Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread Tathagata Das
n email] > <http:///user/SendEmail.jtp?type=node=19600=0>] > *Sent:* Thursday, October 27, 2016 10:17 AM > *To:* Mendelson, Assaf > *Subject:* Re: Watermarking in Structured Streaming to drop late data > > > > Hi all > > I would highly recommend to all users-dev

RE: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread assaf.mendelson
papageorgopoylos [via Apache Spark Developers List] [mailto:ml-node+s1001551n19592...@n3.nabble.com] Sent: Thursday, October 27, 2016 10:17 AM To: Mendelson, Assaf Subject: Re: Watermarking in Structured Streaming to drop late data Hi all I would highly recommend to all users-devs interested

Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread Tathagata Das
;> To enable the user to specify details like lateness threshold, we are >> considering adding a new method to Dataset. We would like to get more >> feedback on this API. Here is the design doc >> >> >> >> https://docs.google.com/document/d/1z-Pazs5v4rA31azvmYhu4I5x >

Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread Ofir Manor
s the entire > aggregation solve this? > > Am I missing something here? > > > > *From:* Michael Armbrust [via Apache Spark Developers List] [mailto: > ml-node+[hidden email] > <http:///user/SendEmail.jtp?type=node=19591=0>] > *Sent:* Thursday, October 27, 2016 3

Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread kostas papageorgopoylos
this? > > Am I missing something here? > > > > *From:* Michael Armbrust [via Apache Spark Developers List] [mailto: > ml-node+[hidden email] > <http:///user/SendEmail.jtp?type=node=19591=0>] > *Sent:* Thursday, October 27, 2016 3:04 AM > *To:* Mendelson, Assaf

RE: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread assaf.mendelson
Apache Spark Developers List] [mailto:ml-node+s1001551n19590...@n3.nabble.com] Sent: Thursday, October 27, 2016 3:04 AM To: Mendelson, Assaf Subject: Re: Watermarking in Structured Streaming to drop late data And the JIRA: https://issues.apache.org/jira/browse/SPARK-18124 On Wed, Oct 26, 2016 at 4

Re: Watermarking in Structured Streaming to drop late data

2016-10-26 Thread Michael Armbrust
And the JIRA: https://issues.apache.org/jira/browse/SPARK-18124 On Wed, Oct 26, 2016 at 4:56 PM, Tathagata Das wrote: > Hey all, > > We are planning implement watermarking in Structured Streaming that would > allow us handle late, out-of-order data better. Specially, when

Watermarking in Structured Streaming to drop late data

2016-10-26 Thread Tathagata Das
Hey all, We are planning implement watermarking in Structured Streaming that would allow us handle late, out-of-order data better. Specially, when we are aggregating over windows on event-time, we currently can end up keeping unbounded amount data as state. We want to define watermarks on the