Hi all

I would highly recommend to all users-devs interested in the design
suggestions / discussions for Structured Streaming Spark API watermarking
to take a look on the following links along with the design document. It
would help to understand the notions of watermark , out of order data and
possible use cases.

https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

Kind Regards


2016-10-27 9:46 GMT+03:00 assaf.mendelson <assaf.mendel...@rsa.com>:

> Hi,
>
> Should comments come here or in the JIRA?
>
> Any, I am a little confused on the need to expose this as an API to begin
> with.
>
> Let’s consider for a second the most basic behavior: We have some input
> stream and we want to aggregate a sum over a time window.
>
> This means that the window we should be looking at would be the maximum
> time across our data and back by the window interval. Everything older can
> be dropped.
>
> When new data arrives, the maximum time cannot move back so we generally
> drop everything tool old.
>
> This basically means we save only the latest time window.
>
> This simpler model would only break if we have a secondary aggregation
> which needs the results of multiple windows.
>
> Is this the use case we are trying to solve?
>
> If so, wouldn’t just calculating the bigger time window across the entire
> aggregation solve this?
>
> Am I missing something here?
>
>
>
> *From:* Michael Armbrust [via Apache Spark Developers List] [mailto:
> ml-node+[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19591&i=0>]
> *Sent:* Thursday, October 27, 2016 3:04 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Watermarking in Structured Streaming to drop late data
>
>
>
> And the JIRA: https://issues.apache.org/jira/browse/SPARK-18124
>
>
>
> On Wed, Oct 26, 2016 at 4:56 PM, Tathagata Das <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19590&i=0>> wrote:
>
> Hey all,
>
>
>
> We are planning implement watermarking in Structured Streaming that would
> allow us handle late, out-of-order data better. Specially, when we are
> aggregating over windows on event-time, we currently can end up keeping
> unbounded amount data as state. We want to define watermarks on the event
> time in order mark and drop data that are "too late" and accordingly age
> out old aggregates that will not be updated any more.
>
>
>
> To enable the user to specify details like lateness threshold, we are
> considering adding a new method to Dataset. We would like to get more
> feedback on this API. Here is the design doc
>
>
>
> https://docs.google.com/document/d/1z-Pazs5v4rA31azvmYhu4I5xwqaNQl6Z
> LIS03xhkfCQ/
>
>
>
> Please comment on the design and proposed APIs.
>
>
>
> Thank you very much!
>
>
>
> TD
>
>
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Watermarking-in-
> Structured-Streaming-to-drop-late-data-tp19589p19590.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19591&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> ------------------------------
> View this message in context: RE: Watermarking in Structured Streaming to
> drop late data
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Watermarking-in-Structured-Streaming-to-drop-late-data-tp19589p19591.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>

Reply via email to