I agree with you, I think that once we will have sessionization, we could
aim for richer processing capabilities per session. As far as I image it, a
session is an ordered sequence of data, that we could apply computation on
it (like CEP).


Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Thu, Nov 17, 2016 at 5:16 PM, assaf.mendelson <assaf.mendel...@rsa.com>
wrote:

> It is true that this is sessionizing but I brought it as an example for
> finding an ordered pattern in the data.
>
> In general, using simple window (e.g. 24 hours) in structured streaming is
> explain in the grouping by time and is very clear.
>
> What I was trying to figure out is how to do streaming of cases where you
> actually have to have some sorting to find patterns, especially when some
> of the data may come in late.
>
> I was trying to figure out if there is plan to support this and if so,
> what would be the performance implications.
>
> Assaf.
>
>
>
> *From:* Ofir Manor [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19936&i=0>]
> *Sent:* Thursday, November 17, 2016 5:13 PM
> *To:* Mendelson, Assaf
> *Subject:* Re: structured streaming and window functions
>
>
>
> Assaf, I think what you are describing is actually sessionizing, by user,
> where a session is ended by a successful login event.
>
> On each session, you want to count number of failed login events.
>
> If so, this is tracked by https://issues.apache.org/
> jira/browse/SPARK-10816 (didn't start yet)
>
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: <a href="<a href="tel:%2B972-54-7801286">tel:%2B972-54-7801286"
> value="+972507470820" target="_blank">+972-54-7801286 | Email: [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19935&i=0>
>
>
>
> On Thu, Nov 17, 2016 at 2:52 PM, assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19935&i=1>> wrote:
>
> Is there a plan to support sql window functions?
>
> I will give an example of use: Let’s say we have login logs. What we want
> to do is for each user we would want to add the number of failed logins for
> each successful login. How would you do it with structured streaming?
>
> As this is currently not supported, is there a plan on how to support it
> in the future?
>
> Assaf.
>
>
>
> *From:* Herman van Hövell tot Westerflier-2 [via Apache Spark Developers
> List] [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19935&i=2>[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19934&i=0>]
> *Sent:* Thursday, November 17, 2016 1:27 PM
> *To:* Mendelson, Assaf
> *Subject:* Re: structured streaming and window functions
>
>
>
> What kind of window functions are we talking about? Structured streaming
> only supports time window aggregates, not the more general sql window
> function (sum(x) over (partition by ... order by ...)) aggregates.
>
>
>
> The basic idea is that you use incremental aggregation and store the
> aggregation buffer (not the end result) in a state store after each
> increment. When an new batch comes in, you perform aggregation on that
> batch, merge the result of that aggregation with the buffer in the state
> store, update the state store and return the new result.
>
>
>
> This is much harder than it sounds, because you need to maintain state in
> a fault tolerant way and you need to have some eviction policy (watermarks
> for instance) for aggregation buffers to prevent the state store from
> reaching an infinite size.
>
>
>
> On Thu, Nov 17, 2016 at 12:19 AM, assaf.mendelson <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19933&i=0>> wrote:
>
> Hi,
>
> I have been trying to figure out how structured streaming handles window
> functions efficiently.
>
> The portion I understand is that whenever new data arrived, it is grouped
> by the time and the aggregated data is added to the state.
>
> However, unlike operations like sum etc. window functions need the
> original data and can change when data arrives late.
>
> So if I understand correctly, this would mean that we would have to save
> the original data and rerun on it to calculate the window function every
> time new data arrives.
>
> Is this correct? Are there ways to go around this issue?
>
>
>
> Assaf.
>
>
> ------------------------------
>
> View this message in context: structured streaming and window functions
> <http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/structured-
> streaming-and-window-functions-tp19930p19933.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http://user/SendEmail.jtp?type=node&node=19934&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
> ------------------------------
>
> View this message in context: RE: structured streaming and window
> functions
> <http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930p19934.html>
>
>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/structured-
> streaming-and-window-functions-tp19930p19935.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19936&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> ------------------------------
> View this message in context: RE: structured streaming and window
> functions
> <http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930p19936.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>

Reply via email to