On Thu, Oct 20, 2016 at 12:07 AM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> At the AMPLab we've been working on a research project that looks at
> just the scheduling latencies and on techniques to get lower
> scheduling latency. It moves away from the micro-batch model, but
> reuses the fault tolerance etc. in Spark. However we haven't yet
> figure out all the parts in integrating this with the rest of
> structured streaming. I'll try to post a design doc / SIP about this
> soon.
>
> On a related note - are there other problems users face with
> micro-batch other than latency ?
>
I think that the fact that they serve as an output trigger is a problem,
but Structured Streaming seems to resolve this now.

>
> Thanks
> Shivaram
>
> On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust
> <mich...@databricks.com> wrote:
> > I know people are seriously thinking about latency.  So far that has not
> > been the limiting factor in the users I've been working with.
> >
> > On Wed, Oct 19, 2016 at 1:11 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>
> >> Is anyone seriously thinking about alternatives to microbatches?
> >>
> >> On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust
> >> <mich...@databricks.com> wrote:
> >> > Anything that is actively being designed should be in JIRA, and it
> seems
> >> > like you found most of it.  In general, release windows can be found
> on
> >> > the
> >> > wiki.
> >> >
> >> > 2.1 has a lot of stability fixes as well as the kafka support you
> >> > mentioned.
> >> > It may also include some of the following.
> >> >
> >> > The items I'd like to start thinking about next are:
> >> >  - Evicting state from the store based on event time watermarks
> >> >  - Sessionization (grouping together related events by key /
> eventTime)
> >> >  - Improvements to the query planner (remove some of the restrictions
> on
> >> > what queries can be run).
> >> >
> >> > This is roughly in order based on what I've been hearing users hit the
> >> > most.
> >> > Would love more feedback on what is blocking real use cases.
> >> >
> >> > On Tue, Oct 18, 2016 at 1:51 AM, Ofir Manor <ofir.ma...@equalum.io>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >> I hope it is the right forum.
> >> >> I am looking for some information of what to expect from
> >> >> StructuredStreaming in its next releases to help me choose when /
> where
> >> >> to
> >> >> start using it more seriously (or where to invest in workarounds and
> >> >> where
> >> >> to wait). I couldn't find a good place where such planning discussed
> >> >> for 2.1
> >> >> (like, for example ML and SPARK-15581).
> >> >> I'm aware of the 2.0 documented limits
> >> >>
> >> >> (
> http://spark.apache.org/docs/2.0.1/structured-streaming-programming-guide.html#unsupported-operations
> ),
> >> >> like no support for multiple aggregations levels, joins are strictly
> to
> >> >> a
> >> >> static dataset (no SCD or stream-stream) etc, limited sources / sinks
> >> >> (like
> >> >> no sink for interactive queries) etc etc
> >> >> I'm also aware of some changes that have landed in master, like the
> new
> >> >> Kafka 0.10 source (and its on-going improvements) in SPARK-15406, the
> >> >> metrics in SPARK-17731, and some improvements for the file source.
> >> >> If I remember correctly, the discussion on Spark release cadence
> >> >> concluded
> >> >> with a preference to a four-month cycles, with likely code freeze
> >> >> pretty
> >> >> soon (end of October). So I believe the scope for 2.1 should likely
> >> >> quite
> >> >> clear to some, and that 2.2 planning should likely be starting about
> >> >> now.
> >> >> Any visibility / sharing will be highly appreciated!
> >> >> thanks in advance,
> >> >>
> >> >> Ofir Manor
> >> >>
> >> >> Co-Founder & CTO | Equalum
> >> >>
> >> >> Mobile: +972-54-7801286 <054-780-1286> | Email:
> ofir.ma...@equalum.io
> >> >
> >> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to