Re: Plan on Structured Streaming in next major/minor release?

2018-11-04 Thread JackyLee
Can these things be added into this list? 1. [SPARK-24630] Support SQLStreaming in Spark This patch defines the Table API for StructStreaming 2. [SPARK-25937] Support user-defined schema in Kafka Source & Sink This patch make user easier to work with StructStreaming 3. SS supports

Re: Plan on Structured Streaming in next major/minor release?

2018-11-02 Thread kant kodali
If I can add one thing to this list I would say stateless aggregations using Raw SQL. For example: As I read micro-batches from Kafka I want to do say a count of that micro batch and spit it out using Raw SQL . (No Count aggregation across batches.) On Tue, Oct 30, 2018 at 4:55 PM Jungtaek Lim

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
OK thanks for clarifying. I guess it is one of major features in streaming area and nice to add, but also agree it would require huge investigation. 2018년 10월 31일 (수) 오전 8:06, Michael Armbrust 님이 작성: > Agree. Just curious, could you explain what do you mean by "negation"? >> Does it mean

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Michael Armbrust
> > Agree. Just curious, could you explain what do you mean by "negation"? > Does it mean applying retraction on aggregated? > Yeah exactly. Our current streaming aggregation assumes that the input is in append-mode and multiple aggregations break this.

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
Thanks Micheal for explaining activity on SS as well as giving opinion on some items! Replying inline. 2018년 10월 31일 (수) 오전 5:44, Michael Armbrust 님이 작성: > Thanks for bringing up some possible future directions for streaming. Here > are some thoughts: > - I personally view all of the activity

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Stavros Kontopoulos
@Michael any update about queryable state? Stavros On Tue, Oct 30, 2018 at 10:43 PM, Michael Armbrust wrote: > Thanks for bringing up some possible future directions for streaming. Here > are some thoughts: > - I personally view all of the activity on Spark SQL also as activity on >

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Michael Armbrust
Thanks for bringing up some possible future directions for streaming. Here are some thoughts: - I personally view all of the activity on Spark SQL also as activity on Structured Streaming. The great thing about building streaming on catalyst / tungsten is that continued improvement to these

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
Adding more: again, it doesn't mean they're feasible to do. Just a kind of brainstorming. * SPARK-20568: Delete files after processing in structured streaming * There hasn't been consensus regarding supporting this: there were voices for both YES and NO. * Support multiple levels of

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Jungtaek Lim
Yeah, the main intention of this thread is to collect interest on possible feature list for structured streaming. From what I can see in Spark community, most of the discussions as well as contributions are for SQL, and I'd wish to see similar activeness / efforts on structured streaming.

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Stavros Kontopoulos
Hi Jungtaek, I just tried to start the discussion in the dev list along time ago. I enumerated some uses cases as Michael proposed here . The discussion didn't

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Jungtaek Lim
Stavros, if my memory is right, you were trying to drive queryable state, right? Could you summary the progress and the reason why the progress got stopped? 2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > That is a very interesting list thanks. I

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Stavros Kontopoulos
That is a very interesting list thanks. I could create a design doc as a starting pointing for discussion if this is a feature we would like to have. Regards, Stavros On Sun, Oct 21, 2018 at 3:04 PM, JackyLee wrote: > Thanks for raising them. > > FYI, I believe this open issues could also be

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread JackyLee
Thanks for raising them. FYI, I believe this open issues could also be considered: https://issues.apache.org/jira/browse/SPARK-24630 An new ability to express Struct Streaming on pure SQL. -- Sent from:

Re: Plan on Structured Streaming in next major/minor release?

2018-10-20 Thread kant kodali
+1 For Raising all this. +1 For Queryable State (SPARK-16738 [3]) On Thu, Oct 18, 2018 at 9:59 PM Jungtaek Lim wrote: > Small correction: "timeout" in map/flatmapGroupsWithState would not work > similar as State TTL when event time and watermark is set. So timeout in >

Re: Plan on Structured Streaming in next major/minor release?

2018-10-18 Thread Jungtaek Lim
Small correction: "timeout" in map/flatmapGroupsWithState would not work similar as State TTL when event time and watermark is set. So timeout in map/flatmapGroupsWithState is to guarantee removal of state when the state will not be used, as similar as what we do with streaming aggregation,

Plan on Structured Streaming in next major/minor release?

2018-10-18 Thread Jungtaek Lim
Hi devs, While Spark 2.4.0 is still in progress of release votes, I'm seeing some pull requests on non-SS are being reviewed and merged into master branch, so I guess discussion about next release is OK. Looks like there's a major TODO left on structured streaming: allowing stateful operation in