Re: Plan on Structured Streaming in next major/minor release?

Jungtaek Lim Tue, 30 Oct 2018 00:23:29 -0700

Adding more: again, it doesn't mean they're feasible to do. Just a kind of
brainstorming.


* SPARK-20568: Delete files after processing in structured streaming
  * There hasn't been consensus regarding supporting this: there were
voices for both YES and NO.
* Support multiple levels of aggregations in structured streaming
  * There're plenty of questions in SO regarding this. While I don't think
it makes sense on structured streaming if it requires additional shuffle,
there might be another case: group by keys, apply aggregation, apply
aggregation on aggregated result (grouped keys don't change)

2018년 10월 22일 (월) 오후 12:25, Jungtaek Lim <[email protected]>님이 작성:

> Yeah, the main intention of this thread is to collect interest on possible
> feature list for structured streaming. From what I can see in Spark
> community, most of the discussions as well as contributions are for SQL,
> and I'd wish to see similar activeness / efforts on structured streaming.
> (Unfortunately there's less effort to review others' works - design doc as
> well as pull request - most of efforts looks like being spent to their own
> works.)
>
> I respect the role of PMC member, so the final decision would be up to PMC
> members, but contributors as well as end users could show the interest as
> well as discuss about requirements on SPIP, which could be a good
> background to persuade PMC members.
>
> Before going into the deep I guess we could use this thread to discuss
> about possible use cases, and if we would like to move forward to
> individual thread we could initiate (or resurrect) its discussion thread.
>
> For queryable state, at least there seems no workaround in Spark to
> provide similar thing, especially state is getting bigger. I may have some
> concerns on the details, but I'll add my thought on the discussion thread.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2018년 10월 22일 (월) 오전 1:15, Stavros Kontopoulos <
> [email protected]>님이 작성:
>
>> Hi Jungtaek,
>>
>> I just tried to start the discussion in the dev list along time ago.
>> I enumerated some uses cases as Michael proposed here
>> <http://mail-archives.apache.org/mod_mbox/spark-dev/201712.mbox/%3CCACTd3c_snT=y4r9vod+ebty1fdgtqsxzgjgubox-k8araur...@mail.gmail.com%3E>.
>> The discussion didn't go further.
>>
>> If people find it useful we should start discussing it in detail again.
>>
>> Stavros
>>
>> On Sun, Oct 21, 2018 at 4:54 PM, Jungtaek Lim <[email protected]> wrote:
>>
>>> Stavros, if my memory is right, you were trying to drive queryable
>>> state, right?
>>>
>>> Could you summary the progress and the reason why the progress got
>>> stopped?
>>>
>>> 2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos <
>>> [email protected]>님이 작성:
>>>
>>>> That is a very interesting list thanks. I could create a design doc as
>>>> a starting pointing for discussion if this is a feature we would like to
>>>> have.
>>>>
>>>> Regards,
>>>> Stavros
>>>>
>>>> On Sun, Oct 21, 2018 at 3:04 PM, JackyLee <[email protected]> wrote:
>>>>
>>>>> Thanks for raising them.
>>>>>
>>>>> FYI, I believe this open issues could also be considered:
>>>>>
>>>>> https://issues.apache.org/jira/browse/SPARK-24630
>>>>> <https://issues.apache.org/jira/browse/SPARK-24630>
>>>>>
>>>>> An new ability to express Struct Streaming on pure SQL.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: [email protected]
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>

Re: Plan on Structured Streaming in next major/minor release?

Reply via email to