Re: Asking for reviewing PRs regarding structured streaming

2018-07-26 Thread Jungtaek Lim
I'd like to bump this again, since only one of 6 pull requests is merged (5 remaining), and others are not reviewed (non code style) from committers. https://github.com/apache/spark/pulls/HeartSaVioR All pull requests are related to Structured Streaming, and most of all are already reviewed by

Re: Asking for reviewing PRs regarding structured streaming

2018-07-12 Thread Jungtaek Lim
I recently added more test results to SPARK-24763 [1] which shows that the proposal reduces state size according to the ratio of key-value size, whereas there's no performance hit and sometimes even slight boost. Please refer the latest comment in JIRA issue [2] to see the numbers from perf.

Re: Asking for reviewing PRs regarding structured streaming

2018-07-09 Thread Jungtaek Lim
Now I'm adding one more issue (SPARK-24763 [1]), which proposes a new option to enable optimization of state size in streaming aggregation without hurting performance. The idea is to remove data for key fields from value which is duplicated between key and value in state row. This requires

Re: Asking for reviewing PRs regarding structured streaming

2018-07-05 Thread Jungtaek Lim
Ted Yu suggested posting the improved numbers to this thread and I think it's good idea, so also posting here, but I also think explaining rationalization of my issues would help understanding why I'm submitting couple of patches, so I'll explain it first. (Sorry to post a wall of text). tl;dr.

Re: Asking for reviewing PRs regarding structured streaming

2018-07-05 Thread Jungtaek Lim
Bump. I have been having hard time working on making additional PRs since some of these rely on non-merged PRs, so spending additional time to decouple these things if possible. https://github.com/apache/spark/pulls/HeartSaVioR Pending 5 PRs so far, and may add more sooner or later. Thanks,

Re: Asking for reviewing PRs regarding structured streaming

2018-06-30 Thread Jungtaek Lim
Kindly reminder since around 2 weeks passed. I've added more PR during 2 weeks and even planning to do more. 2018년 6월 19일 (화) 오후 6:34, Jungtaek Lim 님이 작성: > Hi Spark devs, > > I have couple of pull requests for structured streaming which are getting > older and fading out from earlier pages in

Asking for reviewing PRs regarding structured streaming

2018-06-19 Thread Jungtaek Lim
Hi Spark devs, I have couple of pull requests for structured streaming which are getting older and fading out from earlier pages in PR pages. https://github.com/apache/spark/pull/21469 https://github.com/apache/spark/pull/21357 https://github.com/apache/spark/pull/21222 Two of them are in a