Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-08 Thread Wenchen Fan
> Would you mind if I ask the condition of being public API? The module doesn't matter, but the package matters. We have many public APIs in the catalyst module as well. (e.g. DataType) There are 3 packages in Spark SQL that are meant to be private: 1. org.apache.spark.sql.catalyst 2.

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Wenchen Fan
Regarding DS v2, I'd like to remove SPARK-26785 data source v2 API refactor: streaming write SPARK-26956 remove streaming output mode from data source v2 APIs and put the umbrella ticket

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Dongjoon Hyun
Thank you for the preparation of 3.0-preview, Xingbo! Bests, Dongjoon. On Tue, Oct 8, 2019 at 2:32 PM Xingbo Jiang wrote: > What's the process to propose a feature to be included in the final Spark >> 3.0 release? >> > > I don't know whether there exists any specific process here, normally

Re: [build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-08 Thread Shane Knapp
jenkins is going down now. On Tue, Oct 8, 2019 at 4:21 PM Shane Knapp wrote: > > quick update: > > we are definitely going to have our power shut off starting early > tomorrow morning (by 4am PDT oct 9th), and expect at least 48 hours > before it is restored. > > i will be shutting jenkins down

Re: Auto-closing PRs when there are no feedback or response from its author

2019-10-08 Thread Sean Owen
I'm generally all for closing pretty old PRs. They can be reopened easily. Closing a PR (a particular proposal for how to resolve an issue) is less drastic than closing a JIRA (a description of an issue). Closing them just delivers the reality, that nobody is going to otherwise revisit it, and can

Auto-closing PRs when there are no feedback or response from its author

2019-10-08 Thread Hyukjin Kwon
Hi all, I think we talked about this before. Roughly speaking, there are two cases of PRs: 1. PRs waiting for review and 2. PRs waiting for author's reaction We might not have to take an action but wait for reviewing for the first case. However, we can ping and/or take an action for the second

Re: [build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-08 Thread Shane Knapp
quick update: we are definitely going to have our power shut off starting early tomorrow morning (by 4am PDT oct 9th), and expect at least 48 hours before it is restored. i will be shutting jenkins down some time this evening, and will update everyone here when i get more information. full

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Xingbo Jiang
> > What's the process to propose a feature to be included in the final Spark > 3.0 release? > I don't know whether there exists any specific process here, normally you just merge the feature into Spark master before release code freeze, and then the feature would probably be included in the

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Li Jin
Thanks for summary! I have a question that is semi-related - What's the process to propose a feature to be included in the final Spark 3.0 release? In particular, I am interested in https://issues.apache.org/jira/browse/SPARK-28006. I am happy to do the work so want to make sure I don't miss

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Xingbo Jiang
Hi all, Thanks for all the feedbacks, here is the updated feature list: SPARK-11215 Multiple columns support added to various Transformers: StringIndexer SPARK-11150 Implement Dynamic

Re: [build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-08 Thread Xiao Li
Hi, Shane, Thank you for letting us know in advance! Xiao On Tue, Oct 8, 2019 at 12:50 PM Shane Knapp wrote: > here in the lovely bay area, we are currently experiencing some > absolutely lovely weather: temps around 20C, light winds, and not a > drop of moisture anywhere. > > this means

[build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-08 Thread Shane Knapp
here in the lovely bay area, we are currently experiencing some absolutely lovely weather: temps around 20C, light winds, and not a drop of moisture anywhere. this means that wildfire season is here, and our utilities company (PG) is very concerned about fires like last year's Camp Fire

Re: Exposing functions to pyspark

2019-10-08 Thread Andrew Melo
Hello again, Is it possible to grab a handle to the underlying DataSourceReader backing a DataFrame? I see that there's no nice way to add extra methods to Dataset, so being able to grab the DataSource backing the dataframe would be a good escape hatch. Cheers Andrew On Mon, Sep 30, 2019 at

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-08 Thread Russell Spitzer
+1 (non-binding). Sounds good to me On Mon, Oct 7, 2019 at 11:58 PM Wenchen Fan wrote: > +1 > > I think this is the most reasonable default behavior among the three. > > On Mon, Oct 7, 2019 at 6:06 PM Alessandro Solimando < > alessandro.solima...@gmail.com> wrote: > >> +1 (non-binding) >> >> I

[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-08 Thread Jacek Laskowski
Hi, I haven't spent much time on it, but the following DEBUG message from WatermarkTracker sparked my interest :) I ran a streaming aggregation in Append mode and got the messages: 19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0: EventTimeStats(15000,1000,8000.0,2)