Re: DataSourceV2 hangouts sync

2018-10-25 Thread Saikat Kanjilal
Ditto, I’d also like to join and am in Seattle, generally afternoons work better for me. Sent from my iPhone On Oct 25, 2018, at 5:02 PM, Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Big +1 on this! I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay area. Hopefully we

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
I didn't know I live in the same timezone with you Wenchen :D. Monday or Wednesday at 5PM PDT sounds good to me too FWIW. 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성: > Good point. How about Monday or Wednesday at 5PM PDT then? > > Everyone, please reply to me (no need to spam the list) with

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Ryan Blue
Good point. How about Monday or Wednesday at 5PM PDT then? Everyone, please reply to me (no need to spam the list) with which option works for you and I'll send an invite for the one with the most votes. On Thu, Oct 25, 2018 at 5:14 PM Wenchen Fan wrote: > Friday at the bay area is Saturday at

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Wenchen Fan
Friday at the bay area is Saturday at my side, it will be great if we can pick a day from Monday to Thursday. On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue wrote: > Since not many people have replied with a time window, how about we aim > for 5PM PDT? That should work for Wenchen and most people

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Ryan Blue
Since not many people have replied with a time window, how about we aim for 5PM PDT? That should work for Wenchen and most people here in the bay area. If that makes it so some people can't attend, we can do the next one earlier for people in Europe. If we go with 5PM PDT, then what day works

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Wenchen Fan
Big +1 on this! I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay area. Hopefully we can coordinate a time that fits everyone. Thanks Wenchen On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun wrote: > +1. Thank you for volunteering, Ryan! > > Bests, > Dongjoon. > > > On Thu,

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Reynold Xin
I also think we should get this in: https://github.com/apache/spark/pull/22841 It's to deprecate a confusing & broken window function API, so we can remove them in 3.0 and redesign a better one. See https://issues.apache.org/jira/browse/SPARK-25841 for more information. On Thu, Oct 25, 2018 at

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Sean Owen
Yep, we're going to merge a change to separate the k8s tests into a separate profile, and fix up the Scala 2.12 thing. While non-critical those are pretty nice to have for 2.4. I think that's doable within the next 12 hours even. @skonto I think there's one last minor thing needed on this PR?

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
+1 ! 2018년 10월 26일 (금) 오전 7:21, Dongjoon Hyun 님이 작성: > +1. Thank you for volunteering, Ryan! > > Bests, > Dongjoon. > > > On Thu, Oct 25, 2018 at 4:19 PM Xiao Li wrote: > >> +1 >> >> Reynold Xin 于2018年10月25日周四 下午4:16写道: >> >>> +1 >>> >>> >>> >>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote:

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Wenchen Fan
Any updates on this topic? https://github.com/apache/spark/pull/22827 is merged and 2.4 is unblocked. I'll cut RC5 shortly after the weekend, and it will be great to include the change proposed here. Thanks, Wenchen On Fri, Oct 26, 2018 at 12:55 AM Stavros Kontopoulos <

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Dongjoon Hyun
+1. Thank you for volunteering, Ryan! Bests, Dongjoon. On Thu, Oct 25, 2018 at 4:19 PM Xiao Li wrote: > +1 > > Reynold Xin 于2018年10月25日周四 下午4:16写道: > >> +1 >> >> >> >> On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote: >> >>> Although I am not specifically involved in DSv2, I think having this

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Xiao Li
+1 Reynold Xin 于2018年10月25日周四 下午4:16写道: > +1 > > > > On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote: > >> Although I am not specifically involved in DSv2, I think having this kind >> of meeting is definitely helpful to discuss, move certain effort forward >> and keep people on the same page.

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Reynold Xin
+1 On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote: > Although I am not specifically involved in DSv2, I think having this kind > of meeting is definitely helpful to discuss, move certain effort forward > and keep people on the same page. Glad to see this kind of working group > happening. > >

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Li Jin
Although I am not specifically involved in DSv2, I think having this kind of meeting is definitely helpful to discuss, move certain effort forward and keep people on the same page. Glad to see this kind of working group happening. On Thu, Oct 25, 2018 at 5:58 PM John Zhuge wrote: > Great idea!

Re: DataSourceV2 hangouts sync

2018-10-25 Thread John Zhuge
Great idea! On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue wrote: > Hi everyone, > > There's been some great discussion for DataSourceV2 in the last few > months, but it has been difficult to resolve some of the discussions and I > don't think that we have a very clear roadmap for getting the work

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Felix Cheung
Yes please! From: Ryan Blue Sent: Thursday, October 25, 2018 1:10 PM To: Spark Dev List Subject: DataSourceV2 hangouts sync Hi everyone, There's been some great discussion for DataSourceV2 in the last few months, but it has been difficult to resolve some of

DataSourceV2 hangouts sync

2018-10-25 Thread Ryan Blue
Hi everyone, There's been some great discussion for DataSourceV2 in the last few months, but it has been difficult to resolve some of the discussions and I don't think that we have a very clear roadmap for getting the work done. To coordinate better as a community, I'd like to start a regular

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

2018-10-25 Thread Reynold Xin
I have some pretty serious concerns over this proposal. I agree that there are many things that can be improved, but at the same time I also think the cost of introducing a new IR in the middle is extremely high. Having participated in designing some of the IRs in other systems, I've seen more

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-10-25 Thread Reynold Xin
I incorporated the feedbacks here and updated the SPIP page: https://github.com/apache/spark-website/pull/156 The new version is live now: https://spark.apache.org/improvement-proposals.html On Fri, Aug 31, 2018 at 4:35 PM Ryan Blue wrote: > +1 > > I think this is a great suggestion. I agree

Re: What's a blocker?

2018-10-25 Thread Tom Graves
Ignoring everything else in this thread to put sharper point on one issue. In the pr multiple people referred to it's not a blocker based on it was also a bug/dropped feature in the previous release (note one was phrased slightly different as it was stated not a regression, which I read as not

Re: KryoSerializer Implementation - Not using KryoPool

2018-10-25 Thread Sean Owen
It's not so much the KryoSerializerInstance that's the problem, but that it will always make a new Kryo (although at most 1). You mean to supply it with a reference to a pool instead, shared across all KryoSerializerInstance? plausible yeah. See https://spark.apache.org/contributing.html for

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Stavros Kontopoulos
> > I think it's worth getting in a change to just not enable this module, > which ought to be entirely safe, and avoid two of the issues we > identified. > Besides disabling it, when someone wants to run the tests with 2.12 he should be able to do so. So propagating the Scala profile still makes

Re: KryoSerializer Implementation - Not using KryoPool

2018-10-25 Thread Patrick Brown
Based on my, limited, read through of the code that uses this, it seems like often a new KryoSerializerInstance is created for whatever task and then it falls out of scope, instead of being reused. I did notice that comment about a pool size of 1, however if the use is generally how I just

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Sean Owen
I think it's worth getting in a change to just not enable this module, which ought to be entirely safe, and avoid two of the issues we identified. that said it didn't block RC4 so need not block RC5. But should happen today if we're doing it. On Thu, Oct 25, 2018 at 10:47 AM Xiao Li wrote: > >

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Xiao Li
Hopefully, this will not delay RC5. Since this is not a blocker ticket, RC5 will start if all the blocker tickets are resolved. Thanks, Xiao Sean Owen 于2018年10月25日周四 上午8:44写道: > Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5 :) > > On Thu, Oct 25, 2018 at 10:41 AM

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Sean Owen
Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5 :) On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos wrote: > > I agree these tests should be manual for now but should be run somehow before > a release to make sure things are working right? > > For the other issue:

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Stavros Kontopoulos
I agree these tests should be manual for now but should be run somehow before a release to make sure things are working right? For the other issue: https://issues.apache.org/jira/browse/SPARK-25835 . On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>

Re: What's a blocker?

2018-10-25 Thread Erik Erlandson
I'd like to expand a bit on the phrase "opportunity cost" to try and make it more concrete: delaying a release means that the community is *not* receiving various bug fixes (and features). Just as a particular example, the wait for 2.3.2 delayed a fix for the Py3.7 iterator breaking change that

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Stavros Kontopoulos
I will open a jira for the profile propagation issue and have a look to fix it. Stavros On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson wrote: > > I would be comfortable making the integration testing manual for now. A > JIRA for ironing out how to make it reliable for automatic as a goal for

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-25 Thread Wenchen Fan
Personally I don't think it matters. Users can build arbitrary expressions/plans themselves with internal API, and we never guarantee the result. Removing these functions from the function registry is a small patch and easy to review, and to me it's better than a 1000+ LOC patch that removes the

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Erik Erlandson
I would be comfortable making the integration testing manual for now. A JIRA for ironing out how to make it reliable for automatic as a goal for 3.0 seems like a good idea. On Thu, Oct 25, 2018 at 8:11 AM Sean Owen wrote: > Forking this thread. > > Because we'll have another RC, we could

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-25 Thread Dongjoon Hyun
Thank you for the decision, All. As of now, to unblock this, it seems that we are trying to remove them from the function registry. https://github.com/apache/spark/pull/22821 One problem here is that users can recover those functions like this simply. scala>

Re: What's a blocker?

2018-10-25 Thread Sean Owen
What does "PMC members aren't saying its a block for reasons other then the actual impact the jira has" mean that isn't already widely agreed? Likewise "Committers and PMC members should not be saying its not a blocker because they personally or their company doesn't care about this feature or

Re: What's a blocker?

2018-10-25 Thread Tom Graves
So just to clarify a few things in case people didn't read the entire thread in the PR, the discussion is what is the criteria for a blocker and really my concerns are what people are using as criteria for not marking a jira as a blocker. The only thing we have documented to mark a jira as a

[DISCUSS] Support decimals with negative scale in decimal operation

2018-10-25 Thread Marco Gaido
Hi all, a bit more than one month ago, I sent a proposal for handling properly decimals with negative scales in our operations. This is a long standing problem in our codebase as we derived our rules from Hive and SQLServer where negative scales are forbidden, while in Spark they are not. The

Stream Stream joins with update and complete mode

2018-10-25 Thread sandeep_katta
As per the documentation http://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html#stream-stream-joins , only append mode is supported *As of Spark 2.3, you can use joins only when the query is in Append output mode. Other output modes are not yet supported.* But as per the

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

2018-10-25 Thread Kazuaki Ishizaki
Hi Xiao, Thank you very much for becoming a shepherd. If you feel the discussion settles, we would appreciate it if you would start a voting. Regards, Kazuaki Ishizaki From: Xiao Li To: Kazuaki Ishizaki Cc: dev , Takeshi Yamamuro Date: 2018/10/22 16:31 Subject:Re: