Re: [VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Hyukjin Kwon
-1 Seems there's one critical correctness issue specifically in branch-2.4 ... Please take a look for https://github.com/apache/spark/pull/25593 2019년 8월 27일 (화) 오후 2:38, Takeshi Yamamuro 님이 작성: > Hi, Dongjoon > > I checked that all the test passed on my Mac/x86_64 env with: > -Pyarn

Re: [VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Takeshi Yamamuro
Hi, Dongjoon I checked that all the test passed on my Mac/x86_64 env with: -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes -Pkubernetes-integration-tests -Psparkr maropu@~/spark-2.4.4-rc2:$java -version java version "1.8.0_181" Java(TM) SE Runtime Environment (build

Re: [VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Sean Owen
+1 as per response to RC1. The existing issues identified there seem to have been fixed. On Mon, Aug 26, 2019 at 2:45 AM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 29th 1AM PST and passes if a

Re: Apache Spark git repo moved to gitbox.apache.org

2019-08-26 Thread Wenchen Fan
yea I think we should, but no need to worry too much about it because gitbox still works in the release scripts. On Tue, Aug 27, 2019 at 3:23 AM Shane Knapp wrote: > revisiting this old thread... > > i noticed from the committers' page on the spark site that the 'apache' > remote should be

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-26 Thread Kazuaki Ishizaki
Thank you for pointing out the problem. The characters and hyperlink point different URLs. Could you please access https://repository.apache.org/content/repositories/orgapachespark-1331/ as you see characters? Sorry for your inconvenience. Kazuaki Ishizaki, From: Takeshi Yamamuro To:

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-26 Thread Takeshi Yamamuro
Hi, Thanks for the release manage! It seems the staging repository has not been exposed yet? https://repository.apache.org/content/repositories/orgapachespark-1328/ On Tue, Aug 27, 2019 at 5:28 AM Kazuaki Ishizaki wrote: > Please vote on releasing the following candidate as Apache Spark

Re: DISCUSS [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-08-26 Thread Tom Graves
Bumping this up. I'm guessing people haven't had time to review, it would be great to get feedback on this. Thanks,Tom On Tuesday, August 6, 2019, 2:27:49 PM CDT, Tom Graves wrote: Hey everyone, I have been working on coming up with a proposal for supporting stage level resource

[VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-26 Thread Kazuaki Ishizaki
Please vote on releasing the following candidate as Apache Spark version 2.3.4. The vote is open until August 29th 2PM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.4 [ ] -1 Do not release this package because

Re: Apache Spark git repo moved to gitbox.apache.org

2019-08-26 Thread Shane Knapp
revisiting this old thread... i noticed from the committers' page on the spark site that the 'apache' remote should be 'github.com', and not 'gitbox' as instructed here. so, i did a quick check of the spark repo and found we're still referencing gitbox in a few places: ➜ spark

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Sean Owen
Bringing a side conversation back to main: good news / bad news. We most definitely want one build to run on JDK 8 and JDK 11. That is actually what both of the JDK 11 jobs do right now, so I believe the passing Jenkins job suggests that already works. The downside is I think we haven't

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Matei Zaharia
+1, it’s super messy without that. But great to see this running! > On Aug 26, 2019, at 10:53 AM, Reynold Xin wrote: > > Exactly - I think it's important to be able to create a single binary build. > Otherwise downstream users (the 99.99% won't be building their own Spark but > just pull it

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Reynold Xin
Exactly - I think it's important to be able to create a single binary build. Otherwise downstream users (the 99.99% won't be building their own Spark but just pull it from Maven) will have to deal with the mess, and it's even worse for libraries. On Mon, Aug 26, 2019 at 10:51 AM, Dongjoon Hyun

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Dongjoon Hyun
Oh, right. If you want to publish something to Maven, it will inherit the situation. Thank you for feedback. :) On Mon, Aug 26, 2019 at 10:37 AM Michael Heuer wrote: > That is not true for any downstream users who also provide a library. > Whatever build mess you create in Apache Spark, we'll

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Michael Heuer
That is not true for any downstream users who also provide a library. Whatever build mess you create in Apache Spark, we'll have to inherit it. ;) michael > On Aug 26, 2019, at 12:32 PM, Dongjoon Hyun wrote: > > As Shane wrote, not yet. > > `one build for works for both` is our

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Shane Knapp
maybe in the future, but not right now as the hadoop 2.7 build is broken. also, i busted dev/run-tests.py in my changes to support java11 in PRBs: https://github.com/apache/spark/pull/25585 quick fix, testing now. On Mon, Aug 26, 2019 at 10:23 AM Reynold Xin wrote: > Would it be possible to

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Reynold Xin
Would it be possible to have one build that works for both? On Mon, Aug 26, 2019 at 10:22 AM Dongjoon Hyun wrote: > Thank you all! > > Let me add more explanation on the current status. > > - If you want to run on JDK8, you need to build on JDK8 > - If you want to run on JDK11, you need

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Dongjoon Hyun
Thank you all! Let me add more explanation on the current status. - If you want to run on JDK8, you need to build on JDK8 - If you want to run on JDK11, you need to build on JDK11. The other combinations will not work. Currently, we have two Jenkins jobs. (1) is the one I pointed, and

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Shixiong(Ryan) Zhu
We were worried about regression when adding Kafka source v2 because it had lots of changes. Hence we copy-pasted codes to keep the Kafka source v1 untouched and provided a config to fallback to v1. On Mon, Aug 26, 2019 at 7:05 AM Jungtaek Lim wrote: > Thanks! The patch is here:

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Gabor Somogyi
OK, starting with this tomorrow... On Mon, 26 Aug 2019, 16:05 Jungtaek Lim, wrote: > Thanks! The patch is here: https://github.com/apache/spark/pull/25583 > > On Mon, Aug 26, 2019 at 11:02 PM Gabor Somogyi > wrote: > >> Just checked this and it's a copy-paste :) It works properly when >>

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jungtaek Lim
Thanks! The patch is here: https://github.com/apache/spark/pull/25583 On Mon, Aug 26, 2019 at 11:02 PM Gabor Somogyi wrote: > Just checked this and it's a copy-paste :) It works properly when > KafkaSourceInitialOffsetWriter used. Pull me in if review needed. > > BR, > G > > > On Mon, Aug 26,

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Gabor Somogyi
Just checked this and it's a copy-paste :) It works properly when KafkaSourceInitialOffsetWriter used. Pull me in if review needed. BR, G On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: > Nice finding! I don't see any reason to not use > KafkaSourceInitialOffsetWriter from KafkaSource, as

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jungtaek Lim
Nice finding! I don't see any reason to not use KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I guess it was copied and pasted sometime before and not addressed yet. As you haven't submit a patch, I'll submit a patch shortly, with mentioning credit. I'd close mine and wait

Re: How to load Python Pickle File in Spark Data frame

2019-08-26 Thread Sean Owen
Yes, this does not read raw pickle files. It reads files written in the standard Spark/Hadoop form for binary objects (SequenceFiles) but uses Python pickling for the serialization. See the docs, which say this reads what saveAsPickleFile() writes. On Mon, Aug 26, 2019 at 12:23 AM hxngillani

Re: How to load Python Pickle File in Spark Data frame

2019-08-26 Thread Roland Johann
The error you provided hints that pySpark seems to read pickle files as sequence files but are written as simple pickle files without having sequencefile format in mind. I’m no pySpark expert, but I suggest you look into loading the pickle files as binary file and deserialize at custom code.

[SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jacek Laskowski
Hi, Just found out that KafkaSource [1] does not use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for initial offsets. Any reason for that? Should I report an issue? Just checking out as I'm with 2.4.3 exclusively and have no idea what's coming for 3.0. [1]

[VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 2.4.4. The vote is open until August 29th 1AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.4 [ ] -1 Do not release this package because