Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Felix Cheung
I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just saying the next release. In any case I think in the next release it will be great to get more Python 3.x release test coverage. From: shane knapp Sent: Friday, March 29, 2019 4:46 PM

Re: [k8s][jenkins] spark dev tool docs now have k8s+minikube instructions!

2019-03-29 Thread Felix Cheung
Definitely the part on the PR. Thanks! From: shane knapp Sent: Thursday, March 28, 2019 11:19 AM To: dev; Stavros Kontopoulos Subject: [k8s][jenkins] spark dev tool docs now have k8s+minikube instructions! https://spark.apache.org/developer-tools.html search

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-29 Thread Felix Cheung
+1 build source R tests R package CRAN check locally, r-hub From: d_t...@apple.com on behalf of DB Tsai Sent: Wednesday, March 27, 2019 11:31 AM To: dev Subject: [VOTE] Release Apache Spark 2.4.1 (RC9) Please vote on releasing the following candidate as Apache

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-29 Thread Felix Cheung
(I think the .invalid is added by the list server) Personally I’d rather everyone just +1 or -1, and shouldn’t add binding or not. It’s really the responsibility of the RM to confirm if a vote is binding. Mistakes have been made otherwise. From: Marcelo Vanzin

Dropping SortExec from SortMergeJoins on presorted data

2019-03-29 Thread tim
Hi all, We ingest our data into dataframes with multiple naturally co-sorted columns. The redundant sort required during large SortMergeJoin operations takes substantial time that we'd like to optimise -- a plain merge should be sufficient. Is there a mechanism to avoid these sorts in general?

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread shane knapp
i'm not opposed to 3.6 at all. On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler wrote: > PyArrow dropping Python 3.4 was mainly due to support going away at > Conda-Forge and other dependencies also dropping it. I think we better > upgrade Jenkins Python while we are at it. Are you all against

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Bryan Cutler
PyArrow dropping Python 3.4 was mainly due to support going away at Conda-Forge and other dependencies also dropping it. I think we better upgrade Jenkins Python while we are at it. Are you all against jumping to Python 3.6 so we are not in the same boat in September? On Thu, Mar 28, 2019 at

Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-03-29 Thread Ankur Gupta
Thanks Reynold! That is certainly useful to know. @Chris Will it be possible for you to send out those details if you still have them or better create a JIRA, so someone can work on those improvements. If there is already a JIRA, can you please provide a link to the same. Additionally, if the

Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-03-29 Thread Reynold Xin
We tried enabling blacklisting for some customers and in the cloud, very quickly they end up having 0 executors due to various transient errors. So unfortunately I think the current implementation is terrible for cloud deployments, and shouldn't be on by default. The heart of the issue is that