unsubscribe

2018-08-08 Thread Tarun Kumar

Re: [build system] IMPORTANT: taking centos workers offline for pyarrow upgrade

2018-08-08 Thread shane knapp
test builds started (on ubuntu). if these pass, i will feel comfortable performing the same installation/upgrade steps on the centos workers. upgrade/installation commands: conda install python==3.5 conda install pyarrow=0.10.* -c conda-forge -n py3k pip install sphinx builds:

unsubscribe

2018-08-08 Thread Al Pivonka
-- Those who say it can't be done, are usually interrupted by those doing it.

Re: [build system] IMPORTANT: taking centos workers offline for pyarrow upgrade

2018-08-08 Thread shane knapp
i updated my staging/testing ubuntu server to python 3.5, successfully installed pyarrow 0.10.0 via conda forge. however i'm getting a ton of python package dep failures. i will see if we can get by w/o needing to wipe and recreate every anaconda installation. please hold. On Wed, Aug 8, 2018

Re: [build system] IMPORTANT: taking centos workers offline for pyarrow upgrade

2018-08-08 Thread shane knapp
well... i've been running in to problems (aka dependency hell), and just hit a show-stopper: UnsatisfiableError: The following specifications were found to be in conflict: - pyarrow 0.10.* -> arrow-cpp 0.10.0.* -> python >=2.7,<2.8.0a0 - python 3.4* Use "conda info " to see the dependencies

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-08 Thread Mark Hamstra
I'm inclined to agree. Just saying that it is not a regression doesn't really cut it when it is a now known data correctness issue. We need something a lot more than nothing before releasing 2.4.0. At a barest minimum, that has to be much more complete and publicly highlighted documentation of the

[build system] IMPORTANT: taking centos workers offline for pyarrow upgrade

2018-08-08 Thread shane knapp
pyarrow 0.10.0 has been released, and this is important to be tested against for the 2.4 release (esp due to memory leak problems, etc) https://issues.apache.org/jira/browse/SPARK-23874 https://github.com/apache/spark/pull/21939 https://issues.apache.org/jira/browse/ARROW-1973 i will be putting

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-08 Thread Imran Rashid
On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan wrote: > > SPARK-23243 : > Shuffle+Repartition > on an RDD could lead to incorrect answers > It turns out to be a very complicated issue, there is no consensus about > what is the right fix yet. Likely

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-08 Thread makatun
Steve, thank you for your response. We have tested the spark.read with various options. The difference in performance is very small. In particular, inference makes virtually no effect in the tested case (the testing files have just few rows) Moreover, the complexity of spark.read remains