Re: Very slow complex type column reads from parquet

2018-06-18 Thread Ryan Blue
Jakub, I'm moving the Spark list to bcc and adding the Parquet list, since you're probably more interested in Parquet tuning. It makes sense that you're getting better performance when you have more matching rows distributed, especially if those rows have a huge column that you need to project.

Re: Jenkins build errors

2018-06-18 Thread shane knapp
i triggered another build against your PR, so let's see if this happens again or was a transient failure. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92038/ shane On Mon, Jun 18, 2018 at 5:30 AM, Petar Zecevic wrote: > Hi, > Jenkins build for my PR

[SPARK-24579] SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2018-06-18 Thread Xiangrui Meng
Hi all, I posted a new SPIP on optimized data exchange between Spark and DL/AI frameworks at SPARK-24579 . It took inputs from offline conversations with several Spark committers and contributors at Spark+AI summit conference. Please take a look

Spark FAIR Scheduler vs FIFO Scheduler

2018-06-18 Thread Alessandro Liparoti
Good morning, I have a conceptual question. In an application I am working on, when I write to HDFS some results (*action 1*), I use ~30 executors out of 200. I would like to improve resource utilization in this case. I am aware that repartitioning the df to 200 before action 1 would produce 200

Jenkins build errors

2018-06-18 Thread Petar Zecevic
Hi, Jenkins build for my PR (https://github.com/apache/spark/pull/21109 ; https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92023/testReport/org.apache.spark.sql.hive/HiveExternalCatalogVersionsSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/) keeps failing. First it

Re: Unsubscribe

2018-06-18 Thread Hadrien Chicault
Le ven. 15 juin 2018 à 23:17, Mikhail Dubkov a écrit : > Unsubscribe > > On Thu, Jun 14, 2018 at 8:38 PM Kumar S, Sajive > wrote: > >> Unsubscribe >> >