-1 Found a problem on reading partitioned table. Right now, we may create a SQL project/filter operator for every partition. When we have thousands of partitions, there will be a huge number of SQLMetrics (accumulators), which causes high memory pressure to the driver and then takes down the cluster (long GC time causes different kinds of timeouts).
https://issues.apache.org/jira/browse/SPARK-10339 Will have a fix soon. On Fri, Aug 28, 2015 at 3:18 PM, Jon Bender <jonathan.ben...@gmail.com> wrote: > Marcelo, > > Thanks for replying -- after looking at my test again, I misinterpreted > another issue I'm seeing which is unrelated (note I'm not using a pre-built > binary, rather had to build my own with Yarn/Hive support, as I want to use > it on an older cluster (CDH5.1.0)). > > I can start up a pyspark app on YARN, so I don't want to block this. +1 > > Best, > Jonathan > > On Fri, Aug 28, 2015 at 2:34 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> Hi Jonathan, >> >> Can you be more specific about what problem you're running into? >> >> SPARK-6869 fixed the issue of pyspark vs. assembly jar by shipping the >> pyspark archives separately to YARN. With that fix in place, pyspark >> doesn't need to get anything from the Spark assembly, so it has no >> problems running on YARN. I just downloaded >> spark-1.5.0-bin-hadoop2.6.tgz and tried that out, and pyspark works >> fine on YARN for me. >> >> >> On Fri, Aug 28, 2015 at 2:22 PM, Jonathan Bender >> <jonathan.ben...@gmail.com> wrote: >> > -1 for regression on PySpark + YARN support >> > >> > It seems like this JIRA >> https://issues.apache.org/jira/browse/SPARK-7733 >> > added a requirement for Java 7 in the build process. Due to some quirks >> > with the Java archive format changes between Java 6 and 7, using PySpark >> > with a YARN uberjar seems to break when compiled with anything after >> Java 6 >> > (see https://issues.apache.org/jira/browse/SPARK-1920 for reference). >> > >> > >> > >> > -- >> > View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC2-tp13826p13890.html >> > Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> > For additional commands, e-mail: dev-h...@spark.apache.org >> > >> >> >> >> -- >> Marcelo >> > >