-1

Found a problem on reading partitioned table. Right now, we may create a
SQL project/filter operator for every partition. When we have thousands of
partitions, there will be a huge number of SQLMetrics (accumulators), which
causes high memory pressure to the driver and then takes down the cluster
(long GC time causes different kinds of timeouts).

https://issues.apache.org/jira/browse/SPARK-10339

Will have a fix soon.

On Fri, Aug 28, 2015 at 3:18 PM, Jon Bender <jonathan.ben...@gmail.com>
wrote:

> Marcelo,
>
> Thanks for replying -- after looking at my test again, I misinterpreted
> another issue I'm seeing which is unrelated (note I'm not using a pre-built
> binary, rather had to build my own with Yarn/Hive support, as I want to use
> it on an older cluster (CDH5.1.0)).
>
> I can start up a pyspark app on YARN, so I don't want to block this.  +1
>
> Best,
> Jonathan
>
> On Fri, Aug 28, 2015 at 2:34 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> Hi Jonathan,
>>
>> Can you be more specific about what problem you're running into?
>>
>> SPARK-6869 fixed the issue of pyspark vs. assembly jar by shipping the
>> pyspark archives separately to YARN. With that fix in place, pyspark
>> doesn't need to get anything from the Spark assembly, so it has no
>> problems running on YARN. I just downloaded
>> spark-1.5.0-bin-hadoop2.6.tgz and tried that out, and pyspark works
>> fine on YARN for me.
>>
>>
>> On Fri, Aug 28, 2015 at 2:22 PM, Jonathan Bender
>> <jonathan.ben...@gmail.com> wrote:
>> > -1 for regression on PySpark + YARN support
>> >
>> > It seems like this JIRA
>> https://issues.apache.org/jira/browse/SPARK-7733
>> > added a requirement for Java 7 in the build process.  Due to some quirks
>> > with the Java archive format changes between Java 6 and 7, using PySpark
>> > with a YARN uberjar seems to break when compiled with anything after
>> Java 6
>> > (see https://issues.apache.org/jira/browse/SPARK-1920 for reference).
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC2-tp13826p13890.html
>> > Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>
>

Reply via email to