Re: Configuring Spark Memory

2014-07-25 Thread John Omernik
SO this is good information for standalone, but how is memory distributed within Mesos? There's coarse grain mode where the execute stays active, or theres fine grained mode where it appears each task is it's only process in mesos, how to memory allocations work in these cases? Thanks! On Thu,

Re: Suggestion for SPARK-1825

2014-07-25 Thread Colin McCabe
I have a similar issue with SPARK-1767. There are basically three ways to resolve the issue: 1. Use reflection to access classes newer than 0.21 (or whatever the oldest version of Hadoop is that Spark supports) 2. Add a build variant (in Maven this would be a profile) that deals with this. 3.

Kryo Issue on Spark 1.0.1, Mesos 0.18.2

2014-07-25 Thread Gary Malouf
After upgrading to Spark 1.0.1 from 0.9.1 everything seemed to be going well. Looking at the Mesos slave logs, I noticed: ERROR KryoSerializer: Failed to run spark.kryo.registrator java.lang.ClassNotFoundException: com/mediacrossing/verrazano/kryo/MxDataRegistrator My spark-env.sh has the

Re: GraphX graph partitioning strategy

2014-07-25 Thread Ankur Dave
Hi Larry, GraphX's graph constructor leaves the edges in their original partitions by default. To support arbitrary multipass graph partitioning, one idea is to take advantage of that by partitioning the graph externally to GraphX (though possibly using information from GraphX such as the

Re: GraphX graph partitioning strategy

2014-07-25 Thread Ankur Dave
Oops, the code should be: val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128 def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ... // Get the triplets using GraphX, then use Spark to repartition themval partitionedEdges = unpartitionedGraph.triplets .map(e

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Michael Armbrust
That query is looking at Fix Version not Target Version. The fact that the first one is still open is only because the bug is not resolved in master. It is fixed in 1.0.2. The second one is partially fixed in 1.0.2, but is not worth blocking the release for. On Fri, Jul 25, 2014 at 4:23 PM,

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Patrick Wendell
The most important issue in this release is actually an ammendment to an earlier fix. The original fix caused a deadlock which was a regression from 1.0.0-1.0.1: Issue: https://issues.apache.org/jira/browse/SPARK-1097 1.0.1 Fix: https://github.com/apache/spark/pull/1273/files (had a deadlock)

Re: Suggestion for SPARK-1825

2014-07-25 Thread Patrick Wendell
Yeah I agree reflection is the best solution. Whenever we do reflection we should clearly document in the code which YARN API version corresponds to which code path. I'm guessing since YARN is adding new features... we'll just have to do this over time. - Patrick On Fri, Jul 25, 2014 at 3:35 PM,

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Ted Yu
HADOOP-10456 is fixed in hadoop 2.4.1 Does this mean that synchronization on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for hadoop 2.4.1 ? Cheers On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell pwend...@gmail.com wrote: The most important issue in this release is actually an