Memory config issues

2015-01-18 Thread Alessandro Baretta
All, I'm getting out of memory exceptions in SparkSQL GROUP BY queries. I have plenty of RAM, so I should be able to brute-force my way through, but I can't quite figure out what memory option affects what process. My current memory configuration is the following: export

GraphX doc: triangleCount() requirement overstatement?

2015-01-18 Thread Michael Malak
According to: https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting Note that TriangleCount requires the edges to be in canonical orientation (srcId dstId) But isn't this overstating the requirement? Isn't the requirement really that IF there are duplicate

Re: Memory config issues

2015-01-18 Thread Alessandro Baretta
Akhil, Ah, very good point. I guess SET spark.sql.shuffle.partitions=1024 should do it. Alex On Sun, Jan 18, 2015 at 10:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Its the executor memory (spark.executor.memory) which you can set while creating the spark context. By default it uses

Re: RDD order guarantees

2015-01-18 Thread Reynold Xin
Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem implementation that overrides the listStatus method, and then in Hadoop Conf set the fs.file.impl to that. Shouldn't be

Re: GraphX doc: triangleCount() requirement overstatement?

2015-01-18 Thread Reynold Xin
We will merge https://issues.apache.org/jira/browse/SPARK-3650 for 1.3. Thanks for reminding! On Sun, Jan 18, 2015 at 8:34 PM, Michael Malak michaelma...@yahoo.com.invalid wrote: According to: https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting Note that

Re: Memory config issues

2015-01-18 Thread Akhil Das
Its the executor memory (spark.executor.memory) which you can set while creating the spark context. By default it uses 0.6% of the executor memory for Storage. Now, to show some memory usage, you need to cache (persist) the RDD. Regarding the OOM Exception, you can increase the level of

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-18 Thread Ted Yu
Please tale a look at SPARK-4048 and SPARK-5108 Cheers On Sat, Jan 17, 2015 at 10:26 PM, Gil Vernik g...@il.ibm.com wrote: Hi, I took a source code of Spark 1.2.0 and tried to build it together with hadoop-openstack.jar ( To allow Spark an access to OpenStack Swift ) I used Hadoop 2.6.0.

Re: Semantics of LGTM

2015-01-18 Thread Reynold Xin
Maybe just to avoid LGTM as a single token when it is not actually according to Patrick's definition, but anybody can still leave comments like: The direction of the PR looks good to me. or +1 on the direction The build part looks good to me ... On Sat, Jan 17, 2015 at 8:49 PM, Kay Ousterhout

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-18 Thread Sean Owen
Agree, I think this can / should be fixed with a slightly more conservative version of https://github.com/apache/spark/pull/3938 related to SPARK-5108. On Sun, Jan 18, 2015 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: Please tale a look at SPARK-4048 and SPARK-5108 Cheers On Sat, Jan 17,