Spark 0.9.1 - saveAsTextFile() exception: _temporary doesn't exist!

2014-06-09 Thread Oleg Proudnikov
Hi All, After a few simple transformations I am trying to save to a local file system. The code works in local mode but not on a standalone cluster. The directory *1.txt/_temporary* does exist after the exception. I would appreciate any suggestions. *scala d3.sample(false,0.01,1).map( pair

Re: Using Java functions in Spark

2014-06-07 Thread Oleg Proudnikov
Increasing number of partitions on data file solved the problem. On 6 June 2014 18:46, Oleg Proudnikov oleg.proudni...@gmail.com wrote: Additional observation - the map and mapValues are pipelined and executed - as expected - in pairs. This means that there is a simple sequence of steps

Re: Setting executor memory when using spark-shell

2014-06-06 Thread Oleg Proudnikov
the driver's JVM to be 8g, rather than just the executors. I think this is the reason for why SPARK_MEM was deprecated. See https://github.com/apache/spark/pull/99 On Thu, Jun 5, 2014 at 2:37 PM, Oleg Proudnikov oleg.proudni...@gmail.com wrote: Thank you, Andrew, I am using Spark 0.9.1

Re: Setting executor memory when using spark-shell

2014-06-06 Thread Oleg Proudnikov
Thank you, Hassan! On 6 June 2014 03:23, hassan hellfire...@gmail.com wrote: just use -Dspark.executor.memory= -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Setting-executor-memory-when-using-spark-shell-tp7082p7103.html Sent from the Apache Spark

Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Hi All, I am passing Java static methods into RDD transformations map and mapValues. The first map is from a simple string K into a (K,V) where V is a Java ArrayList of large text strings, 50K each, read from Cassandra. MapValues does processing of these text blocks into very small ArrayLists.

Re: Setting executor memory when using spark-shell

2014-06-06 Thread Oleg Proudnikov
. Thank you again, Oleg On 6 June 2014 18:05, Patrick Wendell pwend...@gmail.com wrote: In 1.0+ you can just pass the --executor-memory flag to ./bin/spark-shell. On Fri, Jun 6, 2014 at 12:32 AM, Oleg Proudnikov oleg.proudni...@gmail.com wrote: Thank you, Hassan! On 6 June 2014 03:23

Re: Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
2014 16:24, Oleg Proudnikov oleg.proudni...@gmail.com wrote: Hi All, I am passing Java static methods into RDD transformations map and mapValues. The first map is from a simple string K into a (K,V) where V is a Java ArrayList of large text strings, 50K each, read from Cassandra. MapValues

Setting executor memory when using spark-shell

2014-06-05 Thread Oleg Proudnikov
Hi All, Please help me set Executor JVM memory size. I am using Spark shell and it appears that the executors are started with a predefined JVM heap of 512m as soon as Spark shell starts. How can I change this setting? I tried setting SPARK_EXECUTOR_MEMORY before launching Spark shell: export

Re: Setting executor memory when using spark-shell

2014-06-05 Thread Oleg Proudnikov
=$MEMORY_PER_EXECUTOR It doesn't seem particularly clean, but it works. Andrew On Thu, Jun 5, 2014 at 2:15 PM, Oleg Proudnikov oleg.proudni...@gmail.com wrote: Hi All, Please help me set Executor JVM memory size. I am using Spark shell and it appears that the executors are started with a predefined

Re: RDD with a Map

2014-06-04 Thread Oleg Proudnikov
Just a thought... Are you trying to use use the RDD as a Map? On 3 June 2014 23:14, Doris Xin doris.s@gmail.com wrote: Hey Amit, You might want to check out PairRDDFunctions http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions. For your use

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Oleg Proudnikov
It is possible if you use a cartesian product to produce all possible pairs for each IP address and 2 stages of map-reduce: - first by pairs of points to find the total of each pair and - second by IP address to find the pair for each IP address with the maximum count. Oleg On 4 June 2014

Reconnect to an application/RDD

2014-06-03 Thread Oleg Proudnikov
HI All, Is it possible to run a standalone app that would compute and persist/cache an RDD and then run other standalone apps that would gain access to that RDD? -- Thank you, Oleg

Re: sc.textFileGroupByPath(*/*.txt)

2014-06-01 Thread Oleg Proudnikov
Anwar, Will try this as it might do exactly what I need. I will follow your pattern but use sc.textFile() for each file. I am now thinking that I could start with an RDD of file paths and map it into (path, content) pairs, provided I could read a file on the server. Thank you, Oleg On 1 June