Re: convert List to RDD

2014-06-13 Thread SK
Thanks. But that did not work. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/convert-List-to-RDD-tp7606p7609.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

printing in unit test

2014-06-13 Thread SK
Hi, My unit test is failing (the output is not matching the expected output). I would like to printout the value of the output. But rdd.foreach(r=println(r)) does not work from the unit test. How can I print or write out the output to a file/screen? thanks. -- View this message in context:

overwriting output directory

2014-06-12 Thread SK
Hi, When we have multiple runs of a program writing to the same output file, the execution fails if the output directory already exists from a previous run. Is there some way we can have it overwrite the existing directory, so that we dont have to manually delete it after each run? Thanks for

Re: specifying fields for join()

2014-06-12 Thread SK
This issue is resolved. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7544.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

groupBy question

2014-06-10 Thread SK
After doing a groupBy operation, I have the following result: val res = (ID1,ArrayBuffer((145804601,ID1,japan))) (ID3,ArrayBuffer((145865080,ID3,canada), (145899640,ID3,china))) (ID2,ArrayBuffer((145752760,ID2,usa), (145934200,ID2,usa))) Now I need to output for each group,

output tuples in CSV format

2014-06-10 Thread SK
My output is a set of tuples and when I output it using saveAsTextFile, my file looks as follows: (field1_tup1, field2_tup1, field3_tup1,...) (field1_tup2, field2_tup2, field3_tup2,...) In Spark. is there some way I can simply have it output in CSV format as follows (i.e. without the

Errors when building Spark with sbt

2014-06-09 Thread SK
I tried to use sbt/sbt assembly to build spark-1.0.0. I get the a lot of Server access error: Connection refused errors when it tries to download from repo.eclipse.org and repository,jboss.org. I tried to navigate to these links manually and some of these links are obsolete (Error 404).

Re: Task not serializable: collect, take

2014-05-02 Thread SK
Thank you very much. Making the trait serializable worked. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-collect-take-tp5193p5236.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

spark 0.9.1: ClassNotFoundException

2014-05-02 Thread SK
I am using Spark 0.9.1 in standalone mode. In the SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my directory called mycode in which I have placed some standalone scala code. I was able to compile. I ran the code using: ./bin/run-example org.apache.spark.mycode.MyClass

Task not serializable: collect, take

2014-05-01 Thread SK
Hi, I have the following code structure. I compiles ok, but at runtime it aborts with the error: Exception in thread main org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: I am running in local (standalone) mode. trait A{ def input(...):

How to declare Tuple return type for a function

2014-04-29 Thread SK
Hi, I am a new user of Spark. I have a class that defines a function as follows. It returns a tuple : (Int, Int, Int). class Sim extends VectorSim { override def input(master:String): (Int,Int,Int) = { sc = new SparkContext(master, Test) val ratings =

packaging time

2014-04-29 Thread SK
Each time I run sbt/sbt assembly to compile my program, the packaging time takes about 370 sec (about 6 min). How can I reduce this time? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/packaging-time-tp5048.html Sent from the Apache Spark User

How to declare Tuple return type for a function

2014-04-28 Thread SK
Hi, I am a new user of Spark. I have a class that defines a function as follows. It returns a tuple : (Int, Int, Int). class Sim extends VectorSim { override def input(master:String): (Int,Int,Int) = { sc = new SparkContext(master, Test) val ratings =

<    1   2