Re: IDF model error

2014-11-26 Thread Shivani Rao
doc3s = new IndexedRow(3L, new SSV(22, Array(10, 14, 20, 21),Array(2.0, 0.0, 2.0, 1.0))) val doc4s = new IndexedRow(4L, new SSV(22, Array(3, 7, 13, 20),Array(2.0, 0.0, 2.0, 1.0))) 2014-11-26 10:09 GMT+08:00 Shivani Rao raoshiv...@gmail.com: Hello Spark fans, I am trying to use the IDF model

IDF model error

2014-11-25 Thread Shivani Rao
Hello Spark fans, I am trying to use the IDF model available in the spark mllib to create an tf-idf representation of a n RDD[Vectors]. Below i have attached my MWE I get the following error java.lang.IndexOutOfBoundsException: 7 not in [-4,4) at

Jobs get stuck at reduceByKey stage with spark 1.0.1

2014-08-12 Thread Shivani Rao
Hello spark aficionados, We upgraded from spark 1.0.0 to 1.0.1 when the new release came out and started noticing some weird errors. Even a simple operation like reduceByKey or count on an RDD gets stuck in cluster mode. This issue does not occur with spark 1.0.0 (in cluster or local mode) or

Spark, Logging Issues: slf4j or log4j

2014-07-02 Thread Shivani Rao
Hello Spark fans, I am unable to figure out how Spark figures out which logger to use. I know that Spark decides upon this at the time of initialization of the Spark Context. From Spark documentation it is clear that Spark uses log4j, and not slf4j, but I have been able to successfully get spark

Bug in Spark REPL

2014-06-23 Thread Shivani Rao
I have two jars with the following packages package a.b.c.d.z found in jar1 package a.b.e found in jar2 In scala REPL (no spark) both imports work just fine, but in the Spark REPL, I found that import a.b.c.d.z gives me the following error object c is not a member of package a.b Has

Re: Bug in Spark REPL

2014-06-23 Thread Shivani Rao
Actually I figured it out. There was a problem was that I was loading the sbt package-ed jar into the class path and not the sbt assembly-ed jar. Once I put the right jar in for package a.b.c.d.z everything worked thanks shivani On Mon, Jun 23, 2014 at 4:38 PM, Shivani Rao raoshiv...@gmail.com

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Shivani Rao
Hello Abhi, I did try that and it did not work And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So how did you overcome this problem? Shivani On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote: Le 20 juin 2014 01:46, Shivani Rao raoshiv

Re: How do you run your spark app?

2014-06-20 Thread Shivani Rao
Hello Michael, I have a quick question for you. Can you clarify the statement build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. Can you give an example. I am using sbt assembly as well to create a fat jar, and supplying the

Re: How do you run your spark app?

2014-06-20 Thread Shivani Rao
AP1z4IYraYm5fqWhITWArY53x Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014 Hope this helps. Thanks, Shrikar On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao raoshiv...@gmail.com wrote

Re: Worker dies while submitting a job

2014-06-20 Thread Shivani Rao
That error typically means that there is a communication error (wrong ports) between master and worker. Also check if the worker has write permissions to create the work directory. We were getting this error due one of the above two reasons On Tue, Jun 17, 2014 at 10:04 AM, Luis Ángel Vicente

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-19 Thread Shivani Rao
to disk sounds very lightweight. I On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com wrote: I am trying to process a file that contains 4 log lines (not very long) and then write my parsed out case classes to a destination folder, and I get the following error

Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-18 Thread Shivani Rao
I am trying to process a file that contains 4 log lines (not very long) and then write my parsed out case classes to a destination folder, and I get the following error: java.lang.OutOfMemoryError: Java heap space at

Re: Hanging Spark jobs

2014-06-12 Thread Shivani Rao
I learned this from my co-worker, but it is relevant here. Spark has lazy evaluation by default, which means that all of your code does not get executed until you run your saveAsTextFile, which does not tell you much about where the problem is occurring. In order to debug this better, you might

Re: Adding external jar to spark-shell classpath in spark 1.0

2014-06-12 Thread Shivani Rao
@Marcelo: The command ./bin/spark-shell --jars jar1,jar2,etc,etc did not work for me on a linux machine What I did is to append the class path in the bin/compute-classpath.sh file. Ran the script, then started the spark shell, and that worked Thanks Shivani On Wed, Jun 11, 2014 at 10:52 AM,

Re: using Log4j to log INFO level messages on workers

2014-06-04 Thread Shivani Rao
at 7:18 PM, Shivani Rao raoshiv...@gmail.com wrote: Hello Spark fans, I am trying to log messages from my spark application. When the main() function attempts to log, using log.info() it works great, but when I try the same command from the code that probably runs on the worker, I initially got

using Log4j to log INFO level messages on workers

2014-06-02 Thread Shivani Rao
Hello Spark fans, I am trying to log messages from my spark application. When the main() function attempts to log, using log.info() it works great, but when I try the same command from the code that probably runs on the worker, I initially got an serialization error. To solve that, I created a

Re: logging in pyspark

2014-05-22 Thread Shivani Rao
I am having trouble adding logging to the class that does serialization and deserialization. Where is the code for org.apache.spark.Logging located? and is this serializable? On Mon, May 12, 2014 at 10:02 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ah, yes, that is correct. You

Imports that need to be specified in a Spark application jar?

2014-05-20 Thread Shivani Rao
Hello All, I am learning that there are certain imports done by Spark REPL that is used to invoke and run code in a spark shell, that I would have to import specifically if I need the same functionality in a spark jar run by command line. I am getting into a repeated serialization error of an

Re: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-05-15 Thread Shivani Rao
This is something that I have bumped into time and again. the object that contains your main() should also be serializable then you won't have this issue. For example object Test extends serializable{ def main(){ // set up spark context // read your data // create your RDD's (grouped by key)

Re: Unable to load native-hadoop library problem

2014-05-14 Thread Shivani Rao
Hello Sophia You are only providing the Spark jar here (nevertheless, a spark jar that contains hadoop libraries in it, but that is not sufficient). Where is your hadoop installed? (Most probably: /usr/lib/hadoop/*) So you need to add that to your class path (by using -cp) I guess. Let me know

Re: Spark: issues with running a sbt fat jar due to akka dependencies

2014-05-02 Thread Shivani Rao
had the same situation for a while without issues. On May 1, 2014 8:46 PM, Shivani Rao raoshiv...@gmail.com wrote: Hello Koert, That did not work. I specified it in my email already. But I figured a way around it by excluding akka dependencies Shivani On Tue, Apr 29, 2014 at 12:37 PM

Re: is it possible to initiate Spark jobs from Oozie?

2014-05-02 Thread Shivani Rao
I have mucked around this a little bit. The first step to make this happen is to build a fat jar. I wrote a quick bloghttp://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.htmldocumenting my learning curve w.r.t that. The next step is to schedule this as a java action. Since

Running Spark jobs via oozie

2014-05-01 Thread Shivani Rao
Hello Spark Fans, I am trying to run a spark job via oozie as a java action. The spark code is packaged as a MySparkJob.jar compiled using sbt assembly (excluding spark and hadoop dependencies). I am able to invoke the spark job from any client using java -cp

Trouble getting hadoop and spark run along side on my vm

2014-03-17 Thread Shivani Rao
From what i understand getting Spark to run alongside a hadoop cluster requires the following a) a working hadoop b) a compiled Spark c) configuration parameters that point spark to the right hadoop conf files i ) Can you let me know the specific steps to take after spark was compiled (via sbt