Spark-submit and multiple files

2015-03-17 Thread poiuytrez
Hello guys, I am having a hard time to understand how spark-submit behave with multiple files. I have created two code snippets. Each code snippet is composed of a main.py and work.py. The code works if I paste work.py then main.py in a pyspark shell. However both snippets do not work when using

Error when using multiple python files spark-submit

2015-03-16 Thread poiuytrez
I have a spark app which is composed of multiple files. When I launch Spark using: ../hadoop/spark-install/bin/spark-submit main.py --py-files /home/poiuytrez/naive.py,/home/poiuytrez/processing.py,/home/poiuytrez/settings.py --master spark://spark-m:7077 I am getting an error: 15

Spark 1.3 dataframe documentation

2015-02-24 Thread poiuytrez
Hello, I have built Spark 1.3. I can successfully use the dataframe api. However, I am not able to find its api documentation in Python. Do you know when the documentation will be available? Best Regards, poiuytrez -- View this message in context: http://apache-spark-user-list.1001560.n3

Movie Recommendation tutorial

2015-02-23 Thread poiuytrez
Hello, I am following the Movies recommendation with MLlib tutorial (https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html). However, I get RMSE that are much larger than what's written at step 7: The best model was trained with rank = 8 and lambda = 1.0, and numIter =

Re: Movie Recommendation tutorial

2015-02-23 Thread poiuytrez
What do you mean? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Implicit ALS with multiple features

2015-02-19 Thread poiuytrez
Hello, I would like to use the spark MLlib recommendation filtering library. My goal will be to predict what a user would like to buy based on what he bought before. I read on the spark documentation that Spark supports implicit feedback. However there is not example for this application.

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread poiuytrez
Very interesting. It works. When I set SPARK_DRIVER_MEMORY=83971m in spark-env.sh or spark-default.conf it works. However, when I set the --driver-memory option with spark submit, the memory is not allocated to the spark master. (the web ui shows the correct value of spark.driver.memory

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
SPARK_LOG_DIR=/hadoop/spark/logs export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/hadoop-install/lib/gcs-connector-1.3.2-hadoop1.jar export MASTER=spark://spark-m:7077 poiuytrez@spark-m:~$ cat ../hadoop/spark-install/conf/spark-defaults.conf spark.master spark://spark-m:7077 spark.eventLog.enabled

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
=/hadoop/spark/logs export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/hadoop-install/lib/gcs-connector-1.3.2-hadoop1.jar export MASTER=spark://spark-m:7077 poiuytrez@spark-m:~$ cat ../hadoop/spark-install/conf/spark-defaults.conf spark.master spark://spark-m:7077 spark.eventLog.enabled true

OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
Hello guys, I am trying to run a Ramdom Forest on 30MB of data. I have a cluster of 4 machines. Each machine has 106 MB of RAM and 16 cores. I am getting: 15/02/11 11:01:23 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-3] shutting down

Re: MLlib linking error Mac OS X

2014-10-20 Thread poiuytrez
This is my error: 14/10/17 10:24:56 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 14/10/17 10:24:56 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS However, it seems to work. What does it means? -- View this

MLlib and pyspark features

2014-10-17 Thread poiuytrez
in the official jira but I could not find any ticket number corresponding to this. Best, poiuytrez -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-and-pyspark-features-tp16667.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: MLlib linking error Mac OS X

2014-10-17 Thread poiuytrez
Hello MLnick, Have you found a solution on how to install MLlib for Mac OS ? I have also some trouble to install the dependencies. Best, poiuytrez -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-linking-error-Mac-OS-X-tp588p16668.html Sent from

Re: Spark SQL - Exception only when using cacheTable

2014-10-13 Thread poiuytrez
) t.count() Thank you, poiuytrez -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16262.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread poiuytrez
I am using the python api. Unfortunately, I cannot find the isCached method equivalent in the documentation: https://spark.apache.org/docs/1.1.0/api/python/index.html in the SQLContext section. -- View this message in context:

Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread poiuytrez
Hi Cheng, I am using Spark 1.1.0. This is the stack trace: 14/10/10 12:17:40 WARN TaskSetManager: Lost task 120.0 in stage 7.0 (TID 2235, spark-w-0.c.db.internal): java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

Bug a spark task

2014-10-09 Thread poiuytrez
Hi, I am parsing a csv file using Spark using the map function. One of the line of the csv file make a task fail (then the whole job fail). Is there a way to do some debugging to find the line which does fail ? Best regards, poiuytrez -- View this message in context: http://apache-spark

Spark SQL - Exception only when using cacheTable

2014-10-09 Thread poiuytrez
Hello, I have a weird issue, this request works fine: sqlContext.sql(SELECT customer_id FROM transactions WHERE purchaseamount = 200).count() However, when I cache the table before making the request: sqlContext.cacheTable(transactions) sqlContext.sql(SELECT customer_id FROM transactions WHERE

Re: Bug a spark task

2014-10-09 Thread poiuytrez
Thanks for the tip ! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Debug-a-spark-task-tp16029p16035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To