Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
If you use Scala, you can do: val conf = new SparkConf() .setMaster(yarn-client) .setAppName(Logistic regression SGD fixed) .set(spark.akka.frameSize, 100) .setExecutorEnv(SPARK_JAVA_OPTS, -Dspark.akka.frameSize=100) var sc = new

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
That is exactly the same error that I got. I am still having no success. Regards, Krishna On Mon, Jul 14, 2014 at 11:50 AM, crater cq...@ucmerced.edu wrote: Hi Krishna, Thanks for your help. Are you able to get your 29M data running yet? I fix the previous problem by setting larger

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
) number of partitions, which should match the number of cores 2) driver memory (you can see it from the executor tab of the Spark WebUI and set it with --driver-memory 10g 3) the version of Spark you were running Best, Xiangrui On Mon, Jul 14, 2014 at 12:14 PM, Srikrishna S srikrishna...@gmail.com

Akka Client disconnected

2014-07-12 Thread Srikrishna S
I am run logistic regression with SGD on a problem with about 19M parameters (the kdda dataset from the libsvm library) I consistently see that the nodes on my computer get disconnected and soon the whole job goes to a grinding halt. 14/07/12 03:05:16 ERROR cluster.YarnClientClusterScheduler:

Re: Akka Client disconnected

2014-07-12 Thread Srikrishna S
I am using the master that I compiled 2 days ago. Can you point me to the JIRA? On Sat, Jul 12, 2014 at 9:13 AM, DB Tsai dbt...@dbtsai.com wrote: Are you using 1.0 or current master? A bug related to this is fixed in master. On Jul 12, 2014 8:50 AM, Srikrishna S srikrishna...@gmail.com wrote

Job getting killed

2014-07-11 Thread Srikrishna S
I am trying to run Logistic Regression on the url dataset (from libsvm) using the exact same code as the example on a 5 node Yarn-Cluster. I get a pretty cryptic error that says Killed Nothing more Settings: --master yarn-client --verbose --driver-memory 24G --executor-memory 24G

Re: Spark Installation

2014-07-08 Thread Srikrishna S
/ On Mon, Jul 7, 2014 at 8:07 PM, Srikrishna S srikrishna...@gmail.com wrote: Hi All, Does anyone know what the command line arguments to mvn are to generate the pre-built binary for spark on Hadoop 2-CHD5. I would like to pull in a recent bug fix in spark-master and rebuild the binaries

Spark Installation

2014-07-07 Thread Srikrishna S
Hi All, Does anyone know what the command line arguments to mvn are to generate the pre-built binary for spark on Hadoop 2-CHD5. I would like to pull in a recent bug fix in spark-master and rebuild the binaries in the exact same way that was used for that provided on the website. I have tried

Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
Hi All., I am new to Spark and I am trying to run LogisticRegression (with SGD) using MLLib on a beefy single machine with about 128GB RAM. The dataset has about 80M rows with only 4 features so it barely occupies 2Gb on disk. I am running the code using all 8 cores with 20G memory using

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
it somehow becomes larger, though it seems unlikely that it would exceed 20 GB) and 2) how many parallel tasks run in each iteration. Matei On Jun 4, 2014, at 6:56 PM, Srikrishna S srikrishna...@gmail.com wrote: I am using the MLLib one (LogisticRegressionWithSGD) with PySpark. I am