Re: Getting error running MLlib example with new cluster
Got it to work on the cluster by changing the master to yarn-cluster instead of local! I do have a couple follow up questions... This is the example I was trying to run:https://github.com/holdenk/learning-spark-examples/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala 1) The example still takes about 1 min 15 seconds to run (my cluster has 3 m3.large nodes). This seems really long for building a model based off data that is about 10 lines long. Is this normal? 2) Any guesses as to why it was able to run in the cluster, but not locally? Thanks for the help! On Mon, Apr 27, 2015 at 11:48 AM, Su She suhsheka...@gmail.com wrote: Hello Xiangrui, I am using this spark-submit command (as I do for all other jobs): /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit --class MLlib --master local[2] --jars $(echo /home/ec2-user/sparkApps/learning-spark/lib/*.jar | tr ' ' ',') /home/ec2-user/sparkApps/learning-spark/target/simple-project-1.1.jar Thank you for the help! Best, Su On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng men...@gmail.com wrote: How did you run the example app? Did you use spark-submit? -Xiangrui On Thu, Apr 23, 2015 at 2:27 PM, Su She suhsheka...@gmail.com wrote: Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala I am able to build the project successfully. When I run it, it returns: features in spam: 8 features in ham: 7 and then freezes. According to the UI, the description of the job is count at DataValidators.scala.38. This corresponds to this line in the code: val model = lrLearner.run(trainingData) I've tried just about everything I can think of...changed numFeatures from 1 - 10,000, set executor memory to 1g, set up a new cluster, at this point I think I might have missed dependencies as that has usually been the problem in other spark apps I have tried to run. This is my pom file, that I have used for other successful spark apps. Please let me know if you think I need any additional dependencies or there are incompatibility issues, or a pom.xml that is better to use. Thank you! Cluster information: Spark version: 1.2.0-SNAPSHOT (in my older cluster it is 1.2.0) java version 1.7.0_25 Scala version: 2.10.4 hadoop version: hadoop 2.5.0-cdh5.3.3 (older cluster was 5.3.0) project xmlns = http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://w3.org/2001/XMLSchema-instance; xsi:schemaLocation =http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; groupId edu.berkely/groupId artifactId simple-project /artifactId modelVersion 4.0.0/modelVersion name Simple Project /name packaging jar /packaging version 1.0 /version repositories repository idcloudera/id url http://repository.cloudera.com/artifactory/cloudera-repos//url /repository repository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /repository /repositories pluginRepositories pluginRepository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /pluginRepository /pluginRepositories build plugins plugin groupIdorg.scala-tools/groupId artifactIdmaven-scala-plugin/artifactId executions execution idcompile/id goals goalcompile/goal /goals phasecompile/phase /execution execution idtest-compile/id goals goaltestCompile/goal /goals phasetest-compile/phase /execution execution phaseprocess-resources/phase goals goalcompile/goal /goals /execution /executions /plugin plugin
Re: Getting error running MLlib example with new cluster
That is mostly the YARN overhead. You're starting up a container for the AM and executors, at least. That still sounds pretty slow, but the defaults aren't tuned for fast startup. On May 11, 2015 7:00 PM, Su She suhsheka...@gmail.com wrote: Got it to work on the cluster by changing the master to yarn-cluster instead of local! I do have a couple follow up questions... This is the example I was trying to run: https://github.com/holdenk/learning-spark-examples/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala 1) The example still takes about 1 min 15 seconds to run (my cluster has 3 m3.large nodes). This seems really long for building a model based off data that is about 10 lines long. Is this normal? 2) Any guesses as to why it was able to run in the cluster, but not locally? Thanks for the help! On Mon, Apr 27, 2015 at 11:48 AM, Su She suhsheka...@gmail.com wrote: Hello Xiangrui, I am using this spark-submit command (as I do for all other jobs): /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit --class MLlib --master local[2] --jars $(echo /home/ec2-user/sparkApps/learning-spark/lib/*.jar | tr ' ' ',') /home/ec2-user/sparkApps/learning-spark/target/simple-project-1.1.jar Thank you for the help! Best, Su On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng men...@gmail.com wrote: How did you run the example app? Did you use spark-submit? -Xiangrui On Thu, Apr 23, 2015 at 2:27 PM, Su She suhsheka...@gmail.com wrote: Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala I am able to build the project successfully. When I run it, it returns: features in spam: 8 features in ham: 7 and then freezes. According to the UI, the description of the job is count at DataValidators.scala.38. This corresponds to this line in the code: val model = lrLearner.run(trainingData) I've tried just about everything I can think of...changed numFeatures from 1 - 10,000, set executor memory to 1g, set up a new cluster, at this point I think I might have missed dependencies as that has usually been the problem in other spark apps I have tried to run. This is my pom file, that I have used for other successful spark apps. Please let me know if you think I need any additional dependencies or there are incompatibility issues, or a pom.xml that is better to use. Thank you! Cluster information: Spark version: 1.2.0-SNAPSHOT (in my older cluster it is 1.2.0) java version 1.7.0_25 Scala version: 2.10.4 hadoop version: hadoop 2.5.0-cdh5.3.3 (older cluster was 5.3.0) project xmlns = http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://w3.org/2001/XMLSchema-instance; xsi:schemaLocation =http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; groupId edu.berkely/groupId artifactId simple-project /artifactId modelVersion 4.0.0/modelVersion name Simple Project /name packaging jar /packaging version 1.0 /version repositories repository idcloudera/id url http://repository.cloudera.com/artifactory/cloudera-repos//url /repository repository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /repository /repositories pluginRepositories pluginRepository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /pluginRepository /pluginRepositories build plugins plugin groupIdorg.scala-tools/groupId artifactIdmaven-scala-plugin/artifactId executions execution idcompile/id goals goalcompile/goal /goals phasecompile/phase /execution execution idtest-compile/id goals goaltestCompile/goal /goals phasetest-compile/phase
Re: Getting error running MLlib example with new cluster
How did you run the example app? Did you use spark-submit? -Xiangrui On Thu, Apr 23, 2015 at 2:27 PM, Su She suhsheka...@gmail.com wrote: Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala I am able to build the project successfully. When I run it, it returns: features in spam: 8 features in ham: 7 and then freezes. According to the UI, the description of the job is count at DataValidators.scala.38. This corresponds to this line in the code: val model = lrLearner.run(trainingData) I've tried just about everything I can think of...changed numFeatures from 1 - 10,000, set executor memory to 1g, set up a new cluster, at this point I think I might have missed dependencies as that has usually been the problem in other spark apps I have tried to run. This is my pom file, that I have used for other successful spark apps. Please let me know if you think I need any additional dependencies or there are incompatibility issues, or a pom.xml that is better to use. Thank you! Cluster information: Spark version: 1.2.0-SNAPSHOT (in my older cluster it is 1.2.0) java version 1.7.0_25 Scala version: 2.10.4 hadoop version: hadoop 2.5.0-cdh5.3.3 (older cluster was 5.3.0) project xmlns = http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://w3.org/2001/XMLSchema-instance; xsi:schemaLocation =http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; groupId edu.berkely/groupId artifactId simple-project /artifactId modelVersion 4.0.0/modelVersion name Simple Project /name packaging jar /packaging version 1.0 /version repositories repository idcloudera/id url http://repository.cloudera.com/artifactory/cloudera-repos//url /repository repository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /repository /repositories pluginRepositories pluginRepository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /pluginRepository /pluginRepositories build plugins plugin groupIdorg.scala-tools/groupId artifactIdmaven-scala-plugin/artifactId executions execution idcompile/id goals goalcompile/goal /goals phasecompile/phase /execution execution idtest-compile/id goals goaltestCompile/goal /goals phasetest-compile/phase /execution execution phaseprocess-resources/phase goals goalcompile/goal /goals /execution /executions /plugin plugin artifactIdmaven-compiler-plugin/artifactId configuration source1.7/source target1.7/target /configuration /plugin /plugins /build dependencies dependency !--Spark dependency -- groupId org.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.2.0-cdh5.3.0/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.5.0-mr1-cdh5.3.0/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-library/artifactId version2.10.4/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-compiler/artifactId version2.10.4/version /dependency dependency groupIdcom.101tec/groupId artifactIdzkclient/artifactId version0.3/version /dependency
Re: Getting error running MLlib example with new cluster
Hello Xiangrui, I am using this spark-submit command (as I do for all other jobs): /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit --class MLlib --master local[2] --jars $(echo /home/ec2-user/sparkApps/learning-spark/lib/*.jar | tr ' ' ',') /home/ec2-user/sparkApps/learning-spark/target/simple-project-1.1.jar Thank you for the help! Best, Su On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng men...@gmail.com wrote: How did you run the example app? Did you use spark-submit? -Xiangrui On Thu, Apr 23, 2015 at 2:27 PM, Su She suhsheka...@gmail.com wrote: Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala I am able to build the project successfully. When I run it, it returns: features in spam: 8 features in ham: 7 and then freezes. According to the UI, the description of the job is count at DataValidators.scala.38. This corresponds to this line in the code: val model = lrLearner.run(trainingData) I've tried just about everything I can think of...changed numFeatures from 1 - 10,000, set executor memory to 1g, set up a new cluster, at this point I think I might have missed dependencies as that has usually been the problem in other spark apps I have tried to run. This is my pom file, that I have used for other successful spark apps. Please let me know if you think I need any additional dependencies or there are incompatibility issues, or a pom.xml that is better to use. Thank you! Cluster information: Spark version: 1.2.0-SNAPSHOT (in my older cluster it is 1.2.0) java version 1.7.0_25 Scala version: 2.10.4 hadoop version: hadoop 2.5.0-cdh5.3.3 (older cluster was 5.3.0) project xmlns = http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://w3.org/2001/XMLSchema-instance; xsi:schemaLocation =http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; groupId edu.berkely/groupId artifactId simple-project /artifactId modelVersion 4.0.0/modelVersion name Simple Project /name packaging jar /packaging version 1.0 /version repositories repository idcloudera/id url http://repository.cloudera.com/artifactory/cloudera-repos//url /repository repository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /repository /repositories pluginRepositories pluginRepository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /pluginRepository /pluginRepositories build plugins plugin groupIdorg.scala-tools/groupId artifactIdmaven-scala-plugin/artifactId executions execution idcompile/id goals goalcompile/goal /goals phasecompile/phase /execution execution idtest-compile/id goals goaltestCompile/goal /goals phasetest-compile/phase /execution execution phaseprocess-resources/phase goals goalcompile/goal /goals /execution /executions /plugin plugin artifactIdmaven-compiler-plugin/artifactId configuration source1.7/source target1.7/target /configuration /plugin /plugins /build dependencies dependency !--Spark dependency -- groupId org.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.2.0-cdh5.3.0/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.5.0-mr1-cdh5.3.0/version /dependency
Getting error running MLlib example with new cluster
I had asked this question before, but wanted to ask again as I think it is related to my pom file or project setup. I have been trying on/off for the past month to try to run this MLlib example: - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Getting error running MLlib example with new cluster
Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala I am able to build the project successfully. When I run it, it returns: features in spam: 8 features in ham: 7 and then freezes. According to the UI, the description of the job is count at DataValidators.scala.38. This corresponds to this line in the code: val model = lrLearner.run(trainingData) I've tried just about everything I can think of...changed numFeatures from 1 - 10,000, set executor memory to 1g, set up a new cluster, at this point I think I might have missed dependencies as that has usually been the problem in other spark apps I have tried to run. This is my pom file, that I have used for other successful spark apps. Please let me know if you think I need any additional dependencies or there are incompatibility issues, or a pom.xml that is better to use. Thank you! Cluster information: Spark version: 1.2.0-SNAPSHOT (in my older cluster it is 1.2.0) java version 1.7.0_25 Scala version: 2.10.4 hadoop version: hadoop 2.5.0-cdh5.3.3 (older cluster was 5.3.0) project xmlns = http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://w3.org/2001/XMLSchema-instance; xsi:schemaLocation =http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd; groupId edu.berkely/groupId artifactId simple-project /artifactId modelVersion 4.0.0/modelVersion name Simple Project /name packaging jar /packaging version 1.0 /version repositories repository idcloudera/id url http://repository.cloudera.com/artifactory/cloudera-repos//url /repository repository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /repository /repositories pluginRepositories pluginRepository idscala-tools.org/id nameScala-tools Maven2 Repository/name urlhttp://scala-tools.org/repo-releases/url /pluginRepository /pluginRepositories build plugins plugin groupIdorg.scala-tools/groupId artifactIdmaven-scala-plugin/artifactId executions execution idcompile/id goals goalcompile/goal /goals phasecompile/phase /execution execution idtest-compile/id goals goaltestCompile/goal /goals phasetest-compile/phase /execution execution phaseprocess-resources/phase goals goalcompile/goal /goals /execution /executions /plugin plugin artifactIdmaven-compiler-plugin/artifactId configuration source1.7/source target1.7/target /configuration /plugin /plugins /build dependencies dependency !--Spark dependency -- groupId org.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.2.0-cdh5.3.0/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.5.0-mr1-cdh5.3.0/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-library/artifactId version2.10.4/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-compiler/artifactId version2.10.4/version /dependency dependency groupIdcom.101tec/groupId artifactIdzkclient/artifactId version0.3/version /dependency dependency groupIdcom.yammer.metrics/groupId artifactIdmetrics-core/artifactId version2.2.0/version /dependency dependency groupIdorg.apache.hadoop/groupId