Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-06 Thread Alonso Isidoro Roman
Hi, just to update the thread, i have just submited a simple wordcount job using yarn using this command: [cloudera@quickstart simple-word-count]$ spark-submit --class com.example.Hello --master yarn --deploy-mode cluster --driver-memory 1024Mb --executor-memory 1G --executor-cores 1

Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-06 Thread Mich Talebzadeh
have you tried master local that should work. This works as a test ${SPARK_HOME}/bin/spark-submit \ --driver-memory 2G \ --num-executors 1 \ --executor-memory 2G \ --master local[2] \ --executor-cores 2 \

Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-06 Thread Alonso Isidoro Roman
Hi guys, i finally understand that i cannot use sbt-pack to use programmatically the spark-streaming job as unix commands, i have to use yarn or mesos in order to run the jobs. I have some doubts, if i run the spark streaming jogs as yarn client mode, i am receiving this exception:

Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-04 Thread Mich Talebzadeh
Hi, Spark works in local, standalone and yarn-client mode. Start as master = local. That is the simplest model.You DO not need to start $SPAK_HOME/sbin/start-master.sh and $SPAK_HOME/sbin/start-slaves.sh Also you do not need to specify all that in spark-submit. In the Scala code you can do val

Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-04 Thread Alonso Isidoro Roman
Hi David, but removing setMaster line provokes this error: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:402) at example.spark.AmazonKafkaConnector$.main(AmazonKafkaConnectorWithMongo.scala:93) at

RE: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-03 Thread David Newberger
Alonso, I could totally be misunderstanding something or missing a piece of the puzzle however remove .setMaster. If you do that it will run with whatever the CDH VM is setup for which in the out of the box default case is YARN and Client. val sparkConf = new SparkConf().setAppName(“Some App

Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-03 Thread Alonso Isidoro Roman
Thank you David, so, i would have to change the way that i am creating SparkConf object, isn't? I can see in this link that the way to run a spark job using YARN is using this

RE: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-03 Thread David Newberger
Alonso, The CDH VM uses YARN and the default deploy mode is client. I’ve been able to use the CDH VM for many learning scenarios. http://www.cloudera.com/documentation/enterprise/latest.html http://www.cloudera.com/documentation/enterprise/latest/topics/spark.html David Newberger From:

About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-03 Thread Alonso
Hi, i am developing a project that needs to use kafka, spark-streaming and spark-mllib, this is the github project . I am using a vmware cdh-5.7-0 image, with 4 cores and 8 GB of ram, the file that i want to use is only 16