Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-09 Thread vs
The Hortonworks Tech Preview of Spark is for Spark on YARN. It does not
require Spark to be installed on all nodes manually. When you submit the
Spark assembly jar it will have all its dependencies. YARN will instantiate
Spark App Master  Containers based on this jar.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p9246.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-08 Thread Sean Owen
On Tue, Jul 8, 2014 at 2:01 AM, DB Tsai dbt...@dbtsai.com wrote:

 Actually, the one needed to install the jar to each individual node is
 standalone mode which works for both MR1 and MR2. Cloudera and
 Hortonworks currently support spark in this way as far as I know.


(CDH5 uses Spark on YARN.)


Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
guys, I'm not talking about running spark on VM, I don have problem with it.

I confused in the next:
1) Hortonworks describe installation process as RPMs on each node
2) spark home page said that everything I need is YARN

And I'm in stucj with understanding what I need to do to run spark on yarn
(do I need RPMs installations or only build spark on edge node?)


Thank you,
Konstantin Kudryavtsev


On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com wrote:

 I can say from my experience that getting Spark to work with Hadoop 2
 is not for the beginner; after solving one problem after another
 (dependencies, scripts, etc.), I went back to Hadoop 1.

 Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
 why, but, given so, Hadoop 2 has too many bumps

 On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
  That is confusing based on the context you provided.
 
  This might take more time than I can spare to try to understand.
 
  For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
 
  Cloudera's CDH 5 express VM includes Spark, but the service isn't
 running by
  default.
 
  I can't remember for MapR...
 
  Marco
 
  On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Marco,
 
  Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
  can try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
   HDP 2.1 means YARN, at the same time they propose ti install rpm
 
  On other hand, http://spark.apache.org/ said 
  Integrated with Hadoop
  Spark can run on Hadoop 2's YARN cluster manager, and can read any
  existing Hadoop data.
 
  If you have a Hadoop 2 cluster, you can run Spark without any
 installation
  needed. 
 
  And this is confusing for me... do I need rpm installation on not?...
 
 
  Thank you,
  Konstantin Kudryavtsev
 
 
  On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
  wrote:
  Can you provide links to the sections that are confusing?
 
  My understanding, the HDP1 binaries do not need YARN, while the HDP2
  binaries do.
 
  Now, you can also install Hortonworks Spark RPM...
 
  For production, in my opinion, RPMs are better for manageability.
 
  On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Hello, thanks for your message... I'm confused, Hortonworhs suggest
  install spark rpm on each node, but on Spark main page said that yarn
  enough and I don't need to install it... What the difference?
 
  sent from my HTC
 
  On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
  Konstantin,
 
  HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can
  try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
  Let me know if you see issues with the tech preview.
 
  spark PI example on HDP 2.0
 
  I downloaded spark 1.0 pre-build from
  http://spark.apache.org/downloads.html
  (for HDP2)
  The run example from spark web-site:
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi
  --master
  yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory
 2g
  --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
  I got error:
  Application application_1404470405736_0044 failed 3 times due to AM
  Container for appattempt_1404470405736_0044_03 exited with
  exitCode: 1
  due to: Exception from container-launch:
  org.apache.hadoop.util.Shell$ExitCodeException:
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
  at org.apache.hadoop.util.Shell.run(Shell.java:379)
  at
 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
  .Failing this attempt.. Failing the application.
 
  Unknown/unsupported param List(--executor-memory, 2048,
  --executor-cores, 1,
  --num-executors, 3)
  Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
  Options:
--jar JAR_PATH   Path to your application's JAR file (required)
--class CLASS_NAME   Name of your application's main class
  (required)
  ...bla-bla-bla
  
 
 
 
  --
  View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
  Sent from the Apache Spark User List mailing list archive at
  

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Krishna Sankar
Konstantin,

   1. You need to install the hadoop rpms on all nodes. If it is Hadoop 2,
   the nodes would have hdfs  YARN.
   2. Then you need to install Spark on all nodes. I haven't had experience
   with HDP, but the tech preview might have installed Spark as well.
   3. In the end, one should have hdfs,yarn  spark installed on all the
   nodes.
   4. After installations, check the web console to make sure hdfs, yarn 
   spark are running.
   5. Then you are ready to start experimenting/developing spark
   applications.

HTH.
Cheers
k/


On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com wrote:

 guys, I'm not talking about running spark on VM, I don have problem with
 it.

 I confused in the next:
 1) Hortonworks describe installation process as RPMs on each node
 2) spark home page said that everything I need is YARN

 And I'm in stucj with understanding what I need to do to run spark on yarn
 (do I need RPMs installations or only build spark on edge node?)


 Thank you,
 Konstantin Kudryavtsev


 On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com
 wrote:

 I can say from my experience that getting Spark to work with Hadoop 2
 is not for the beginner; after solving one problem after another
 (dependencies, scripts, etc.), I went back to Hadoop 1.

 Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
 why, but, given so, Hadoop 2 has too many bumps

 On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
  That is confusing based on the context you provided.
 
  This might take more time than I can spare to try to understand.
 
  For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
 
  Cloudera's CDH 5 express VM includes Spark, but the service isn't
 running by
  default.
 
  I can't remember for MapR...
 
  Marco
 
  On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Marco,
 
  Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that
 you
  can try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
   HDP 2.1 means YARN, at the same time they propose ti install rpm
 
  On other hand, http://spark.apache.org/ said 
  Integrated with Hadoop
  Spark can run on Hadoop 2's YARN cluster manager, and can read any
  existing Hadoop data.
 
  If you have a Hadoop 2 cluster, you can run Spark without any
 installation
  needed. 
 
  And this is confusing for me... do I need rpm installation on not?...
 
 
  Thank you,
  Konstantin Kudryavtsev
 
 
  On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
  wrote:
  Can you provide links to the sections that are confusing?
 
  My understanding, the HDP1 binaries do not need YARN, while the HDP2
  binaries do.
 
  Now, you can also install Hortonworks Spark RPM...
 
  For production, in my opinion, RPMs are better for manageability.
 
  On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Hello, thanks for your message... I'm confused, Hortonworhs suggest
  install spark rpm on each node, but on Spark main page said that yarn
  enough and I don't need to install it... What the difference?
 
  sent from my HTC
 
  On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
  Konstantin,
 
  HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
 can
  try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
  Let me know if you see issues with the tech preview.
 
  spark PI example on HDP 2.0
 
  I downloaded spark 1.0 pre-build from
  http://spark.apache.org/downloads.html
  (for HDP2)
  The run example from spark web-site:
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi
  --master
  yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory
 2g
  --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
  I got error:
  Application application_1404470405736_0044 failed 3 times due to AM
  Container for appattempt_1404470405736_0044_03 exited with
  exitCode: 1
  due to: Exception from container-launch:
  org.apache.hadoop.util.Shell$ExitCodeException:
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
  at org.apache.hadoop.util.Shell.run(Shell.java:379)
  at
 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at 

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
thank you Krishna!

 Could you please explain why do I need install spark on each node if Spark
official site said: If you have a Hadoop 2 cluster, you can run Spark
without any installation needed

I have HDP 2 (YARN) and that's why I hope I don't need to install spark on
each node

Thank you,
Konstantin Kudryavtsev


On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar ksanka...@gmail.com wrote:

 Konstantin,

1. You need to install the hadoop rpms on all nodes. If it is Hadoop
2, the nodes would have hdfs  YARN.
2. Then you need to install Spark on all nodes. I haven't had
experience with HDP, but the tech preview might have installed Spark as
well.
3. In the end, one should have hdfs,yarn  spark installed on all the
nodes.
4. After installations, check the web console to make sure hdfs, yarn
 spark are running.
5. Then you are ready to start experimenting/developing spark
applications.

 HTH.
 Cheers
 k/


 On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 guys, I'm not talking about running spark on VM, I don have problem with
 it.

 I confused in the next:
 1) Hortonworks describe installation process as RPMs on each node
 2) spark home page said that everything I need is YARN

 And I'm in stucj with understanding what I need to do to run spark on
 yarn (do I need RPMs installations or only build spark on edge node?)


 Thank you,
 Konstantin Kudryavtsev


 On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com
 wrote:

 I can say from my experience that getting Spark to work with Hadoop 2
 is not for the beginner; after solving one problem after another
 (dependencies, scripts, etc.), I went back to Hadoop 1.

 Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
 why, but, given so, Hadoop 2 has too many bumps

 On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
  That is confusing based on the context you provided.
 
  This might take more time than I can spare to try to understand.
 
  For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
 
  Cloudera's CDH 5 express VM includes Spark, but the service isn't
 running by
  default.
 
  I can't remember for MapR...
 
  Marco
 
  On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Marco,
 
  Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that
 you
  can try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
   HDP 2.1 means YARN, at the same time they propose ti install rpm
 
  On other hand, http://spark.apache.org/ said 
  Integrated with Hadoop
  Spark can run on Hadoop 2's YARN cluster manager, and can read any
  existing Hadoop data.
 
  If you have a Hadoop 2 cluster, you can run Spark without any
 installation
  needed. 
 
  And this is confusing for me... do I need rpm installation on not?...
 
 
  Thank you,
  Konstantin Kudryavtsev
 
 
  On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
  wrote:
  Can you provide links to the sections that are confusing?
 
  My understanding, the HDP1 binaries do not need YARN, while the HDP2
  binaries do.
 
  Now, you can also install Hortonworks Spark RPM...
 
  For production, in my opinion, RPMs are better for manageability.
 
  On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Hello, thanks for your message... I'm confused, Hortonworhs suggest
  install spark rpm on each node, but on Spark main page said that
 yarn
  enough and I don't need to install it... What the difference?
 
  sent from my HTC
 
  On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
  Konstantin,
 
  HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
 can
  try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
  Let me know if you see issues with the tech preview.
 
  spark PI example on HDP 2.0
 
  I downloaded spark 1.0 pre-build from
  http://spark.apache.org/downloads.html
  (for HDP2)
  The run example from spark web-site:
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi
  --master
  yarn-cluster --num-executors 3 --driver-memory 2g
 --executor-memory 2g
  --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
  I got error:
  Application application_1404470405736_0044 failed 3 times due to AM
  Container for appattempt_1404470405736_0044_03 exited with
  exitCode: 1
  due to: Exception from container-launch:
  org.apache.hadoop.util.Shell$ExitCodeException:
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
  at org.apache.hadoop.util.Shell.run(Shell.java:379)
  at
 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at
 
 

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Chester @work
In Yarn cluster mode, you can either have spark on all the cluster nodes or 
supply the spark jar yourself. In the 2nd case, you don't need install spark on 
cluster at all. As you supply the spark assembly as we as your app jar 
together. 

I hope this make it clear

Chester

Sent from my iPhone

 On Jul 7, 2014, at 5:05 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 thank you Krishna!
 
 Could you please explain why do I need install spark on each node if Spark 
 official site said: If you have a Hadoop 2 cluster, you can run Spark without 
 any installation needed
 
 I have HDP 2 (YARN) and that's why I hope I don't need to install spark on 
 each node 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Konstantin,
 You need to install the hadoop rpms on all nodes. If it is Hadoop 2, the 
 nodes would have hdfs  YARN.
 Then you need to install Spark on all nodes. I haven't had experience with 
 HDP, but the tech preview might have installed Spark as well.
 In the end, one should have hdfs,yarn  spark installed on all the nodes.
 After installations, check the web console to make sure hdfs, yarn  spark 
 are running.
 Then you are ready to start experimenting/developing spark applications.
 HTH.
 Cheers
 k/
 
 
 On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 guys, I'm not talking about running spark on VM, I don have problem with it.
 
 I confused in the next:
 1) Hortonworks describe installation process as RPMs on each node
 2) spark home page said that everything I need is YARN
 
 And I'm in stucj with understanding what I need to do to run spark on yarn 
 (do I need RPMs installations or only build spark on edge node?)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com 
 wrote:
 I can say from my experience that getting Spark to work with Hadoop 2
 is not for the beginner; after solving one problem after another
 (dependencies, scripts, etc.), I went back to Hadoop 1.
 
 Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
 why, but, given so, Hadoop 2 has too many bumps
 
 On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
  That is confusing based on the context you provided.
 
  This might take more time than I can spare to try to understand.
 
  For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
 
  Cloudera's CDH 5 express VM includes Spark, but the service isn't 
  running by
  default.
 
  I can't remember for MapR...
 
  Marco
 
  On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Marco,
 
  Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
  can try
  from
  http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
   HDP 2.1 means YARN, at the same time they propose ti install rpm
 
  On other hand, http://spark.apache.org/ said 
  Integrated with Hadoop
  Spark can run on Hadoop 2's YARN cluster manager, and can read any
  existing Hadoop data.
 
  If you have a Hadoop 2 cluster, you can run Spark without any 
  installation
  needed. 
 
  And this is confusing for me... do I need rpm installation on not?...
 
 
  Thank you,
  Konstantin Kudryavtsev
 
 
  On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
  wrote:
  Can you provide links to the sections that are confusing?
 
  My understanding, the HDP1 binaries do not need YARN, while the HDP2
  binaries do.
 
  Now, you can also install Hortonworks Spark RPM...
 
  For production, in my opinion, RPMs are better for manageability.
 
  On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Hello, thanks for your message... I'm confused, Hortonworhs suggest
  install spark rpm on each node, but on Spark main page said that yarn
  enough and I don't need to install it... What the difference?
 
  sent from my HTC
 
  On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
  Konstantin,
 
  HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can
  try
  from
  http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
  Let me know if you see issues with the tech preview.
 
  spark PI example on HDP 2.0
 
  I downloaded spark 1.0 pre-build from
  http://spark.apache.org/downloads.html
  (for HDP2)
  The run example from spark web-site:
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi
  --master
  yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 
  2g
  --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
  I got error:
  Application application_1404470405736_0044 failed 3 times due to AM
  Container for appattempt_1404470405736_0044_03 exited with
  exitCode: 1
  due to: Exception from container-launch:
  org.apache.hadoop.util.Shell$ExitCodeException:
  at 

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread Konstantin Kudryavtsev
Hi Chester,

Thank you very much, it is clear now - just two different way to support
spark on acluster

Thank you,
Konstantin Kudryavtsev


On Mon, Jul 7, 2014 at 3:22 PM, Chester @work ches...@alpinenow.com wrote:

 In Yarn cluster mode, you can either have spark on all the cluster nodes
 or supply the spark jar yourself. In the 2nd case, you don't need install
 spark on cluster at all. As you supply the spark assembly as we as your app
 jar together.

 I hope this make it clear

 Chester

 Sent from my iPhone

 On Jul 7, 2014, at 5:05 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 thank you Krishna!

  Could you please explain why do I need install spark on each node if
 Spark official site said: If you have a Hadoop 2 cluster, you can run
 Spark without any installation needed

 I have HDP 2 (YARN) and that's why I hope I don't need to install spark on
 each node

 Thank you,
 Konstantin Kudryavtsev


 On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar ksanka...@gmail.com
 wrote:

 Konstantin,

1. You need to install the hadoop rpms on all nodes. If it is Hadoop
2, the nodes would have hdfs  YARN.
2. Then you need to install Spark on all nodes. I haven't had
experience with HDP, but the tech preview might have installed Spark as
well.
3. In the end, one should have hdfs,yarn  spark installed on all the
nodes.
4. After installations, check the web console to make sure hdfs, yarn
 spark are running.
5. Then you are ready to start experimenting/developing spark
applications.

 HTH.
 Cheers
 k/


 On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 guys, I'm not talking about running spark on VM, I don have problem with
 it.

 I confused in the next:
 1) Hortonworks describe installation process as RPMs on each node
 2) spark home page said that everything I need is YARN

 And I'm in stucj with understanding what I need to do to run spark on
 yarn (do I need RPMs installations or only build spark on edge node?)


 Thank you,
 Konstantin Kudryavtsev


 On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com
 wrote:

 I can say from my experience that getting Spark to work with Hadoop 2
 is not for the beginner; after solving one problem after another
 (dependencies, scripts, etc.), I went back to Hadoop 1.

 Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
 why, but, given so, Hadoop 2 has too many bumps

 On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
  That is confusing based on the context you provided.
 
  This might take more time than I can spare to try to understand.
 
  For sure, you need to add Spark to run it in/on the HDP 2.1 express
 VM.
 
  Cloudera's CDH 5 express VM includes Spark, but the service isn't
 running by
  default.
 
  I can't remember for MapR...
 
  Marco
 
  On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Marco,
 
  Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that
 you
  can try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
   HDP 2.1 means YARN, at the same time they propose ti install rpm
 
  On other hand, http://spark.apache.org/ said 
  Integrated with Hadoop
  Spark can run on Hadoop 2's YARN cluster manager, and can read any
  existing Hadoop data.
 
  If you have a Hadoop 2 cluster, you can run Spark without any
 installation
  needed. 
 
  And this is confusing for me... do I need rpm installation on not?...
 
 
  Thank you,
  Konstantin Kudryavtsev
 
 
  On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
  wrote:
  Can you provide links to the sections that are confusing?
 
  My understanding, the HDP1 binaries do not need YARN, while the HDP2
  binaries do.
 
  Now, you can also install Hortonworks Spark RPM...
 
  For production, in my opinion, RPMs are better for manageability.
 
  On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Hello, thanks for your message... I'm confused, Hortonworhs suggest
  install spark rpm on each node, but on Spark main page said that
 yarn
  enough and I don't need to install it... What the difference?
 
  sent from my HTC
 
  On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
  Konstantin,
 
  HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
 can
  try
  from
 
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
  Let me know if you see issues with the tech preview.
 
  spark PI example on HDP 2.0
 
  I downloaded spark 1.0 pre-build from
  http://spark.apache.org/downloads.html
  (for HDP2)
  The run example from spark web-site:
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi
  --master
  yarn-cluster --num-executors 3 --driver-memory 2g
 --executor-memory 2g
  --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
  I got error:
  Application 

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-07 Thread DB Tsai
Actually, the one needed to install the jar to each individual node is
standalone mode which works for both MR1 and MR2. Cloudera and
Hortonworks currently support spark in this way as far as I know.

For both yarn-cluster or yarn-client, Spark will distribute the jars
through distributed cache and each executor can find the jars there.

On Jul 7, 2014 6:23 AM, Chester @work ches...@alpinenow.com wrote:

 In Yarn cluster mode, you can either have spark on all the cluster nodes or 
 supply the spark jar yourself. In the 2nd case, you don't need install spark 
 on cluster at all. As you supply the spark assembly as we as your app jar 
 together.

 I hope this make it clear

 Chester

 Sent from my iPhone

 On Jul 7, 2014, at 5:05 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 thank you Krishna!

 Could you please explain why do I need install spark on each node if Spark 
 official site said: If you have a Hadoop 2 cluster, you can run Spark without 
 any installation needed

 I have HDP 2 (YARN) and that's why I hope I don't need to install spark on 
 each node

 Thank you,
 Konstantin Kudryavtsev


 On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar ksanka...@gmail.com wrote:

 Konstantin,

 You need to install the hadoop rpms on all nodes. If it is Hadoop 2, the 
 nodes would have hdfs  YARN.
 Then you need to install Spark on all nodes. I haven't had experience with 
 HDP, but the tech preview might have installed Spark as well.
 In the end, one should have hdfs,yarn  spark installed on all the nodes.
 After installations, check the web console to make sure hdfs, yarn  spark 
 are running.
 Then you are ready to start experimenting/developing spark applications.

 HTH.
 Cheers
 k/


 On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 guys, I'm not talking about running spark on VM, I don have problem with it.

 I confused in the next:
 1) Hortonworks describe installation process as RPMs on each node
 2) spark home page said that everything I need is YARN

 And I'm in stucj with understanding what I need to do to run spark on yarn 
 (do I need RPMs installations or only build spark on edge node?)


 Thank you,
 Konstantin Kudryavtsev


 On Mon, Jul 7, 2014 at 4:34 AM, Robert James srobertja...@gmail.com wrote:

 I can say from my experience that getting Spark to work with Hadoop 2
 is not for the beginner; after solving one problem after another
 (dependencies, scripts, etc.), I went back to Hadoop 1.

 Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
 why, but, given so, Hadoop 2 has too many bumps

 On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
  That is confusing based on the context you provided.
 
  This might take more time than I can spare to try to understand.
 
  For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
 
  Cloudera's CDH 5 express VM includes Spark, but the service isn't 
  running by
  default.
 
  I can't remember for MapR...
 
  Marco
 
  On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Marco,
 
  Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
  can try
  from
  http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
   HDP 2.1 means YARN, at the same time they propose ti install rpm
 
  On other hand, http://spark.apache.org/ said 
  Integrated with Hadoop
  Spark can run on Hadoop 2's YARN cluster manager, and can read any
  existing Hadoop data.
 
  If you have a Hadoop 2 cluster, you can run Spark without any 
  installation
  needed. 
 
  And this is confusing for me... do I need rpm installation on not?...
 
 
  Thank you,
  Konstantin Kudryavtsev
 
 
  On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
  wrote:
  Can you provide links to the sections that are confusing?
 
  My understanding, the HDP1 binaries do not need YARN, while the HDP2
  binaries do.
 
  Now, you can also install Hortonworks Spark RPM...
 
  For production, in my opinion, RPMs are better for manageability.
 
  On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
  kudryavtsev.konstan...@gmail.com wrote:
 
  Hello, thanks for your message... I'm confused, Hortonworhs suggest
  install spark rpm on each node, but on Spark main page said that yarn
  enough and I don't need to install it... What the difference?
 
  sent from my HTC
 
  On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
  Konstantin,
 
  HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can
  try
  from
  http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
  Let me know if you see issues with the tech preview.
 
  spark PI example on HDP 2.0
 
  I downloaded spark 1.0 pre-build from
  http://spark.apache.org/downloads.html
  (for HDP2)
  The run example from spark web-site:
  ./bin/spark-submit --class org.apache.spark.examples.SparkPi
  --master
  yarn-cluster --num-executors 3 

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread vs
Konstantin,

HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can try
from
http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf

Let me know if you see issues with the tech preview.

spark PI example on HDP 2.0

I downloaded spark 1.0 pre-build from http://spark.apache.org/downloads.html
(for HDP2)
The run example from spark web-site:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g
--executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

I got error:
Application application_1404470405736_0044 failed 3 times due to AM
Container for appattempt_1404470405736_0044_03 exited with exitCode: 1
due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application.

Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 1,
--num-executors, 3)
Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] 
Options:
  --jar JAR_PATH   Path to your application's JAR file (required)
  --class CLASS_NAME   Name of your application's main class (required)
...bla-bla-bla




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Konstantin Kudryavtsev
Hello, thanks for your message... I'm confused, Hortonworhs suggest install
spark rpm on each node, but on Spark main page said that yarn enough and I
don't need to install it... What the difference?

sent from my HTC
On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:

 Konstantin,

 HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can try
 from
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf

 Let me know if you see issues with the tech preview.

 spark PI example on HDP 2.0

 I downloaded spark 1.0 pre-build from
 http://spark.apache.org/downloads.html
 (for HDP2)
 The run example from spark web-site:
 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g
 --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

 I got error:
 Application application_1404470405736_0044 failed 3 times due to AM
 Container for appattempt_1404470405736_0044_03 exited with exitCode: 1
 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at

 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at

 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at

 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 .Failing this attempt.. Failing the application.

 Unknown/unsupported param List(--executor-memory, 2048, --executor-cores,
 1,
 --num-executors, 3)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class (required)
 ...bla-bla-bla
 



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Marco Shaw
Can you provide links to the sections that are confusing?

My understanding, the HDP1 binaries do not need YARN, while the HDP2 binaries 
do. 

Now, you can also install Hortonworks Spark RPM...

For production, in my opinion, RPMs are better for manageability. 

 On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hello, thanks for your message... I'm confused, Hortonworhs suggest install 
 spark rpm on each node, but on Spark main page said that yarn enough and I 
 don't need to install it... What the difference?
 
 sent from my HTC
 
 On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
 Konstantin,
 
 HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can try
 from
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
 Let me know if you see issues with the tech preview.
 
 spark PI example on HDP 2.0
 
 I downloaded spark 1.0 pre-build from http://spark.apache.org/downloads.html
 (for HDP2)
 The run example from spark web-site:
 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g
 --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
 I got error:
 Application application_1404470405736_0044 failed 3 times due to AM
 Container for appattempt_1404470405736_0044_03 exited with exitCode: 1
 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 .Failing this attempt.. Failing the application.
 
 Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 1,
 --num-executors, 3)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class (required)
 ...bla-bla-bla
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Konstantin Kudryavtsev
Marco,

Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
can try
from
http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
HDP 2.1 means YARN, at the same time they propose ti install rpm

On other hand, http://spark.apache.org/ said 
Integrated with Hadoop

Spark can run on Hadoop 2's YARN cluster manager, and can read any existing
Hadoop data.
If you have a Hadoop 2 cluster, you can run Spark without any installation
needed. 

And this is confusing for me... do I need rpm installation on not?...


Thank you,
Konstantin Kudryavtsev


On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com wrote:

 Can you provide links to the sections that are confusing?

 My understanding, the HDP1 binaries do not need YARN, while the HDP2
 binaries do.

 Now, you can also install Hortonworks Spark RPM...

 For production, in my opinion, RPMs are better for manageability.

 On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 Hello, thanks for your message... I'm confused, Hortonworhs suggest
 install spark rpm on each node, but on Spark main page said that yarn
 enough and I don't need to install it... What the difference?

 sent from my HTC
 On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:

 Konstantin,

 HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can try
 from

 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf

 Let me know if you see issues with the tech preview.

 spark PI example on HDP 2.0

 I downloaded spark 1.0 pre-build from
 http://spark.apache.org/downloads.html
 (for HDP2)
 The run example from spark web-site:
 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g
 --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

 I got error:
 Application application_1404470405736_0044 failed 3 times due to AM
 Container for appattempt_1404470405736_0044_03 exited with exitCode: 1
 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at

 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at

 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at

 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 .Failing this attempt.. Failing the application.

 Unknown/unsupported param List(--executor-memory, 2048, --executor-cores,
 1,
 --num-executors, 3)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class (required)
 ...bla-bla-bla
 



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.




Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Marco Shaw
That is confusing based on the context you provided. 

This might take more time than I can spare to try to understand. 

For sure, you need to add Spark to run it in/on the HDP 2.1 express VM. 

Cloudera's CDH 5 express VM includes Spark, but the service isn't running by 
default. 

I can't remember for MapR...

Marco

 On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Marco,
 
 Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can 
 try
 from
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf  
 HDP 2.1 means YARN, at the same time they propose ti install rpm
 
 On other hand, http://spark.apache.org/ said 
 Integrated with Hadoop
 Spark can run on Hadoop 2's YARN cluster manager, and can read any existing 
 Hadoop data.
 
 If you have a Hadoop 2 cluster, you can run Spark without any installation 
 needed. 
 
 And this is confusing for me... do I need rpm installation on not?...
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com wrote:
 Can you provide links to the sections that are confusing?
 
 My understanding, the HDP1 binaries do not need YARN, while the HDP2 
 binaries do. 
 
 Now, you can also install Hortonworks Spark RPM...
 
 For production, in my opinion, RPMs are better for manageability. 
 
 On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hello, thanks for your message... I'm confused, Hortonworhs suggest install 
 spark rpm on each node, but on Spark main page said that yarn enough and I 
 don't need to install it... What the difference?
 
 sent from my HTC
 
 On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
 Konstantin,
 
 HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can try
 from
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
 
 Let me know if you see issues with the tech preview.
 
 spark PI example on HDP 2.0
 
 I downloaded spark 1.0 pre-build from 
 http://spark.apache.org/downloads.html
 (for HDP2)
 The run example from spark web-site:
 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g
 --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2
 
 I got error:
 Application application_1404470405736_0044 failed 3 times due to AM
 Container for appattempt_1404470405736_0044_03 exited with exitCode: 1
 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 .Failing this attempt.. Failing the application.
 
 Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 
 1,
 --num-executors, 3)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class (required)
 ...bla-bla-bla
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 


Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Robert James
I can say from my experience that getting Spark to work with Hadoop 2
is not for the beginner; after solving one problem after another
(dependencies, scripts, etc.), I went back to Hadoop 1.

Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
why, but, given so, Hadoop 2 has too many bumps

On 7/6/14, Marco Shaw marco.s...@gmail.com wrote:
 That is confusing based on the context you provided.

 This might take more time than I can spare to try to understand.

 For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.

 Cloudera's CDH 5 express VM includes Spark, but the service isn't running by
 default.

 I can't remember for MapR...

 Marco

 On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev
 kudryavtsev.konstan...@gmail.com wrote:

 Marco,

 Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
 can try
 from
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
  HDP 2.1 means YARN, at the same time they propose ti install rpm

 On other hand, http://spark.apache.org/ said 
 Integrated with Hadoop
 Spark can run on Hadoop 2's YARN cluster manager, and can read any
 existing Hadoop data.

 If you have a Hadoop 2 cluster, you can run Spark without any installation
 needed. 

 And this is confusing for me... do I need rpm installation on not?...


 Thank you,
 Konstantin Kudryavtsev


 On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com
 wrote:
 Can you provide links to the sections that are confusing?

 My understanding, the HDP1 binaries do not need YARN, while the HDP2
 binaries do.

 Now, you can also install Hortonworks Spark RPM...

 For production, in my opinion, RPMs are better for manageability.

 On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev
 kudryavtsev.konstan...@gmail.com wrote:

 Hello, thanks for your message... I'm confused, Hortonworhs suggest
 install spark rpm on each node, but on Spark main page said that yarn
 enough and I don't need to install it... What the difference?

 sent from my HTC

 On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
 Konstantin,

 HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can
 try
 from
 http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf

 Let me know if you see issues with the tech preview.

 spark PI example on HDP 2.0

 I downloaded spark 1.0 pre-build from
 http://spark.apache.org/downloads.html
 (for HDP2)
 The run example from spark web-site:
 ./bin/spark-submit --class org.apache.spark.examples.SparkPi
 --master
 yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g
 --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

 I got error:
 Application application_1404470405736_0044 failed 3 times due to AM
 Container for appattempt_1404470405736_0044_03 exited with
 exitCode: 1
 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 .Failing this attempt.. Failing the application.

 Unknown/unsupported param List(--executor-memory, 2048,
 --executor-cores, 1,
 --num-executors, 3)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class
 (required)
 ...bla-bla-bla
 



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.