You may have to do "sudo jps", because it should definitely list your processes.
What does hivecluster2:8080 look like? My guess is it says there are 2 applications registered, and one has taken all the executors. There must be two applications running, as those are the only things that keep open those 4040/4041 ports. On Mon, Jun 2, 2014 at 11:32 AM, Russell Jurney <russell.jur...@gmail.com> wrote: > If it matters, I have servers running at > http://hivecluster2:4040/stages/ and http://hivecluster2:4041/stages/ > > When I run rdd.first, I see an item at > http://hivecluster2:4041/stages/ but no tasks are running. Stage ID 1, > first at <console>:46, Tasks: Succeeded/Total 0/16. > > On Mon, Jun 2, 2014 at 10:09 AM, Russell Jurney > <russell.jur...@gmail.com> wrote: > > Looks like just worker and master processes are running: > > > > [hivedata@hivecluster2 ~]$ jps > > > > 10425 Jps > > > > [hivedata@hivecluster2 ~]$ ps aux|grep spark > > > > hivedata 10424 0.0 0.0 103248 820 pts/3 S+ 10:05 0:00 grep > spark > > > > root 10918 0.5 1.4 4752880 230512 ? Sl May27 41:43 java -cp > > > :/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/conf:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/core/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/repl/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/examples/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/bagel/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/mllib/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/streaming/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/jline.jar > > -Dspark.akka.logLifecycleEvents=true > > > -Djava.library.path=/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native > > -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip hivecluster2 > > --port 7077 --webui-port 18080 > > > > root 12715 0.0 0.0 148028 656 ? S May27 0:00 sudo > > /opt/cloudera/parcels/SPARK/lib/spark/bin/spark-class > > org.apache.spark.deploy.worker.Worker spark://hivecluster2:7077 > > > > root 12716 0.3 1.1 4155884 191340 ? Sl May27 30:21 java -cp > > > :/opt/cloudera/parcels/SPARK/lib/spark/conf:/opt/cloudera/parcels/SPARK/lib/spark/core/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/repl/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/examples/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/bagel/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/mllib/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/streaming/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/SPARK/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/SPARK/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/SPARK/lib/spark/lib/jline.jar > > -Dspark.akka.logLifecycleEvents=true > > > -Djava.library.path=/opt/cloudera/parcels/SPARK/lib/spark/lib:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native > > -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker > > spark://hivecluster2:7077 > > > > > > > > > > On Sun, Jun 1, 2014 at 7:41 PM, Aaron Davidson <ilike...@gmail.com> > wrote: > >> > >> Sounds like you have two shells running, and the first one is talking > all > >> your resources. Do a "jps" and kill the other guy, then try again. > >> > >> By the way, you can look at http://localhost:8080 (replace localhost > with > >> the server your Spark Master is running on) to see what applications are > >> currently started, and what resource allocations they have. > >> > >> > >> On Sun, Jun 1, 2014 at 6:47 PM, Russell Jurney < > russell.jur...@gmail.com> > >> wrote: > >>> > >>> Thanks again. Run results here: > >>> https://gist.github.com/rjurney/dc0efae486ba7d55b7d5 > >>> > >>> This time I get a port already in use exception on 4040, but it isn't > >>> fatal. Then when I run rdd.first, I get this over and over: > >>> > >>> 14/06/01 18:35:40 WARN scheduler.TaskSchedulerImpl: Initial job has not > >>> accepted any resources; check your cluster UI to ensure that workers > are > >>> registered and have sufficient memory > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Sun, Jun 1, 2014 at 3:09 PM, Aaron Davidson <ilike...@gmail.com> > >>> wrote: > >>>> > >>>> You can avoid that by using the constructor that takes a SparkConf, a > la > >>>> > >>>> val conf = new SparkConf() > >>>> conf.setJars("avro.jar", ...) > >>>> val sc = new SparkContext(conf) > >>>> > >>>> > >>>> On Sun, Jun 1, 2014 at 2:32 PM, Russell Jurney > >>>> <russell.jur...@gmail.com> wrote: > >>>>> > >>>>> Followup question: the docs to make a new SparkContext require that I > >>>>> know where $SPARK_HOME is. However, I have no idea. Any idea where > that > >>>>> might be? > >>>>> > >>>>> > >>>>> On Sun, Jun 1, 2014 at 10:28 AM, Aaron Davidson <ilike...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> Gotcha. The easiest way to get your dependencies to your Executors > >>>>>> would probably be to construct your SparkContext with all necessary > jars > >>>>>> passed in (as the "jars" parameter), or inside a SparkConf with > setJars(). > >>>>>> Avro is a "necessary jar", but it's possible your application also > needs to > >>>>>> distribute other ones to the cluster. > >>>>>> > >>>>>> An easy way to make sure all your dependencies get shipped to the > >>>>>> cluster is to create an assembly jar of your application, and then > you just > >>>>>> need to tell Spark about that jar, which includes all your > application's > >>>>>> transitive dependencies. Maven and sbt both have pretty > straightforward ways > >>>>>> of producing assembly jars. > >>>>>> > >>>>>> > >>>>>> On Sat, May 31, 2014 at 11:23 PM, Russell Jurney > >>>>>> <russell.jur...@gmail.com> wrote: > >>>>>>> > >>>>>>> Thanks for the fast reply. > >>>>>>> > >>>>>>> I am running CDH 4.4 with the Cloudera Parcel of Spark 0.9.0, in > >>>>>>> standalone mode. > >>>>>>> > >>>>>>> > >>>>>>> On Saturday, May 31, 2014, Aaron Davidson <ilike...@gmail.com> > wrote: > >>>>>>>> > >>>>>>>> First issue was because your cluster was configured incorrectly. > You > >>>>>>>> could probably read 1 file because that was done on the driver > node, but > >>>>>>>> when it tried to run a job on the cluster, it failed. > >>>>>>>> > >>>>>>>> Second issue, it seems that the jar containing avro is not getting > >>>>>>>> propagated to the Executors. What version of Spark are you > running on? What > >>>>>>>> deployment mode (YARN, standalone, Mesos)? > >>>>>>>> > >>>>>>>> > >>>>>>>> On Sat, May 31, 2014 at 9:37 PM, Russell Jurney > >>>>>>>> <russell.jur...@gmail.com> wrote: > >>>>>>>> > >>>>>>>> Now I get this: > >>>>>>>> > >>>>>>>> scala> rdd.first > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Starting job: first at > >>>>>>>> <console>:41 > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job 4 (first at > >>>>>>>> <console>:41) with 1 output partitions (allowLocal=true) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final stage: Stage > 4 > >>>>>>>> (first at <console>:41) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents of final > >>>>>>>> stage: List() > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing parents: > >>>>>>>> List() > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Computing the > >>>>>>>> requested partition locally > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO rdd.HadoopRDD: Input split: > >>>>>>>> > hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864 > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Job finished: first at > >>>>>>>> <console>:41, took 0.037371256 s > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Starting job: first at > >>>>>>>> <console>:41 > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job 5 (first at > >>>>>>>> <console>:41) with 16 output partitions (allowLocal=true) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final stage: Stage > 5 > >>>>>>>> (first at <console>:41) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents of final > >>>>>>>> stage: List() > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing parents: > >>>>>>>> List() > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting Stage 5 > >>>>>>>> (HadoopRDD[0] at hadoopRDD at <console>:37), which has no missing > parents > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting 16 > missing > >>>>>>>> tasks from Stage 5 (HadoopRDD[0] at hadoopRDD at <console>:37) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Adding task > set > >>>>>>>> 5.0 with 16 tasks > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:0 > >>>>>>>> as TID 92 on executor 2: hivecluster3 (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:0 as 1294 bytes in 1 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:3 > >>>>>>>> as TID 93 on executor 1: hivecluster5.labs.lan (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:3 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:1 > >>>>>>>> as TID 94 on executor 4: hivecluster4 (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:1 as 1294 bytes in 1 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:2 > >>>>>>>> as TID 95 on executor 0: hivecluster6.labs.lan (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:2 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:4 > >>>>>>>> as TID 96 on executor 3: hivecluster1.labs.lan (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:4 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:6 > >>>>>>>> as TID 97 on executor 2: hivecluster3 (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:6 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:5 > >>>>>>>> as TID 98 on executor 1: hivecluster5.labs.lan (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:5 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:8 > >>>>>>>> as TID 99 on executor 4: hivecluster4 (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:8 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:7 > >>>>>>>> as TID 100 on executor 0: hivecluster6.labs.lan (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:7 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > >>>>>>>> 5.0:10 as TID 101 on executor 3: hivecluster1.labs.lan > (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:10 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > >>>>>>>> 5.0:14 as TID 102 on executor 2: hivecluster3 (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:14 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > 5.0:9 > >>>>>>>> as TID 103 on executor 1: hivecluster5.labs.lan (NODE_LOCAL) > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task > >>>>>>>> 5.0:9 as 1294 bytes in 0 ms > >>>>>>>> > >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task > >>>>>>>> 5.0:11 as TID 104 on executor 4: hivecluster4 (N > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > >>>>>>> datasyndrome.com > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > >>>>> datasyndrome.com > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > >>> datasyndrome.com > >> > >> > > > > > > > > -- > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > datasyndrome.com > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > datasyndrome.com >