Set Job Descriptions for Scala application

2015-08-05 Thread Rares Vernica
Hello, My Spark application is written in Scala and submitted to a Spark cluster in standalone mode. The Spark Jobs for my application are listed in the Spark UI like this: Job Id Description ... 6 saveAsTextFile at Foo.scala:202 5 saveAsTextFile at Foo.scala:201 4

Driver ID from spark-submit

2015-04-27 Thread Rares Vernica
Hello, I am trying to use the default Spark cluster manager in a production environment. I will be submitting jobs with spark-submit. I wonder if the following is possible: 1. Get the Driver ID from spark-submit. We will use this ID to keep track of the job and kill it if necessary. 2. Weather

2 input paths generate 3 partitions

2015-03-27 Thread Rares Vernica
Hello, I am using the Spark shell in Scala on the localhost. I am using sc.textFile to read a directory. The directory looks like this (generated by another Spark script): part-0 part-1 _SUCCESS The part-0 has four short lines of text while part-1 has two short lines of text.

Re: 2 input paths generate 3 partitions

2015-03-27 Thread Rares Vernica
, The number of partition is controlled by HDFS input format, and one file may have multiple partitions if it consists of multiple block. In you case, I think there is one file with 2 splits. Thanks. Zhan Zhang On Mar 27, 2015, at 3:12 PM, Rares Vernica rvern...@gmail.com wrote: Hello

Set spark.fileserver.uri on private cluster

2015-03-17 Thread Rares Vernica
Hi, I have a private cluster with private IPs, 192.168.*.*, and a gateway node with both private IP, 192.168.*.*, and public internet IP. I setup the Spark master on the gateway node and set the SPARK_MASTER_IP to the private IP. I start Spark workers on the private nodes. It works fine. The

takeSample triggers 2 jobs

2015-03-06 Thread Rares Vernica
Hello, I am using takeSample from the Scala Spark 1.2.1 shell: scala sc.textFile(README.md).takeSample(false, 3) and I notice that two jobs are generated on the Spark Jobs page: Job Id Description 1 takeSample at console:13 0 takeSample at console:13 Any ideas why the two jobs are needed?