Hello,
My Spark application is written in Scala and submitted to a Spark cluster
in standalone mode. The Spark Jobs for my application are listed in the
Spark UI like this:
Job Id Description ...
6 saveAsTextFile at Foo.scala:202
5 saveAsTextFile at Foo.scala:201
4
Hello,
I am trying to use the default Spark cluster manager in a production
environment. I will be submitting jobs with spark-submit. I wonder if the
following is possible:
1. Get the Driver ID from spark-submit. We will use this ID to keep track
of the job and kill it if necessary.
2. Weather
Hello,
I am using the Spark shell in Scala on the localhost. I am using sc.textFile
to read a directory. The directory looks like this (generated by another
Spark script):
part-0
part-1
_SUCCESS
The part-0 has four short lines of text while part-1 has two short
lines of text.
,
The number of partition is controlled by HDFS input format, and one file
may have multiple partitions if it consists of multiple block. In you case,
I think there is one file with 2 splits.
Thanks.
Zhan Zhang
On Mar 27, 2015, at 3:12 PM, Rares Vernica rvern...@gmail.com wrote:
Hello
Hi,
I have a private cluster with private IPs, 192.168.*.*, and a gateway node
with both private IP, 192.168.*.*, and public internet IP.
I setup the Spark master on the gateway node and set the SPARK_MASTER_IP to
the private IP. I start Spark workers on the private nodes. It works fine.
The
Hello,
I am using takeSample from the Scala Spark 1.2.1 shell:
scala sc.textFile(README.md).takeSample(false, 3)
and I notice that two jobs are generated on the Spark Jobs page:
Job Id Description
1 takeSample at console:13
0 takeSample at console:13
Any ideas why the two jobs are needed?