Re: Spark Job not using all nodes in cluster

Shailesh Birari Wed, 20 May 2015 15:39:40 -0700

No. I am not setting the number of executors anywhere (in env file or in
program).


Is it due to large number of small files ?

On Wed, May 20, 2015 at 5:11 PM, ayan guha <guha.a...@gmail.com> wrote:

> What is your spark env file says? Are you setting number of executors in
> spark context?
> On 20 May 2015 13:16, "Shailesh Birari" <sbirar...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
>> of RAM.
>> I have around 600,000+ Json files on HDFS. Each file is small around 1KB
>> in
>> size. Total data is around 16GB. Hadoop block size is 256MB.
>> My application reads these files with sc.textFile() (or sc.jsonFile()
>> tried
>> both) API. But all the files are getting read by only one node (4
>> executors). Spark UI shows all 600K+ tasks on one node and 0 on other
>> nodes.
>>
>> I confirmed that all files are accessible from all nodes. Some other
>> application which uses big files uses all nodes on same cluster.
>>
>> Can you please let me know why it is behaving in such way ?
>>
>> Thanks,
>>   Shailesh
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Re: Spark Job not using all nodes in cluster

Reply via email to