Hi there,

We use something like:


/*
 * Force Spark to initialise the defaultParallelism by executing a dummy 
parallel operation and then return
 * the resulting defaultParallelism.
 */
private int getWorkerCount(SparkContext sparkContext) {
    sparkContext.parallelize(List.of(1, 2, 3, 4)).collect();
    return sparkContext.defaultParallelism();
}


Its useful for setting certain pool sizes dynamically, such as:


sparkContext.hadoopConfiguration().set("fs.s3a.connection.maximum", 
Integer.toString(workerCount * 2));

This  works in our Spark 3.0.1 code; just migrating to 3.2.1 now.

Cheers,

Steve C

On 8 Jun 2022, at 4:28 pm, Poorna Murali 
<poornamur...@gmail.com<mailto:poornamur...@gmail.com>> wrote:

Hi,

I would like to know if it is possible to  get the count of live master and 
worker spark nodes running in a system.

Please help to clarify the same.

Thanks&Regards,
Poorna

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia's privacy policy. 
http://www.infomedia.com.au/privacy-policy/

Reply via email to