Hi,

I have a 2 node yarn cluster and I am using spark 1.1.0 to submit my tasks.

As per the documentation of spark, number of cores are maximum cores available. 
So does it mean each node creates no of cores = no of threads to process the 
job assigned to that node.

For ex,

        List<Integer> data = new ArrayList<Integer>();
              for(int i=0;i<1000;i++)
                     data.add(i);

   JavaRDD<Integer> distData = sc.parallelize(data);

distData=distData.map(
                           new Function<Integer, Integer>() {

                                  @Override
                                  public Integer call(Integer arg0) throws 
Exception {
                                         return arg0*arg0;
                                  }


                           }
                           );



   distData.count();

The above program dividing my RDD into 2 batches of 500 size, and submitting to 
the executors.


1)  So each executor will use all the cores of the node CPU to process 500 size 
batch am I right?


2)  If so, Does it mean each executor uses multi threading? Is that execution 
parallel or sequential on node.


3)  How to check how many cores an executor is using to process my jobs?


4)      Do we have any chance to control the batch division on nodes?





Please  give some clarity on above.

Thanks & Regards,
Naveen

Reply via email to