Hi, I have a 2 node yarn cluster and I am using spark 1.1.0 to submit my tasks.
As per the documentation of spark, number of cores are maximum cores available. So does it mean each node creates no of cores = no of threads to process the job assigned to that node. For ex, List<Integer> data = new ArrayList<Integer>(); for(int i=0;i<1000;i++) data.add(i); JavaRDD<Integer> distData = sc.parallelize(data); distData=distData.map( new Function<Integer, Integer>() { @Override public Integer call(Integer arg0) throws Exception { return arg0*arg0; } } ); distData.count(); The above program dividing my RDD into 2 batches of 500 size, and submitting to the executors. 1) So each executor will use all the cores of the node CPU to process 500 size batch am I right? 2) If so, Does it mean each executor uses multi threading? Is that execution parallel or sequential on node. 3) How to check how many cores an executor is using to process my jobs? 4) Do we have any chance to control the batch division on nodes? Please give some clarity on above. Thanks & Regards, Naveen