[ https://issues.apache.org/jira/browse/PIG-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689310#comment-15689310 ]
liyunzhang_intel commented on PIG-4952: --------------------------------------- [~nkollar]: thanks for taking this big work. The target of this jira is to calculate a proper value of parallism according to the CPU or memory resource on the cluster. from the article of How-to: Tune Your Apache Spark Jobs (Part 2)|http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/, it said: {quote} The primary concern is that the number of tasks will be too small. If there are fewer tasks than slots available to run them in, the stage won’t be taking advantage of all the CPU available. {quote} Currently i am not very clear how to calculate the best value of parallism according to cpu or memory resource. Have any thoughts? > Calculate the value of parallism for spark mode > ----------------------------------------------- > > Key: PIG-4952 > URL: https://issues.apache.org/jira/browse/PIG-4952 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: Nandor Kollar > Fix For: spark-branch > > Attachments: PIG-4952_1.patch > > > Calculate the value of parallism for spark mode like what > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.ParallelismSetter > does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)