Let us just take one script L9 for analysis. - What was the failure error/stack trace? We run Pigmix with just 1G of heap. So it cannot be going out of memory. - Where was the 6 hours spent? Can you give a breakdown? Are all the reducer tasks being launched in parallel? For eg: If a reducer normally takes 30 mins, if it is launched in 6 waves it can take 3 hrs. Try lowering reducer memory from -Xmx3276m to -Xmx2048m or -Xmx1638m if that is the case.
On Tue, Jul 26, 2016 at 12:18 AM, Zhang, Liyun <liyun.zh...@intel.com> wrote: > Hi all: > > Now I’m using pigmix to test the performance of Pig On Spark(PIG-4937 > <https://issues.apache.org/jira/browse/PIG-4937>). The test data is 1TB. > After generating all the test data, I have run first round of test in mr > mode. > > The cluster has 8 nodes(each node has 40 cores and 60g memory, will assign > 28 cores and 56g for nodemanager on the node). Total cores and memory for > the cluster is 224 cores and 448g memory. > > > > The snippet of yarn-site.xml: > > <property> > > <name>yarn.nodemanager.resource.memory-mb</name> > > <value>57344</value> > > <description>the amount of memory on the NodeManager in > MB</description> > > </property> > > <property> > > <name>yarn.nodemanager.resource.cpu-vcores</name> > > <value>28</value> > > </property> > > <property> > > <name>yarn.scheduler.minimum-allocation-mb</name> > > <value>2048</value> > > </property> > > <property> > > <name>yarn.scheduler.maximum-allocation-mb</name> > > <value>57344</value> > > </property> > > <property> > > <name>yarn.nodemanager.vmem-check-enabled</name> > > <value>false</value> > > <description>Whether virtual memory limits will be enforced for > containers</description> > > </property> > > <property> > > <name>yarn.nodemanager.vmem-pmem-ratio</name> > > <value>4</value> > > <description>Ratio between virtual memory to physical memory when > setting memory limits for containers</description> > > </property> > > > > The snippet of mapred-site.xml is > > <property> > > <name>mapreduce.map.java.opts</name> > > <value>-Xmx1638m</value> > > </property> > > <property> > > <name>mapreduce.reduce.java.opts</name> > > <value>-Xmx3276m</value> > > </property> > > <property> > > <name>mapreduce.map.memory.mb</name> > > <value>2048</value> > > </property> > > <property> > > <name>mapreduce.reduce.memory.mb</name> > > <value>4096</value> > > </property> > > <property> > > <name>mapreduce.task.io.sort.mb</name> > > <value>820</value> > > </property> > > <property> > > <name>mapred.task.timeout</name> > > <value>1200000</value> > > </property> > > > > The snippet of hdfs-site.xml > > <property> > > <name>dfs.blocksize</name> > > <value>1124217344</value> > > </property> > > <property> > > <name>dfs.replication</name> > > <value>1</value> > > </property> > > <property> > > <name>dfs.socket.timeout</name> > > <value>1200000</value> > > </property> > > <property> > > <name>dfs.datanode.socket.write.timeout</name> > > <value>1200000</value> > > </property> > > > > The result of last run of pigmix in mr mode(L9,10,13,14,17 fail). It shows > that the average time spent on one script is nearly *6* hours. I don’t > know whether it really need so *much* time to run L1~L17? Can anyone who > has experience on pigmix share his/her configuration and expected result > with me? > > > > > > MR(sec) > > L1 > > 21544 > > L2 > > 20482 > > L3 > > 21629 > > L4 > > 20905 > > L5 > > 20738 > > L6 > > 24131 > > L7 > > 21983 > > L8 > > 24549 > > L9 > > 6585(Fail) > > L10 > > 22286(Fail) > > L11 > > 21849 > > L12 > > 21266 > > L13 > > 11099(Fail) > > L14 > > 43(Fail) > > L15 > > 23808 > > L16 > > 42889 > > L17 > > 10(Fail) > > > > > > > > Kelly Zhang/Zhang,Liyun > > Best Regards > > >