Let us just take one script L9 for analysis.
    - What was the failure error/stack trace? We run Pigmix with just 1G of
heap. So it cannot be going out of memory.
    - Where was the 6 hours spent? Can you give a breakdown? Are all the
reducer tasks being launched in parallel? For eg: If a reducer normally
takes 30 mins, if it is launched in 6 waves it can take 3 hrs.  Try
lowering reducer memory from -Xmx3276m to -Xmx2048m or -Xmx1638m if that is
the case.



On Tue, Jul 26, 2016 at 12:18 AM, Zhang, Liyun <liyun.zh...@intel.com>
wrote:

> Hi all:
>
>   Now I’m using pigmix to test the performance of Pig On Spark(PIG-4937
> <https://issues.apache.org/jira/browse/PIG-4937>). The test data is 1TB.
> After generating all the test data, I have run first round of test in mr
> mode.
>
> The cluster has 8 nodes(each node has 40 cores and 60g memory, will assign
> 28 cores and 56g for  nodemanager on the node).  Total cores and memory for
> the cluster is 224 cores and 448g memory.
>
>
>
> The snippet of yarn-site.xml:
>
> <property>
>
>     <name>yarn.nodemanager.resource.memory-mb</name>
>
>     <value>57344</value>
>
>     <description>the amount of memory on the NodeManager in
> MB</description>
>
>   </property>
>
>    <property>
>
>     <name>yarn.nodemanager.resource.cpu-vcores</name>
>
>     <value>28</value>
>
>   </property>
>
>   <property>
>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>
>     <value>2048</value>
>
>   </property>
>
>   <property>
>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>
>     <value>57344</value>
>
>   </property>
>
>     <property>
>
>     <name>yarn.nodemanager.vmem-check-enabled</name>
>
>     <value>false</value>
>
>     <description>Whether virtual memory limits will be enforced for
> containers</description>
>
>   </property>
>
>   <property>
>
>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>
>     <value>4</value>
>
>     <description>Ratio between virtual memory to physical memory when
> setting memory limits for containers</description>
>
>   </property>
>
>
>
> The snippet of mapred-site.xml is
>
>   <property>
>
>     <name>mapreduce.map.java.opts</name>
>
>     <value>-Xmx1638m</value>
>
>   </property>
>
>   <property>
>
>     <name>mapreduce.reduce.java.opts</name>
>
>     <value>-Xmx3276m</value>
>
>   </property>
>
>   <property>
>
>     <name>mapreduce.map.memory.mb</name>
>
>     <value>2048</value>
>
>   </property>
>
>   <property>
>
>     <name>mapreduce.reduce.memory.mb</name>
>
>     <value>4096</value>
>
>   </property>
>
>   <property>
>
>     <name>mapreduce.task.io.sort.mb</name>
>
>     <value>820</value>
>
>   </property>
>
>   <property>
>
>     <name>mapred.task.timeout</name>
>
>     <value>1200000</value>
>
>   </property>
>
>
>
> The snippet of hdfs-site.xml
>
> <property>
>
>     <name>dfs.blocksize</name>
>
>     <value>1124217344</value>
>
>   </property>
>
> <property>
>
>   <name>dfs.replication</name>
>
>   <value>1</value>
>
> </property>
>
> <property>
>
> <name>dfs.socket.timeout</name>
>
> <value>1200000</value>
>
> </property>
>
> <property>
>
> <name>dfs.datanode.socket.write.timeout</name>
>
> <value>1200000</value>
>
> </property>
>
>
>
> The result of last run of pigmix in mr mode(L9,10,13,14,17 fail). It shows
> that the average time spent on one script is nearly *6* hours.  I don’t
> know whether it really need so *much* time to run L1~L17?  Can anyone who
> has experience on pigmix share his/her configuration and expected result
> with me?
>
>
>
>
>
> MR(sec)
>
> L1
>
> 21544
>
> L2
>
> 20482
>
> L3
>
> 21629
>
> L4
>
> 20905
>
> L5
>
> 20738
>
> L6
>
> 24131
>
> L7
>
> 21983
>
> L8
>
> 24549
>
> L9
>
> 6585(Fail)
>
> L10
>
> 22286(Fail)
>
> L11
>
> 21849
>
> L12
>
> 21266
>
> L13
>
> 11099(Fail)
>
> L14
>
> 43(Fail)
>
> L15
>
> 23808
>
> L16
>
> 42889
>
> L17
>
> 10(Fail)
>
>
>
>
>
>
>
> Kelly Zhang/Zhang,Liyun
>
> Best Regards
>
>
>

Reply via email to