[ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan A. Veselovsky updated PIG-2898: ------------------------------------ Attachment: PIG-2898-trunk-7.patch Hi, Daniel, Rohini, I implemented the required optimization which ensures that the local and HDFS directories are created only when needed (on demand). These changes are in newly attached "PIG-2898-trunk-7.patch". The idea of the fix is that we splitted methods #globalSetup() and #globalCleanup() into 2 parts: new methods #globalSetup2() and #globalClenup2() methods introduced. The method #globalSetup2() only invoked if there is some test to execute, and #globalCleanup2() is only invoked if #globalSetup2() was invoked. Also I in this patch I reverted one of previous changes that changed IPC::Run::run('mkdir' ...) to "mkpath" perl call because "mkpath" appears to have (at lest on my perl implementation 5.14.2) quite strange feature: it returns non-zero exit status with "No such file or directory" message if the directory we're attempting to create already exists. This behavior is unexpected and confusing because it contradicts to native "mkdir -p" and java.io.File#mkdirs() behavior. So, despite of the fact that IPC::Run::run is slower, I prefer to use it to avoid developer's trouble. > Parallel execution of e2e tests > ------------------------------- > > Key: PIG-2898 > URL: https://issues.apache.org/jira/browse/PIG-2898 > Project: Pig > Issue Type: Improvement > Components: e2e harness > Affects Versions: 0.10.0 > Reporter: Andrey Klochkov > Assignee: Ivan A. Veselovsky > Labels: test > Attachments: PIG-2898-branch-0.10-6-final.patch, > PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch > > > Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The > bottleneck here is the client side, and per our observations it can help a > lot if the e2e harness would be able to run tests in parallel threads. > We prototyped changes in e2e harness allowing to run tests in a configurable > number of threads. Preliminary results show more than 6x reduction in > execution time when using a small 3-nodes M/R cluster with modest > configuration. Going to share a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira