[ 
https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-trunk-7.patch

Hi, Daniel, Rohini,
I implemented the required optimization which ensures that the local and HDFS 
directories are created only when needed (on demand).
These changes are in newly attached "PIG-2898-trunk-7.patch".

The idea of the fix is that we splitted methods #globalSetup() and 
#globalCleanup() into 2 parts: new methods #globalSetup2() and #globalClenup2() 
methods introduced. The method #globalSetup2() only invoked if there is some 
test to execute, and #globalCleanup2() is only invoked if #globalSetup2() was 
invoked.

Also I in this patch I reverted one of previous changes that changed 
IPC::Run::run('mkdir' ...) to "mkpath" perl call because "mkpath" appears to 
have (at lest on my perl implementation 5.14.2) quite strange feature: it 
returns non-zero exit status with "No such file or directory" message if the 
directory we're attempting to create already exists. This behavior is 
unexpected and confusing because it contradicts to native "mkdir -p" and 
java.io.File#mkdirs() behavior. So, despite of the fact that IPC::Run::run is 
slower, I prefer to use it to avoid developer's trouble.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, 
> PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The 
> bottleneck here is the client side, and per our observations it can help a 
> lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable 
> number of threads. Preliminary results show more than 6x reduction in 
> execution time when using a small 3-nodes M/R cluster with modest 
> configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to