[jira] [Commented] (PIG-5071) MapReduce concurrency Could Be Better

Daniel Dai (JIRA) Wed, 07 Dec 2016 21:41:33 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731184#comment-15731184
 ]


Daniel Dai commented on PIG-5071:
---------------------------------

Yes, that's the best solution if works for you.

> MapReduce concurrency Could Be Better
> -------------------------------------
>
>                 Key: PIG-5071
>                 URL: https://issues.apache.org/jira/browse/PIG-5071
>             Project: Pig
>          Issue Type: Wish
>            Reporter: William Watson
>
> We have a job that launches, after optimization, about 20 MapReduce jobs. 
> Some of these are quite long running and while pig does an okay job of 
> running jobs concurrently, it could do better at least in this very specific 
> case.
> The pig job can be divided up amongst 4 major sections like so:
> A1 -> A2 -> A3 -> A4 -> A
> B1 -> B2 -> B
> C1 -> C2 -> C3 -> C
> D1 -> D2 -> D3 -> D4 -> D
> and the sections are joined at the end:
> A + B -> AB
> AB + C -> ABC
> ABC + D -> ABCD
> In short, if C2 finishes very quickly, C3 won't be started until A2, B2, and 
> D2 are all also complete. This is a problem if say, D2 takes an hour and 
> there are unused cluster resources that could be made available to C3 (and by 
> extension A3 and B3 if their prerequisites also finish before D2).
> One possible work around is to scale D2 better, but that's besides the point. 
> I think pig is capable of knowing that the prerequisites are done for certain 
> jobs, but since it only kicks off jobs in "phases", it won't kick off jobs as 
> soon as possible.
> I've taken a look at the code and I'm having a hard time working out where 
> the issue is or else I would be glad to contribute a patch. 
> Is this a desirable feature and is this directly controlled by pig? If so, 
> could someone help point me in the right direction so I can contribute a 
> patch?
> Note: We can change this from a "wish" to an "improvement" if this feature is 
> desired...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-5071) MapReduce concurrency Could Be Better

Reply via email to