[ 
https://issues.apache.org/jira/browse/PIG-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771579#action_12771579
 ] 

Sriranjan Manjunath commented on PIG-1048:
------------------------------------------

The reason this issue was happening was because for a key with 1 value, we were 
allocating 2 reducers. The culprit being:

                        // number of reducers
-                       Integer cnt = 0;
-                       if (minIndex < maxIndex) {
-                               cnt = maxIndex - minIndex;
-                       } else {
-                               cnt = totalReducers[0] + maxIndex - minIndex;
-                       }
-

If maxIndex = minIndex  = 0 and totalReducers was 1, cnt would get a value of 1 
instead of 0!

cnt was based on 0-index whereas, totalReducers was based on 1-index. This 
resulted in POParitionRearrange distributing the tuple to 1 more than the 
required amount of reducers.

The fix is to always set the value of cnt to "maxIndex - minIndex". The code 
guarantees that maxIndex is always greater than minIndex.

> inner join using 'skewed' produces multiple rows for keys with single row in 
> both input relations
> -------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1048
>                 URL: https://issues.apache.org/jira/browse/PIG-1048
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>            Assignee: Sriranjan Manjunath
>
> ${code}
> grunt> cat students.txt                           
> asdfxc  M       23      12.44
> qwer    F       21      14.44
> uhsdf   M       34      12.11
> zxldf   M       21      12.56
> qwer    F       23      145.5
> oiue    M       54      23.33
>  l1 = load 'students.txt';            
> l2 = load 'students.txt';                  
> j = join l1 by $0, l2 by $0 ; 
> store j into 'tmp.txt'             
> grunt> cat tmp.txt
> oiue    M       54      23.33   oiue    M       54      23.33
> oiue    M       54      23.33   oiue    M       54      23.33
> qwer    F       21      14.44   qwer    F       21      14.44
> qwer    F       21      14.44   qwer    F       23      145.5
> qwer    F       23      145.5   qwer    F       21      14.44
> qwer    F       23      145.5   qwer    F       23      145.5
> uhsdf   M       34      12.11   uhsdf   M       34      12.11
> uhsdf   M       34      12.11   uhsdf   M       34      12.11
> zxldf   M       21      12.56   zxldf   M       21      12.56
> zxldf   M       21      12.56   zxldf   M       21      12.56
> asdfxc  M       23      12.44   asdfxc  M       23      12.44
> asdfxc  M       23      12.44   asdfxc  M       23      12.44$
> ${code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to