[ 
https://issues.apache.org/jira/browse/PIG-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5355:
-----------------------------------------
    Description: 
The logic for padding the current row does not consider the updated padded row 
during the comparison. It ends up with different length then expected. This 
results in negative value for {{processed}}.

{code}
            byte[] lastPadded = currRow_;
            if (currRow_.length < endRow_.length) {
                lastPadded = Bytes.padTail(currRow_, endRow_.length - 
currRow_.length);
            }
            if (currRow_.length < startRow_.length) {
                lastPadded = Bytes.padTail(currRow_, startRow_.length - 
currRow_.length);
            }

            byte [] prependHeader = {1, 0};
            BigInteger bigLastRow = new BigInteger(Bytes.add(prependHeader, 
lastPadded));
            if (bigLastRow.compareTo(bigEnd_) > 0) {
                return progressSoFar_;
            }
            BigDecimal processed = new 
BigDecimal(bigLastRow.subtract(bigStart_));
{code}
The fix is to use {{lastPadded}} in the second {{if}} comparison and 
{{Bytes.padTail}} call inside that {{if}}

PIG-4700 added progress reporting. This enabled ProgressHelper in Tez. It calls 
{{getProgress}} [here 
|https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/common/ProgressHelper.java#L50]
 on {{PigRecrodReader}} 
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L159
 . Since Pig is reporting negative progress, job is getting killed by AM.
 

 

  was:
The logic for padding the current row does not consider the updated padded row 
during the comparison. It ends up with different length then expected. This 
results in negative value for {{processed}}.

{code}
            byte[] lastPadded = currRow_;
            if (currRow_.length < endRow_.length) {
                lastPadded = Bytes.padTail(currRow_, endRow_.length - 
currRow_.length);
            }
            if (currRow_.length < startRow_.length) {
                lastPadded = Bytes.padTail(currRow_, startRow_.length - 
currRow_.length);
            }

            byte [] prependHeader = {1, 0};
            BigInteger bigLastRow = new BigInteger(Bytes.add(prependHeader, 
lastPadded));
            if (bigLastRow.compareTo(bigEnd_) > 0) {
                return progressSoFar_;
            }
            BigDecimal processed = new 
BigDecimal(bigLastRow.subtract(bigStart_));
{code}
The fix is to use {{lastPadded}} in the second {{if}} comparison and 
{{Bytes.padTail}} call inside that {{if}}
 

 


> Negative progress report by HBaseTableRecordReader
> --------------------------------------------------
>
>                 Key: PIG-5355
>                 URL: https://issues.apache.org/jira/browse/PIG-5355
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>
> The logic for padding the current row does not consider the updated padded 
> row during the comparison. It ends up with different length then expected. 
> This results in negative value for {{processed}}.
> {code}
>             byte[] lastPadded = currRow_;
>             if (currRow_.length < endRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, endRow_.length - 
> currRow_.length);
>             }
>             if (currRow_.length < startRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, startRow_.length - 
> currRow_.length);
>             }
>             byte [] prependHeader = {1, 0};
>             BigInteger bigLastRow = new BigInteger(Bytes.add(prependHeader, 
> lastPadded));
>             if (bigLastRow.compareTo(bigEnd_) > 0) {
>                 return progressSoFar_;
>             }
>             BigDecimal processed = new 
> BigDecimal(bigLastRow.subtract(bigStart_));
> {code}
> The fix is to use {{lastPadded}} in the second {{if}} comparison and 
> {{Bytes.padTail}} call inside that {{if}}
> PIG-4700 added progress reporting. This enabled ProgressHelper in Tez. It 
> calls {{getProgress}} [here 
> |https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/common/ProgressHelper.java#L50]
>  on {{PigRecrodReader}} 
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L159
>  . Since Pig is reporting negative progress, job is getting killed by AM.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to