[jira] [Commented] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

Mona Chitnis (JIRA) Wed, 23 Apr 2014 13:57:13 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978886#comment-13978886
 ]


Mona Chitnis commented on PIG-3891:
-----------------------------------

above observation was due to a misconfigured single-node cluster. Now able to 
reproduce that problem lies only in output records and bytes written

> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Mona Chitnis
>
> FileBasedOutputSizeReader only includes files in the top level output 
> directory. So if files are stored under subdirectories (For eg: 
> MultiStorage), it does not have the bytes written correctly. 
> 0.11 shows the correct number of total bytes written and this is a 
> regression. A quick look at the code shows that the 
> JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and 
> code is same as  FileBasedOutputSizeReader. Need to investigate where the 
> correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

Reply via email to