[jira] [Commented] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

Rohini Palaniswamy (JIRA) Thu, 24 Apr 2014 12:05:38 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980106#comment-13980106
 ]


Rohini Palaniswamy commented on PIG-3891:
-----------------------------------------

bq. 0.11 shows the correct number of total bytes written and this is a 
regression. A quick look at the code shows that the 
JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and code 
is same as FileBasedOutputSizeReader. Need to investigate where the correct 
value comes from in 0.11 and fix it in 0.12.1/0.13

  It is not a regression. I was checking case of single store (in 0.11) vs 
multi store (in trunk). Single store always uses the mapreduce counter to get 
the hdfs bytes written and does not use FileBasedOutputSizeReader. 

> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Mona Chitnis
>
> FileBasedOutputSizeReader only includes files in the top level output 
> directory. So if files are stored under subdirectories (For eg: 
> MultiStorage), it does not have the bytes written correctly. 
> 0.11 shows the correct number of total bytes written and this is a 
> regression. A quick look at the code shows that the 
> JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and 
> code is same as  FileBasedOutputSizeReader. Need to investigate where the 
> correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

Reply via email to