[
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712581#comment-15712581
]
Rohini Palaniswamy commented on PIG-3891:
-----------------------------------------
Comments:
- CHANGES.txt will be modified when committing. Need not make any changes
to that as part of patch
- Please revert changes to ExecType and TezMiniCluster. We can't have
public static changed to package protected as it is already being used by
users. Once PIG-4923 goes in, we can add TEZ and SPARK there.
- In TestMRJobStats, can you change "The returned output size is expected to
be the same as the file size" to "The returned output size is expected to be
sum of file sizes in the sub-directories"
- We try to avoid if (Tez) else (MR) conditions as much as possible in
tests. For testOutputStats test in TestMultiStorage, can we just do following
asserts and put hardcoded values instead of getting values from MR and Tez
counters. That way test is more solid. Also please do add a FILTER statement
for out2 to filter couple of records so that bytes and records are not same as
out1.
{code}
Map<String, Long> multiStoreCounters = dagStats.getMultiStoreCounters();
+ PigStats stats = job.getStatistics();
+ assertEquals(HardCodedValueHere, stats.getBytesWritten());
+ List<OutputStats> outputStats = SimplePigStats.get().getOutputStats();
+ assertEquals(2, outputStats.size()); // 2 split conditions
+ assertEquals(HardCodedValueHere, outputStats.get(0).getBytes());
+ assertEquals(HardCodedValueHere, outputStats.get(1).getBytes());
+ assertEquals(HardCodedValueHere, outputStats.get(0).getRecords());
+ assertEquals(HardCodedValueHere, outputStats.get(1).getRecords());
+ assertEquals(9L, multiStoreCounters.get("Output records in
_1_out2").longValue());
+ assertEquals(9L, multiStoreCounters.get("Output records in
_0_out1").longValue());
{code}
> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
> Key: PIG-3891
> URL: https://issues.apache.org/jira/browse/PIG-3891
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Rohini Palaniswamy
> Assignee: Nandor Kollar
> Attachments: PIG-3891-1.patch, PIG-3891-2.patch, PIG-3891-3.patch,
> PIG-3891-4.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output
> directory. So if files are stored under subdirectories (For eg:
> MultiStorage), it does not have the bytes written correctly.
> 0.11 shows the correct number of total bytes written and this is a
> regression. A quick look at the code shows that the
> JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and
> code is same as FileBasedOutputSizeReader. Need to investigate where the
> correct value comes from in 0.11 and fix it in 0.12.1/0.13.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)