[ https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902065#action_12902065 ]
Thejas M Nair commented on PIG-1501: ------------------------------------ Comments on the patch - TFileStorage.java - getSchema() code that determines schema from data is same across TFileStorage and InterStorage . The code in BinStorage is also same, except that it does uses some deprecated functions. That can be moved to a common util class. (Yes, I should have moved it to a util class when I created InterStorage) TestTmpFileCompression.java - both tests test if TFile is getting used. I think one test can be changed to check if InterStorage gets used when compression is not turned on, or a check can be added to any other existing test case that runs MR job, to see if InterStorage gets used there. - log setup code is duplicated between setup and resetLog() . can be moved to common func SampleOptimizer.java - The following comment can be updated - // check that it is using BinaryStorage. to // check that it is using the temp file storage format. TFileRecordWriter.java , - the comment in following section does not seem to be valid anymore - {code} public TFileRecordWriter(Path file, String codec, Configuration conf) + throws IOException { + // hardcoded to use gzip and 1M as block size: may wish to be made configurable {code} > need to investigate the impact of compression on pig performance > ---------------------------------------------------------------- > > Key: PIG-1501 > URL: https://issues.apache.org/jira/browse/PIG-1501 > Project: Pig > Issue Type: Test > Reporter: Olga Natkovich > Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: compress_perf_data.txt, compress_perf_data_2.txt, > PIG-1501.patch, PIG-1501.patch > > > We would like to understand how compressing map results as well as well as > reducer output in a chain of MR jobs impacts performance. We can use PigMix > queries for this investigation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.