[ 
https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1501:
--------------------------

    Attachment: compress_perf_data_2.txt

The data set in the last tests are small such that the performance difference 
was lost in background noise.  This test case generates more temporary data.

In summary, lzo generates about 3% compression ration and sees 4x  speed 
improvement than uncompressed;  gzip generates less than 1% compress ratio but 
the speed is 1%-2% slower than uncompressed. This observation is in line with 
the general observation that gzip compresses better but performs worse.

> need to investigate the impact of compression on pig performance
> ----------------------------------------------------------------
>
>                 Key: PIG-1501
>                 URL: https://issues.apache.org/jira/browse/PIG-1501
>             Project: Pig
>          Issue Type: Test
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>         Attachments: compress_perf_data.txt, compress_perf_data_2.txt
>
>
> We would like to understand how compressing map results as well as well as 
> reducer output in a chain of MR jobs impacts performance. We can use PigMix 
> queries for this investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to