[
https://issues.apache.org/jira/browse/HCATALOG-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated HCATALOG-448:
--------------------------------------
Attachment: hcatalog-448-for-0.4-codebase.patch
Attaching a patch which was tested with 0.4 code base.
Sample script used for testing:
================================
a = load 'hit_data' using org.apache.hcatalog.pig.HCatLoader();
b = filter a by (load_date == '20120515');
store b into 'duplicate_hit_data' using org.apache.hcatalog.pig.HCatStorer();
Results:
========
1. Without patch:
-----------------
Job took 27 minutes to process 2,551,157 records and writing 3,993,443,494
bytes to HDFS
2. With Patch:
-----------------
Job took 2 minutes to process 2,551,157 records and writing 3,993,443,494 bytes
to HDFS
> HCatStorer performance is 4x slower in HCat 0.4 than HCat 0.2
> -------------------------------------------------------------
>
> Key: HCATALOG-448
> URL: https://issues.apache.org/jira/browse/HCATALOG-448
> Project: HCatalog
> Issue Type: Bug
> Affects Versions: 0.4.1
> Reporter: Rohini Palaniswamy
> Assignee: Mithun Radhakrishnan
> Priority: Critical
> Attachments: hcatalog-448-for-0.4-codebase.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira