[ https://issues.apache.org/jira/browse/PIG-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gaurav Jain reassigned PIG-1411: -------------------------------- Assignee: Gaurav Jain > [Zebra] Can Zebra use HAR to reduce file/block count for namenode > ----------------------------------------------------------------- > > Key: PIG-1411 > URL: https://issues.apache.org/jira/browse/PIG-1411 > Project: Pig > Issue Type: New Feature > Components: impl > Affects Versions: 0.8.0 > Reporter: Gaurav Jain > Assignee: Gaurav Jain > Priority: Minor > Fix For: 0.8.0 > > > Due to column group structure, Zebra can create extra files for namenode to > remember. That means namenode taking more memory for Zebra related files. > The goal is to reduce the no of files/blocks > The idea among various options is to use HAR ( Hadoop Archive ). Hadoop > Archive reduces the block and file count by copying data from small files ( > 1M, 2M ...) into a hdfs-block of larger size. Thus, reducing the total no. of > blocks and files. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.