[jira] Updated: (HIVE-1332) Archiving partitions

Paul Yang (JIRA) Mon, 07 Jun 2010 14:04:11 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Yang updated HIVE-1332:
----------------------------

    Status: Patch Available  (was: Open)

> Archiving partitions
> --------------------
>
>                 Key: HIVE-1332
>                 URL: https://issues.apache.org/jira/browse/HIVE-1332
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 0.6.0
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>         Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
> HIVE-1332.4.patch, HIVE-1332.5.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An 
> issue is that as the number of files increase, there will be higher 
> memory/load requirements on the namenode. Partitions in bucketed tables are a 
> particular problem because they consist of many files, one for each of the 
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION 
> <spec> that would automatically put the files for the partition into a HAR 
> file. We would also have an UNARCHIVE option to convert the files in the 
> partition back to the original files. Archived partitions would be slower to 
> access, but they would have the same functionality and decrease the number of 
> files drastically. Typically, only seldom accessed partitions would be 
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the 
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1332) Archiving partitions

Reply via email to