Archiving partitions
--------------------

                 Key: HIVE-1332
                 URL: https://issues.apache.org/jira/browse/HIVE-1332
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Metastore
    Affects Versions: 0.6.0
            Reporter: Paul Yang
            Assignee: Paul Yang


Partitions and tables in Hive typically consist of many files on HDFS. An issue 
is that as the number of files increase, there will be higher memory/load 
requirements on the namenode. Partitions in bucketed tables are a particular 
problem because they consist of many files, one for each of the buckets.

One way to drastically reduce the number of files is to use hadoop archives:
http://hadoop.apache.org/common/docs/current/hadoop_archives.html

This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION 
<spec> that would automatically put the files for the partition into a HAR 
file. We would also have an UNARCHIVE option to convert the files in the 
partition back to the original files. Archived partitions would be slower to 
access, but they would have the same functionality and decrease the number of 
files drastically. Typically, only seldom accessed partitions would be archived.

Hadoop archives are still somewhat new, so we'll only put in support for the 
latest released major version (0.20). Here are some bug fixes:

https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
potentially cause data loss without this fix)
https://issues.apache.org/jira/browse/HADOOP-6645
https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to