[
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Yang updated HIVE-1332:
----------------------------
Status: Patch Available (was: Open)
> Archiving partitions
> --------------------
>
> Key: HIVE-1332
> URL: https://issues.apache.org/jira/browse/HIVE-1332
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore
> Affects Versions: 0.6.0
> Reporter: Paul Yang
> Assignee: Paul Yang
> Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch,
> HIVE-1332.4.patch, HIVE-1332.5.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An
> issue is that as the number of files increase, there will be higher
> memory/load requirements on the namenode. Partitions in bucketed tables are a
> particular problem because they consist of many files, one for each of the
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION
> <spec> that would automatically put the files for the partition into a HAR
> file. We would also have an UNARCHIVE option to convert the files in the
> partition back to the original files. Archived partitions would be slower to
> access, but they would have the same functionality and decrease the number of
> files drastically. Typically, only seldom accessed partitions would be
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.