[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1332: - Fix Version/s: 0.6.0 Affects Version/s: (was: 0.6.0) Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Reporter: Paul Yang Assignee: Paul Yang Fix For: 0.6.0 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch, HIVE-1332.5.patch, HIVE-1332.6.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Attachment: HIVE-1332.6.patch This should fix the test issues - I'm re-running the test suite now but let me know if you see anything. Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch, HIVE-1332.5.patch, HIVE-1332.6.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1332: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks Paul Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch, HIVE-1332.5.patch, HIVE-1332.6.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Attachment: HIVE-1332.5.patch Updated to current trunk. Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch, HIVE-1332.5.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Status: Patch Available (was: Open) Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch, HIVE-1332.5.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Status: Open (was: Patch Available) Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Attachment: HIVE-1332.4.patch This doesn't incorporate the hadoop-version aware test framework, but here's another version that has some additional fixes/tests. * Handled table renaming * Added a check when creating partitions to catch reserved values * Additional tests for above Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, HIVE-1332.4.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Attachment: HIVE-1332.3.patch * Moved some constants to thrift definition Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Status: Patch Available (was: Open) Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1332) Archiving partitions
[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1332: Attachment: HIVE-1332.1.patch Archiving partitions Key: HIVE-1332 URL: https://issues.apache.org/jira/browse/HIVE-1332 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1332.1.patch Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. One way to drastically reduce the number of files is to use hadoop archives: http://hadoop.apache.org/common/docs/current/hadoop_archives.html This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION spec that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) https://issues.apache.org/jira/browse/HADOOP-6645 https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.