[jira] Updated: (HIVE-1332) Archiving partitions

2010-07-23 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1332:
-

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.6.0)

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.6.0

 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch, HIVE-1332.5.patch, HIVE-1332.6.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-06-08 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Attachment: HIVE-1332.6.patch

This should fix the test issues - I'm re-running the test suite now but let me 
know if you see anything.

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch, HIVE-1332.5.patch, HIVE-1332.6.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-06-08 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1332:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Paul

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch, HIVE-1332.5.patch, HIVE-1332.6.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-06-07 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Attachment: HIVE-1332.5.patch

Updated to current trunk.

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch, HIVE-1332.5.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-06-07 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Status: Patch Available  (was: Open)

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch, HIVE-1332.5.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-05-28 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Status: Open  (was: Patch Available)

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-05-26 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Attachment: HIVE-1332.4.patch

This doesn't incorporate the hadoop-version aware test framework, but here's 
another version that has some additional fixes/tests.

* Handled table renaming
* Added a check when creating partitions to catch reserved values
* Additional tests for above

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch, 
 HIVE-1332.4.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-05-20 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Attachment: HIVE-1332.3.patch

* Moved some constants to thrift definition

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-05-20 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Status: Patch Available  (was: Open)

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch, HIVE-1332.2.patch, HIVE-1332.3.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1332) Archiving partitions

2010-04-30 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1332:


Attachment: HIVE-1332.1.patch

 Archiving partitions
 

 Key: HIVE-1332
 URL: https://issues.apache.org/jira/browse/HIVE-1332
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1332.1.patch


 Partitions and tables in Hive typically consist of many files on HDFS. An 
 issue is that as the number of files increase, there will be higher 
 memory/load requirements on the namenode. Partitions in bucketed tables are a 
 particular problem because they consist of many files, one for each of the 
 buckets.
 One way to drastically reduce the number of files is to use hadoop archives:
 http://hadoop.apache.org/common/docs/current/hadoop_archives.html
 This feature would introduce an ALTER TABLE table_name ARCHIVE PARTITION 
 spec that would automatically put the files for the partition into a HAR 
 file. We would also have an UNARCHIVE option to convert the files in the 
 partition back to the original files. Archived partitions would be slower to 
 access, but they would have the same functionality and decrease the number of 
 files drastically. Typically, only seldom accessed partitions would be 
 archived.
 Hadoop archives are still somewhat new, so we'll only put in support for the 
 latest released major version (0.20). Here are some bug fixes:
 https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
 potentially cause data loss without this fix)
 https://issues.apache.org/jira/browse/HADOOP-6645
 https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.