[
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863416#action_12863416
]
Namit Jain commented on HIVE-1332:
----------------------------------
DDLSemanticAnalyzer.java
622 private void analyzeAlterTableArchive(CommonTree ast, boolean
isUnArchive)
623 throws SemanticException {
624
625 if
(!conf.getBoolVar(HiveConf.ConfVars.HIVEARCHIVEENABLED)) {
626 throw new SemanticException("Archiving methods are
currently disabled. " +
627 "Please see the Hive wiki for more information about
enabling archiving.");
628
629 }
630 String tblName =
unescapeIdentifier(ast.getChild(0).getText());
631 // partition name to value
632 List<Map<String, String>> partSpecs =
getPartitionSpecs(ast);
633 if (partSpecs.size() > 1 ) {
634 throw new SemanticException(isUnArchive ? "UNARCHIVE" :
"ARCHIVE" +
635 " can only be run on a single partition");
636 }
637 if (partSpecs.size() == 0) {
638 throw new SemanticException("ARCHIVE can only be run on
partitions");
Add the error messages in ErrorMsg.java, and add negative tests for all of them.
DDLTask.java
413 // Means user specified a table
414 if (simpleDesc.getPartSpec() == null) {
415 throw new HiveException("ARCHIVE is for partitions
only");
416 }
Shouldn't this be checked in DDLSemanticAnalyzer instead ?
Same as above:
421 if (tbl.getTableType() != TableType.MANAGED_TABLE) {
422 throw new HiveException("ARCHIVE can only be performed
on managed tables");
423 }
and:
429 if (isArchived(p)) {
430 throw new HiveException("Specified partition is already
archived");
431 }
One check that seems to be missing:
if we have multilple partition columns, say ds and hr.
and if the user tries to archive just by specifying ds, should that be allowed ?
I dont think it will work - are you checking that ?
> Archiving partitions
> --------------------
>
> Key: HIVE-1332
> URL: https://issues.apache.org/jira/browse/HIVE-1332
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore
> Affects Versions: 0.6.0
> Reporter: Paul Yang
> Assignee: Paul Yang
> Attachments: HIVE-1332.1.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An
> issue is that as the number of files increase, there will be higher
> memory/load requirements on the namenode. Partitions in bucketed tables are a
> particular problem because they consist of many files, one for each of the
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION
> <spec> that would automatically put the files for the partition into a HAR
> file. We would also have an UNARCHIVE option to convert the files in the
> partition back to the original files. Archived partitions would be slower to
> access, but they would have the same functionality and decrease the number of
> files drastically. Typically, only seldom accessed partitions would be
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.