[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863416#action_12863416 ]
Namit Jain commented on HIVE-1332: ---------------------------------- DDLSemanticAnalyzer.java 622 private void analyzeAlterTableArchive(CommonTree ast, boolean isUnArchive) 623 throws SemanticException { 624 625 if (!conf.getBoolVar(HiveConf.ConfVars.HIVEARCHIVEENABLED)) { 626 throw new SemanticException("Archiving methods are currently disabled. " + 627 "Please see the Hive wiki for more information about enabling archiving."); 628 629 } 630 String tblName = unescapeIdentifier(ast.getChild(0).getText()); 631 // partition name to value 632 List<Map<String, String>> partSpecs = getPartitionSpecs(ast); 633 if (partSpecs.size() > 1 ) { 634 throw new SemanticException(isUnArchive ? "UNARCHIVE" : "ARCHIVE" + 635 " can only be run on a single partition"); 636 } 637 if (partSpecs.size() == 0) { 638 throw new SemanticException("ARCHIVE can only be run on partitions"); Add the error messages in ErrorMsg.java, and add negative tests for all of them. DDLTask.java 413 // Means user specified a table 414 if (simpleDesc.getPartSpec() == null) { 415 throw new HiveException("ARCHIVE is for partitions only"); 416 } Shouldn't this be checked in DDLSemanticAnalyzer instead ? Same as above: 421 if (tbl.getTableType() != TableType.MANAGED_TABLE) { 422 throw new HiveException("ARCHIVE can only be performed on managed tables"); 423 } and: 429 if (isArchived(p)) { 430 throw new HiveException("Specified partition is already archived"); 431 } One check that seems to be missing: if we have multilple partition columns, say ds and hr. and if the user tries to archive just by specifying ds, should that be allowed ? I dont think it will work - are you checking that ? > Archiving partitions > -------------------- > > Key: HIVE-1332 > URL: https://issues.apache.org/jira/browse/HIVE-1332 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore > Affects Versions: 0.6.0 > Reporter: Paul Yang > Assignee: Paul Yang > Attachments: HIVE-1332.1.patch > > > Partitions and tables in Hive typically consist of many files on HDFS. An > issue is that as the number of files increase, there will be higher > memory/load requirements on the namenode. Partitions in bucketed tables are a > particular problem because they consist of many files, one for each of the > buckets. > One way to drastically reduce the number of files is to use hadoop archives: > http://hadoop.apache.org/common/docs/current/hadoop_archives.html > This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION > <spec> that would automatically put the files for the partition into a HAR > file. We would also have an UNARCHIVE option to convert the files in the > partition back to the original files. Archived partitions would be slower to > access, but they would have the same functionality and decrease the number of > files drastically. Typically, only seldom accessed partitions would be > archived. > Hadoop archives are still somewhat new, so we'll only put in support for the > latest released major version (0.20). Here are some bug fixes: > https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could > potentially cause data loss without this fix) > https://issues.apache.org/jira/browse/HADOOP-6645 > https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.