[jira] Commented: (HIVE-1332) Archiving partitions

Namit Jain (JIRA) Mon, 03 May 2010 10:36:23 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863416#action_12863416
 ]


Namit Jain commented on HIVE-1332:
----------------------------------

DDLSemanticAnalyzer.java

    622 private void analyzeAlterTableArchive(CommonTree ast, boolean 
isUnArchive)
                623     throws SemanticException {
                624
                625     if 
(!conf.getBoolVar(HiveConf.ConfVars.HIVEARCHIVEENABLED)) {
                626     throw new SemanticException("Archiving methods are 
currently disabled. " +
                627     "Please see the Hive wiki for more information about 
enabling archiving.");
                628
                629     }
                630     String tblName = 
unescapeIdentifier(ast.getChild(0).getText());
                631     // partition name to value
                632     List<Map<String, String>> partSpecs = 
getPartitionSpecs(ast);
                633     if (partSpecs.size() > 1 ) {
                634     throw new SemanticException(isUnArchive ? "UNARCHIVE" : 
"ARCHIVE" +
                635     " can only be run on a single partition");
                636     }
                637     if (partSpecs.size() == 0) {
                638     throw new SemanticException("ARCHIVE can only be run on 
partitions");


Add the error messages in ErrorMsg.java, and add negative tests for all of them.



DDLTask.java
    413  // Means user specified a table
                414     if (simpleDesc.getPartSpec() == null) {
                415     throw new HiveException("ARCHIVE is for partitions 
only");
                416     }

Shouldn't this be checked in DDLSemanticAnalyzer instead ?



Same as above:

    421  if (tbl.getTableType() != TableType.MANAGED_TABLE)  {
                422     throw new HiveException("ARCHIVE can only be performed 
on managed tables");
                423     }


and:

    429  if (isArchived(p)) {
                430     throw new HiveException("Specified partition is already 
archived");
                431     }


One check that seems to be missing:

if we have multilple partition columns, say ds and hr.

and if the user tries to archive just by specifying ds, should that be allowed ?
I dont think it will work - are you checking that ?




> Archiving partitions
> --------------------
>
>                 Key: HIVE-1332
>                 URL: https://issues.apache.org/jira/browse/HIVE-1332
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 0.6.0
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>         Attachments: HIVE-1332.1.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An 
> issue is that as the number of files increase, there will be higher 
> memory/load requirements on the namenode. Partitions in bucketed tables are a 
> particular problem because they consist of many files, one for each of the 
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION 
> <spec> that would automatically put the files for the partition into a HAR 
> file. We would also have an UNARCHIVE option to convert the files in the 
> partition back to the original files. Archived partitions would be slower to 
> access, but they would have the same functionality and decrease the number of 
> files drastically. Typically, only seldom accessed partitions would be 
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the 
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1332) Archiving partitions

Reply via email to