[ 
https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863525#action_12863525
 ] 

Paul Yang commented on HIVE-1332:
---------------------------------

Yeah, the way the patch is now, concurrent operations were not supported as it 
was assumed these commands were going to be run via a single cron job. But that 
is probably not a good assumption to make. And the priority was to make the 
order of operations to prevent data loss in case of any failures. The reason 
why the (un)archive operation is tricky to do concurrently is because there are 
no ways to lock a partition/table (HIVE-1293) and there are no ways to 
atomically make a filesystem and metadata change. But there are ways of 
addressing these concurrency issues while preserving data during failure 
scenarios:

Archiving a partition using a conservative approach would involve something 
like (as discussed with Namit):

1. Create a copy of the partition, call it ds=1.copy
2. Alter metadata's location to point to ds=1.copy
-- At this point failures are okay as the copy is not touched
3. Make the archive of the partition directory in a tmp directory
4. Remove the directory ds=1
5. Move the tmp directory to ds=1
6. Alter metadata's location to point to har:/...ds=1
7. Delete ds=1.copy

These set of steps would ensure that no matter when failure occurs, subsequent 
queries on the partition will continue to succeed. However, this approach 
incurs the overhead of having to make a copy of the partition, which can be 
significant. Another approach is to:

1. Make the archive of the partition in a tmp directory
2. Move the archive folder to ds=1.copy
3. Move ds=1 to ds=1.old
4. Move ds=1.copy to ds=1
5. Alter the metada to change the location to har:/...ds=1

The drawback to this approach is that if a failure occurs, subsequent queries 
will not be able to properly access the data. However, the archive command can 
be run again to recover from the situation.

Also since the semantics for FileSystem.rename() do not throw an error if the 
destination directory already exists, there is a small window for data 
duplication. However, this issue is already present in INSERT OVERWRITE... 
These will be addressed with lock support.


> Archiving partitions
> --------------------
>
>                 Key: HIVE-1332
>                 URL: https://issues.apache.org/jira/browse/HIVE-1332
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 0.6.0
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>         Attachments: HIVE-1332.1.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An 
> issue is that as the number of files increase, there will be higher 
> memory/load requirements on the namenode. Partitions in bucketed tables are a 
> particular problem because they consist of many files, one for each of the 
> buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION 
> <spec> that would automatically put the files for the partition into a HAR 
> file. We would also have an UNARCHIVE option to convert the files in the 
> partition back to the original files. Archived partitions would be slower to 
> access, but they would have the same functionality and decrease the number of 
> files drastically. Typically, only seldom accessed partitions would be 
> archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the 
> latest released major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could 
> potentially cause data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to