[ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863525#action_12863525 ]
Paul Yang commented on HIVE-1332: --------------------------------- Yeah, the way the patch is now, concurrent operations were not supported as it was assumed these commands were going to be run via a single cron job. But that is probably not a good assumption to make. And the priority was to make the order of operations to prevent data loss in case of any failures. The reason why the (un)archive operation is tricky to do concurrently is because there are no ways to lock a partition/table (HIVE-1293) and there are no ways to atomically make a filesystem and metadata change. But there are ways of addressing these concurrency issues while preserving data during failure scenarios: Archiving a partition using a conservative approach would involve something like (as discussed with Namit): 1. Create a copy of the partition, call it ds=1.copy 2. Alter metadata's location to point to ds=1.copy -- At this point failures are okay as the copy is not touched 3. Make the archive of the partition directory in a tmp directory 4. Remove the directory ds=1 5. Move the tmp directory to ds=1 6. Alter metadata's location to point to har:/...ds=1 7. Delete ds=1.copy These set of steps would ensure that no matter when failure occurs, subsequent queries on the partition will continue to succeed. However, this approach incurs the overhead of having to make a copy of the partition, which can be significant. Another approach is to: 1. Make the archive of the partition in a tmp directory 2. Move the archive folder to ds=1.copy 3. Move ds=1 to ds=1.old 4. Move ds=1.copy to ds=1 5. Alter the metada to change the location to har:/...ds=1 The drawback to this approach is that if a failure occurs, subsequent queries will not be able to properly access the data. However, the archive command can be run again to recover from the situation. Also since the semantics for FileSystem.rename() do not throw an error if the destination directory already exists, there is a small window for data duplication. However, this issue is already present in INSERT OVERWRITE... These will be addressed with lock support. > Archiving partitions > -------------------- > > Key: HIVE-1332 > URL: https://issues.apache.org/jira/browse/HIVE-1332 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore > Affects Versions: 0.6.0 > Reporter: Paul Yang > Assignee: Paul Yang > Attachments: HIVE-1332.1.patch > > > Partitions and tables in Hive typically consist of many files on HDFS. An > issue is that as the number of files increase, there will be higher > memory/load requirements on the namenode. Partitions in bucketed tables are a > particular problem because they consist of many files, one for each of the > buckets. > One way to drastically reduce the number of files is to use hadoop archives: > http://hadoop.apache.org/common/docs/current/hadoop_archives.html > This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION > <spec> that would automatically put the files for the partition into a HAR > file. We would also have an UNARCHIVE option to convert the files in the > partition back to the original files. Archived partitions would be slower to > access, but they would have the same functionality and decrease the number of > files drastically. Typically, only seldom accessed partitions would be > archived. > Hadoop archives are still somewhat new, so we'll only put in support for the > latest released major version (0.20). Here are some bug fixes: > https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could > potentially cause data loss without this fix) > https://issues.apache.org/jira/browse/HADOOP-6645 > https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.