[ 
https://issues.apache.org/jira/browse/HIVE-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293059#comment-13293059
 ] 

Kevin Wilfong commented on HIVE-3106:
-------------------------------------

Per Carl's comments, explicitely stated the advantages/disadvantages, removed 
atomic from the name of the configuration variable, as this is not really true, 
removed references to "outputs" in description of config.

Also, fixed an issue, where if a file was taking a long time to produce, there 
would still be a long time between when the tables/partitions are produced and 
when the locks on them are released. Now, when the option is set, the 
DependencyCollection task depends on the dependencies of the move tasks for 
files, but the move tasks for files do not depend on the DependencyCollection 
task, as there are no locks on these files so there would not be any advantage.

Added a new test case for this additional functionality.
                
> Add option to make multi inserts more atomic
> --------------------------------------------
>
>                 Key: HIVE-3106
>                 URL: https://issues.apache.org/jira/browse/HIVE-3106
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-3106.1.patch.txt
>
>
> Currently, with multi-insert queries as soon the output of one of the inserts 
> is ready the move task associated with that insert is run, creating the 
> table/partition.  However, if concurrency is enabled the lock on this 
> table/partition is not released until the entire query finishes, which can be 
> much later.
> This causes issues if, for example, a user is waiting for an output of the 
> multi-insert query which is created long before the other outputs, and 
> checking for it's existence using the metastore's Thrift methods 
> (get_table/get_partition).  In which case, the user will run their query 
> which uses the output, and it will experience a timeout trying to acquire the 
> lock on the table/partition.
> If all the move tasks depend on the parent's of all other move tasks, the 
> output creation will be much closer to atomic relieving this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to