[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248800#comment-15248800
 ] 

Sushanth Sowmyan commented on HIVE-6476:
----------------------------------------

Oh dear. I'm not certain I am able to list all out all the corner cases that I 
considered back when implementing the append functionality. However, from code, 
I do see one issue that I think is the primary blocker for this. If this can be 
resolved, I think we can move forward on this.

Consider the following dynamic-partition-write.

Say our current write spills to partitions p=3, p=4 & p=5. Before the write 
happened, say the table had data for partitions p=1, p=2 & p=3. So, now, our 
new write does not affect p=1&p=2, does an "append" for p=3, and creates a new 
partition for p=4 & p=5.

If we never had to consider p=3 in this case, then HCat would try to do the 
add_partitions call at the OutputCommitter side as an atomic call to the 
metastore to prevent complicated rollback logic with the MS, and if the 
add_partitions call succeeds, it will proceed to move data into the appropriate 
directories. Now that p=3 is also in the picture, add_partitions will fail 
since p=3 already exists.

Thus, we needed support an additional add_partitions_if_not_exist behaviour, 
which didn't exist then (but does now) and we weren't sure if it was the right 
call to add that(but now, that is moot).

The next aspect to consider is this:

In the case of a non-dynamic-ptn append, we try to move the new append files in 
with _N? suffixes if files of the same name already exist, and do so till we 
hit a max configurable limit - if we should fail in this file-movement phase, 
or determine we can't copy any of the items needed, then append can rollback 
trivially because we have no metadata change to roll back. There is no metadata 
update that is needed, and this part can't fail because of some other partition 
or because some other metadata needed updating that couldn't be updated.

If we allow dynamic-ptn appends, then for any reason, if we have to do a 
rollback, then, for any case where there is a metadata rollback or a fs 
rollback needed, then we need to make sure that the state is compatible with 
how it was before this operation started.

While these are gotchas, they are not unsurmountable, and as long as we are 
able to plan out how we handle the mix, this should be doable.

> Support Append with Dynamic Partitioning
> ----------------------------------------
>
>                 Key: HIVE-6476
>                 URL: https://issues.apache.org/jira/browse/HIVE-6476
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HCatalog, Metastore, Query Processor, Thrift API
>            Reporter: Sushanth Sowmyan
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to