[jira] [Commented] (HIVE-6476) Support Append with Dynamic Partitioning

2017-08-19 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134270#comment-16134270
 ] 

Mariappan Asokan commented on HIVE-6476:


Sushanth,
Here are the changes I made in the uploaded patch:
* Moved some common code to new private methods ({{moveFiles()}} and 
{{isReservedName()}})
* When dynamic partitioning is used, the table can be mutable
* For dynamic partitioning, when a partition does not exist, the optimization 
to move the entire directory to the target location is still in effect.  
However, when a partition exists, newly added files to the partition are moved 
to the existing target directory one at a time
* The unique name generation logic is applied to only non-directory files
* In addition to deleting the newly created partitions in the metadata server, 
all newly added files will be deleted when a commit fails
* Fixed a minor bug: when there is no new file to move ({{firstChild}} == 
{{null}} in {{moveTaskOutputs()}}), no action will be taken to avoid null 
pointer dereferencing.
* Created tests that test different cases: appending new records such that all 
records go to existing partitions, some  records to existing partitions and 
others to new partitions, and all records to new partitions
* Deleted an existing test that tested the failure of dynamic partitioning and 
append

Please provide your feedback.  Thanks.

> Support Append with Dynamic Partitioning
> 
>
> Key: HIVE-6476
> URL: https://issues.apache.org/jira/browse/HIVE-6476
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>Assignee: Mariappan Asokan
> Attachments: HIVE-6476.1.patch
>
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-6476) Support Append with Dynamic Partitioning

2016-04-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252812#comment-15252812
 ] 

Sushanth Sowmyan commented on HIVE-6476:


Yup, that would be the perfect place to start. And yes, I'll assign this jira 
to you. Thanks for working on it! :)

> Support Append with Dynamic Partitioning
> 
>
> Key: HIVE-6476
> URL: https://issues.apache.org/jira/browse/HIVE-6476
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6476) Support Append with Dynamic Partitioning

2016-04-20 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250277#comment-15250277
 ] 

Mariappan Asokan commented on HIVE-6476:


Sushanth, thank you.  This is very helpful.  I will dig into the code and if I 
have any questions I will let you know.  Is FileOutputCommitterContainer.java a 
good place to start? Can I assign this Jira to me?



> Support Append with Dynamic Partitioning
> 
>
> Key: HIVE-6476
> URL: https://issues.apache.org/jira/browse/HIVE-6476
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6476) Support Append with Dynamic Partitioning

2016-04-19 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248800#comment-15248800
 ] 

Sushanth Sowmyan commented on HIVE-6476:


Oh dear. I'm not certain I am able to list all out all the corner cases that I 
considered back when implementing the append functionality. However, from code, 
I do see one issue that I think is the primary blocker for this. If this can be 
resolved, I think we can move forward on this.

Consider the following dynamic-partition-write.

Say our current write spills to partitions p=3, p=4 & p=5. Before the write 
happened, say the table had data for partitions p=1, p=2 & p=3. So, now, our 
new write does not affect p=1=2, does an "append" for p=3, and creates a new 
partition for p=4 & p=5.

If we never had to consider p=3 in this case, then HCat would try to do the 
add_partitions call at the OutputCommitter side as an atomic call to the 
metastore to prevent complicated rollback logic with the MS, and if the 
add_partitions call succeeds, it will proceed to move data into the appropriate 
directories. Now that p=3 is also in the picture, add_partitions will fail 
since p=3 already exists.

Thus, we needed support an additional add_partitions_if_not_exist behaviour, 
which didn't exist then (but does now) and we weren't sure if it was the right 
call to add that(but now, that is moot).

The next aspect to consider is this:

In the case of a non-dynamic-ptn append, we try to move the new append files in 
with _N? suffixes if files of the same name already exist, and do so till we 
hit a max configurable limit - if we should fail in this file-movement phase, 
or determine we can't copy any of the items needed, then append can rollback 
trivially because we have no metadata change to roll back. There is no metadata 
update that is needed, and this part can't fail because of some other partition 
or because some other metadata needed updating that couldn't be updated.

If we allow dynamic-ptn appends, then for any reason, if we have to do a 
rollback, then, for any case where there is a metadata rollback or a fs 
rollback needed, then we need to make sure that the state is compatible with 
how it was before this operation started.

While these are gotchas, they are not unsurmountable, and as long as we are 
able to plan out how we handle the mix, this should be doable.

> Support Append with Dynamic Partitioning
> 
>
> Key: HIVE-6476
> URL: https://issues.apache.org/jira/browse/HIVE-6476
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6476) Support Append with Dynamic Partitioning

2016-04-15 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243005#comment-15243005
 ] 

Mariappan Asokan commented on HIVE-6476:


I have the same question: What are the corner cases that need to be tested?  
Dynamic partitioning with append is a very common use case.  Sushanth, if you 
can elaborate on the "corner cases" and give some pointers I can pick up this 
Jira and work on it.  Thanks.


> Support Append with Dynamic Partitioning
> 
>
> Key: HIVE-6476
> URL: https://issues.apache.org/jira/browse/HIVE-6476
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)