[ 
https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731226#comment-13731226
 ] 

Sushanth Sowmyan commented on HIVE-5011:
----------------------------------------

Basic bug synopsis:

Say a table t1 partitioned by key dayofweek:string is present in location 
"hdfs://blah/foo/t1/".

Ordinarily, if we try to write to it specifying that we're writing a partition 
dayofweek="sunday", then the location it'll write to is 
"hdfs://blah/foo/t1/dayofweek=sunday/".

Now, this is known before the MR jobs start, and will be set as the location, 
and all is good. If the table is specified as an external table, and the user 
wants to specify a custom location format for the location, such that they want 
"hdfs://blah/foo/t1/sunday/", then HCat Storer currently allows them to specify 
that, and that will be honoured too.

That was the intent of HCATALOG-500, and the way it works for static 
partitioning.

With dynamic partitioning on external tables, with HCATALOG-500, however, this 
is what winds up happening.

All the partitions being written to wind up having their location set as 
"hdfs://blah/foo/t1/dayofweek=__DEFAULT_HIVE_PARTITION__" if no override is 
provided , or to "hdfs://blah/foo/t1/whatever" if that location was provided as 
an override.

This results in the first partition writes from the drones writing to this 
location, and all other drones not being able to open to write, stalling, 
getting retried, and having the job fail. It would be possible, in theory, if 
there were only one reducer in the job, and all data present in only one 
partition worth of writing, that the job might not fail, but that's a highly 
constrained mode of writing which makes the dynamic partitioning feature itself 
meaningless.
                
> Dynamic partitioning in HCatalog broken on external tables
> ----------------------------------------------------------
>
>                 Key: HIVE-5011
>                 URL: https://issues.apache.org/jira/browse/HIVE-5011
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> Dynamic partitioning with HCatalog has been broken as a result of 
> HCATALOG-500 trying to support user-set paths for external tables.
> The goal there was to be able to support other custom destinations apart from 
> the normal "hive-style" partitions. However, it is not currently possible for 
> users to set paths for dynamic ptn writes, since we don't support any way for 
> users to specify "patterns"(like, say "$\{rootdir\}/$v1.$v2/") into which 
> writes happen, only "locations", and the values for dyn. partitions are not 
> known ahead of time. Also, specifying a custom path messes with the way 
> dynamic ptn. code tries to determine what was written to where from the 
> output committer, which means that even if we supported patterned-writes 
> instead of location-writes, we still have to do some more deep diving into 
> the output committer code to support it.
> Thus, my current proposal is that we honour writes to user-specified paths 
> for external tables *ONLY* for static partition writes - i.e., if we can 
> determine that the write is a dyn. ptn. write, we will ignore the user 
> specification. (Note that this does not mean we ignore the table's external 
> location - we honour that - we just don't honour any HCatStorer/etc provided 
> additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to