getPartitionDescFromPath failing from CombineHiveInputFormat
------------------------------------------------------------
Key: HIVE-1006
URL: https://issues.apache.org/jira/browse/HIVE-1006
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.4.1
Reporter: Dave Lerman
When HiveInputFormat.getPartitionDescFromPath is called from
CombineHiveInputFormat, it sometimes fails to return a matching partitionDesc
which then causes an Exception down the line since the split doesn't have an
inputFormatClassName.
The issue is that the path format used as the key in pathToPartitionInfo varies
between stage - in the first stage it's the complete path as returned from the
table definitions (eg. hdfs://server/path), and then in subsequent stages, it's
the complete path with port (eg. hdfs://server:8020/path) of the result of the
previous stage. This isn't a problem in HiveInputFormat since the directory
you're looking up always uses the same format as the keys, but in
CombineHiveInputFormat, we take that path and look up its children in the file
system to get all the block information, and then use one of the returned paths
to get the partition info -- and that returned path does not include the port.
So, in any stage after the first, we are looking for a path without the port,
but all the keys in the map contain a port, so we don't find a match.
The attached patch may not be ideal -- it doesn't fix the underlying problem of
inconsistent path formats in pathToPartitionInfo -- it just works around it by
walking through the map and looking for a matching path rather than doing a
hash lookup.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.