[
https://issues.apache.org/jira/browse/HIVE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011946#comment-14011946
]
Daniel Dai commented on HIVE-7072:
----------------------------------
+1
> HCatLoader only loads first region of hbase table
> -------------------------------------------------
>
> Key: HIVE-7072
> URL: https://issues.apache.org/jira/browse/HIVE-7072
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-7072.2.patch
>
>
> Pig needs a config parameter 'pig.noSplitCombination' set to 'true' for it to
> be able to read HBaseStorageHandler-based tables.
> This is done in the HBaseLoader at getSplits time, but HCatLoader does not do
> so, which results in only a partial data load.
> Thus, we need one more special case definition in HCat, that sets this
> parameter in the job properties if we detect that we're loading a
> HBaseStorageHandler based table. (Note, also, that we should not depend
> directly on the HBaseStorageHandler class, and instead depend on the name of
> the class, since we do not want a mvn dependency on hive-hbase-handler to be
> able to compile HCatalog core, since it's conceivable that at some time,
> there might be a reverse dependency.) The primary issue is one of where this
> code should go, since it doesn't belong in pig (pig does not know what loader
> behaviour should be, and this parameter is its interface to a loader), and
> doesn't belong in the HBaseStorageHandler either, since that's implementing a
> HiveStorageHandler and is connecting up the two. Thus, this should belong to
> HCatLoader. Setting this parameter across the board results in poor
> performance for HCatLoader, so it must only be set when using with HBase.
> Thus, it belongs in the SpecialCases definition as that was created
> specifically for these kinds of odd cases, and can be called from within
> HCatLoader.
--
This message was sent by Atlassian JIRA
(v6.2#6252)