[ https://issues.apache.org/jira/browse/HIVE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sushanth Sowmyan updated HIVE-7072: ----------------------------------- Status: Patch Available (was: Open) > HCatLoader only loads first region of hbase table > ------------------------------------------------- > > Key: HIVE-7072 > URL: https://issues.apache.org/jira/browse/HIVE-7072 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > Attachments: HIVE-7072.2.patch > > > Pig needs a config parameter 'pig.noSplitCombination' set to 'true' for it to > be able to read HBaseStorageHandler-based tables. > This is done in the HBaseLoader at getSplits time, but HCatLoader does not do > so, which results in only a partial data load. > Thus, we need one more special case definition in HCat, that sets this > parameter in the job properties if we detect that we're loading a > HBaseStorageHandler based table. The primary issue is one of where this code > should go, since it doesn't belong in pig (pig does not know what loader > behaviour should be, and this parameter is its interface to a loader), and > doesn't belong in the HBaseStorageHandler either, since that's implementing a > HiveStorageHandler and is connecting up the two. Thus, this should belong to > HCatLoader. Setting this parameter across the board results in poor > performance for HCatLoader, so it must only be set when using with HBase. > Thus, it belongs in the SpecialCases definition as that was created > specifically for these kinds of odd cases, and can be called from within > HCatLoader. -- This message was sent by Atlassian JIRA (v6.2#6252)