[ https://issues.apache.org/jira/browse/HIVE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051947#comment-14051947 ]
Sushanth Sowmyan commented on HIVE-7072: ---------------------------------------- [~daijy], could you please review/commit the latest version of this patch? > HCatLoader only loads first region of hbase table > ------------------------------------------------- > > Key: HIVE-7072 > URL: https://issues.apache.org/jira/browse/HIVE-7072 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > Attachments: HIVE-7072.2.patch, HIVE-7072.3.patch > > > Pig needs a config parameter 'pig.noSplitCombination' set to 'true' for it to > be able to read HBaseStorageHandler-based tables. > This is done in the HBaseLoader at getSplits time, but HCatLoader does not do > so, which results in only a partial data load. > Thus, we need one more special case definition in HCat, that sets this > parameter in the job properties if we detect that we're loading a > HBaseStorageHandler based table. (Note, also, that we should not depend > directly on the HBaseStorageHandler class, and instead depend on the name of > the class, since we do not want a mvn dependency on hive-hbase-handler to be > able to compile HCatalog core, since it's conceivable that at some time, > there might be a reverse dependency.) The primary issue is one of where this > code should go, since it doesn't belong in pig (pig does not know what loader > behaviour should be, and this parameter is its interface to a loader), and > doesn't belong in the HBaseStorageHandler either, since that's implementing a > HiveStorageHandler and is connecting up the two. Thus, this should belong to > HCatLoader. Setting this parameter across the board results in poor > performance for HCatLoader, so it must only be set when using with HBase. > Thus, it belongs in the SpecialCases definition as that was created > specifically for these kinds of odd cases, and can be called from within > HCatLoader. -- This message was sent by Atlassian JIRA (v6.2#6252)