[ 
https://issues.apache.org/jira/browse/HIVE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-7072:
-----------------------------------

    Status: Patch Available  (was: Open)

> HCatLoader only loads first region of hbase table
> -------------------------------------------------
>
>                 Key: HIVE-7072
>                 URL: https://issues.apache.org/jira/browse/HIVE-7072
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-7072.2.patch
>
>
> Pig needs a config parameter 'pig.noSplitCombination' set to 'true' for it to 
> be able to read HBaseStorageHandler-based tables.
> This is done in the HBaseLoader at getSplits time, but HCatLoader does not do 
> so, which results in only a partial data load.
> Thus, we need one more special case definition in HCat, that sets this 
> parameter in the job properties if we detect that we're loading a 
> HBaseStorageHandler based table. The primary issue is one of where this code 
> should go, since it doesn't belong in pig (pig does not know what loader 
> behaviour should be, and this parameter is its interface to a loader), and 
> doesn't belong in the HBaseStorageHandler either, since that's implementing a 
> HiveStorageHandler and is connecting up the two. Thus, this should belong to 
> HCatLoader. Setting this parameter across the board results in poor 
> performance for HCatLoader, so it must only be set when using with HBase.
> Thus, it belongs in the SpecialCases definition as that was created 
> specifically for these kinds of odd cases, and can be called from within 
> HCatLoader.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to