[ 
https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988022#action_12988022
 ] 

Ashutosh Chauhan commented on PIG-1828:
---------------------------------------

I don't think we have sufficient evidence yet to point finger at split 
combination for this bug. Theoretically, combination of multiple TableSplits 
into one Split within Pig should not result in any problem, if you honor the 
semantics of InputFormat imposed by MR framework, which is each split is 
stateless in a sense it doesn't maintain any state. One TableSplit should know 
nothing about another one. I don't know enough about TableSplit, but I would 
assume they are indeed stateless. 

OrderedLoadFunc tries to impose this restriction by defining an order on 
Splits. It dictates that all keys in one split are smaller then another one. 
Thus, ideally Pig should *not* combine the loaders implementing it. But for 
reasons discussed in PIG-1518 it was eventually decided that for feature to be 
useful, Pig wouldn't  combine OrderedLoadFunc loaders *only* if loader is also 
used for MergeJoin or map-side cogroups in scripts. So, adding OLF won't turn 
off the combination in all cases. If you suspect combination is causing a bug 
(potentially because TableSplits are stateful w.r.t each other) then only 
setting the flag to false will ensure no-combination. But, I doubt that 
TableSplits have state and the split combination is causing the bug. Ian, Lukas 
can you confirm if setting pig.splitCombination to false results in bug going 
away?  


> HBaseStorage has problems with processing multiregion tables
> ------------------------------------------------------------
>
>                 Key: PIG-1828
>                 URL: https://issues.apache.org/jira/browse/PIG-1828
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>         Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode
>            Reporter: Lukas
>
> As brought up in the pig user mailing list 
> (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does 
> sometime not scan the full HBase table.
> It seems that HBaseStorage has problems scanning large tables. It issues just 
> one mapper job instead of one mapper job per table region.
> Ian Stevens, who brought this issue up in the mailing list, attached a script 
> to reproduce the problem (https://gist.github.com/766929).
> However, in my case, the problem only occurred, after the table was split 
> into more than one regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to