[ https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988159#action_12988159 ]
Ashutosh Chauhan commented on PIG-1828: --------------------------------------- Thanks Lukas for checking. This indicates that TableSplits are rather not combinable. Thinking more about it, I think basic Pig's assumption that splits can be combined in general and only for special cases we won't combine (which Pig checks itself) is not correct. Question of combination should really be asked from Loader and not assumed. Also, this OLF thing is too complicated. Condition imposed by OLF is one possibility, but I assume there exists other scenarios where loader is not OLF but is still not combinable. I would propose to add a new method in LoadFunc and ask directly from loader and drop all the logic of determining whether splits are combinable or not. {java} // By default, splits generated by a loader is considered combinable to preserve current behavior public boolean isCombinable() { return true; } {java} Good thing is LoadFunc is abstract class, so this won't break backward compatibility. @Dmitiry, As I pointed above adding OLF to HBaseStorage will not help. Though it won't hurt either. A quick fix for HBaseStorage loader for now is to set the key to false, somewhere early. I think setLocation() or setSchema() is one of the first methods called on LoadFunc and since checks for determining combination happen much later, loader setting that key to false will be seen and combination won't happen. That will avoid the need of telling the users of HbaseStorage to set the key themselves. > HBaseStorage has problems with processing multiregion tables > ------------------------------------------------------------ > > Key: PIG-1828 > URL: https://issues.apache.org/jira/browse/PIG-1828 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0 > Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode > Reporter: Lukas > > As brought up in the pig user mailing list > (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does > sometime not scan the full HBase table. > It seems that HBaseStorage has problems scanning large tables. It issues just > one mapper job instead of one mapper job per table region. > Ian Stevens, who brought this issue up in the mailing list, attached a script > to reproduce the problem (https://gist.github.com/766929). > However, in my case, the problem only occurred, after the table was split > into more than one regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.