[
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899722#action_12899722
]
Dmitriy V. Ryaboy commented on PIG-1205:
----------------------------------------
bq. 1. Is it possible to specify min_row_key and max_row_key in parameters
Even better than that -- you can specify lt, lte, gt, and gte. It's true that
as written splits will be created for the whole table, but the filters will
cause most of those splits to immediately exit. Not creating the splits is on
my todo list (I already do this in the elephantbird version for 0.6)
bq. 2. One small suggestion: move line 206 to if block (only one time setting
is enough)
Good idea.
bq. 3. It's better to add warning log in HBaseBinaryConverter when the bytes is
cut off for type conversion
Will do.
bq. 4. The parameter "Per-region limit" is a bit confusing for me, I think
users would like to the set the limit on the whole table not per region. What
do you think ?
Trouble is, you can't enforce a total limit without post-processing. In
practice, I use -limit when I am experimenting and want to get just a few rows
from HBase; if I want a specific number of rows, I use both -limit (to speed up
the tasks, since the scanners will exit early), and Pig's LIMIT operator (to
get the exact number of rows I need).
> Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
> ------------------------------------------------------------------------------
>
> Key: PIG-1205
> URL: https://issues.apache.org/jira/browse/PIG-1205
> Project: Pig
> Issue Type: Sub-task
> Affects Versions: 0.7.0
> Reporter: Jeff Zhang
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.8.0
>
> Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch,
> PIG_1205_4.patch, PIG_1205_5.path
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.