[ 
https://issues.apache.org/jira/browse/HBASE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765000#action_12765000
 ] 

anty.rao commented on HBASE-1901:
---------------------------------

Hi: stack ,
I have done some test and find we should change the codes of 
TestHFileOutputFormat a little ,or the test won't work.
" int rows = this.conf.getInt("mapred.map.tasks", 1) * ROWSPERSPLIT;"
should be 
" int rows = this.conf.getInt("mapred.map.tasks", 1) * ROWSPERSPLIT+2;"
 just as you said ,   The end key needs to be exclusive; i.e. one larger than 
the biggest key in your key space.
however ,the key range of TestHFileOutFormat is 
1----conf.getInt("mapred.map.tasks",1)*ROWSPERSLPLIT+1,so  we should add 1 more 
to rows(the end key).
except that ,everything looks right.the STARTKEY and ENDKEY of each region are 
correct.
the precondition is we should know the startKey and endKey,now you have written 
the partitioner,can we write a MR job to calculate the startKey and endKey ?

> "General" partitioner for "hbase-48" bulk (behind the api, write hfiles 
> direct) uploader
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-1901
>                 URL: https://issues.apache.org/jira/browse/HBASE-1901
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: stack
>         Attachments: 1901.patch
>
>
> For users to bulk upload by writing hfiles directly to the filesystem, they 
> currently need to write a partitioner that is intimate with how their key 
> schema works.  This issue is about providing a general partitioner, one that 
> could never be as fair as a custom-written partitioner but that might just 
> work for many cases.  The idea is that a user would supply the first and last 
> keys in their dataset to upload.  We'd then do bigdecimal on the range 
> between start and end rowids dividing it by the number of reducers to come up 
> with key ranges per reducer.
> (I thought jgray had done some BigDecimal work dividing keys already but I 
> can't find it)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to