Hi:
    statck
     i did some tests on bulk load tools of HBASE-48.
I took files made by TestHFileOutputFormat test and passed them to the
script you wrote.It did works ,but it seems to be something unusual.For each
region ,the STARTKEY and ENDKEY is nearly the same,the ENDKY is bigger than
STARTKEY by nearly 1,e.g.
  STARTKEY=>'0000009447',ENDKY=>'0000009448';
  STARTKEY=>'0000020476',ENDKY=>'0000020477';
...

        i also have some doubts about TestHFileOutputFormat,the default
partitioner is hash partitioner,however ,the hash partitioner can't meet
requirements of TestHFileOutputFormat ,just as you said we need to ensure a
total ordering of all keys and we need to supply a partitioner that does
total ordering(but you didn't add a new  partitioner in
TestHFileOutputFormat).
   so ,I think TestHFileOutputFormat use the hash partitionar ,it does not
do  totoal ordering,different regions would have rows intercross ,which is
not correct for hbase.And I found the firstKey,lastKey of the files mady by
TestHFileOutputFormat is indeed intercross.
    if the bulk tools is just the beginning,needed further improvement?I
think the bulk tools is very usefull.




-- 
Anty Rao

Reply via email to