On Sat, Oct 10, 2009 at 10:54 PM, Anty <[email protected]> wrote: > Hi: > statck > i did some tests on bulk load tools of HBASE-48. >
Thanks for trying it out. > I took files made by TestHFileOutputFormat test and passed them to the > script you wrote.It did works ,but it seems to be something unusual.For > each > region ,the STARTKEY and ENDKEY is nearly the same,the ENDKY is bigger than > STARTKEY by nearly 1,e.g. > STARTKEY=>'0000009447',ENDKY=>'0000009448'; > STARTKEY=>'0000020476',ENDKY=>'0000020477'; > ... > > Did you do your own partitioner or just use default hash partitioner? > i also have some doubts about TestHFileOutputFormat,the default > partitioner is hash partitioner,however ,the hash partitioner can't meet > requirements of TestHFileOutputFormat ,just as you said we need to ensure a > total ordering of all keys and we need to supply a partitioner that does > total ordering(but you didn't add a new partitioner in > TestHFileOutputFormat). > This is broke then as you point out. We should make something like what is described in https://issues.apache.org/jira/browse/HBASE-1901 for TestHFileOutputFormat? > so ,I think TestHFileOutputFormat use the hash partitionar ,it does not > do totoal ordering,different regions would have rows intercross ,which is > not correct for hbase.And I found the firstKey,lastKey of the files mady by > TestHFileOutputFormat is indeed intercross. > if the bulk tools is just the beginning,needed further improvement?I > think the bulk tools is very usefull. > > Can you help us improve it? What do you think we need to do next (hbase-901?) Thanks for writing Anty Rao. St.Ack
