Thanks, J-D. Especially for the incredibly quick response! Rod
On 1/26/10 Tuesday, January 26, 201011:32 AM, "Jean-Daniel Cryans" <[email protected]> wrote: > Rod, > > This is a known issue. It relates to HBASE-29 but IIRC there was a > more specific jira about it. > > Basically the workaround is to not set the timestamp in a way that the > same one could come twice like you described. > > J-D > > On Tue, Jan 26, 2010 at 9:36 AM, Rod Cope <[email protected]> wrote: >> Hi, >> >> I¹m seeing behavior on 0.20.2 and 0.20.3 that doesn¹t seem quite right and >> would like to know if this is by design, a bug, or something I¹m doing >> wrong. >> >> Background: >> >> When I do a put that includes a timestamp like this (conceptually I know >> this is not the actual API), it works just fine. >> put ³table², ³family², ³column², ³bbb², 12345 >> >> Then, if I do another put in the same client code using the same timestamp >> like this... >> put ³table², ³family², ³column², ³aaa², 12345 >> >> ...and I create a scanner, grab a Result, and iterate over all values using >> list(), I get this... >> ³table², ³family², ³column², ³aaa², 12345 >> >> So far, so good. Now, if I truncate the table from the shell and run a new >> program that does a flush() on the table between the two put¹s, but does it >> in the same client program back-to-back, I also get the same results from >> list(). >> >> ----- >> >> Problem: >> >> Here¹s where the trouble starts. I truncate the table and run a new program >> that puts ³bbb², flushes the table, and quits. Here¹s what I get from >> list(): >> ³table², ³family², ³column², ³bbb², 12345 >> >> Then I run another program that puts ³aaa², flushes, and quits. Here¹s what >> I get from list(): >> ³table², ³family², ³column², ³aaa², 12345 >> ³table², ³family², ³column², ³bbb², 12345 >> >> And if I then run a third program that puts ³ccc², flushes, and quits, I get >> this from list(): >> ³table², ³family², ³column², ³ccc², 12345 >> ³table², ³family², ³column², ³bbb², 12345 >> ³table², ³family², ³column², ³aaa², 12345 >> >> I¹m getting three different values for identical >> table/family/qualifier/timestamp tuples. Does this seem right? There also >> doesn¹t seem to be a defined sort order, probably because the timestamps are >> identical. >> >> Also, if instead of using list(), I use getMap(), then I always only get a >> single result. The single result is always the last item in the lists above >> (i.e., ³bbb² then ³bbb² then ³aaa²). I get identical results from using >> getNoVersionMap(). >> >> I suspect that this same behavior could occur when HBase decides to flush on >> its own, but I could be wrong. As you can imagine, this can cause problems >> because clients can¹t know from the results of calling list() which value is >> ³right² or ³newest². They also can¹t rely on getMap() or getNoVersionMap() >> because the single result that gets returned is not necessarily ³right² or >> ³newest². >> >> I¹ve reproduced everything above in a stand-alone installation and also with >> a 7 regionserver cluster with the final 0.20.3. I started down this >> debugging path originally because I ran into this problem on the 7 >> regionserver cluster with one table of 100+ regions. I was flushing >> programmatically at the end of some large imports because I'm doing >> setWriteToWAL(false) for load performance. >> >> Am I doing something wrong? Did I miss an HBase assumption about flushing >> and/or identical timestamps? >> >> Any help would be much appreciated. >> >> Thanks, >> Rod >> >> -- >> >> Rod Cope >> CTO & Founder >> OpenLogic, Inc. >> >> -- Rod Cope | CTO and Founder [email protected] Follow me on Twitter @RodCope 720 240 4501 | phone 720 240 4557 | fax 1 888 OpenLogic | toll free www.openlogic.com Follow OpenLogic on Twitter @openlogic
