[ https://issues.apache.org/jira/browse/HADOOP-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Duxbury updated HADOOP-2513: ---------------------------------- Status: Patch Available (was: In Progress) Trying hudson. > [hbase] HStore#get and HStore#getFull may not return expected values by > timestamp when there is more than one MapFile > --------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-2513 > URL: https://issues.apache.org/jira/browse/HADOOP-2513 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: Bryan Duxbury > Assignee: Bryan Duxbury > Fix For: 0.16.0 > > Attachments: 2512-v2.patch, 2513.patch > > > Ok, this one is a little tricky. Let's say that you write a row with some > value without a timestamp, thus meaning right now. Then, the memcache gets > flushed out to a MapFile. Then, you write another value to the same row, this > time with a timestamp that is in the past, ie, before the "now" timestamp of > the first put. > Some time later, but before there is a compaction, if you do a get for this > row, and only ask for a single version, you will logically be expecting the > latest version of the cell, which you would assume would be the one written > at "now" time. Instead, you will get the value written into the "past" cell, > because even though it is tagged as having happened in the past, it actually > *was written* after the "now" cell, and thus when #get searches for > satisfying values, it runs into the one most recently written first. > The result of this problem is inconsistent data results. Note that this > problem only ever exists when there's an uncompacted HStore, because during > compaction, these cells will all get sorted into the correct order by > timestamp and such. In a way, this actually makes the problem worse, because > then you could easily get inconsistent results from HBase about the same > (unchanged) row depending on whether there's been a flush/compaction. > The only solution I can think of for this problem at the moment is to scan > all the MapFiles and Memcache for possible results, sort them, and then > select the desired number of versions off of the top. This is unfortunate > because it means you never get the snazzy shortcircuit logic except within a > single mapfile or memcache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.