[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

HBase Review Board (JIRA) Fri, 26 Nov 2010 15:51:44 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936131#action_12936131
 ]

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Pranav Khaitan" <pranavkhai...@gmail.com>

bq.  On 2010-11-26 14:54:45, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1373
bq.  > <http://review.cloudera.org/r/1252/diff/1/?file=17712#file17712line1373>
bq.  >
bq.  >     what are all the consequences for not sorting by type when using 
KVComparator?  Does this mean we might create HFiles that not sorted properly, 
because the HFile comparator uses the KeyComparator directly with ignoreType = 
false. 
bq.  >     
bq.  >     While in memstore we can rely on memstoreTS to roughly order by 
insertion time, and the Put/Delete should probably work in that situation, you 
are talking about modifiying a pretty core and important concept in how we sort 
things.
bq.  >     
bq.  >     There are other ways to reconcile bugs like this, one of them is to 
extend the memstoreTS concept into the HFile and use that to reconcile during 
reads.  There is another JIRA where I proposed this.  
bq.  >     
bq.  >     If we are talking about 0.92 and beyond I'd prefer building a solid 
base rather than dangerous hacks like this.  Our unit tests are not extremely 
extensive, so while they might pass, that doesnt guarantee lack of bad 
behaviour later on.
bq.  >
bq.  
bq.  Pranav Khaitan wrote:
bq.      Agree. As I mentioned, this is a major change and more thought needs 
to be given to it.
bq.      
bq.      However, to resolve issues like HBASE-3276, we need either such a 
change or extend the memstoreTS concept to HFile as you mentioned.
bq.      
bq.      About consequences, I don't see anything negative here. This change 
only affects the sorting of keys having same row, col, timestamp. After this 
change, all keys with the same row, col, ts will be sorted purely based on the 
order in which they were inserted. When a memstore is flushed to HFile, the 
memstoreTS takes care of ordering. During compactions, the KeyValueHeap breaks 
ties by using the sequence ids of storefiles.
bq.  
bq.  Ryan Rawson wrote:
bq.      the problem is you are now changing how things are ordered sometimes 
but not all the time.  HFile directly uses the rawcomparator, instantiating it 
directly rather than getting it via the code path you changed.  So now you 
create a memstore in this order:
bq.      
bq.      row,col,100,Put  (memstoreTS=1)
bq.      row,col,100,Delete (memstoreTS=2)
bq.      row,col,100,Put (memstoreTS=3)
bq.      
bq.      But the HFile comparator will consider this out of order since it 
doesnt know about memstoreTS and it still expects things to be in a certain 
order.
bq.      
bq.      I'm a little wary of having implicit ordering in the HFiles... in your 
new scheme, Put,Delete,Put are in that order 'just because they are', and the 
comparator cannot put them back in order, and must rely on scanner order.  
During compactions we would place keys in order based on which files they came 
from, but they wouldn't themselves have an order.  Basically we should get rid 
of 'type sorting' and use memstoreTS sorting in memory and implicit sorting in 
the HFiles.  
bq.      
bq. 

Right. I see that HFile does an extra check and throws an IOException when it 
gets data out of the order. So if we go forward with this change, we will have 
to ensure that the comparator used by HFile knows about this change. This 
change be achieved in two ways: Firstly, by setting the default value of 
ignoreType = true. Alternately, the HFile can explicitly set ignoreType = true.

- Pranav

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------

> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default 
> "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same 
> timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put 
> doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and 
> one cannot insert a new value at that version. [The only, unsatisfactory, 
> workaround at this point seems to be trigger a major compaction. The major 
> compact would clear the delete marker, and allow new cells to be created with 
> that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 
> to also respect ordering between delete/put operations. I'll look into this 
> further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Reply via email to