[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989486#comment-12989486
 ] 

stack commented on HBASE-2856:
------------------------------

Chatting with Ryan:

+ To be solved is how read/write rwcc points are respected on hfile flush; how 
do we not pull from hfile, items that are in the future as far as current rwcc 
read point is concerned (especially when cf7 of a ten cf row flushes mid-read).
+ Soln is that we'll move the read point forward up in region scanner on each 
next invocation; i.e on entrance into a new row.  We'll also only swap in new 
hfiles on next up in the region scanner (rather than down in store scanner as 
is currently done).  So, if row of 100 cfs and 1M columns, as we're reading, 
we'll hold the rwcc read point.  cf 48 and cf 59 might flush but we'll not swap 
in their new store files until we get to the end of the row (we'll be holding 
on to the snapshots for a little longer than we do now).
+ On next up in region scanner, we also need to reseek each row even though 
this could be a perf killer.  Our current notion of end-of-row marker is the kv 
that does not have a row that matches that of the row we are currently in.  
Lets call this next row kv kvnext.  We park here in between next invocations.  
Well, what if in between region next invocations, there is a big pause and a 
bunch of puts come in only the puts have same row as kvnext AND they happen to 
sort before the kvnext at which we are currently parked.  We have to reseek (It 
could be worse, a new row could have been inserted between next invocations in 
between kvbefore and kvnext...... if parked at kvnext we're not going to see 
it, not unless we do hbase-3498).
+ We do not think we need to add a sequence number to KV, one that is persisted 
out to HFile.
+ It looks like we do not need to use hlog's sequence number all over; we can 
keep up RWCCs little incrementing value.  We can also keep its memstore scope 
-- as opposed to what was being discussed above where we were going to broaden 
the scope to cover WAL writing.

> TestAcidGuarantee broken on trunk 
> ----------------------------------
>
>                 Key: HBASE-2856
>                 URL: https://issues.apache.org/jira/browse/HBASE-2856
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89.20100621
>            Reporter: ryan rawson
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 
> acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to