[ 
https://issues.apache.org/jira/browse/HBASE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710127#action_12710127
 ] 

Jonathan Gray commented on HBASE-1304:
--------------------------------------

Dropped some thoughts on IRC, figured I'd post here:

[10:42am] jgray2: dj_ryan: i don't think v7 patch contains changes to 
compactions yet... not following your questions exactly but compactions need to 
be merged with scan code
[10:43am] jgray2: gets can be redone as scans
[10:43am] jgray2: and that's probably the direction we'll need to go
[10:43am] jgray2: if millions of columns in a single row
[10:44am] jgray2: you basically need to scan them, even within the row
[10:44am] jgray2: QueryMatcher makes the decision about what to do with a KV 
given the parameters of the query
[10:45am] jgray2: the two complex bits of it are a DeleteTracker and the 
ColumnTracker
[10:45am] jgray2: two implementations of each
[10:46am] jgray2: ScanDT and GetDT are different because, right now, a Get is 
not a low-level KV merge like a Scan is
[10:46am] jgray2: so when you're scanning (or compacting) you actually look at 
a Stores keys in strict sorted order
[10:46am] jgray2: merging all storefiles + memcache
[10:46am] jgray2: so when tracking deletes
[10:46am] jgray2: you need to track very little
[10:47am] jgray2: in a Get, you grab all keys from each storefile, starting at 
memcache, then going through them newest to oldest
[10:47am] jgray2: so deletes you read in one storefile will apply to any 
storefiles that are older
[10:47am] jgray2: so GetDT is quite a bit more complex
[10:47am] jgray2: we need to benchmark and see if scans are gooder
[10:47am] jgray2: because they are much more "correct"
[10:47am] jgray2: if you do manual timestamp setting, gets can give you 
indeterminate results
[10:48am] jgray2: but scans are always strictly sorted
[10:48am] jgray2: ColumnTracker is implemented as either ExplicitCT or 
WildcardCT
[10:48am] jgray2: explicit is when qualifiers are given, wildcard if all in a 
family
[10:48am] jgray2: so it tracks that, and then max versions for each
[10:49am] jgray2: honestly i've not looked at compactions since i wrote 
scanners but have had it in mind
[10:50am] jgray2: it will use QueryMatcher and CT/DT directly
[10:50am] jgray2: wildcardCT where maxVerisons = family setting
[10:50am] jgray2: ScanDT
[10:50am] jgray2: QueryMatcher already does TTL enforcement and such
[10:51am] jgray2: the only difference is in a minor compaction you still need 
to output deletes
[10:51am] jgray2: that are not fully enforced or overridden
[10:51am] jgray2: so then we'll probably have a CompactDT
[10:52am] jgray2: might need a slight modification here and there, i don't 
think QM is written to ever permit deletes out to the result

> New client server implementation of how gets and puts are handled. 
> -------------------------------------------------------------------
>
>                 Key: HBASE-1304
>                 URL: https://issues.apache.org/jira/browse/HBASE-1304
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Erik Holstad
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1304-v1.patch, HBASE-1304-v2.patch, 
> HBASE-1304-v3.patch, HBASE-1304-v4.patch, HBASE-1304-v5.patch, 
> HBASE-1304-v6.patch, HBASE-1304-v7.patch
>
>
> Creating an issue where the implementation of the new client and server will 
> go. Leaving HBASE-1249 as a discussion forum and will put code and patches 
> here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to