Todd Lipcon has posted comments on this change.

Change subject: Initial scan tokens design doc
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/2443/3//COMMIT_MSG
Commit Message:

Line 7: Initial scan tokens design doc
mind throwing the JIRA number in here?


http://gerrit.cloudera.org:8080/#/c/2443/3/docs/design-docs/scan-tokens.md
File docs/design-docs/scan-tokens.md:

Line 20: split Kudu tables into logical sections, so that computation can be 
distributed
physical sections, not logical, right?


Line 36: defined serialization format so that tokens may be serialized and 
deserialized
well defined (but opaque to the caller)


Line 43:    location hint, or a hint for every replica?
I think multiple hints, but with a preference for the current leader? probably 
depends on the consistency mode


Line 47: 2) How should scan tokens handle going stale WRT tablet location 
changes and
perhaps we can provide an API on a scanner like 'IsLocal()'? that's also useful 
for metrics (eg Impala and MR like to expose counters of how many bytes were 
read locally vs remote, etc). The API might be slightly subtle since it coudl 
change as the scanner moves cross-tablet, but I think that's the best we can do.

Another thought: we could offer a 'refresh' API or a 'check current' type API 
which would re-contact the master and verify that things haven't changed? 
though I still think some indication of "isLocal" is useful


Line 54:    point, but it will be an important consideration once that feature 
lands.
I vote for the partition key range, since that will support splits at some 
point in the future without any changes


Line 60:    client could.
yes I think this is a very useful API -- right now we ask people to split into 
many tablets per TS to get scan parallelism, but if we could subdivide our scan 
ranges in Impala/Spark/etc, then this wouldn't be nearly as important.

I dont think we should implement it right off the bat, but working it into our 
thinking is a good idea.

As for whether Kudu can do better than the client -- yes, I think the tablet 
server has enough data to suggest subdivisions - it can look at the current 
rowset min/max boundaries and sizes to get a reasonable estimate of PK 
distribution for example.


-- 
To view, visit http://gerrit.cloudera.org:8080/2443
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id208cecababf15e1671a01a219d4599adfcd4163
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to