[kudu-CR] Non-covering Range Partitions design doc

Adar Dembo (Code Review) Mon, 18 Apr 2016 18:50:03 -0700

Adar Dembo has posted comments on this change.

Change subject: Non-covering Range Partitions design doc
......................................................................

Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/2772/1/docs/design-docs/non-covering-range-partitions.md
File docs/design-docs/non-covering-range-partitions.md:

Line 69: but schema designers may find it useful to be able to use
: both options
> What I'm concerned about is the cognitive overhead of having two ways to ac
If you're concerned about the cognitive overhead of having both range bounds
and split rows, why not just deprecate split rows entirely? Range bounds are
net more expressive, right?

Line 87: RANGE BOUND (("North America"), ("North America\0")),
: RANGE BOUND (("Europe"), ("Europe\0")),
: RANGE BOUND (("Asia"), ("Asia\0"));
> Having Kudu automatically create partitions is beyond the scope of this des
Agreed that inclusive upper bound would remove some of the pain (nul
terminators in string "point" ranges), leaving just the verbosity behind.

Line 104: If
: the client limits the scan to a non-existent range partition through
either
: predicates or primary key bounds no results will be returned at all.
> Right, no error in that case. But I'm more thinking about the case where th
I'm coming around to the always-no-error perspective. See Todd's argument
below: tablets are an implementation detail, and so from the clients'
perspective, a scan in-range without data is semantically equivalent to a scan
out-of-range without data.

Line 107:
> Currently, the meta cache for both clients is implemented as a sorted (tree
Right, you've answered my first question, but not the second. Or do you expect
the number of tablets per table to remain roughly the same?

Line 121: only
: recontacting the master after a configurable timeout.
> @adar: when the application attempts to write into a range that the meta ca
I see, so the configurable timeout you wrote about is for "negative" lookup
results. I didn't understand that in my first read through; could you clarify
it in the doc?

In any case, I think a negative cache makes sense, provided it's reasonable
smart. For example, could it track upper and lower bounds of negative space? Or
merely certain points (i.e. individual rows) where no range existed? I think
the answer largely depends on how much additional information the server
provides with TabletNotFound: if you try to write row x, the server could say
"TabletNotFound, the upper bound of the previous tablet is x-10 and the lower
bound of the next tablet is x+3". Obviously tracking missing space as a set of
ranges and not points is advantageous: it means an attempt to insert N rows
with different keys all outside of a range won't result in N lookups.

I suggested the following pathological case on Slack for motivation for a
negative cache: one "bad" client is repeatedly trying to insert rows that don't
exist, and as a result, is placing a lot of load on the master, which could
affect other "good" clients. This won't be as much of an issue in the future
when clients could go to any master for read operations, but it could be an
issue now.

Line 131: Unlike the add range partition case in which a
: client can not know whether a new range partition has been added
since the last
: master lookup, during a drop range partition the client will be able
to
: recognize a dropped tablet when trying to insert or scan the tablet
> 1. is correct
We discussed this on Slack and now I understand what you mean. To summarize:

1. Without a negative cache, ADD and DROP are symmetric, because all operations
will end up going to a server, either the tserver (meta cache hit for a
scan/write), or the master (meta cache miss).
2. With a negative cache, ADD becomes more problematic, because the meta cache
can now "hit, but fail locally" an operation that would have succeeded had it
been allowed to go to the server (i.e. an operation on a range that was just
added). This is asymmetric with respect to DROP because the regular existence
cache behavior is to allow the operation to proceed; a dropped range would
yield a server response that could be used to invalidate the existence cache.

--
To view, visit http://gerrit.cloudera.org:8080/2772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3e530eda60c00faf066c41b6bdb2b37f6d96a5dc
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Binglin Chang <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] Non-covering Range Partitions design doc

Reply via email to