Adar Dembo has posted comments on this change. Change subject: Non-covering Range Partitions design doc ......................................................................
Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/2772/1/docs/design-docs/non-covering-range-partitions.md File docs/design-docs/non-covering-range-partitions.md: Line 69: but schema designers may find it useful to be able to use : both options > What I'm concerned about is the cognitive overhead of having two ways to ac If you're concerned about the cognitive overhead of having both range bounds and split rows, why not just deprecate split rows entirely? Range bounds are net more expressive, right? Line 87: RANGE BOUND (("North America"), ("North America\0")), : RANGE BOUND (("Europe"), ("Europe\0")), : RANGE BOUND (("Asia"), ("Asia\0")); > Having Kudu automatically create partitions is beyond the scope of this des Agreed that inclusive upper bound would remove some of the pain (nul terminators in string "point" ranges), leaving just the verbosity behind. Line 104: If : the client limits the scan to a non-existent range partition through either : predicates or primary key bounds no results will be returned at all. > Right, no error in that case. But I'm more thinking about the case where th I'm coming around to the always-no-error perspective. See Todd's argument below: tablets are an implementation detail, and so from the clients' perspective, a scan in-range without data is semantically equivalent to a scan out-of-range without data. Line 107: > Currently, the meta cache for both clients is implemented as a sorted (tree Right, you've answered my first question, but not the second. Or do you expect the number of tablets per table to remain roughly the same? Line 121: only : recontacting the master after a configurable timeout. > @adar: when the application attempts to write into a range that the meta ca I see, so the configurable timeout you wrote about is for "negative" lookup results. I didn't understand that in my first read through; could you clarify it in the doc? In any case, I think a negative cache makes sense, provided it's reasonable smart. For example, could it track upper and lower bounds of negative space? Or merely certain points (i.e. individual rows) where no range existed? I think the answer largely depends on how much additional information the server provides with TabletNotFound: if you try to write row x, the server could say "TabletNotFound, the upper bound of the previous tablet is x-10 and the lower bound of the next tablet is x+3". Obviously tracking missing space as a set of ranges and not points is advantageous: it means an attempt to insert N rows with different keys all outside of a range won't result in N lookups. I suggested the following pathological case on Slack for motivation for a negative cache: one "bad" client is repeatedly trying to insert rows that don't exist, and as a result, is placing a lot of load on the master, which could affect other "good" clients. This won't be as much of an issue in the future when clients could go to any master for read operations, but it could be an issue now. Line 131: Unlike the add range partition case in which a : client can not know whether a new range partition has been added since the last : master lookup, during a drop range partition the client will be able to : recognize a dropped tablet when trying to insert or scan the tablet > 1. is correct We discussed this on Slack and now I understand what you mean. To summarize: 1. Without a negative cache, ADD and DROP are symmetric, because all operations will end up going to a server, either the tserver (meta cache hit for a scan/write), or the master (meta cache miss). 2. With a negative cache, ADD becomes more problematic, because the meta cache can now "hit, but fail locally" an operation that would have succeeded had it been allowed to go to the server (i.e. an operation on a range that was just added). This is asymmetric with respect to DROP because the regular existence cache behavior is to allow the operation to proceed; a dropped range would yield a server response that could be used to invalidate the existence cache. -- To view, visit http://gerrit.cloudera.org:8080/2772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e530eda60c00faf066c41b6bdb2b37f6d96a5dc Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Binglin Chang <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
