Hi everyone, I've updated the KIP, applying Matthias' feedback regarding interface hierarchy.
Also, following the last email, I think we can consider reverse operations on KeyValue range as well, as implementation supports lexicographic order. I considered different naming between Key-based ranges and Time-based ranges, and mitigate confusion when fetching keys and time ranges as WindowStore does: Key-based ranges: reverseRange(), reverseAll() Time-based ranges: backwardFetch() Then, key-based changes apply to KeyValueStore, and time-based changes to Window and Session stores. Let me know if you have any questions. Thanks, Jorge. On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya < quilcate.jo...@gmail.com> wrote: > Hi everyone, sorry for the late reply. > > Thanks Matthias for your feedback. I think it makes sense to reconsider > the current design based on your input. > > After digging deeper into the current implementation, I'd like to bring my > current understanding to be double-checked as it might be redefining the > KIP's scope: > > 1. There are 2 ranges been exposed by different stores: > > a. Key Range > b. Timestamp Range > > So far, we have discussed covering both. > > 2. Key Range functions do not provide ordering guarantees by design: > > ```ReadOnlyKeyValueStore.java > /** > * Get an iterator over a given range of keys. This iterator must be > closed after use. > * The returned iterator must be safe from {@link > java.util.ConcurrentModificationException}s > * and must not return null values. No ordering guarantees are > provided. > * ... > */ > KeyValueIterator<K, V> range(K from, K to); > ``` > > Therefore, I'd propose removing Key range operations from the scope. > > 3. Timestamp Range operations happen at the SegmentsStore level (internal) > API > > AFAICT, Segments wrappers handle all Timestamp ranges queries. > > I'd propose extending `Segments#segments(timeFrom, timeTo, backwards)` > with a flag for backwards operations. > > As segments returned will be processed backwards, I'm not extending > KeyValueStores to query each segment backwards as previous point 2. > > 4. Extend WindowStores implementations with a new > WindowBackwardStore/ReadOnlyBackwardStore: > > ```java > public interface ReadOnlyWindowBackwardStore<K, V> { > WindowStoreIterator<V> backwardFetch(K key, Instant from, Instant to) > throws IllegalArgumentException; > > KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to, Instant > fromTime, Instant toTime) > throws IllegalArgumentException; > > KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant from, > Instant to) throws IllegalArgumentException; > ``` > > 5. SessionStore is a bit different as it has fetch/find sessions spread > between SessionStore and ReadOnlySessionStore. > > I'd propose a new interface `SessionBackwardStore` to expose backward find > operations: > > ```java > public interface SessionBackwardStore<K, AGG> { > KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K key, > final long earliestSessionEndTime, final long latestSessionStartTime); > > KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K > keyFrom, final K keyTo, final long earliestSessionEndTime, final long > latestSessionStartTime); > } > ``` > > If this understanding is correct I'll proceed to update the KIP based on > this. > > Looking forward to your feedback. > > Thanks, > Jorge. > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <mj...@apache.org> wrote: > >> Hey, >> >> Sorry that I am late to the game. I am not 100% convinced about the >> current proposal. Using a new config as feature flag seems to be rather >> "nasty" to me, and flipping from/to is a little bit too fancy for my >> personal taste. >> >> I agree, that the original proposal using a "ReadDirection" enum is not >> ideal either. >> >> Thus, I would like to put out a new idea: We could add a new interface >> that offers new methods that return revers iterators. >> >> The KIP already proposes to add `reverseAll()` and it seems backward >> incompatible to just add this method to `ReadOnlyKeyValueStore` and >> `ReadOnlyWindowStore`. I don't think we could provide a useful default >> implementation for custom stores and thus either break compatibility or >> need add a default that just throws an exception. Neither seems to be a >> good option. >> >> Using a new interface avoid this issue and allows users implementing >> custom stores to opt-in by adding the interface to their stores. >> Furthermore, we don't need any config. In the end, we encapsulte the >> change into the store, and our runtime is agnostic to it (as it should >> be). >> >> The hierarchy becomes a little complex (but uses would not really see >> the complexity): >> >> // exsiting >> ReadOnlyKeyValueStore >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore >> >> >> // helper interface; users don't care >> // need similar ones for other stores >> ReverseReadOnlyKeyValueStore { >> KeyValueIterator<K, V> reverseRange(K from, K to); >> KeyValueIterator<K, V> reverseAll(); >> } >> >> >> // two new user facing interfaces for kv-store >> // need similar ones for other stores >> ReadOnlyKeyValueStoreWithReverseIterators extends ReadOnlyKeyValueStore, >> ReverseReadOnlyKeyValueStore >> >> KeyValueStoreWithReverseIterators extends KeyValueStore, >> ReverseReadOnlyKeyValueStore >> >> >> // updated (also internal) >> // also need to update other built-in stores >> RocksDB implements KeyValueStoreWithReverseIterators, BulkLoadingStore >> >> >> In the end, user would only care about the two (for kv-case) new >> interface that offer revers iterator (read only and regular) and can >> cast stores accordingly in their Processors/Transformers or via IQ. >> >> >> Btw: if we add revers iterator for KeyValue and Window store, should we >> do the same for Session store? >> >> >> >> This might be more code to write, but I believe it provides the better >> user experience. Thoughts? >> >> >> >> -Matthias >> >> >> >> >> On 5/26/20 8:47 PM, John Roesler wrote: >> > Sorry for my silence, Jorge, >> > >> > I've just taken another look, and I'm personally happy with the KIP. >> > >> > Thanks, >> > -John >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote: >> >> If no additional comments, I will proceed to start the a vote thread. >> >> >> >> Thanks a lot for your feedback! >> >> >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya < >> >> quilcate.jo...@gmail.com> wrote: >> >> >> >>> Thanks Sophie. I like the `reverseAll()` idea. >> >>> >> >>> I updated the KIP with your feedback. >> >>> >> >>> >> >>> >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman < >> sop...@confluent.io> >> >>> wrote: >> >>> >> >>>> Hm, the case of `all()` does seem to present a dilemma in the case of >> >>>> variable-length keys. >> >>>> >> >>>> In the case of fixed-length keys, you can just compute the keys that >> >>>> correspond >> >>>> to the maximum and minimum serialized bytes, then perform a `range()` >> >>>> query >> >>>> instead of an `all()`. If your keys don't have a well-defined >> ordering >> >>>> such >> >>>> that >> >>>> you can't determine the MAX_KEY, then you probably don't care about >> the >> >>>> iterator order anyway. >> >>>> >> >>>> But with variable-length keys, there is no MAX_KEY. If all your >> keys were >> >>>> just >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only way >> to >> >>>> figure out the >> >>>> maximum key in the store is by using `all()` -- and without a reverse >> >>>> iterator, you're >> >>>> doomed to iterate through every single key just to answer that simple >> >>>> question. >> >>>> >> >>>> That said, I still think determining the iterator order based on the >> >>>> to/from bytes >> >>>> makes a lot of intuitive sense and gives the API a nice symmetry. >> What if >> >>>> we >> >>>> solved the `all()` problem by just giving `all()` a reverse form to >> >>>> complement it? >> >>>> Ie we would have `all()` and `reverseAll()`, or something to that >> effect. >> >>>> >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya < >> >>>> quilcate.jo...@gmail.com> wrote: >> >>>> >> >>>>> Thanks John. >> >>>>> >> >>>>> Agree. I like the first approach as well, with StreamsConfig flag >> >>>> passing >> >>>>> by via ProcessorContext. >> >>>>> >> >>>>> Another positive effect with "reverse parameters" is that in the >> case of >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide _which_ >> pair >> >>>> to >> >>>>> flip, whether with `ReadDirection` enum it apply to both. >> >>>>> >> >>>>> The only issue I've found while reviewing the KIP is that `all()` >> won't >> >>>> fit >> >>>>> within this approach. >> >>>>> >> >>>>> We could remove it from the KIP and argue that for WindowStore, >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse >> order, >> >>>> and >> >>>>> for KeyValueStore, no ordering guarantees are provided. >> >>>>> >> >>>>> If there is consensus with this changes, I will go and update the >> KIP. >> >>>>> >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <vvcep...@apache.org> >> >>>> wrote: >> >>>>> >> >>>>>> Hi Jorge, >> >>>>>> >> >>>>>> Thanks for that idea. I agree, a feature flag would protect anyone >> >>>>>> who may be depending on the current behavior. >> >>>>>> >> >>>>>> It seems better to locate the feature flag in the initialization >> >>>> logic of >> >>>>>> the store, rather than have a method on the "live" store that >> changes >> >>>>>> its behavior on the fly. >> >>>>>> >> >>>>>> It seems like there are two options here, one is to add a new >> config: >> >>>>>> >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION = >> >>>>>> "enable.backwards.iteration >> >>>>>> >> >>>>>> Or we can add a feature flag in Materialized, like >> >>>>>> >> >>>>>> Materialized.enableBackwardsIteration() >> >>>>>> >> >>>>>> I think I'd personally lean toward the config, for the following >> >>>> reason. >> >>>>>> The concern that Sophie raised is that someone's program may depend >> >>>>>> on the existing contract of getting an empty iterator. We don't >> want >> >>>> to >> >>>>>> switch behavior when they aren't expecting it, so we provide them a >> >>>>>> config to assert that they _are_ expecting the new behavior, which >> >>>>>> means they take responsibility for updating their code to expect >> the >> >>>> new >> >>>>>> behavior. >> >>>>>> >> >>>>>> There doesn't seem to be a reason to offer a choice of behaviors >> on a >> >>>>>> per-query, or per-store basis. We just want people to be not >> surprised >> >>>>>> by this change in general. >> >>>>>> >> >>>>>> What do you think? >> >>>>>> Thanks, >> >>>>>> -John >> >>>>>> >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote: >> >>>>>>> Thank you both for the great feedback. >> >>>>>>> >> >>>>>>> I like the "fancy" proposal :), and how it removes the need for >> >>>>>>> additional API methods. And with a feature flag on `StateStore`, >> >>>>>>> disabled by default, should no break current users. >> >>>>>>> >> >>>>>>> The only side-effect I can think of is that: by moving the flag >> >>>>> upwards, >> >>>>>>> all later operations become affected; which might be ok for most >> >>>> (all?) >> >>>>>>> cases. I can't think of an scenario where this would be an issue, >> >>>> just >> >>>>>>> want to point this out. >> >>>>>>> >> >>>>>>> If moving to this approach, I'd like to check if I got this right >> >>>>> before >> >>>>>>> updating the KIP: >> >>>>>>> >> >>>>>>> - only `StateStore` will change by having a new method: >> >>>>>>> `backwardIteration()`, `false` by default to keep things >> compatible. >> >>>>>>> - then all `*Stores` will have to update their implementation >> based >> >>>> on >> >>>>>>> this flag. >> >>>>>>> >> >>>>>>> >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote: >> >>>>>>>>> There's no possibility that someone could be relying >> >>>>>>>>> on iterating over that range in increasing order, because that's >> >>>> not >> >>>>>> what >> >>>>>>>>> happens. However, they could indeed be relying on getting an >> >>>> empty >> >>>>>>>> iterator >> >>>>>>>> >> >>>>>>>> I just meant that they might be relying on the assumption that >> the >> >>>>>> range >> >>>>>>>> query >> >>>>>>>> will never return results with decreasing keys. The empty >> iterator >> >>>>>> wouldn't >> >>>>>>>> break that contract, but of course a surprise reverse iterator >> >>>> would. >> >>>>>>>> >> >>>>>>>> FWIW I actually am in favor of automatically converting to a >> >>>> reverse >> >>>>>>>> iterator, >> >>>>>>>> I just thought we should consider whether this should be off by >> >>>>>> default or >> >>>>>>>> even possible to disable at all. >> >>>>>>>> >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler < >> vvcep...@apache.org >> >>>>> >> >>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Thanks for the response, Sophie, >> >>>>>>>>> >> >>>>>>>>> I wholeheartedly agree we should take as much into account as >> >>>>> possible >> >>>>>>>>> up front, rather than regretting our decisions later. I actually >> >>>> do >> >>>>>> share >> >>>>>>>>> your vague sense of worry, which was what led me to say >> initially >> >>>>>> that I >> >>>>>>>>> thought my counterproposal might be "too fancy". Sometimes, it's >> >>>>>> better >> >>>>>>>>> to be explicit instead of "elegant", if we think more people >> >>>> will be >> >>>>>>>>> confused >> >>>>>>>>> than not. >> >>>>>>>>> >> >>>>>>>>> I really don't think that there's any danger of "relying on a >> >>>> bug" >> >>>>>> here, >> >>>>>>>>> although >> >>>>>>>>> people certainly could be relying on current behavior. One thing >> >>>> to >> >>>>> be >> >>>>>>>>> clear >> >>>>>>>>> about (which I just left a more detailed comment in KAFKA-8159 >> >>>>> about) >> >>>>>> is >> >>>>>>>>> that >> >>>>>>>>> when we say something like key1 > key2, this ordering is defined >> >>>> by >> >>>>>> the >> >>>>>>>>> serde's output and nothing else. >> >>>>>>>>> >> >>>>>>>>> Currently, thanks to your fix in >> >>>>>> https://github.com/apache/kafka/pull/6521 >> >>>>>>>>> , >> >>>>>>>>> the store contract is that for range scans, if from > to, then >> >>>> the >> >>>>>> store >> >>>>>>>>> must >> >>>>>>>>> return an empty iterator. There's no possibility that someone >> >>>> could >> >>>>> be >> >>>>>>>>> relying >> >>>>>>>>> on iterating over that range in increasing order, because that's >> >>>> not >> >>>>>> what >> >>>>>>>>> happens. However, they could indeed be relying on getting an >> >>>> empty >> >>>>>>>>> iterator. >> >>>>>>>>> >> >>>>>>>>> My counterproposal was to actually change this contract to say >> >>>> that >> >>>>>> the >> >>>>>>>>> store >> >>>>>>>>> must return an iterator over the keys in that range, but in the >> >>>>>> reverse >> >>>>>>>>> order. >> >>>>>>>>> So, in addition to considering whether this idea is "too fancy" >> >>>> (aka >> >>>>>>>>> confusing), >> >>>>>>>>> we should also consider the likelihood of breaking an existing >> >>>>>> program with >> >>>>>>>>> this behavior/contract change. >> >>>>>>>>> >> >>>>>>>>> To echo your clarification, I'm also not advocating strongly in >> >>>>> favor >> >>>>>> of my >> >>>>>>>>> proposal. I just wanted to present it for consideration >> alongside >> >>>>>> Jorge's >> >>>>>>>>> original one. >> >>>>>>>>> >> >>>>>>>>> Thanks for raising these very good points, >> >>>>>>>>> -John >> >>>>>>>>> >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote: >> >>>>>>>>>>> Rather than working around it, I think we should just fix it >> >>>>>>>>>> Now *that's* a "fancy" idea :P >> >>>>>>>>>> >> >>>>>>>>>> That was my primary concern, although I do have a vague sense >> of >> >>>>>> worry >> >>>>>>>>>> that we might be allowing users to get into trouble without >> >>>>>> realizing it. >> >>>>>>>>>> For example if their custom serdes suffer a similar bug as the >> >>>>> above, >> >>>>>>>>>> and/or >> >>>>>>>>>> they rely on getting results in increasing order (of the keys) >> >>>> even >> >>>>>> when >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the range >> >>>> query >> >>>>>> returns >> >>>>>>>>>> nothing in that case. >> >>>>>>>>>> >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not, but in >> >>>> that >> >>>>>> past >> >>>>>>>>>> we've >> >>>>>>>>>> taken the stance that we should not break compatibility even if >> >>>> the >> >>>>>> user >> >>>>>>>>>> was relying on bugs or unintentional behavior. >> >>>>>>>>>> >> >>>>>>>>>> Just to clarify I'm not advocating strongly against this >> >>>> proposal, >> >>>>>> just >> >>>>>>>>>> laying >> >>>>>>>>>> out some considerations we should take into account. At the end >> >>>> of >> >>>>>> the >> >>>>>>>>> day >> >>>>>>>>>> we should do what's right rather than maintain compatibility >> >>>> with >> >>>>>>>>> existing >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle ground. >> >>>>>>>>>> >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler < >> >>>> vvcep...@apache.org> >> >>>>>>>>> wrote: >> >>>>>>>>>>> Thanks Sophie, >> >>>>>>>>>>> >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it, I >> >>>> think >> >>>>> we >> >>>>>>>>> should >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira. >> >>>>>>>>>>> >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern that some >> >>>>>> serdes >> >>>>>>>>>>> might be incorrectly written. >> >>>>>>>>>>> >> >>>>>>>>>>> Were there other practical concerns that you had in mind? >> >>>>>>>>>>> >> >>>>>>>>>>> Thanks, >> >>>>>>>>>>> John >> >>>>>>>>>>> >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote: >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes but I >> >>>>> think >> >>>>>>>>> there >> >>>>>>>>>>>> are some practical limitations to implementing this. In >> >>>>> particular >> >>>>>>>>>>>> I'm thinking about this issue >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the >> >>>>>> built-in >> >>>>>>>>>>> signed >> >>>>>>>>>>>> number serdes. >> >>>>>>>>>>>> >> >>>>>>>>>>>> This trick would actually fix the problem for >> >>>> negative-negative >> >>>>>>>>> queries >> >>>>>>>>>>>> (ie where to & from are negative) but would cause >> undetectable >> >>>>>>>>>>>> incorrect results for negative-positive queries. For example, >> >>>> say >> >>>>>> you >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short >> serdes. >> >>>>> The >> >>>>>>>>>>>> serialized bytes for that are >> >>>>>>>>>>>> >> >>>>>>>>>>>> from = 1111111111111111 >> >>>>>>>>>>>> to = 0000000000000001 >> >>>>>>>>>>>> >> >>>>>>>>>>>> so we would end up flipping those and iterating over all keys >> >>>>> from >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in >> >>>>> lexicographical >> >>>>>>>>>>>> order means we would iterate over every key in the space >> >>>> *except* >> >>>>>> for >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be >> >>>> included >> >>>>>> in >> >>>>>>>>> the >> >>>>>>>>>>>> range query. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Currently we just log a warning and return an empty iterator >> >>>> when >> >>>>>>>>>>>> to < from, which is obviously also incorrect but feels >> >>>> slightly >> >>>>>> more >> >>>>>>>>>>>> palatable. If we start automatically converting to reverse >> >>>>> queries >> >>>>>> we >> >>>>>>>>>>>> can't even log a warning in this case unless we wanted to log >> >>>> a >> >>>>>>>>> warning >> >>>>>>>>>>>> every time, which would be weird to do for a valid usage of a >> >>>> new >> >>>>>>>>>>>> feature. >> >>>>>>>>>>>> >> >>>>>>>>>>>> All that said, I still like the idea overall. Off the top of >> >>>> my >> >>>>>> head >> >>>>>>>>> I >> >>>>>>>>>>> guess >> >>>>>>>>>>>> we could add a store config to enable/disable automatic >> >>>> reverse >> >>>>>>>>>>> iteration, >> >>>>>>>>>>>> which is off by default? >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition >> >>>>>>>>>>>> >> >>>>>>>>>>>> Sophie >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler < >> >>>>> vvcep...@apache.org> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>> Hi there Jorge, >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks for the KIP! >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I think this feature sounds very reasonable. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do you >> >>>> think >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people to flip >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z" >> >>>>> would >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a backward >> >>>> one? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>> -John >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote: >> >>>>>>>>>>>>>> Hi everyone, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617: >> >>>>>>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>> >> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards >> >>>>>>>>>>>>>> Looking forward to your feedback. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thanks! >> >>>>>>>>>>>>>> Jorge. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Attachments: >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc >> >>>>>>> >> >>>>>>> >> >>>>>>> Attachments: >> >>>>>>> * 0x5F2C6E22064982DF.asc >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> >>