Hi everyone, sorry for the late reply. Thanks Matthias for your feedback. I think it makes sense to reconsider the current design based on your input.
After digging deeper into the current implementation, I'd like to bring my current understanding to be double-checked as it might be redefining the KIP's scope: 1. There are 2 ranges been exposed by different stores: a. Key Range b. Timestamp Range So far, we have discussed covering both. 2. Key Range functions do not provide ordering guarantees by design: ```ReadOnlyKeyValueStore.java /** * Get an iterator over a given range of keys. This iterator must be closed after use. * The returned iterator must be safe from {@link java.util.ConcurrentModificationException}s * and must not return null values. No ordering guarantees are provided. * ... */ KeyValueIterator<K, V> range(K from, K to); ``` Therefore, I'd propose removing Key range operations from the scope. 3. Timestamp Range operations happen at the SegmentsStore level (internal) API AFAICT, Segments wrappers handle all Timestamp ranges queries. I'd propose extending `Segments#segments(timeFrom, timeTo, backwards)` with a flag for backwards operations. As segments returned will be processed backwards, I'm not extending KeyValueStores to query each segment backwards as previous point 2. 4. Extend WindowStores implementations with a new WindowBackwardStore/ReadOnlyBackwardStore: ```java public interface ReadOnlyWindowBackwardStore<K, V> { WindowStoreIterator<V> backwardFetch(K key, Instant from, Instant to) throws IllegalArgumentException; KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to, Instant fromTime, Instant toTime) throws IllegalArgumentException; KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant from, Instant to) throws IllegalArgumentException; ``` 5. SessionStore is a bit different as it has fetch/find sessions spread between SessionStore and ReadOnlySessionStore. I'd propose a new interface `SessionBackwardStore` to expose backward find operations: ```java public interface SessionBackwardStore<K, AGG> { KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K key, final long earliestSessionEndTime, final long latestSessionStartTime); KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K keyFrom, final K keyTo, final long earliestSessionEndTime, final long latestSessionStartTime); } ``` If this understanding is correct I'll proceed to update the KIP based on this. Looking forward to your feedback. Thanks, Jorge. On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <mj...@apache.org> wrote: > Hey, > > Sorry that I am late to the game. I am not 100% convinced about the > current proposal. Using a new config as feature flag seems to be rather > "nasty" to me, and flipping from/to is a little bit too fancy for my > personal taste. > > I agree, that the original proposal using a "ReadDirection" enum is not > ideal either. > > Thus, I would like to put out a new idea: We could add a new interface > that offers new methods that return revers iterators. > > The KIP already proposes to add `reverseAll()` and it seems backward > incompatible to just add this method to `ReadOnlyKeyValueStore` and > `ReadOnlyWindowStore`. I don't think we could provide a useful default > implementation for custom stores and thus either break compatibility or > need add a default that just throws an exception. Neither seems to be a > good option. > > Using a new interface avoid this issue and allows users implementing > custom stores to opt-in by adding the interface to their stores. > Furthermore, we don't need any config. In the end, we encapsulte the > change into the store, and our runtime is agnostic to it (as it should be). > > The hierarchy becomes a little complex (but uses would not really see > the complexity): > > // exsiting > ReadOnlyKeyValueStore > KeyValueStore extend StateStore, ReadOnlyKeyValueStore > > > // helper interface; users don't care > // need similar ones for other stores > ReverseReadOnlyKeyValueStore { > KeyValueIterator<K, V> reverseRange(K from, K to); > KeyValueIterator<K, V> reverseAll(); > } > > > // two new user facing interfaces for kv-store > // need similar ones for other stores > ReadOnlyKeyValueStoreWithReverseIterators extends ReadOnlyKeyValueStore, > ReverseReadOnlyKeyValueStore > > KeyValueStoreWithReverseIterators extends KeyValueStore, > ReverseReadOnlyKeyValueStore > > > // updated (also internal) > // also need to update other built-in stores > RocksDB implements KeyValueStoreWithReverseIterators, BulkLoadingStore > > > In the end, user would only care about the two (for kv-case) new > interface that offer revers iterator (read only and regular) and can > cast stores accordingly in their Processors/Transformers or via IQ. > > > Btw: if we add revers iterator for KeyValue and Window store, should we > do the same for Session store? > > > > This might be more code to write, but I believe it provides the better > user experience. Thoughts? > > > > -Matthias > > > > > On 5/26/20 8:47 PM, John Roesler wrote: > > Sorry for my silence, Jorge, > > > > I've just taken another look, and I'm personally happy with the KIP. > > > > Thanks, > > -John > > > > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote: > >> If no additional comments, I will proceed to start the a vote thread. > >> > >> Thanks a lot for your feedback! > >> > >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya < > >> quilcate.jo...@gmail.com> wrote: > >> > >>> Thanks Sophie. I like the `reverseAll()` idea. > >>> > >>> I updated the KIP with your feedback. > >>> > >>> > >>> > >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman < > sop...@confluent.io> > >>> wrote: > >>> > >>>> Hm, the case of `all()` does seem to present a dilemma in the case of > >>>> variable-length keys. > >>>> > >>>> In the case of fixed-length keys, you can just compute the keys that > >>>> correspond > >>>> to the maximum and minimum serialized bytes, then perform a `range()` > >>>> query > >>>> instead of an `all()`. If your keys don't have a well-defined ordering > >>>> such > >>>> that > >>>> you can't determine the MAX_KEY, then you probably don't care about > the > >>>> iterator order anyway. > >>>> > >>>> But with variable-length keys, there is no MAX_KEY. If all your keys > were > >>>> just > >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only way to > >>>> figure out the > >>>> maximum key in the store is by using `all()` -- and without a reverse > >>>> iterator, you're > >>>> doomed to iterate through every single key just to answer that simple > >>>> question. > >>>> > >>>> That said, I still think determining the iterator order based on the > >>>> to/from bytes > >>>> makes a lot of intuitive sense and gives the API a nice symmetry. > What if > >>>> we > >>>> solved the `all()` problem by just giving `all()` a reverse form to > >>>> complement it? > >>>> Ie we would have `all()` and `reverseAll()`, or something to that > effect. > >>>> > >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya < > >>>> quilcate.jo...@gmail.com> wrote: > >>>> > >>>>> Thanks John. > >>>>> > >>>>> Agree. I like the first approach as well, with StreamsConfig flag > >>>> passing > >>>>> by via ProcessorContext. > >>>>> > >>>>> Another positive effect with "reverse parameters" is that in the > case of > >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide _which_ > pair > >>>> to > >>>>> flip, whether with `ReadDirection` enum it apply to both. > >>>>> > >>>>> The only issue I've found while reviewing the KIP is that `all()` > won't > >>>> fit > >>>>> within this approach. > >>>>> > >>>>> We could remove it from the KIP and argue that for WindowStore, > >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse > order, > >>>> and > >>>>> for KeyValueStore, no ordering guarantees are provided. > >>>>> > >>>>> If there is consensus with this changes, I will go and update the > KIP. > >>>>> > >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <vvcep...@apache.org> > >>>> wrote: > >>>>> > >>>>>> Hi Jorge, > >>>>>> > >>>>>> Thanks for that idea. I agree, a feature flag would protect anyone > >>>>>> who may be depending on the current behavior. > >>>>>> > >>>>>> It seems better to locate the feature flag in the initialization > >>>> logic of > >>>>>> the store, rather than have a method on the "live" store that > changes > >>>>>> its behavior on the fly. > >>>>>> > >>>>>> It seems like there are two options here, one is to add a new > config: > >>>>>> > >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION = > >>>>>> "enable.backwards.iteration > >>>>>> > >>>>>> Or we can add a feature flag in Materialized, like > >>>>>> > >>>>>> Materialized.enableBackwardsIteration() > >>>>>> > >>>>>> I think I'd personally lean toward the config, for the following > >>>> reason. > >>>>>> The concern that Sophie raised is that someone's program may depend > >>>>>> on the existing contract of getting an empty iterator. We don't want > >>>> to > >>>>>> switch behavior when they aren't expecting it, so we provide them a > >>>>>> config to assert that they _are_ expecting the new behavior, which > >>>>>> means they take responsibility for updating their code to expect the > >>>> new > >>>>>> behavior. > >>>>>> > >>>>>> There doesn't seem to be a reason to offer a choice of behaviors on > a > >>>>>> per-query, or per-store basis. We just want people to be not > surprised > >>>>>> by this change in general. > >>>>>> > >>>>>> What do you think? > >>>>>> Thanks, > >>>>>> -John > >>>>>> > >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote: > >>>>>>> Thank you both for the great feedback. > >>>>>>> > >>>>>>> I like the "fancy" proposal :), and how it removes the need for > >>>>>>> additional API methods. And with a feature flag on `StateStore`, > >>>>>>> disabled by default, should no break current users. > >>>>>>> > >>>>>>> The only side-effect I can think of is that: by moving the flag > >>>>> upwards, > >>>>>>> all later operations become affected; which might be ok for most > >>>> (all?) > >>>>>>> cases. I can't think of an scenario where this would be an issue, > >>>> just > >>>>>>> want to point this out. > >>>>>>> > >>>>>>> If moving to this approach, I'd like to check if I got this right > >>>>> before > >>>>>>> updating the KIP: > >>>>>>> > >>>>>>> - only `StateStore` will change by having a new method: > >>>>>>> `backwardIteration()`, `false` by default to keep things > compatible. > >>>>>>> - then all `*Stores` will have to update their implementation based > >>>> on > >>>>>>> this flag. > >>>>>>> > >>>>>>> > >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote: > >>>>>>>>> There's no possibility that someone could be relying > >>>>>>>>> on iterating over that range in increasing order, because that's > >>>> not > >>>>>> what > >>>>>>>>> happens. However, they could indeed be relying on getting an > >>>> empty > >>>>>>>> iterator > >>>>>>>> > >>>>>>>> I just meant that they might be relying on the assumption that the > >>>>>> range > >>>>>>>> query > >>>>>>>> will never return results with decreasing keys. The empty iterator > >>>>>> wouldn't > >>>>>>>> break that contract, but of course a surprise reverse iterator > >>>> would. > >>>>>>>> > >>>>>>>> FWIW I actually am in favor of automatically converting to a > >>>> reverse > >>>>>>>> iterator, > >>>>>>>> I just thought we should consider whether this should be off by > >>>>>> default or > >>>>>>>> even possible to disable at all. > >>>>>>>> > >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org > >>>>> > >>>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks for the response, Sophie, > >>>>>>>>> > >>>>>>>>> I wholeheartedly agree we should take as much into account as > >>>>> possible > >>>>>>>>> up front, rather than regretting our decisions later. I actually > >>>> do > >>>>>> share > >>>>>>>>> your vague sense of worry, which was what led me to say initially > >>>>>> that I > >>>>>>>>> thought my counterproposal might be "too fancy". Sometimes, it's > >>>>>> better > >>>>>>>>> to be explicit instead of "elegant", if we think more people > >>>> will be > >>>>>>>>> confused > >>>>>>>>> than not. > >>>>>>>>> > >>>>>>>>> I really don't think that there's any danger of "relying on a > >>>> bug" > >>>>>> here, > >>>>>>>>> although > >>>>>>>>> people certainly could be relying on current behavior. One thing > >>>> to > >>>>> be > >>>>>>>>> clear > >>>>>>>>> about (which I just left a more detailed comment in KAFKA-8159 > >>>>> about) > >>>>>> is > >>>>>>>>> that > >>>>>>>>> when we say something like key1 > key2, this ordering is defined > >>>> by > >>>>>> the > >>>>>>>>> serde's output and nothing else. > >>>>>>>>> > >>>>>>>>> Currently, thanks to your fix in > >>>>>> https://github.com/apache/kafka/pull/6521 > >>>>>>>>> , > >>>>>>>>> the store contract is that for range scans, if from > to, then > >>>> the > >>>>>> store > >>>>>>>>> must > >>>>>>>>> return an empty iterator. There's no possibility that someone > >>>> could > >>>>> be > >>>>>>>>> relying > >>>>>>>>> on iterating over that range in increasing order, because that's > >>>> not > >>>>>> what > >>>>>>>>> happens. However, they could indeed be relying on getting an > >>>> empty > >>>>>>>>> iterator. > >>>>>>>>> > >>>>>>>>> My counterproposal was to actually change this contract to say > >>>> that > >>>>>> the > >>>>>>>>> store > >>>>>>>>> must return an iterator over the keys in that range, but in the > >>>>>> reverse > >>>>>>>>> order. > >>>>>>>>> So, in addition to considering whether this idea is "too fancy" > >>>> (aka > >>>>>>>>> confusing), > >>>>>>>>> we should also consider the likelihood of breaking an existing > >>>>>> program with > >>>>>>>>> this behavior/contract change. > >>>>>>>>> > >>>>>>>>> To echo your clarification, I'm also not advocating strongly in > >>>>> favor > >>>>>> of my > >>>>>>>>> proposal. I just wanted to present it for consideration alongside > >>>>>> Jorge's > >>>>>>>>> original one. > >>>>>>>>> > >>>>>>>>> Thanks for raising these very good points, > >>>>>>>>> -John > >>>>>>>>> > >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote: > >>>>>>>>>>> Rather than working around it, I think we should just fix it > >>>>>>>>>> Now *that's* a "fancy" idea :P > >>>>>>>>>> > >>>>>>>>>> That was my primary concern, although I do have a vague sense of > >>>>>> worry > >>>>>>>>>> that we might be allowing users to get into trouble without > >>>>>> realizing it. > >>>>>>>>>> For example if their custom serdes suffer a similar bug as the > >>>>> above, > >>>>>>>>>> and/or > >>>>>>>>>> they rely on getting results in increasing order (of the keys) > >>>> even > >>>>>> when > >>>>>>>>>> to < from. Maybe they're relying on the fact that the range > >>>> query > >>>>>> returns > >>>>>>>>>> nothing in that case. > >>>>>>>>>> > >>>>>>>>>> Not sure if that qualifies as relying on a bug or not, but in > >>>> that > >>>>>> past > >>>>>>>>>> we've > >>>>>>>>>> taken the stance that we should not break compatibility even if > >>>> the > >>>>>> user > >>>>>>>>>> was relying on bugs or unintentional behavior. > >>>>>>>>>> > >>>>>>>>>> Just to clarify I'm not advocating strongly against this > >>>> proposal, > >>>>>> just > >>>>>>>>>> laying > >>>>>>>>>> out some considerations we should take into account. At the end > >>>> of > >>>>>> the > >>>>>>>>> day > >>>>>>>>>> we should do what's right rather than maintain compatibility > >>>> with > >>>>>>>>> existing > >>>>>>>>>> bugs, but sometimes there's a reasonable middle ground. > >>>>>>>>>> > >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler < > >>>> vvcep...@apache.org> > >>>>>>>>> wrote: > >>>>>>>>>>> Thanks Sophie, > >>>>>>>>>>> > >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it, I > >>>> think > >>>>> we > >>>>>>>>> should > >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira. > >>>>>>>>>>> > >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern that some > >>>>>> serdes > >>>>>>>>>>> might be incorrectly written. > >>>>>>>>>>> > >>>>>>>>>>> Were there other practical concerns that you had in mind? > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> John > >>>>>>>>>>> > >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote: > >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes but I > >>>>> think > >>>>>>>>> there > >>>>>>>>>>>> are some practical limitations to implementing this. In > >>>>> particular > >>>>>>>>>>>> I'm thinking about this issue > >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the > >>>>>> built-in > >>>>>>>>>>> signed > >>>>>>>>>>>> number serdes. > >>>>>>>>>>>> > >>>>>>>>>>>> This trick would actually fix the problem for > >>>> negative-negative > >>>>>>>>> queries > >>>>>>>>>>>> (ie where to & from are negative) but would cause undetectable > >>>>>>>>>>>> incorrect results for negative-positive queries. For example, > >>>> say > >>>>>> you > >>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short serdes. > >>>>> The > >>>>>>>>>>>> serialized bytes for that are > >>>>>>>>>>>> > >>>>>>>>>>>> from = 1111111111111111 > >>>>>>>>>>>> to = 0000000000000001 > >>>>>>>>>>>> > >>>>>>>>>>>> so we would end up flipping those and iterating over all keys > >>>>> from > >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in > >>>>> lexicographical > >>>>>>>>>>>> order means we would iterate over every key in the space > >>>> *except* > >>>>>> for > >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be > >>>> included > >>>>>> in > >>>>>>>>> the > >>>>>>>>>>>> range query. > >>>>>>>>>>>> > >>>>>>>>>>>> Currently we just log a warning and return an empty iterator > >>>> when > >>>>>>>>>>>> to < from, which is obviously also incorrect but feels > >>>> slightly > >>>>>> more > >>>>>>>>>>>> palatable. If we start automatically converting to reverse > >>>>> queries > >>>>>> we > >>>>>>>>>>>> can't even log a warning in this case unless we wanted to log > >>>> a > >>>>>>>>> warning > >>>>>>>>>>>> every time, which would be weird to do for a valid usage of a > >>>> new > >>>>>>>>>>>> feature. > >>>>>>>>>>>> > >>>>>>>>>>>> All that said, I still like the idea overall. Off the top of > >>>> my > >>>>>> head > >>>>>>>>> I > >>>>>>>>>>> guess > >>>>>>>>>>>> we could add a store config to enable/disable automatic > >>>> reverse > >>>>>>>>>>> iteration, > >>>>>>>>>>>> which is off by default? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition > >>>>>>>>>>>> > >>>>>>>>>>>> Sophie > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler < > >>>>> vvcep...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> Hi there Jorge, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks for the KIP! > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think this feature sounds very reasonable. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do you > >>>> think > >>>>>>>>>>>>> about avoiding the enum by instead allowing people to flip > >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z" > >>>>> would > >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a backward > >>>> one? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> -John > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote: > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I would like to start the discussion for KIP-617: > >>>>>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards > >>>>>>>>>>>>>> Looking forward to your feedback. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks! > >>>>>>>>>>>>>> Jorge. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Attachments: > >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc > >>>>>>> > >>>>>>> > >>>>>>> Attachments: > >>>>>>> * 0x5F2C6E22064982DF.asc > >>>>>> > >>>>> > >>>> > >>> > >> > >