Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Jorge Esteban Quilcate Otoya Wed, 01 Jul 2020 15:02:08 -0700

Quick update: KIP is updated with latest changes now.
Will leave it open this week while working on the PR.


Hope to open a new vote thread over the next few days if no additional
feedback is provided.

Cheers,
Jorge.

On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
[email protected]> wrote:

> Thanks, John!
>
> Make sense to reconsider the current approach. I was heading in a similar
> direction while drafting the implementation. Metered, Caching, and other
> layers will also have to get duplicated to build up new methods in `Stores`
> factory, and class casting issues would appear on stores created by DSL.
>
> I will draft a proposal with new methods (move methods from proposed
> interfaces to existing ones) with default implementation in a KIP update
> and wait for Matthias to chime in to validate this approach.
>
> Jorge.
>
>
> On Sat, Jun 27, 2020 at 4:01 PM John Roesler <[email protected]> wrote:
>
>> Hi Jorge,
>>
>> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1 releases.
>>
>> The idea to separate the new methods into "mixin" interfaces seems
>> like a good one, but as we've discovered in KIP-614, it doesn't work
>> out that way in practice. The problem is that the store implementations
>> are just the base layer that get composed with other layers in Streams
>> before they can be accessed in the DSL. This is extremely subtle, so
>> I'm going to put everyone to sleep with a detailed explanation:
>>
>> For example, this is the mechanism by which all KeyValueStore
>> implementations get added to Streams:
>> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
>> return new MeteredKeyValueStore<>(
>>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
>>   storeSupplier.metricsScope(),
>>   time,
>>   keySerde,
>>   valueSerde
>> );
>>
>> In the DSL, the store that a processor gets from the context would be
>> the result of this composition. So even if the storeSupplier.get() returns
>> a store that implements the "reverse" interface, when you try to use it
>> from a processor like:
>> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
>> ReadOnlyBackwardWindowStore<K, V> store =
>>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
>>
>> You'd just get a ClassCastException because it's actually a
>> MeteredKeyValueStore, which doesn't implement
>> ReadOnlyBackwardWindowStore.
>>
>> The only way to make this work would be to make the Metered,
>> Caching, and Logging layers also implement the new interfaces,
>> but this effectively forces implementations to also implement
>> the interface. Otherwise, the intermediate layers would have to
>> cast the store in each method, like this:
>> MeteredWindowStore#backwardFetch {
>>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
>> }
>>
>> And then if the implementation doesn't "opt in" by implementing
>> the interface, you'd get a ClassCastException, not when you get the
>> store, but when you try to use it.
>>
>> The fact that we get ClassCastExceptions no matter which way we
>> turn here indicates that we're really not getting any benefit from the
>> type system, which makes the extra interfaces seem not worth all the
>> code involved.
>>
>> Where we landed in KIP-614 is that, unless we want to completely
>> revamp the way that StateStores work in the DSL, you might as
>> well just add the new methods to the existing interfaces. To prevent
>> compilation errors, we can add default implementations that throw
>> UnsupportedOperationException. If a store doesn't opt in by
>> implementing the methods, you'd get an UnsupportedOperationException,
>> which seems no worse, and maybe better, than the ClassCastException
>> you'd get if we go with the "mixin interface" approach.
>>
>> A quick note: This entire discussion focuses on the DSL. If you're just
>> using the Processor API by directly adding the a custom store to the
>> Topology:
>> org.apache.kafka.streams.Topology#addStateStore
>> and then retrieving it in the processor via:
>> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
>> in
>> org.apache.kafka.streams.processor.Processor#init
>>
>> Then, you can both register and retrieve _any_ StateStore implementation.
>> There's no need to use KeyValueStore or any other built-in interface.
>> In other words, KeyValueStore and company are only part of the DSL,
>> not the PAPI. So, discussions about the build-in store interfaces are only
>> really relevant in the context of the DSL, Transformers, and Materialized.
>>
>> So, in conclusion, I'd really recommend just adding any new methods to
>> the existing store interfaces. We might be able to revamp the API in the
>> future to support mixins, but it's a much larger scope of work than this
>> KIP.
>> A more minor comment is that we don't need to add Deprecated variants
>> of new methods.
>>
>> Thanks again, and once again, I'm sorry I tuned out and didn't offer this
>> feedback before you revised the KIP.
>> -John
>>
>>
>>
>>
>> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
>> > Hi everyone,
>> >
>> > I've updated the KIP, applying Matthias' feedback regarding interface
>> > hierarchy.
>> >
>> > Also, following the last email, I think we can consider reverse
>> operations
>> > on KeyValue range as well, as implementation supports lexicographic
>> order.
>> >
>> > I considered different naming between Key-based ranges and Time-based
>> > ranges, and mitigate confusion when fetching keys and time ranges as
>> > WindowStore does:
>> >
>> > Key-based ranges: reverseRange(), reverseAll()
>> > Time-based ranges: backwardFetch()
>> >
>> > Then, key-based changes apply to KeyValueStore, and time-based changes
>> to
>> > Window and Session stores.
>> >
>> > Let me know if you have any questions.
>> >
>> > Thanks,
>> > Jorge.
>> >
>> >
>> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
>> > [email protected]> wrote:
>> >
>> > > Hi everyone, sorry for the late reply.
>> > >
>> > > Thanks Matthias for your feedback. I think it makes sense to
>> reconsider
>> > > the current design based on your input.
>> > >
>> > > After digging deeper into the current implementation, I'd like to
>> bring my
>> > > current understanding to be double-checked as it might be redefining
>> the
>> > > KIP's scope:
>> > >
>> > > 1. There are 2 ranges been exposed by different stores:
>> > >
>> > > a. Key Range
>> > > b. Timestamp Range
>> > >
>> > > So far, we have discussed covering both.
>> > >
>> > > 2. Key Range functions do not provide ordering guarantees by design:
>> > >
>> > > ```ReadOnlyKeyValueStore.java
>> > >     /**
>> > >      * Get an iterator over a given range of keys. This iterator must
>> be
>> > > closed after use.
>> > >      * The returned iterator must be safe from {@link
>> > > java.util.ConcurrentModificationException}s
>> > >      * and must not return null values. No ordering guarantees are
>> > > provided.
>> > >      * ...
>> > >      */
>> > >      KeyValueIterator<K, V> range(K from, K to);
>> > > ```
>> > >
>> > > Therefore, I'd propose removing Key range operations from the scope.
>> > >
>> > > 3. Timestamp Range operations happen at the SegmentsStore level
>> (internal)
>> > > API
>> > >
>> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
>> > >
>> > > I'd propose extending `Segments#segments(timeFrom, timeTo, backwards)`
>> > > with a flag for backwards operations.
>> > >
>> > > As segments returned will be processed backwards, I'm not extending
>> > > KeyValueStores to query each segment backwards as previous point 2.
>> > >
>> > > 4. Extend WindowStores implementations with a new
>> > > WindowBackwardStore/ReadOnlyBackwardStore:
>> > >
>> > > ```java
>> > > public interface ReadOnlyWindowBackwardStore<K, V> {
>> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from, Instant
>> to)
>> > > throws IllegalArgumentException;
>> > >
>> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
>> Instant
>> > > fromTime, Instant toTime)
>> > >         throws IllegalArgumentException;
>> > >
>> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant from,
>> > > Instant to) throws IllegalArgumentException;
>> > > ```
>> > >
>> > > 5. SessionStore is a bit different as it has fetch/find sessions
>> spread
>> > > between SessionStore and ReadOnlySessionStore.
>> > >
>> > > I'd propose a new interface `SessionBackwardStore` to expose backward
>> find
>> > > operations:
>> > >
>> > > ```java
>> > > public interface SessionBackwardStore<K, AGG> {
>> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
>> key,
>> > > final long earliestSessionEndTime, final long latestSessionStartTime);
>> > >
>> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
>> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final long
>> > > latestSessionStartTime);
>> > > }
>> > > ```
>> > >
>> > > If this understanding is correct I'll proceed to update the KIP based
>> on
>> > > this.
>> > >
>> > > Looking forward to your feedback.
>> > >
>> > > Thanks,
>> > > Jorge.
>> > >
>> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <[email protected]>
>> wrote:
>> > >
>> > >> Hey,
>> > >>
>> > >> Sorry that I am late to the game. I am not 100% convinced about the
>> > >> current proposal. Using a new config as feature flag seems to be
>> rather
>> > >> "nasty" to me, and flipping from/to is a little bit too fancy for my
>> > >> personal taste.
>> > >>
>> > >> I agree, that the original proposal using a "ReadDirection" enum is
>> not
>> > >> ideal either.
>> > >>
>> > >> Thus, I would like to put out a new idea: We could add a new
>> interface
>> > >> that offers new methods that return revers iterators.
>> > >>
>> > >> The KIP already proposes to add `reverseAll()` and it seems backward
>> > >> incompatible to just add this method to `ReadOnlyKeyValueStore` and
>> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
>> default
>> > >> implementation for custom stores and thus either break compatibility
>> or
>> > >> need add a default that just throws an exception. Neither seems to
>> be a
>> > >> good option.
>> > >>
>> > >> Using a new interface avoid this issue and allows users implementing
>> > >> custom stores to opt-in by adding the interface to their stores.
>> > >> Furthermore, we don't need any config. In the end, we encapsulte the
>> > >> change into the store, and our runtime is agnostic to it (as it
>> should
>> > >> be).
>> > >>
>> > >> The hierarchy becomes a little complex (but uses would not really see
>> > >> the complexity):
>> > >>
>> > >> // exsiting
>> > >> ReadOnlyKeyValueStore
>> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
>> > >>
>> > >>
>> > >> // helper interface; users don't care
>> > >> // need similar ones for other stores
>> > >> ReverseReadOnlyKeyValueStore {
>> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
>> > >>     KeyValueIterator<K, V> reverseAll();
>> > >> }
>> > >>
>> > >>
>> > >> // two new user facing interfaces for kv-store
>> > >> // need similar ones for other stores
>> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
>> ReadOnlyKeyValueStore,
>> > >> ReverseReadOnlyKeyValueStore
>> > >>
>> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
>> > >> ReverseReadOnlyKeyValueStore
>> > >>
>> > >>
>> > >> // updated (also internal)
>> > >> // also need to update other built-in stores
>> > >> RocksDB implements KeyValueStoreWithReverseIterators,
>> BulkLoadingStore
>> > >>
>> > >>
>> > >> In the end, user would only care about the two (for kv-case) new
>> > >> interface that offer revers iterator (read only and regular) and can
>> > >> cast stores accordingly in their Processors/Transformers or via IQ.
>> > >>
>> > >>
>> > >> Btw: if we add revers iterator for KeyValue and Window store, should
>> we
>> > >> do the same for Session store?
>> > >>
>> > >>
>> > >>
>> > >> This might be more code to write, but I believe it provides the
>> better
>> > >> user experience. Thoughts?
>> > >>
>> > >>
>> > >>
>> > >> -Matthias
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On 5/26/20 8:47 PM, John Roesler wrote:
>> > >> > Sorry for my silence, Jorge,
>> > >> >
>> > >> > I've just taken another look, and I'm personally happy with the
>> KIP.
>> > >> >
>> > >> > Thanks,
>> > >> > -John
>> > >> >
>> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote:
>> > >> >> If no additional comments, I will proceed to start the a vote
>> thread.
>> > >> >>
>> > >> >> Thanks a lot for your feedback!
>> > >> >>
>> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
>> > >> >> [email protected]> wrote:
>> > >> >>
>> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
>> > >> >>>
>> > >> >>> I updated the KIP with your feedback.
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
>> > >> [email protected]>
>> > >> >>> wrote:
>> > >> >>>
>> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in the
>> case of
>> > >> >>>> variable-length keys.
>> > >> >>>>
>> > >> >>>> In the case of fixed-length keys, you can just compute the keys
>> that
>> > >> >>>> correspond
>> > >> >>>> to the maximum and minimum serialized bytes, then perform a
>> `range()`
>> > >> >>>> query
>> > >> >>>> instead of an `all()`. If your keys don't have a well-defined
>> > >> ordering
>> > >> >>>> such
>> > >> >>>> that
>> > >> >>>> you can't determine the MAX_KEY, then you probably don't care
>> about
>> > >> the
>> > >> >>>> iterator order anyway.
>> > >> >>>>
>> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If all your
>> > >> keys were
>> > >> >>>> just
>> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only
>> way
>> > >> to
>> > >> >>>> figure out the
>> > >> >>>> maximum key in the store is by using `all()` -- and without a
>> reverse
>> > >> >>>> iterator, you're
>> > >> >>>> doomed to iterate through every single key just to answer that
>> simple
>> > >> >>>> question.
>> > >> >>>>
>> > >> >>>> That said, I still think determining the iterator order based
>> on the
>> > >> >>>> to/from bytes
>> > >> >>>> makes a lot of intuitive sense and gives the API a nice
>> symmetry.
>> > >> What if
>> > >> >>>> we
>> > >> >>>> solved the `all()` problem by just giving `all()` a reverse
>> form to
>> > >> >>>> complement it?
>> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something to
>> that
>> > >> effect.
>> > >> >>>>
>> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya <
>> > >> >>>> [email protected]> wrote:
>> > >> >>>>
>> > >> >>>>> Thanks John.
>> > >> >>>>>
>> > >> >>>>> Agree. I like the first approach as well, with StreamsConfig
>> flag
>> > >> >>>> passing
>> > >> >>>>> by via ProcessorContext.
>> > >> >>>>>
>> > >> >>>>> Another positive effect with "reverse parameters" is that in
>> the
>> > >> case of
>> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide
>> _which_
>> > >> pair
>> > >> >>>> to
>> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
>> > >> >>>>>
>> > >> >>>>> The only issue I've found while reviewing the KIP is that
>> `all()`
>> > >> won't
>> > >> >>>> fit
>> > >> >>>>> within this approach.
>> > >> >>>>>
>> > >> >>>>> We could remove it from the KIP and argue that for WindowStore,
>> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse
>> > >> order,
>> > >> >>>> and
>> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
>> > >> >>>>>
>> > >> >>>>> If there is consensus with this changes, I will go and update
>> the
>> > >> KIP.
>> > >> >>>>>
>> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
>> [email protected]>
>> > >> >>>> wrote:
>> > >> >>>>>
>> > >> >>>>>> Hi Jorge,
>> > >> >>>>>>
>> > >> >>>>>> Thanks for that idea. I agree, a feature flag would protect
>> anyone
>> > >> >>>>>> who may be depending on the current behavior.
>> > >> >>>>>>
>> > >> >>>>>> It seems better to locate the feature flag in the
>> initialization
>> > >> >>>> logic of
>> > >> >>>>>> the store, rather than have a method on the "live" store that
>> > >> changes
>> > >> >>>>>> its behavior on the fly.
>> > >> >>>>>>
>> > >> >>>>>> It seems like there are two options here, one is to add a new
>> > >> config:
>> > >> >>>>>>
>> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>> > >> >>>>>>   "enable.backwards.iteration
>> > >> >>>>>>
>> > >> >>>>>> Or we can add a feature flag in Materialized, like
>> > >> >>>>>>
>> > >> >>>>>> Materialized.enableBackwardsIteration()
>> > >> >>>>>>
>> > >> >>>>>> I think I'd personally lean toward the config, for the
>> following
>> > >> >>>> reason.
>> > >> >>>>>> The concern that Sophie raised is that someone's program may
>> depend
>> > >> >>>>>> on the existing contract of getting an empty iterator. We
>> don't
>> > >> want
>> > >> >>>> to
>> > >> >>>>>> switch behavior when they aren't expecting it, so we provide
>> them a
>> > >> >>>>>> config to assert that they _are_ expecting the new behavior,
>> which
>> > >> >>>>>> means they take responsibility for updating their code to
>> expect
>> > >> the
>> > >> >>>> new
>> > >> >>>>>> behavior.
>> > >> >>>>>>
>> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
>> behaviors
>> > >> on a
>> > >> >>>>>> per-query, or per-store basis. We just want people to be not
>> > >> surprised
>> > >> >>>>>> by this change in general.
>> > >> >>>>>>
>> > >> >>>>>> What do you think?
>> > >> >>>>>> Thanks,
>> > >> >>>>>> -John
>> > >> >>>>>>
>> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
>> > >> >>>>>>> Thank you both for the great feedback.
>> > >> >>>>>>>
>> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the need
>> for
>> > >> >>>>>>> additional API methods. And with a feature flag on
>> `StateStore`,
>> > >> >>>>>>> disabled by default, should no break current users.
>> > >> >>>>>>>
>> > >> >>>>>>> The only side-effect I can think of is that: by moving the
>> flag
>> > >> >>>>> upwards,
>> > >> >>>>>>> all later operations become affected; which might be ok for
>> most
>> > >> >>>> (all?)
>> > >> >>>>>>> cases. I can't think of an scenario where this would be an
>> issue,
>> > >> >>>> just
>> > >> >>>>>>> want to point this out.
>> > >> >>>>>>>
>> > >> >>>>>>> If moving to this approach, I'd like to check if I got this
>> right
>> > >> >>>>> before
>> > >> >>>>>>> updating the KIP:
>> > >> >>>>>>>
>> > >> >>>>>>> - only `StateStore` will change by having a new method:
>> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things
>> > >> compatible.
>> > >> >>>>>>> - then all `*Stores` will have to update their implementation
>> > >> based
>> > >> >>>> on
>> > >> >>>>>>> this flag.
>> > >> >>>>>>>
>> > >> >>>>>>>
>> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>> > >> >>>>>>>>> There's no possibility that someone could be relying
>> > >> >>>>>>>>> on iterating over that range in increasing order, because
>> that's
>> > >> >>>> not
>> > >> >>>>>> what
>> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
>> an
>> > >> >>>> empty
>> > >> >>>>>>>> iterator
>> > >> >>>>>>>>
>> > >> >>>>>>>> I just meant that they might be relying on the assumption
>> that
>> > >> the
>> > >> >>>>>> range
>> > >> >>>>>>>> query
>> > >> >>>>>>>> will never return results with decreasing keys. The empty
>> > >> iterator
>> > >> >>>>>> wouldn't
>> > >> >>>>>>>> break that contract, but of course a surprise reverse
>> iterator
>> > >> >>>> would.
>> > >> >>>>>>>>
>> > >> >>>>>>>> FWIW I actually am in favor of automatically converting to a
>> > >> >>>> reverse
>> > >> >>>>>>>> iterator,
>> > >> >>>>>>>> I just thought we should consider whether this should be
>> off by
>> > >> >>>>>> default or
>> > >> >>>>>>>> even possible to disable at all.
>> > >> >>>>>>>>
>> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
>> > >> [email protected]
>> > >> >>>>>
>> > >> >>>>>> wrote:
>> > >> >>>>>>>>
>> > >> >>>>>>>>> Thanks for the response, Sophie,
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> I wholeheartedly agree we should take as much into account
>> as
>> > >> >>>>> possible
>> > >> >>>>>>>>> up front, rather than regretting our decisions later. I
>> actually
>> > >> >>>> do
>> > >> >>>>>> share
>> > >> >>>>>>>>> your vague sense of worry, which was what led me to say
>> > >> initially
>> > >> >>>>>> that I
>> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
>> Sometimes, it's
>> > >> >>>>>> better
>> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
>> people
>> > >> >>>> will be
>> > >> >>>>>>>>> confused
>> > >> >>>>>>>>> than not.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> I really don't think that there's any danger of "relying
>> on a
>> > >> >>>> bug"
>> > >> >>>>>> here,
>> > >> >>>>>>>>> although
>> > >> >>>>>>>>> people certainly could be relying on current behavior. One
>> thing
>> > >> >>>> to
>> > >> >>>>> be
>> > >> >>>>>>>>> clear
>> > >> >>>>>>>>> about (which I just left a more detailed comment in
>> KAFKA-8159
>> > >> >>>>> about)
>> > >> >>>>>> is
>> > >> >>>>>>>>> that
>> > >> >>>>>>>>> when we say something like key1 > key2, this ordering is
>> defined
>> > >> >>>> by
>> > >> >>>>>> the
>> > >> >>>>>>>>> serde's output and nothing else.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> Currently, thanks to your fix in
>> > >> >>>>>> https://github.com/apache/kafka/pull/6521
>> > >> >>>>>>>>> ,
>> > >> >>>>>>>>> the store contract is that for range scans, if from > to,
>> then
>> > >> >>>> the
>> > >> >>>>>> store
>> > >> >>>>>>>>> must
>> > >> >>>>>>>>> return an empty iterator. There's no possibility that
>> someone
>> > >> >>>> could
>> > >> >>>>> be
>> > >> >>>>>>>>> relying
>> > >> >>>>>>>>> on iterating over that range in increasing order, because
>> that's
>> > >> >>>> not
>> > >> >>>>>> what
>> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
>> an
>> > >> >>>> empty
>> > >> >>>>>>>>> iterator.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> My counterproposal was to actually change this contract to
>> say
>> > >> >>>> that
>> > >> >>>>>> the
>> > >> >>>>>>>>> store
>> > >> >>>>>>>>> must return an iterator over the keys in that range, but
>> in the
>> > >> >>>>>> reverse
>> > >> >>>>>>>>> order.
>> > >> >>>>>>>>> So, in addition to considering whether this idea is "too
>> fancy"
>> > >> >>>> (aka
>> > >> >>>>>>>>> confusing),
>> > >> >>>>>>>>> we should also consider the likelihood of breaking an
>> existing
>> > >> >>>>>> program with
>> > >> >>>>>>>>> this behavior/contract change.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
>> strongly in
>> > >> >>>>> favor
>> > >> >>>>>> of my
>> > >> >>>>>>>>> proposal. I just wanted to present it for consideration
>> > >> alongside
>> > >> >>>>>> Jorge's
>> > >> >>>>>>>>> original one.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> Thanks for raising these very good points,
>> > >> >>>>>>>>> -John
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
>> > >> >>>>>>>>>>> Rather than working around it, I think we should just
>> fix it
>> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> That was my primary concern, although I do have a vague
>> sense
>> > >> of
>> > >> >>>>>> worry
>> > >> >>>>>>>>>> that we might be allowing users to get into trouble
>> without
>> > >> >>>>>> realizing it.
>> > >> >>>>>>>>>> For example if their custom serdes suffer a similar bug
>> as the
>> > >> >>>>> above,
>> > >> >>>>>>>>>> and/or
>> > >> >>>>>>>>>> they rely on getting results in increasing order (of the
>> keys)
>> > >> >>>> even
>> > >> >>>>>> when
>> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the
>> range
>> > >> >>>> query
>> > >> >>>>>> returns
>> > >> >>>>>>>>>> nothing in that case.
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not,
>> but in
>> > >> >>>> that
>> > >> >>>>>> past
>> > >> >>>>>>>>>> we've
>> > >> >>>>>>>>>> taken the stance that we should not break compatibility
>> even if
>> > >> >>>> the
>> > >> >>>>>> user
>> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against this
>> > >> >>>> proposal,
>> > >> >>>>>> just
>> > >> >>>>>>>>>> laying
>> > >> >>>>>>>>>> out some considerations we should take into account. At
>> the end
>> > >> >>>> of
>> > >> >>>>>> the
>> > >> >>>>>>>>> day
>> > >> >>>>>>>>>> we should do what's right rather than maintain
>> compatibility
>> > >> >>>> with
>> > >> >>>>>>>>> existing
>> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle ground.
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>> > >> >>>> [email protected]>
>> > >> >>>>>>>>> wrote:
>> > >> >>>>>>>>>>> Thanks Sophie,
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it,
>> I
>> > >> >>>> think
>> > >> >>>>> we
>> > >> >>>>>>>>> should
>> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern
>> that some
>> > >> >>>>>> serdes
>> > >> >>>>>>>>>>> might be incorrectly written.
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> Were there other practical concerns that you had in mind?
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> Thanks,
>> > >> >>>>>>>>>>> John
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
>> wrote:
>> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes
>> but I
>> > >> >>>>> think
>> > >> >>>>>>>>> there
>> > >> >>>>>>>>>>>> are some practical limitations to implementing this. In
>> > >> >>>>> particular
>> > >> >>>>>>>>>>>> I'm thinking about this issue
>> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159>
>> with the
>> > >> >>>>>> built-in
>> > >> >>>>>>>>>>> signed
>> > >> >>>>>>>>>>>> number serdes.
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> This trick would actually fix the problem for
>> > >> >>>> negative-negative
>> > >> >>>>>>>>> queries
>> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
>> > >> undetectable
>> > >> >>>>>>>>>>>> incorrect results for negative-positive queries. For
>> example,
>> > >> >>>> say
>> > >> >>>>>> you
>> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short
>> > >> serdes.
>> > >> >>>>> The
>> > >> >>>>>>>>>>>> serialized bytes for that are
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> from = 1111111111111111
>> > >> >>>>>>>>>>>> to = 0000000000000001
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> so we would end up flipping those and iterating over
>> all keys
>> > >> >>>>> from
>> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
>> > >> >>>>> lexicographical
>> > >> >>>>>>>>>>>> order means we would iterate over every key in the space
>> > >> >>>> *except*
>> > >> >>>>>> for
>> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be
>> > >> >>>> included
>> > >> >>>>>> in
>> > >> >>>>>>>>> the
>> > >> >>>>>>>>>>>> range query.
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty
>> iterator
>> > >> >>>> when
>> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but feels
>> > >> >>>> slightly
>> > >> >>>>>> more
>> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
>> reverse
>> > >> >>>>> queries
>> > >> >>>>>> we
>> > >> >>>>>>>>>>>> can't even log a warning in this case unless we wanted
>> to log
>> > >> >>>> a
>> > >> >>>>>>>>> warning
>> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
>> usage of a
>> > >> >>>> new
>> > >> >>>>>>>>>>>> feature.
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off the
>> top of
>> > >> >>>> my
>> > >> >>>>>> head
>> > >> >>>>>>>>> I
>> > >> >>>>>>>>>>> guess
>> > >> >>>>>>>>>>>> we could add a store config to enable/disable automatic
>> > >> >>>> reverse
>> > >> >>>>>>>>>>> iteration,
>> > >> >>>>>>>>>>>> which is off by default?
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> Sophie
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>> > >> >>>>> [email protected]>
>> > >> >>>>>>>>>>> wrote:
>> > >> >>>>>>>>>>>>> Hi there Jorge,
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> Thanks for the KIP!
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do
>> you
>> > >> >>>> think
>> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people to
>> flip
>> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A"
>> to "Z"
>> > >> >>>>> would
>> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
>> backward
>> > >> >>>> one?
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> Thanks,
>> > >> >>>>>>>>>>>>> -John
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
>> > >> >>>>>>>>>>>>>> Hi everyone,
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>
>> > >> >>>>>>
>> > >> >>>>>
>> > >> >>>>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>> Thanks!
>> > >> >>>>>>>>>>>>>> Jorge.
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>> Attachments:
>> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>> > >> >>>>>>>
>> > >> >>>>>>>
>> > >> >>>>>>> Attachments:
>> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
>> > >> >>>>>>
>> > >> >>>>>
>> > >> >>>>
>> > >> >>>
>> > >> >>
>> > >>
>> > >>
>> >
>>
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Reply via email to