If no additional comments, I will proceed to start the a vote thread.

Thanks a lot for your feedback!

On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Sophie. I like the `reverseAll()` idea.
>
> I updated the KIP with your feedback.
>
>
>
> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <sop...@confluent.io>
> wrote:
>
>> Hm, the case of `all()` does seem to present a dilemma in the case of
>> variable-length keys.
>>
>> In the case of fixed-length keys, you can just compute the keys that
>> correspond
>> to the maximum and minimum serialized bytes, then perform a `range()`
>> query
>> instead of an `all()`. If your keys don't have a well-defined ordering
>> such
>> that
>> you can't determine the MAX_KEY, then you probably don't care about the
>> iterator order anyway.
>>
>>  But with variable-length keys, there is no MAX_KEY. If all your keys were
>> just
>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only way to
>> figure out the
>> maximum key in the store is by using `all()` -- and without a reverse
>> iterator, you're
>> doomed to iterate through every single key just to answer that simple
>> question.
>>
>> That said, I still think determining the iterator order based on the
>> to/from bytes
>> makes a lot of intuitive sense and gives the API a nice symmetry. What if
>> we
>> solved the `all()` problem by just giving `all()` a reverse form to
>> complement it?
>> Ie we would have `all()` and `reverseAll()`, or something to that effect.
>>
>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya <
>> quilcate.jo...@gmail.com> wrote:
>>
>> > Thanks John.
>> >
>> > Agree. I like the first approach as well, with StreamsConfig flag
>> passing
>> > by via ProcessorContext.
>> >
>> > Another positive effect with "reverse parameters" is that in the case of
>> > `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide _which_ pair
>> to
>> > flip, whether with `ReadDirection` enum it apply to both.
>> >
>> > The only issue I've found while reviewing the KIP is that `all()` won't
>> fit
>> > within this approach.
>> >
>> > We could remove it from the KIP and argue that for WindowStore,
>> > `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse order,
>> and
>> > for KeyValueStore, no ordering guarantees are provided.
>> >
>> > If there is consensus with this changes, I will go and update the KIP.
>> >
>> > On Thu, May 21, 2020 at 3:33 PM John Roesler <vvcep...@apache.org>
>> wrote:
>> >
>> > > Hi Jorge,
>> > >
>> > > Thanks for that idea. I agree, a feature flag would protect anyone
>> > > who may be depending on the current behavior.
>> > >
>> > > It seems better to locate the feature flag in the initialization
>> logic of
>> > > the store, rather than have a method on the "live" store that changes
>> > > its behavior on the fly.
>> > >
>> > > It seems like there are two options here, one is to add a new config:
>> > >
>> > > StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>> > >   "enable.backwards.iteration
>> > >
>> > > Or we can add a feature flag in Materialized, like
>> > >
>> > > Materialized.enableBackwardsIteration()
>> > >
>> > > I think I'd personally lean toward the config, for the following
>> reason.
>> > > The concern that Sophie raised is that someone's program may depend
>> > > on the existing contract of getting an empty iterator. We don't want
>> to
>> > > switch behavior when they aren't expecting it, so we provide them a
>> > > config to assert that they _are_ expecting the new behavior, which
>> > > means they take responsibility for updating their code to expect the
>> new
>> > > behavior.
>> > >
>> > > There doesn't seem to be a reason to offer a choice of behaviors on a
>> > > per-query, or per-store basis. We just want people to be not surprised
>> > > by this change in general.
>> > >
>> > > What do you think?
>> > > Thanks,
>> > > -John
>> > >
>> > > On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
>> > > > Thank you both for the great feedback.
>> > > >
>> > > > I like the "fancy" proposal :), and how it removes the need for
>> > > > additional API methods. And with a feature flag on `StateStore`,
>> > > > disabled by default, should no break current users.
>> > > >
>> > > > The only side-effect I can think of is that: by moving the flag
>> > upwards,
>> > > > all later operations become affected; which might be ok for most
>> (all?)
>> > > > cases. I can't think of an scenario where this would be an issue,
>> just
>> > > > want to point this out.
>> > > >
>> > > > If moving to this approach, I'd like to check if I got this right
>> > before
>> > > > updating the KIP:
>> > > >
>> > > > - only `StateStore` will change by having a new method:
>> > > > `backwardIteration()`, `false` by default to keep things compatible.
>> > > > - then all `*Stores` will have to update their implementation based
>> on
>> > > > this flag.
>> > > >
>> > > >
>> > > > On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>> > > > >> There's no possibility that someone could be relying
>> > > > >> on iterating over that range in increasing order, because that's
>> not
>> > > what
>> > > > >> happens. However, they could indeed be relying on getting an
>> empty
>> > > > > iterator
>> > > > >
>> > > > > I just meant that they might be relying on the assumption that the
>> > > range
>> > > > > query
>> > > > > will never return results with decreasing keys. The empty iterator
>> > > wouldn't
>> > > > > break that contract, but of course a surprise reverse iterator
>> would.
>> > > > >
>> > > > > FWIW I actually am in favor of automatically converting to a
>> reverse
>> > > > > iterator,
>> > > > > I just thought we should consider whether this should be off by
>> > > default or
>> > > > > even possible to disable at all.
>> > > > >
>> > > > > On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org
>> >
>> > > wrote:
>> > > > >
>> > > > >> Thanks for the response, Sophie,
>> > > > >>
>> > > > >> I wholeheartedly agree we should take as much into account as
>> > possible
>> > > > >> up front, rather than regretting our decisions later. I actually
>> do
>> > > share
>> > > > >> your vague sense of worry, which was what led me to say initially
>> > > that I
>> > > > >> thought my counterproposal might be "too fancy". Sometimes, it's
>> > > better
>> > > > >> to be explicit instead of "elegant", if we think more people
>> will be
>> > > > >> confused
>> > > > >> than not.
>> > > > >>
>> > > > >> I really don't think that there's any danger of "relying on a
>> bug"
>> > > here,
>> > > > >> although
>> > > > >> people certainly could be relying on current behavior. One thing
>> to
>> > be
>> > > > >> clear
>> > > > >> about (which I just left a more detailed comment in KAFKA-8159
>> > about)
>> > > is
>> > > > >> that
>> > > > >> when we say something like key1 > key2, this ordering is defined
>> by
>> > > the
>> > > > >> serde's output and nothing else.
>> > > > >>
>> > > > >> Currently, thanks to your fix in
>> > > https://github.com/apache/kafka/pull/6521
>> > > > >> ,
>> > > > >> the store contract is that for range scans, if from > to, then
>> the
>> > > store
>> > > > >> must
>> > > > >> return an empty iterator. There's no possibility that someone
>> could
>> > be
>> > > > >> relying
>> > > > >> on iterating over that range in increasing order, because that's
>> not
>> > > what
>> > > > >> happens. However, they could indeed be relying on getting an
>> empty
>> > > > >> iterator.
>> > > > >>
>> > > > >> My counterproposal was to actually change this contract to say
>> that
>> > > the
>> > > > >> store
>> > > > >> must return an iterator over the keys in that range, but in the
>> > > reverse
>> > > > >> order.
>> > > > >> So, in addition to considering whether this idea is "too fancy"
>> (aka
>> > > > >> confusing),
>> > > > >> we should also consider the likelihood of breaking an existing
>> > > program with
>> > > > >> this behavior/contract change.
>> > > > >>
>> > > > >> To echo your clarification, I'm also not advocating strongly in
>> > favor
>> > > of my
>> > > > >> proposal. I just wanted to present it for consideration alongside
>> > > Jorge's
>> > > > >> original one.
>> > > > >>
>> > > > >> Thanks for raising these very good points,
>> > > > >> -John
>> > > > >>
>> > > > >> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
>> > > > >>>> Rather than working around it, I think we should just fix it
>> > > > >>> Now *that's* a "fancy" idea :P
>> > > > >>>
>> > > > >>> That was my primary concern, although I do have a vague sense of
>> > > worry
>> > > > >>> that we might be allowing users to get into trouble without
>> > > realizing it.
>> > > > >>> For example if their custom serdes suffer a similar bug as the
>> > above,
>> > > > >>> and/or
>> > > > >>> they rely on getting results in increasing order (of the keys)
>> even
>> > > when
>> > > > >>> to < from. Maybe they're relying on the fact that the range
>> query
>> > > returns
>> > > > >>> nothing in that case.
>> > > > >>>
>> > > > >>> Not sure if that qualifies as relying on a bug or not, but in
>> that
>> > > past
>> > > > >>> we've
>> > > > >>> taken the stance that we should not break compatibility even if
>> the
>> > > user
>> > > > >>> was relying on bugs or unintentional behavior.
>> > > > >>>
>> > > > >>> Just to clarify I'm not advocating strongly against this
>> proposal,
>> > > just
>> > > > >>> laying
>> > > > >>> out some considerations we should take into account. At the end
>> of
>> > > the
>> > > > >> day
>> > > > >>> we should do what's right rather than maintain compatibility
>> with
>> > > > >> existing
>> > > > >>> bugs, but sometimes there's a reasonable middle ground.
>> > > > >>>
>> > > > >>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>> vvcep...@apache.org>
>> > > > >> wrote:
>> > > > >>>> Thanks Sophie,
>> > > > >>>>
>> > > > >>>> Woah, that’s a nasty bug. Rather than working around it, I
>> think
>> > we
>> > > > >> should
>> > > > >>>> just fix it. I’ll leave some comments on the Jira.
>> > > > >>>>
>> > > > >>>> It doesn’t seem like it should be this KIP’s concern that some
>> > > serdes
>> > > > >>>> might be incorrectly written.
>> > > > >>>>
>> > > > >>>> Were there other practical concerns that you had in mind?
>> > > > >>>>
>> > > > >>>> Thanks,
>> > > > >>>> John
>> > > > >>>>
>> > > > >>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote:
>> > > > >>>>> I like this "fancy idea" to just flip the to/from bytes but I
>> > think
>> > > > >> there
>> > > > >>>>> are some practical limitations to implementing this. In
>> > particular
>> > > > >>>>> I'm thinking about this issue
>> > > > >>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the
>> > > built-in
>> > > > >>>> signed
>> > > > >>>>> number serdes.
>> > > > >>>>>
>> > > > >>>>> This trick would actually fix the problem for
>> negative-negative
>> > > > >> queries
>> > > > >>>>> (ie where to & from are negative) but would cause undetectable
>> > > > >>>>> incorrect results for negative-positive queries. For example,
>> say
>> > > you
>> > > > >>>>> call #range with from = -1 and to = 1, using the Short serdes.
>> > The
>> > > > >>>>> serialized bytes for that are
>> > > > >>>>>
>> > > > >>>>> from = 1111111111111111
>> > > > >>>>> to = 0000000000000001
>> > > > >>>>>
>> > > > >>>>> so we would end up flipping those and iterating over all keys
>> > from
>> > > > >>>>> 0000000000000001 to 1111111111111111. Iterating in
>> > lexicographical
>> > > > >>>>> order means we would iterate over every key in the space
>> *except*
>> > > for
>> > > > >>>>> 0, but 0 is actually the *only* other key we meant to be
>> included
>> > > in
>> > > > >> the
>> > > > >>>>> range query.
>> > > > >>>>>
>> > > > >>>>> Currently we just log a warning and return an empty iterator
>> when
>> > > > >>>>> to < from, which is obviously also incorrect but feels
>> slightly
>> > > more
>> > > > >>>>> palatable. If we start automatically converting to reverse
>> > queries
>> > > we
>> > > > >>>>> can't even log a warning in this case unless we wanted to log
>> a
>> > > > >> warning
>> > > > >>>>> every time, which would be weird to do for a valid usage of a
>> new
>> > > > >>>>> feature.
>> > > > >>>>>
>> > > > >>>>> All that said, I still like the idea overall. Off the top of
>> my
>> > > head
>> > > > >> I
>> > > > >>>> guess
>> > > > >>>>> we could add a store config to enable/disable automatic
>> reverse
>> > > > >>>> iteration,
>> > > > >>>>> which is off by default?
>> > > > >>>>>
>> > > > >>>>> Thanks for the KIP! This will be a nice addition
>> > > > >>>>>
>> > > > >>>>> Sophie
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>> > vvcep...@apache.org>
>> > > > >>>> wrote:
>> > > > >>>>>> Hi there Jorge,
>> > > > >>>>>>
>> > > > >>>>>> Thanks for the KIP!
>> > > > >>>>>>
>> > > > >>>>>> I think this feature sounds very reasonable.
>> > > > >>>>>>
>> > > > >>>>>> I'm not 100% sure if this is "too fancy", but what do you
>> think
>> > > > >>>>>> about avoiding the enum by instead allowing people to flip
>> > > > >>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z"
>> > would
>> > > > >>>>>> be a forward scan, and from "Z" to "A" would be a backward
>> one?
>> > > > >>>>>>
>> > > > >>>>>> Thanks,
>> > > > >>>>>> -John
>> > > > >>>>>>
>> > > > >>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
>> > > > >>>>>>> Hi everyone,
>> > > > >>>>>>>
>> > > > >>>>>>> I would like to start the discussion for KIP-617:
>> > > > >>>>>>>
>> > > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>> > > > >>>>>>> Looking forward to your feedback.
>> > > > >>>>>>>
>> > > > >>>>>>> Thanks!
>> > > > >>>>>>> Jorge.
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> Attachments:
>> > > > >>>>>>> * 0x5F2C6E22064982DF.asc
>> > > >
>> > > >
>> > > > Attachments:
>> > > > * 0x5F2C6E22064982DF.asc
>> > >
>> >
>>
>

Reply via email to