Hi John,

Thank you! Question on the comment, where should I add the default
implementation? I guess that needs to be added in the Proposal Section of
the kIP.

Thanks!
Sagar.

On Sat, Jul 4, 2020 at 11:46 PM John Roesler <vvcep...@apache.org> wrote:

> Thanks Sagar,
>
> That looks good to me! The only minor comment I’d make is that I think the
> method declaration should have a default implementation that throws an
> UnsupportedOperationException, for source compatibility with existing state
> stores.
>
> Otherwise, as far as I’m concerned, I’m ready to vote.
>
> Thanks,
> John
>
> On Sat, Jul 4, 2020, at 12:19, Sagar wrote:
> > Hi John,
> >
> > I have updated the KIP with all the new changes we discussed in this
> > discussion thread.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-614%3A+Add+Prefix+Scan+support+for+State+Stores
> >
> > Request you to go through the same.
> >
> > Thanks!
> > Sagar.
> >
> > On Tue, Jun 30, 2020 at 8:09 AM John Roesler <vvcep...@apache.org>
> wrote:
> >
> > > Hi Sagar,
> > >
> > > That’s a good observation; yes, it should go in the
> ReadOnlyKeyValueStore
> > > interface.
> > >
> > > Thanks again for the great work,
> > > John
> > >
> > > On Sun, Jun 28, 2020, at 23:54, Sagar wrote:
> > > > Hi John,
> > > >
> > > > Thank you for the positive feedback! The meaningful discussions we
> had on
> > > > the mailing list helped me understand what needed to be done.
> > > >
> > > > I am definitely open to any further suggestions on this.
> > > >
> > > > Before I updated the KIP, I just had one question, is it fine to
> have it
> > > > for KeyValueStore or should I move it to ReadOnlyKeyValueStore where
> even
> > > > the range query resides?
> > > >
> > > > Regarding the 2 notes on UnsupportedOperationException and changing
> the
> > > > name to prefixScan, i will incorporate both of them into the KIP.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Sun, Jun 28, 2020 at 11:55 PM John Roesler <vvcep...@apache.org>
> > > wrote:
> > > >
> > > > > Woah, this is great, Sagar!
> > > > >
> > > > > I think this API looks really good. I'm curious if anyone else has
> > > > > any concern. For my part, I think this will work just fine. People
> > > > > might face tricky bugs getting their key serde and their prefix
> > > > > serde "aligned", but I think the API makes it pretty obvious what
> > > > > has to happen to make this work. As long as the API isn't going
> > > > > to "trick" anyone by trying to abstract away things that can't be
> > > > > abstracted, this is the best we can do. In other words, I think
> > > > > your approach is ideal here.
> > > > >
> > > > > I also really appreciate that you took the time to do a full POC
> > > > > with end-to-end tests to show that the proposal is actually
> > > > > going to work.
> > > > >
> > > > > A couple of notes as you update the KIP:
> > > > >
> > > > > 1. I think that for "optional" state store features like this, we
> > > > > should add a default implementation to the interface that
> > > > > throws UnsupportedOperationException. That way,
> > > > > any existing store implementations won't fail to compile
> > > > > on the new version. And any store that just can't support
> > > > > a prefix scan would simply not override the method.
> > > > >
> > > > > 2. I think you meant "prefixScan", not "prefixSeek", since
> > > > > we're actually getting an iterator that only returns prefix-
> > > > > matching keys, as opposed to just seeking to that prefix.
> > > > >
> > > > > Thanks again for the work you've put into this. I look
> > > > > forward to reviewing the updated KIP.
> > > > >
> > > > > Thanks,
> > > > > -John
> > > > >
> > > > >
> > > > > On Sun, Jun 28, 2020, at 12:17, Sagar wrote:
> > > > > > Hi John,
> > > > > >
> > > > > > I took some time out and as we discussed, looked to implement
> these
> > > > > > changes. Most of these changes are for demonstrative purposes
> but I
> > > > > thought
> > > > > > I will share.
> > > > > >
> > > > > > I added the new prefixSeek method at the KeyValueStore interface
> > > level:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/confluentinc/kafka/pull/242/files#diff-5e92747b506c868db3948323478e1b07R74-R83
> > > > > >
> > > > > > As you had pointed out, the prefix type can be different from
> the key
> > > > > type.
> > > > > > That's why this method takes 2 parameters. the key type and it's
> > > > > serializer.
> > > > > >
> > > > > > Then I added the implementation of this method in a couple of
> Stores.
> > > > > > RocksDBStore:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/confluentinc/kafka/pull/242/commits#diff-046ca566243518c88e007b7499ec9f51R308-R320
> > > > > > and
> > > > > > InMemoryKVStore:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/confluentinc/kafka/pull/242/commits#diff-4c685a32e765eab60bcb60097768104eR108-R120
> > > > > >
> > > > > > I modified the older test cases for RocksDBStore. You can find
> them
> > > here:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/confluentinc/kafka/pull/242/commits#diff-051439f56f0d6a12334d7e8cc4f66bf8R304-R415
> > > > > >
> > > > > >
> > > > > > I have added a test case where the keys are of type UUID while
> the
> > > prefix
> > > > > > is of type string. This seems to be working because the code is
> able
> > > to
> > > > > > pull in UUIDs with the provided prefix, even though their types
> are
> > > > > > different.
> > > > > >
> > > > > > To address one of the gaps from my previous implementation, I
> have
> > > also
> > > > > > added a test case for the end to end flow using the state store
> > > supplier.
> > > > > > you can find it here:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/confluentinc/kafka/pull/242/commits#diff-a94de5b2ec72d09ebac7183c31d7c906R269-R305
> > > > > >
> > > > > > Note that for this to work, i needed to update MeteredKVstore and
> > > > > > ChangeLoggingKVStore.
> > > > > >
> > > > > > Lastly, barring the 4 stores mentioned above, rest of the
> > > implementers of
> > > > > > KVStore have the prefixSeek override as null. As I said, this is
> > > mainly
> > > > > for
> > > > > > demonstrative purposes and hence done this way.
> > > > > > If you get the chance, it would be great if you can provide some
> > > feedback
> > > > > > on this.
> > > > > >
> > > > > > Thanks!
> > > > > > Sagar.
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 10, 2020 at 9:21 AM Sagar <sagarmeansoc...@gmail.com
> >
> > > wrote:
> > > > > >
> > > > > > > Hi John,
> > > > > > >
> > > > > > > You rightly pointed out, the devil is in the detail :). I will
> > > start
> > > > > with
> > > > > > > the implementation to get a sense.
> > > > > > >
> > > > > > > Here are my thoughts on the core challenge that you pointed
> out.
> > > The
> > > > > key
> > > > > > > value store abstractions that have been exposed via the state
> > > store DSL
> > > > > > > APIs, make it possible for the end user to define generic key
> > > types.
> > > > > > > However, the Serdes are the one which convert those generic
> > > keys/values
> > > > > > > into the format in which the state store stores them- which
> for all
> > > > > > > practical purposes are byte-arrays. I think with the prefix
> type
> > > > > serde, if
> > > > > > > it converts the prefix to the same internal storage type (byte
> > > array)
> > > > > as
> > > > > > > that of the Keys, then we should be able to do a prefix scan.
> > > > > > >
> > > > > > > Regarding other databases, I have worked a bit with Redis which
> > > also
> > > > > > > provides a scan operator using the glob style pattern
> match(it's
> > > more
> > > > > > > evolved than prefix scan but can be converted easily):
> > > > > > >
> > > > > > > https://redis.io/commands/scan#the-match-option
> > > > > > >
> > > > > > > Typically Redis works with Binary Safe strings so the prefix
> key
> > > type
> > > > > and
> > > > > > > the actual keys are of the same type.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Sagar.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 10, 2020 at 1:41 AM John Roesler <
> vvcep...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi Sagar,
> > > > > > >>
> > > > > > >> Thanks for the reply. I agree that your UUID example
> illustrates
> > > the
> > > > > > >> problem I was pointing out.
> > > > > > >>
> > > > > > >> Yes, I think that experimenting with the API in the PR is
> > > probably the
> > > > > > >> best way to make progress (as opposed to just thinking in
> terms of
> > > > > > >> design on the wiki) because with this kind of thing, the
> devil is
> > > > > often
> > > > > > >> in the details.
> > > > > > >>
> > > > > > >> To clarify what I meant by that last statement, I see the core
> > > > > challenge
> > > > > > >> here as deriving from the fact that we have a key/value store
> with
> > > > > > >> generically typed keys, with a separate component (the serde)
> that
> > > > > > >> turns those typed keys into bytes for storage. In contrast,
> > > RocksDB
> > > > > > >> can easily offer a "prefix scan" operation because they key
> type
> > > is
> > > > > > >> always just a byte array, so "prefix" is a very natural
> concept to
> > > > > offer
> > > > > > >> in the API. Other key/value stores force the keys to always be
> > > > > strings,
> > > > > > >> which also makes it easy to define a prefix scan.
> > > > > > >>
> > > > > > >> My question is whether there are other databases that offer
> both:
> > > > > > >> 1. generically typed keys (as opposed to just bytes, just
> strings,
> > > > > etc)
> > > > > > >> 2. prefix scans
> > > > > > >> And, if so, what the API looks like.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> -John
> > > > > > >>
> > > > > > >> On Tue, Jun 9, 2020, at 11:51, Sagar wrote:
> > > > > > >> > Hi John,
> > > > > > >> >
> > > > > > >> > Thanks for the response. For starters, for our use case, we
> > > tweaked
> > > > > our
> > > > > > >> > keys etc to avoid prefix scans. So, we are good there.
> > > > > > >> >
> > > > > > >> > Regarding the KIP, I see what you mean when you say that the
> > > same
> > > > > key
> > > > > > >> type
> > > > > > >> > for prefix won't work. For example, continuing with the UUID
> > > example
> > > > > > >> that
> > > > > > >> > you gave, let's say one of the UUIDs
> > > > > > >> > is 123e4567-e89b-12d3-a456-426614174000, and with a prefix
> scan
> > > we
> > > > > want
> > > > > > >> to
> > > > > > >> > fetch all keys starting with 123. There's already a
> UUIDSerde
> > > so if
> > > > > the
> > > > > > >> > keys have been stored with that, then using UUIDSerde for
> > > prefixes
> > > > > won't
> > > > > > >> > help- I am not sure if the UUIDSerializer would even work
> for
> > > 123.
> > > > > > >> >
> > > > > > >> > So, that indicates that we will need to provide a new
> prefix key
> > > > > type
> > > > > > >> > serializer. Having said that, how it will be stitched
> together
> > > and
> > > > > > >> finally
> > > > > > >> > exposed using the APIs is something that is up for
> questioning.
> > > > > This is
> > > > > > >> > something you have also brought up in the earlier emails.
> If it
> > > > > > >> > makes sense, I can modify my PR to go along these lines.
> Please
> > > let
> > > > > me
> > > > > > >> know
> > > > > > >> > what you think.
> > > > > > >> >
> > > > > > >> > Lastly, I didn't understand this line of yours: *It might
> help
> > > if
> > > > > there
> > > > > > >> are
> > > > > > >> > other typed key/value stores to compare APIs with.*
> > > > > > >> >
> > > > > > >> > Thanks!
> > > > > > >> > Sagar.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Thu, Jun 4, 2020 at 6:03 AM John Roesler <
> > > vvcep...@apache.org>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Hi Sagar,
> > > > > > >> > >
> > > > > > >> > > Thanks for the question, and sorry for muddying the water.
> > > > > > >> > >
> > > > > > >> > > I meant the Bytes/byte[] thing as advice for how you all
> can
> > > solve
> > > > > > >> your
> > > > > > >> > > problem in the mean time, while we work through this KIP.
> I
> > > don’t
> > > > > > >> think
> > > > > > >> > > it’s relevant for the KIP itself.
> > > > > > >> > >
> > > > > > >> > > I think the big issue here is what the type of the prefix
> > > should
> > > > > be
> > > > > > >> in the
> > > > > > >> > > method signature. Using the same type as the key makes
> sense
> > > some
> > > > > > >> times,
> > > > > > >> > > but not other times. I’m not sure what the best way around
> > > this
> > > > > might
> > > > > > >> be.
> > > > > > >> > > It might help if there are other typed key/value stores to
> > > compare
> > > > > > >> APIs
> > > > > > >> > > with.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > John
> > > > > > >> > >
> > > > > > >> > > On Mon, Jun 1, 2020, at 09:58, Sagar wrote:
> > > > > > >> > > > Hi John,
> > > > > > >> > > >
> > > > > > >> > > > Just to add to my previous email(and sorry for the
> spam),
> > > if we
> > > > > > >> consider
> > > > > > >> > > > using Bytes/byte[] and manually invoke the serdes, if
> you
> > > could
> > > > > > >> provide
> > > > > > >> > > > examples where the same Serde for keys won't work for
> the
> > > prefix
> > > > > > >> types.
> > > > > > >> > > As
> > > > > > >> > > > far as my understanding goes, the prefix seek would
> depend
> > > upon
> > > > > > >> ordering
> > > > > > >> > > of
> > > > > > >> > > > the keys like lexicographic. As long as the binary
> format is
> > > > > > >> consistent
> > > > > > >> > > for
> > > > > > >> > > > both the keys and the prefixes would it not ensure the
> > > ability
> > > > > to
> > > > > > >> search
> > > > > > >> > > in
> > > > > > >> > > > that same ordering space? This is from my limited
> > > understanding
> > > > > so
> > > > > > >> any
> > > > > > >> > > > concrete examples would be helpful...
> > > > > > >> > > >
> > > > > > >> > > > Also, you mentioned about the creation of dummy values
> to
> > > > > indicate
> > > > > > >> prefix
> > > > > > >> > > > values, do you mean this line:
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >>
> > > > >
> > >
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/ForeignJoinSubscriptionProcessorSupplier.java#L91
> > > > > > >> > > > This
> > > > > > >> > > > is where the prefix key is built and used for searching
> .
> > > > > > >> > > >
> > > > > > >> > > > Thanks!
> > > > > > >> > > > Sagar.
> > > > > > >> > > >
> > > > > > >> > > > On Mon, Jun 1, 2020 at 11:42 AM Sagar <
> > > > > sagarmeansoc...@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi John,
> > > > > > >> > > > >
> > > > > > >> > > > > Thank you. I think it makes sense to modify the KIP to
> > > add the
> > > > > > >> > > > > prefixScan() as part of the existing interfaces and
> add
> > > the
> > > > > new
> > > > > > >> mixin
> > > > > > >> > > > > behaviour as Rejected alternatives. I am not very
> aware of
> > > > > other
> > > > > > >> stores
> > > > > > >> > > > > apart from keyValueStore so is it fine if I keep it
> there
> > > for
> > > > > now?
> > > > > > >> > > > >
> > > > > > >> > > > > Regarding the type definition of types I will try and
> > > think
> > > > > about
> > > > > > >> some
> > > > > > >> > > > > alternatives and share if I get any.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks!
> > > > > > >> > > > > Sagar.
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Sun, May 31, 2020 at 1:55 AM John Roesler <
> > > > > vvcep...@apache.org
> > > > > > >> >
> > > > > > >> > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > >> Hi Sagar,
> > > > > > >> > > > >>
> > > > > > >> > > > >> Thanks for the response. Your use case makes sense to
> > > me; I
> > > > > > >> figured it
> > > > > > >> > > > >> must be something like that.
> > > > > > >> > > > >>
> > > > > > >> > > > >> On a pragmatic level, in the near term, you might
> > > consider
> > > > > > >> basically
> > > > > > >> > > > >> doing the same thing we did in KIP-213. If you swap
> out
> > > the
> > > > > store
> > > > > > >> > > types for
> > > > > > >> > > > >> Byte/byte[] and “manually” invoke the serdes in your
> own
> > > > > logic,
> > > > > > >> then
> > > > > > >> > > you
> > > > > > >> > > > >> can use the same algorithm we did to derive the range
> > > scan
> > > > > > >> boundaries
> > > > > > >> > > from
> > > > > > >> > > > >> your desired prefix.
> > > > > > >> > > > >>
> > > > > > >> > > > >> For the actual KIP, it seems like we would need
> > > significant
> > > > > > >> design
> > > > > > >> > > > >> improvements to be able to do any mixins, so I think
> we
> > > > > should
> > > > > > >> favor
> > > > > > >> > > > >> proposing either to just add to the existing
> interfaces
> > > or to
> > > > > > >> create
> > > > > > >> > > brand
> > > > > > >> > > > >> new interfaces, as appropriate, for now. Given that
> > > prefix
> > > > > can be
> > > > > > >> > > converted
> > > > > > >> > > > >> to a range query at a low level, I think we can
> probably
> > > > > explore
> > > > > > >> > > adding
> > > > > > >> > > > >> prefix to the existing interfaces with a default
> > > > > implementation.
> > > > > > >> > > > >>
> > > > > > >> > > > >> It seems like that just leaves the question of how to
> > > define
> > > > > the
> > > > > > >> type
> > > > > > >> > > of
> > > > > > >> > > > >> the prefix. To be honest, I don’t have any great
> ideas
> > > here.
> > > > > Are
> > > > > > >> you
> > > > > > >> > > able
> > > > > > >> > > > >> to generate some creative solutions, Sagar?
> > > > > > >> > > > >>
> > > > > > >> > > > >> Thanks,
> > > > > > >> > > > >> John
> > > > > > >> > > > >>
> > > > > > >> > > > >> On Tue, May 26, 2020, at 06:42, Sagar wrote:
> > > > > > >> > > > >> > Hi John,
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Thanks for the detailed reply. I was a bit crammed
> with
> > > > > work
> > > > > > >> last
> > > > > > >> > > week
> > > > > > >> > > > >> so
> > > > > > >> > > > >> > couldn't respond earlier so apologies for that.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > First of all, thanks for the context that both you
> and
> > > Adam
> > > > > > >> have
> > > > > > >> > > > >> > provided me on the issues faced previously. As I
> can
> > > > > clearly
> > > > > > >> see,
> > > > > > >> > > while
> > > > > > >> > > > >> I
> > > > > > >> > > > >> > was able to cut some corners while writing some
> test
> > > cases
> > > > > or
> > > > > > >> > > > >> benchmarks,
> > > > > > >> > > > >> > to be able to stitch together a store with prefix
> scan
> > > > > into an
> > > > > > >> > > actual
> > > > > > >> > > > >> > topology needs more work. I am sorry for the half
> baked
> > > > > tests
> > > > > > >> that I
> > > > > > >> > > > >> wrote
> > > > > > >> > > > >> > without realising and you have rightly put it when
> you
> > > said
> > > > > > >> these
> > > > > > >> > > > >> > challenges aren't obvious up front.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Now, coming back to the other points, I spent some
> time
> > > > > going
> > > > > > >> > > through
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > KIP-213 and also some of the code snippets that are
> > > talked
> > > > > > >> about in
> > > > > > >> > > that
> > > > > > >> > > > >> > KIP. With the detailed explanation that you
> provided,
> > > it
> > > > > is now
> > > > > > >> > > obvious
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > me that keeping a generic type for keys like K
> won't
> > > work
> > > > > oob
> > > > > > >> and
> > > > > > >> > > hence
> > > > > > >> > > > >> a
> > > > > > >> > > > >> > decision was made to use Bytes as the key type.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > I just had another thought on this though. I
> looked at
> > > the
> > > > > > >> range
> > > > > > >> > > > >> function
> > > > > > >> > > > >> > that was added in the ReadOnlyKeyValueStore. While
> the
> > > Key
> > > > > and
> > > > > > >> the
> > > > > > >> > > Value
> > > > > > >> > > > >> > mentioned in that method is generic, internally
> almost
> > > all
> > > > > > >> queries
> > > > > > >> > > end
> > > > > > >> > > > >> up
> > > > > > >> > > > >> > querying using Bytes in some or the other form. I
> > > looked at
> > > > > > >> not just
> > > > > > >> > > > >> > RocksDb Store but other stores like InMemory store
> or
> > > > > > >> MemoryLRU and
> > > > > > >> > > this
> > > > > > >> > > > >> > seems to be the pattern. I think this stems from
> the
> > > fact
> > > > > that
> > > > > > >> these
> > > > > > >> > > > >> stores
> > > > > > >> > > > >> > while implementing KeyValueStore pass Bytes,
> byte[] as
> > > the
> > > > > K
> > > > > > >> and V
> > > > > > >> > > > >> values.
> > > > > > >> > > > >> > Classes like MeteredKeyValueStore which don't do
> this,
> > > > > still
> > > > > > >> use
> > > > > > >> > > > >> Bytes.wrap
> > > > > > >> > > > >> > to wrap the passed keys and values and invoke the
> range
> > > > > method.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > So, the point I am trying to make is, with the same
> > > > > behaviour
> > > > > > >> - and
> > > > > > >> > > > >> > ignoring for a moment that it's a separate
> interface
> > > which
> > > > > I am
> > > > > > >> > > trying
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > "mix-in"- the issues with the key types could be
> > > resolved.
> > > > > I
> > > > > > >> may be
> > > > > > >> > > > >> wrong
> > > > > > >> > > > >> > though so would like to know your thoughts on this.
> > > Infact
> > > > > > >> > > unknowingly
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > interface implementation of PrefixSeekableType in
> > > > > > >> RockDBStateStore
> > > > > > >> > > also
> > > > > > >> > > > >> > passes Bytes and bytes[] as K and V.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > The second part of exposing it via the publically
> > > > > accessible
> > > > > > >> > > interfaces
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > which we downcast while building the topology (like
> > > > > > >> KeyValueStore),
> > > > > > >> > > I
> > > > > > >> > > > >> can
> > > > > > >> > > > >> > clearly see now that mixing-in the way I tried to
> won't
> > > > > work.
> > > > > > >> My
> > > > > > >> > > > >> intention
> > > > > > >> > > > >> > all along was not to hamper the flow of those
> stores
> > > which
> > > > > > >> don't
> > > > > > >> > > support
> > > > > > >> > > > >> > prefix scan as yet and hence the separate
> interface.
> > > But, I
> > > > > > >> agree
> > > > > > >> > > that
> > > > > > >> > > > >> for
> > > > > > >> > > > >> > this to work, it needs to be part of some
> pre-defined
> > > store
> > > > > > >> types
> > > > > > >> > > like
> > > > > > >> > > > >> > KVStore etc. Right now, I don't have an answer to
> this
> > > but
> > > > > > >> mostly it
> > > > > > >> > > > >> would
> > > > > > >> > > > >> > have to be moved there and implemented across all
> > > > > stores(if we
> > > > > > >> see
> > > > > > >> > > the
> > > > > > >> > > > >> > worth in prefix scans :) )
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Regarding the motivation, I am sorry if I wasn't
> clear.
> > > > > This
> > > > > > >> > > originated
> > > > > > >> > > > >> > from one of my own use cases with kafka streams
> where i
> > > > > needed
> > > > > > >> to
> > > > > > >> > > find
> > > > > > >> > > > >> some
> > > > > > >> > > > >> > keys based upon certain prefix. Infact it's
> similar to
> > > the
> > > > > > >> > > > >> > RangeScanCombinedKeyUsage diagram in KIP-213 where
> the
> > > > > > >> otherTable
> > > > > > >> > > tries
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > find entries in the state store based upon the FK.
> I
> > > was
> > > > > using
> > > > > > >> > > > >> > KevValueStore to be precise. I also remember
> having a
> > > slack
> > > > > > >> > > > >> conversation on
> > > > > > >> > > > >> > this, and I was told that this isn't supported
> right
> > > now,
> > > > > but
> > > > > > >> some
> > > > > > >> > > other
> > > > > > >> > > > >> > users shared their experiences on how with some
> hacks
> > > they
> > > > > are
> > > > > > >> able
> > > > > > >> > > to
> > > > > > >> > > > >> > perform prefix scans even though their use case
> fits
> > > the
> > > > > bill
> > > > > > >> for a
> > > > > > >> > > > >> prefix
> > > > > > >> > > > >> > scan. That kind of motivated me to take a stab at
> it.
> > > > > > >> > > Unfortunately, I
> > > > > > >> > > > >> have
> > > > > > >> > > > >> > lost the slack chat because of some cleanup at the
> > > slack
> > > > > > >> channel
> > > > > > >> > > level.
> > > > > > >> > > > >> I
> > > > > > >> > > > >> > will try and update the ambiguous motivation
> statement
> > > in
> > > > > the
> > > > > > >> near
> > > > > > >> > > > >> future.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Lastly, I would like to point out, that your
> response
> > > was
> > > > > not
> > > > > > >> at all
> > > > > > >> > > > >> > discouraging. On the contrary it was really
> insightful
> > > and
> > > > > it's
> > > > > > >> > > always
> > > > > > >> > > > >> good
> > > > > > >> > > > >> > to learn/discover new things :)
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Thanks!
> > > > > > >> > > > >> > Sagar.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > On Fri, May 15, 2020 at 7:37 AM John Roesler <
> > > > > > >> vvcep...@apache.org>
> > > > > > >> > > > >> wrote:
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > > Hi, Sagar!
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Thanks for this KIP. I'm sorry it took me so
> long to
> > > > > reply.
> > > > > > >> I'll
> > > > > > >> > > > >> number my
> > > > > > >> > > > >> > > points differently to avoid confusion.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > I can provide some additional context on the
> > > > > difficulties we
> > > > > > >> > > > >> previously
> > > > > > >> > > > >> > > faced in KIP-213 (which you and Adam have already
> > > > > discussed).
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > J1) In your KIP, you propose the following
> interface:
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > public interface PrefixSeekableStore<K, V> {
> > > > > > >> > > > >> > >     KeyValueIterator<K, V> prefixSeek(K prefix);
> > > > > > >> > > > >> > > }
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > This is roughly the same thing that Adam and I
> were
> > > > > > >> considering
> > > > > > >> > > > >> > > before. It has a hidden problem, that it assumes
> that
> > > > > > >> prefixes of
> > > > > > >> > > > >> > > keys in the key space are also in the key space.
> In
> > > other
> > > > > > >> words,
> > > > > > >> > > this
> > > > > > >> > > > >> > > is a store with key type K, and the API assumes
> that
> > > > > > >> prefixes are
> > > > > > >> > > also
> > > > > > >> > > > >> > > of type K. This is true for some key types, like
> > > String
> > > > > or
> > > > > > >> Bytes,
> > > > > > >> > > but
> > > > > > >> > > > >> not
> > > > > > >> > > > >> > > for others.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > For example, if the keys are UUIDs, then no
> prefix is
> > > > > also a
> > > > > > >> > > UUID. If
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > > key is a complex data type, like Windowed<K> in
> our
> > > own
> > > > > DSL,
> > > > > > >> then
> > > > > > >> > > > >> > > we would absolutely want to query all keys with
> the
> > > same
> > > > > > >> record
> > > > > > >> > > key
> > > > > > >> > > > >> > > (the K part), or the same window start time, but
> in
> > > > > neither
> > > > > > >> case
> > > > > > >> > > is
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > > prefix actually a Windowed<K>.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > You can skirt the issue by defining a third type
> > > > > parameter,
> > > > > > >> maybe
> > > > > > >> > > KP,
> > > > > > >> > > > >> that
> > > > > > >> > > > >> > > is the "prefix" type, but this would also be
> awkward
> > > for
> > > > > many
> > > > > > >> > > usages.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > J2) There is a related problem with
> serialization.
> > > > > Whether
> > > > > > >> > > something
> > > > > > >> > > > >> > > is a prefix or not depends not on the Java key
> (K),
> > > but
> > > > > on
> > > > > > >> the
> > > > > > >> > > binary
> > > > > > >> > > > >> > > format that is produced when you use a serde on
> the
> > > key.
> > > > > > >> Whether
> > > > > > >> > > > >> > > we say that the prefix must also be a K or
> whether it
> > > > > gets
> > > > > > >> its own
> > > > > > >> > > > >> type,
> > > > > > >> > > > >> > > KP, there are problems.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > In the latter case, we must additionally require
> a
> > > second
> > > > > > >> set of
> > > > > > >> > > > >> serdes
> > > > > > >> > > > >> > > for the prefixes, but there's no obvious way to
> > > > > incorporate
> > > > > > >> this
> > > > > > >> > > in
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > > API, especially not in the DSL.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > In either case, for the API to actually work, we
> > > need to
> > > > > know
> > > > > > >> > > ahead
> > > > > > >> > > > >> > > of time that the Serde will produce a binary key
> that
> > > > > starts
> > > > > > >> with
> > > > > > >> > > the
> > > > > > >> > > > >> > > part that we wish to use as a prefix. For
> example,
> > > what
> > > > > we
> > > > > > >> were
> > > > > > >> > > doing
> > > > > > >> > > > >> > > briefly in KIP-213 (where we had complex keys,
> > > similar to
> > > > > > >> > > Windowed<K>)
> > > > > > >> > > > >> > > was to define "dummy" values that indicate that a
> > > > > > >> Windowed<K> is
> > > > > > >> > > > >> actually
> > > > > > >> > > > >> > > just a prefix key, not a real key. Maybe the
> window
> > > start
> > > > > > >> time
> > > > > > >> > > would
> > > > > > >> > > > >> be
> > > > > > >> > > > >> > > null or the key part would be null. But we also
> had
> > > to
> > > > > > >> define a
> > > > > > >> > > serde
> > > > > > >> > > > >> > > that would very specifically anticipate which
> > > component
> > > > > of
> > > > > > >> the
> > > > > > >> > > complex
> > > > > > >> > > > >> > > key would need to be used in a prefix key.
> Having to
> > > > > bring
> > > > > > >> all
> > > > > > >> > > these
> > > > > > >> > > > >> > > parts together in a reliable, easy-to-debug,
> fashion
> > > > > gives
> > > > > > >> me some
> > > > > > >> > > > >> doubt
> > > > > > >> > > > >> > > that people would actually be able to use this
> > > feature in
> > > > > > >> > > complicated
> > > > > > >> > > > >> > > programs without driving themselves crazy.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > J3) Thanks so much for including benchmarks and
> > > tests!
> > > > > > >> > > Unfortunately,
> > > > > > >> > > > >> > > these don't include everything you need to really
> > > plug
> > > > > into
> > > > > > >> the
> > > > > > >> > > > >> Streams
> > > > > > >> > > > >> > > API. I think when you push it a little farther,
> > > you'll
> > > > > > >> realize
> > > > > > >> > > what
> > > > > > >> > > > >> Adam
> > > > > > >> > > > >> > > was talking about wrt the interface difficulties.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > In your benchmark and tests, you directly
> construct
> > > the
> > > > > > >> store and
> > > > > > >> > > then
> > > > > > >> > > > >> > > use it, but in a real Streams application, you
> can
> > > only
> > > > > > >> provide
> > > > > > >> > > your
> > > > > > >> > > > >> > > implementation in a StoreSupplier, for example
> via
> > > the
> > > > > > >> > > Materialized
> > > > > > >> > > > >> > > parameter. Then, to use the store from inside a
> > > > > Processor,
> > > > > > >> you'd
> > > > > > >> > > have
> > > > > > >> > > > >> > > to get it by name from the ProcessorContext, and
> then
> > > > > cast
> > > > > > >> it to
> > > > > > >> > > one
> > > > > > >> > > > >> of
> > > > > > >> > > > >> > > the pre-defined store types, KeyValueStore,
> > > > > WindowedStore, or
> > > > > > >> > > > >> > > SessionStore. It won't work to "mix in" your
> > > interface
> > > > > > >> because the
> > > > > > >> > > > >> > > processor gets a store that's wrapped in layers
> that
> > > > > handle
> > > > > > >> > > > >> serialization,
> > > > > > >> > > > >> > > change-logging, recording metrics, and caching.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > To use the store through IQ, you have to provide
> a
> > > > > > >> > > QueriableStoreType
> > > > > > >> > > > >> > > to KafkaStreams#store, and you get back a
> similarly
> > > > > wrapped
> > > > > > >> store.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > I think our only choices to add an interface like
> > > yours
> > > > > is
> > > > > > >> either
> > > > > > >> > > to
> > > > > > >> > > > >> add
> > > > > > >> > > > >> > > it to one of the existing store types, like
> > > > > KeyValueStore or
> > > > > > >> > > > >> > > WindowedStore, or to define a completely new
> store
> > > > > hierarchy,
> > > > > > >> > > meaning
> > > > > > >> > > > >> > > you have to duplicate all the "wrapper" layers in
> > > > > Streams.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > I think if you write an "end-to-end" test, where
> you
> > > > > write a
> > > > > > >> > > Streams
> > > > > > >> > > > >> app,
> > > > > > >> > > > >> > > provide your store, and then use it in a
> Processor
> > > and
> > > > > > >> through IQ,
> > > > > > >> > > > >> > > you'll see what I'm talking about.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > IIRC, those three points were the ones that
> > > ultimately
> > > > > led
> > > > > > >> us to
> > > > > > >> > > > >> abandon
> > > > > > >> > > > >> > > the whole idea last time and just register the
> stores
> > > > > with
> > > > > > >> key
> > > > > > >> > > type
> > > > > > >> > > > >> Bytes.
> > > > > > >> > > > >> > > I think some creative solutions may yet be
> possible,
> > > but
> > > > > > >> it'll
> > > > > > >> > > take
> > > > > > >> > > > >> some
> > > > > > >> > > > >> > > more design work to get there.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Can I ask what your motivation is, exactly, for
> > > proposing
> > > > > > >> this
> > > > > > >> > > > >> feature?
> > > > > > >> > > > >> > > The motivation just says "some users may want to
> do
> > > it",
> > > > > > >> which has
> > > > > > >> > > > >> > > the advantage that it's impossible to disagree
> with,
> > > but
> > > > > > >> doesn't
> > > > > > >> > > > >> provide
> > > > > > >> > > > >> > > a lot of concrete detail ;)
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Specifically, what I'm wondering is whether you
> > > wanted
> > > > > to use
> > > > > > >> > > this as
> > > > > > >> > > > >> > > part of a KayValue store, which might be a
> > > challenge, or
> > > > > > >> whether
> > > > > > >> > > you
> > > > > > >> > > > >> > > wanted to use it for more efficient scans in a
> > > > > > >> WindowedStore, like
> > > > > > >> > > > >> > > Guozhang.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Thanks again for the KIP! I hope my response
> isn't
> > > too
> > > > > > >> > > discouraging;
> > > > > > >> > > > >> > > I just wanted to convey the challenges we faced
> last
> > > > > time,
> > > > > > >> since
> > > > > > >> > > they
> > > > > > >> > > > >> > > are all not obvious up front.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Best regards,
> > > > > > >> > > > >> > > -John
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > On Thu, May 14, 2020, at 16:17, Sophie
> Blee-Goldman
> > > > > wrote:
> > > > > > >> > > > >> > > > Whoops, I guess I didn't finish reading the KIP
> > > all the
> > > > > > >> way to
> > > > > > >> > > the
> > > > > > >> > > > >> end
> > > > > > >> > > > >> > > > earlier. Thanks
> > > > > > >> > > > >> > > > for including the link to the RocksDB PR in the
> > > KIP!
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > I have one additional question about the
> proposal:
> > > do
> > > > > you
> > > > > > >> plan
> > > > > > >> > > to
> > > > > > >> > > > >> also
> > > > > > >> > > > >> > > add
> > > > > > >> > > > >> > > > this
> > > > > > >> > > > >> > > > prefix seek API to the dual column family
> > > iterators?
> > > > > These
> > > > > > >> are
> > > > > > >> > > used
> > > > > > >> > > > >> by
> > > > > > >> > > > >> > > > RocksDBTimestampedStore (which extends
> > > RocksDBStore),
> > > > > for
> > > > > > >> > > example
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > > > *RocksDBDualCFRangeIterator*
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > Thanks for the KIP!
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > On Thu, May 14, 2020 at 10:50 AM Sagar <
> > > > > > >> > > sagarmeansoc...@gmail.com>
> > > > > > >> > > > >> > > wrote:
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > > Hey @Adam,
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > Thanks for sharing your experience with using
> > > prefix
> > > > > > >> seek. I
> > > > > > >> > > did
> > > > > > >> > > > >> look
> > > > > > >> > > > >> > > at
> > > > > > >> > > > >> > > > > your code for RocksDBPrefixIterator, infact I
> > > have
> > > > > > >> repurposed
> > > > > > >> > > that
> > > > > > >> > > > >> > > class
> > > > > > >> > > > >> > > > > itself since it wasn't being used else where.
> > > > > Regarding
> > > > > > >> how I
> > > > > > >> > > > >> plan to
> > > > > > >> > > > >> > > > > expose them through-out the state stores,
> what I
> > > have
> > > > > > >> tried
> > > > > > >> > > to do
> > > > > > >> > > > >> is
> > > > > > >> > > > >> > > add it
> > > > > > >> > > > >> > > > > as a separate interface. So, basically, it is
> > > not at
> > > > > the
> > > > > > >> same
> > > > > > >> > > > >> level as
> > > > > > >> > > > >> > > the
> > > > > > >> > > > >> > > > > *range function so to speak. The reason I did
> > > that is
> > > > > > >> > > currently I
> > > > > > >> > > > >> feel
> > > > > > >> > > > >> > > not
> > > > > > >> > > > >> > > > > all state stores are a natural fit for prefix
> > > seek.
> > > > > As I
> > > > > > >> > > > >> mentioned in
> > > > > > >> > > > >> > > the
> > > > > > >> > > > >> > > > > KIP as well, the current equivalent to it
> could
> > > be
> > > > > > >> > > > >> > > BulkLoadingStore(not in
> > > > > > >> > > > >> > > > > terms of functionality but in terms of how
> it is
> > > > > also not
> > > > > > >> > > > >> implemented
> > > > > > >> > > > >> > > by
> > > > > > >> > > > >> > > > > all of them). So, that ways I am not needing
> to
> > > stub
> > > > > them
> > > > > > >> > > across
> > > > > > >> > > > >> all
> > > > > > >> > > > >> > > the
> > > > > > >> > > > >> > > > > state-stores and we can implement it only
> where
> > > > > needed.
> > > > > > >> For
> > > > > > >> > > > >> example,
> > > > > > >> > > > >> > > in the
> > > > > > >> > > > >> > > > > PR that I have put for reference in the KIP,
> you
> > > can
> > > > > see
> > > > > > >> that
> > > > > > >> > > I
> > > > > > >> > > > >> have it
> > > > > > >> > > > >> > > > > implemented only for RocksDB.
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > @Guozhang,
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > Thanks for the feedback. Those are very
> > > interesting
> > > > > > >> questions
> > > > > > >> > > and
> > > > > > >> > > > >> I
> > > > > > >> > > > >> > > will
> > > > > > >> > > > >> > > > > try my best to answer based upon whatever
> limited
> > > > > > >> > > understanding I
> > > > > > >> > > > >> have
> > > > > > >> > > > >> > > > > developed so far :)
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > 1) Regarding the usage of
> > > > > useFixedLengthPrefixExtractor,
> > > > > > >> > > > >> honestly, I
> > > > > > >> > > > >> > > hadn't
> > > > > > >> > > > >> > > > > looked at that config. I did look it up
> after you
> > > > > > >> pointed it
> > > > > > >> > > out
> > > > > > >> > > > >> and
> > > > > > >> > > > >> > > seems
> > > > > > >> > > > >> > > > > it's more for hash-based memtables? I may be
> > > wrong
> > > > > > >> though. But
> > > > > > >> > > > >> what I
> > > > > > >> > > > >> > > would
> > > > > > >> > > > >> > > > > say is that, the changes I had made were not
> > > exactly
> > > > > > >> from a
> > > > > > >> > > > >> correctness
> > > > > > >> > > > >> > > > > stand point but more from trying to showcase
> how
> > > we
> > > > > can
> > > > > > >> > > implement
> > > > > > >> > > > >> these
> > > > > > >> > > > >> > > > > changes. The idea was that once we see the
> merit
> > > in
> > > > > this
> > > > > > >> > > approach
> > > > > > >> > > > >> then
> > > > > > >> > > > >> > > we
> > > > > > >> > > > >> > > > > can add some of the tunings( and I would need
> > > your
> > > > > team's
> > > > > > >> > > > >> assistance
> > > > > > >> > > > >> > > there
> > > > > > >> > > > >> > > > > :D).
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > 2) Regarding the similarity of
> > > > > `RocksDBPrefixIterator`
> > > > > > >> and
> > > > > > >> > > > >> > > > > `RocksDBRangeIterator`, yes the
> implementations
> > > look
> > > > > > >> more or
> > > > > > >> > > less
> > > > > > >> > > > >> > > similar.
> > > > > > >> > > > >> > > > > So, in terms of performance, they might be
> > > similar.
> > > > > But
> > > > > > >> > > > >> semantically,
> > > > > > >> > > > >> > > they
> > > > > > >> > > > >> > > > > can solve 2 different use-cases. The range
> seek
> > > is
> > > > > useful
> > > > > > >> > > when we
> > > > > > >> > > > >> know
> > > > > > >> > > > >> > > both
> > > > > > >> > > > >> > > > > from and to. But if we consider use-cases
> where
> > > we
> > > > > want
> > > > > > >> to
> > > > > > >> > > find
> > > > > > >> > > > >> keys
> > > > > > >> > > > >> > > with a
> > > > > > >> > > > >> > > > > certain prefix, but we don't know if what
> it's
> > > start
> > > > > and
> > > > > > >> end
> > > > > > >> > > is,
> > > > > > >> > > > >> then
> > > > > > >> > > > >> > > > > prefix seek would come in more handy. The
> point
> > > that
> > > > > I am
> > > > > > >> > > trying
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > > make is
> > > > > > >> > > > >> > > > > that it can extend the scope of state stores
> from
> > > > > just
> > > > > > >> point
> > > > > > >> > > > >> lookups to
> > > > > > >> > > > >> > > > > somewhat being able to speculative queries
> where
> > > by
> > > > > > >> users can
> > > > > > >> > > > >> search
> > > > > > >> > > > >> > > if a
> > > > > > >> > > > >> > > > > certain pattern exists. I can vouch for this
> > > > > personally
> > > > > > >> > > because I
> > > > > > >> > > > >> > > wanted to
> > > > > > >> > > > >> > > > > use state stores for one such use case and
> since
> > > this
> > > > > > >> option
> > > > > > >> > > > >> wasn't
> > > > > > >> > > > >> > > there,
> > > > > > >> > > > >> > > > > I had to do some other things. An equivalent
> to
> > > this
> > > > > > >> could be
> > > > > > >> > > SCAN
> > > > > > >> > > > >> > > operator
> > > > > > >> > > > >> > > > > in Redis. (Not trying to compare the Redis
> and
> > > state
> > > > > > >> stores
> > > > > > >> > > but
> > > > > > >> > > > >> trying
> > > > > > >> > > > >> > > to
> > > > > > >> > > > >> > > > > give some context).
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > Regarding the point on bloom filter, I think
> > > there
> > > > > are
> > > > > > >> certain
> > > > > > >> > > > >> > > > > optimisations that are being talked about in
> > > case of
> > > > > > >> prefix
> > > > > > >> > > seek
> > > > > > >> > > > >> here:
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >>
> > > > > > >> > >
> > > > > > >>
> > > > >
> > >
> https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#prefix-vs-whole-key
> > > > > > >> > > > >> > > > > Again
> > > > > > >> > > > >> > > > > this isn't something that I have explored
> fully.
> > > > > Also,
> > > > > > >> on the
> > > > > > >> > > > >> prefix
> > > > > > >> > > > >> > > seek
> > > > > > >> > > > >> > > > > page on RocksDB they mention that there's a
> > > prefix
> > > > > > >> iterating
> > > > > > >> > > > >> technique
> > > > > > >> > > > >> > > > > called Prefix Bloom Filter.
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > 3) Regarding the question on length of bytes
> for
> > > > > seek v/s
> > > > > > >> > > prefix
> > > > > > >> > > > >> seek,
> > > > > > >> > > > >> > > I am
> > > > > > >> > > > >> > > > > not entirely sure about that scenario. What I
> > > have
> > > > > > >> understood
> > > > > > >> > > is
> > > > > > >> > > > >> that
> > > > > > >> > > > >> > > > > at-least for Rocks DB, it is more performant
> for
> > > > > short
> > > > > > >> > > iterator
> > > > > > >> > > > >> queries
> > > > > > >> > > > >> > > > > that longer ones.
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > 4) Regarding the last question on placing it
> > > within
> > > > > > >> Segment,
> > > > > > >> > > the
> > > > > > >> > > > >> > > reason I
> > > > > > >> > > > >> > > > > didn't do that way, is that I thought we
> > > shouldn't
> > > > > tie
> > > > > > >> this
> > > > > > >> > > > >> feature
> > > > > > >> > > > >> > > only to
> > > > > > >> > > > >> > > > > RocksDB. I agree that I got this idea while
> > > > > > >> looking/reading
> > > > > > >> > > about
> > > > > > >> > > > >> > > RocksDB
> > > > > > >> > > > >> > > > > but if we keep it outside the purview of
> RocksDB
> > > and
> > > > > > >> keep it
> > > > > > >> > > as a
> > > > > > >> > > > >> > > pluggable
> > > > > > >> > > > >> > > > > entity, then a) it remains generic by not
> being
> > > tied
> > > > > to
> > > > > > >> any
> > > > > > >> > > > >> specific
> > > > > > >> > > > >> > > store
> > > > > > >> > > > >> > > > > and b) no change is needed at all for any of
> the
> > > > > other
> > > > > > >> stores
> > > > > > >> > > > >> which
> > > > > > >> > > > >> > > haven't
> > > > > > >> > > > >> > > > > implemented it.
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > I am not sure of any of the above points make
> > > sense
> > > > > but
> > > > > > >> as I
> > > > > > >> > > said,
> > > > > > >> > > > >> > > this is
> > > > > > >> > > > >> > > > > based out of my limited understanding of the
> > > > > codebase.
> > > > > > >> So,
> > > > > > >> > > pardon
> > > > > > >> > > > >> any
> > > > > > >> > > > >> > > > > incorrect/illogical statements plz!
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > @Sophie,
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > Thanks for bringing that point up! I have
> > > mentioned
> > > > > about
> > > > > > >> > > that PR
> > > > > > >> > > > >> in
> > > > > > >> > > > >> > > the
> > > > > > >> > > > >> > > > > KIP under a section called Other
> considerations.
> > > > > > >> Nonetheless,
> > > > > > >> > > > >> thanks
> > > > > > >> > > > >> > > for
> > > > > > >> > > > >> > > > > pointing it out!
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > Thanks!
> > > > > > >> > > > >> > > > > Sagar.
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > On Thu, May 14, 2020 at 5:17 AM Sophie
> > > Blee-Goldman <
> > > > > > >> > > > >> > > sop...@confluent.io>
> > > > > > >> > > > >> > > > > wrote:
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > > Not to derail this KIP discussion, but to
> > > leave a
> > > > > few
> > > > > > >> notes
> > > > > > >> > > on
> > > > > > >> > > > >> some
> > > > > > >> > > > >> > > of
> > > > > > >> > > > >> > > > > the
> > > > > > >> > > > >> > > > > > RocksDB points that have come up:
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > Someone actually merged some long overdue
> > > > > performance
> > > > > > >> > > > >> improvements to
> > > > > > >> > > > >> > > > > > the RocksJava implementation (the PR was
> opened
> > > > > back in
> > > > > > >> > > 2017!
> > > > > > >> > > > >> yikes).
> > > > > > >> > > > >> > > > > > I haven't looked into the prefix seek API
> > > closely
> > > > > > >> enough to
> > > > > > >> > > > >> know how
> > > > > > >> > > > >> > > > > > relevant
> > > > > > >> > > > >> > > > > > this particular change is, and they are
> still
> > > > > improving
> > > > > > >> > > things,
> > > > > > >> > > > >> but
> > > > > > >> > > > >> > > it
> > > > > > >> > > > >> > > > > > gives me some
> > > > > > >> > > > >> > > > > > faith.
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > There are some pretty promising results
> > > reported on
> > > > > > >> the PR:
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >>
> > > > > > >>
> > > https://github.com/facebook/rocksdb/pull/2283#issuecomment-561563037
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > Regarding the custom comparator, they also
> > > recently
> > > > > > >> merged
> > > > > > >> > > this
> > > > > > >> > > > >> > > > > performance
> > > > > > >> > > > >> > > > > > <
> https://github.com/facebook/rocksdb/pull/6252
> > > >
> > > > > > >> > > > >> > > > > > improvement <
> > > > > > >> https://github.com/facebook/rocksdb/pull/6252
> > > > > > >> > > >.
> > > > > > >> > > > >> The
> > > > > > >> > > > >> > > tl;dr
> > > > > > >> > > > >> > > > > is
> > > > > > >> > > > >> > > > > > they reduced the slowdown of a custom
> > > comparator in
> > > > > > >> Java
> > > > > > >> > > > >> > > > > > (relative to the native C++) from ~7x to
> ~5.2x
> > > at
> > > > > best.
> > > > > > >> > > Which is
> > > > > > >> > > > >> > > still
> > > > > > >> > > > >> > > > > not
> > > > > > >> > > > >> > > > > > great, but it
> > > > > > >> > > > >> > > > > > would be interesting to run our own
> benchmarks
> > > and
> > > > > see
> > > > > > >> how
> > > > > > >> > > this
> > > > > > >> > > > >> > > stacks
> > > > > > >> > > > >> > > > > up.
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > Of course, these are all new changes and as
> > > such
> > > > > will
> > > > > > >> > > require
> > > > > > >> > > > >> us to
> > > > > > >> > > > >> > > > > upgrade
> > > > > > >> > > > >> > > > > > rocks to 6.x which means they have to wait
> for
> > > us
> > > > > to
> > > > > > >> > > release a
> > > > > > >> > > > >> 3.0.
> > > > > > >> > > > >> > > But
> > > > > > >> > > > >> > > > > > there's
> > > > > > >> > > > >> > > > > > some talk about 3.0 coming in the next few
> > > > > releases so
> > > > > > >> > > consider
> > > > > > >> > > > >> it
> > > > > > >> > > > >> > > food
> > > > > > >> > > > >> > > > > for
> > > > > > >> > > > >> > > > > > not-so-future thought
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > On Tue, May 12, 2020 at 5:02 PM Adam
> Bellemare
> > > <
> > > > > > >> > > > >> > > adam.bellem...@gmail.com
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > wrote:
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > > Hi Guozhang
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > For clarity, the issues I was running
> into
> > > was
> > > > > not
> > > > > > >> about
> > > > > > >> > > the
> > > > > > >> > > > >> actual
> > > > > > >> > > > >> > > > > > > *prefixSeek* function itself, but about
> > > exposing
> > > > > it
> > > > > > >> to the
> > > > > > >> > > > >> same
> > > > > > >> > > > >> > > level
> > > > > > >> > > > >> > > > > of
> > > > > > >> > > > >> > > > > > > access as the *range* function throughout
> > > Kafka
> > > > > > >> Streams.
> > > > > > >> > > It
> > > > > > >> > > > >> > > required a
> > > > > > >> > > > >> > > > > > lot
> > > > > > >> > > > >> > > > > > > of changes, and also required that most
> state
> > > > > stores
> > > > > > >> stub
> > > > > > >> > > it
> > > > > > >> > > > >> out
> > > > > > >> > > > >> > > since
> > > > > > >> > > > >> > > > > it
> > > > > > >> > > > >> > > > > > > wasn't clear how they would implement
> it. It
> > > was
> > > > > > >> > > basically an
> > > > > > >> > > > >> > > > > > overreaching
> > > > > > >> > > > >> > > > > > > API change that was easily solved (for
> the
> > > > > specific
> > > > > > >> > > > >> prefix-scan in
> > > > > > >> > > > >> > > FKJ)
> > > > > > >> > > > >> > > > > > by
> > > > > > >> > > > >> > > > > > > simply using *range*. So to be clear, the
> > > > > blockers
> > > > > > >> were
> > > > > > >> > > > >> > > predominantly
> > > > > > >> > > > >> > > > > > > around correctly handling the API
> changes,
> > > > > nothing
> > > > > > >> to do
> > > > > > >> > > with
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > > > > > > mechanisms of the RocksDB prefix
> scanning.
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > As for KAFKA-5285 I'll look into it more
> to
> > > see
> > > > > if I
> > > > > > >> can
> > > > > > >> > > get a
> > > > > > >> > > > >> > > better
> > > > > > >> > > > >> > > > > > > handle on the problem!
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > Hope this helps clear it up.
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > Adam
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > On Tue, May 12, 2020 at 7:16 PM Guozhang
> > > Wang <
> > > > > > >> > > > >> wangg...@gmail.com>
> > > > > > >> > > > >> > > > > > wrote:
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > > Hello Adam,
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > I'm wondering if you can provide a bit
> more
> > > > > > >> context on
> > > > > > >> > > the
> > > > > > >> > > > >> > > blockers
> > > > > > >> > > > >> > > > > of
> > > > > > >> > > > >> > > > > > > > using prefixSeek of RocksDB (I saw you
> > > have a
> > > > > > >> > > > >> > > RocksDBPrefixIterator
> > > > > > >> > > > >> > > > > > class
> > > > > > >> > > > >> > > > > > > > but not used anywhere yet)? I'm
> currently
> > > > > looking
> > > > > > >> at
> > > > > > >> > > ways to
> > > > > > >> > > > >> > > allow
> > > > > > >> > > > >> > > > > some
> > > > > > >> > > > >> > > > > > > > secondary indices with rocksDB
> following
> > > some
> > > > > > >> existing
> > > > > > >> > > > >> approaches
> > > > > > >> > > > >> > > > > > > > from CockroachDB etc so I'm very
> curious to
> > > > > learn
> > > > > > >> your
> > > > > > >> > > > >> > > experience.
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > 1) Before considering any secondary
> > > indices, a
> > > > > > >> quick
> > > > > > >> > > > >> thought is
> > > > > > >> > > > >> > > that
> > > > > > >> > > > >> > > > > > for
> > > > > > >> > > > >> > > > > > > > (key, timeFrom, timeTo) queries, we can
> > > easily
> > > > > > >> replace
> > > > > > >> > > the
> > > > > > >> > > > >> > > current
> > > > > > >> > > > >> > > > > > > > `range()` impl with a `prefixRange()`
> impl
> > > via
> > > > > a
> > > > > > >> prefix
> > > > > > >> > > > >> iterator;
> > > > > > >> > > > >> > > > > > though
> > > > > > >> > > > >> > > > > > > > for (keyFrom, keyTo, timeFrom, timeTo)
> it
> > > is
> > > > > much
> > > > > > >> more
> > > > > > >> > > > >> > > complicated
> > > > > > >> > > > >> > > > > > indeed
> > > > > > >> > > > >> > > > > > > > and hence existing `range()` impl may
> > > still be
> > > > > > >> used.
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > 2) Another related issue I've been
> > > pondering
> > > > > for a
> > > > > > >> > > while is
> > > > > > >> > > > >> > > > > > > > around KAFKA-5285: with the default
> > > > > lexicograpic
> > > > > > >> byte
> > > > > > >> > > > >> comparator,
> > > > > > >> > > > >> > > > > since
> > > > > > >> > > > >> > > > > > > the
> > > > > > >> > > > >> > > > > > > > key length varies, the combo (key,
> window)
> > > > > would
> > > > > > >> have
> > > > > > >> > > > >> > > interleaving
> > > > > > >> > > > >> > > > > byte
> > > > > > >> > > > >> > > > > > > > layouts like:
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > AAA0001          (key AAA, timestamp
> 0001)
> > > > > > >> > > > >> > > > > > > > AAA00011        (key AAA0, timestamp
> 0011)
> > > > > > >> > > > >> > > > > > > > AAA0002          (key AAA, timestamp
> 0002)
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > which is challenging for prefix seeks
> to
> > > work
> > > > > > >> > > efficiently.
> > > > > > >> > > > >> > > Although
> > > > > > >> > > > >> > > > > we
> > > > > > >> > > > >> > > > > > > can
> > > > > > >> > > > >> > > > > > > > overwrite the byte-comparator in JNI
> it is
> > > very
> > > > > > >> > > expensive
> > > > > > >> > > > >> and the
> > > > > > >> > > > >> > > > > cost
> > > > > > >> > > > >> > > > > > of
> > > > > > >> > > > >> > > > > > > > JNI overwhelms its benefits. If you've
> got
> > > some
> > > > > > >> ideas
> > > > > > >> > > > >> around it
> > > > > > >> > > > >> > > > > please
> > > > > > >> > > > >> > > > > > > lmk
> > > > > > >> > > > >> > > > > > > > as well.
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > Guozhang
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > On Tue, May 12, 2020 at 6:26 AM Adam
> > > Bellemare
> > > > > <
> > > > > > >> > > > >> > > > > > adam.bellem...@gmail.com
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > wrote:
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > > Hi Sagar
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > I implemented a very similar
> interface
> > > for
> > > > > > >> KIP-213,
> > > > > > >> > > the
> > > > > > >> > > > >> > > foreign-key
> > > > > > >> > > > >> > > > > > > > joiner.
> > > > > > >> > > > >> > > > > > > > > We pulled it out of the final
> > > implementation
> > > > > and
> > > > > > >> > > instead
> > > > > > >> > > > >> used
> > > > > > >> > > > >> > > > > RocksDB
> > > > > > >> > > > >> > > > > > > > range
> > > > > > >> > > > >> > > > > > > > > instead. You can see the particular
> code
> > > > > where
> > > > > > >> we use
> > > > > > >> > > > >> > > > > > > RocksDB.range(...)
> > > > > > >> > > > >> > > > > > > > to
> > > > > > >> > > > >> > > > > > > > > get the same iterator result.
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >>
> > > > > > >> > >
> > > > > > >>
> > > > >
> > >
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/ForeignJoinSubscriptionProcessorSupplier.java#L95
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > We pulled it out because there were
> > > numerous
> > > > > > >> awkward
> > > > > > >> > > > >> > > acrobatics to
> > > > > > >> > > > >> > > > > > > > > integrate *prefixSeek()* function
> into
> > > the
> > > > > Kafka
> > > > > > >> > > Streams
> > > > > > >> > > > >> code.
> > > > > > >> > > > >> > > > > > > > Basically, I
> > > > > > >> > > > >> > > > > > > > > wanted to be able to access
> > > *prefixSeek()*
> > > > > the
> > > > > > >> same
> > > > > > >> > > way I
> > > > > > >> > > > >> can
> > > > > > >> > > > >> > > > > access
> > > > > > >> > > > >> > > > > > > > > *range()* for any state store, and in
> > > > > particular
> > > > > > >> use
> > > > > > >> > > it
> > > > > > >> > > > >> for
> > > > > > >> > > > >> > > storing
> > > > > > >> > > > >> > > > > > > data
> > > > > > >> > > > >> > > > > > > > > with a particular foreign key (as
> per the
> > > > > > >> previous
> > > > > > >> > > URL).
> > > > > > >> > > > >> > > However, I
> > > > > > >> > > > >> > > > > > > found
> > > > > > >> > > > >> > > > > > > > > out that it required way too many
> > > changes to
> > > > > > >> expose
> > > > > > >> > > the
> > > > > > >> > > > >> > > > > > *prefixSeek()*
> > > > > > >> > > > >> > > > > > > > > functionality while still being able
> to
> > > > > leverage
> > > > > > >> all
> > > > > > >> > > the
> > > > > > >> > > > >> nice
> > > > > > >> > > > >> > > Kafka
> > > > > > >> > > > >> > > > > > > > Streams
> > > > > > >> > > > >> > > > > > > > > state management + supplier
> > > functionality,
> > > > > so we
> > > > > > >> made
> > > > > > >> > > a
> > > > > > >> > > > >> > > decision
> > > > > > >> > > > >> > > > > just
> > > > > > >> > > > >> > > > > > > to
> > > > > > >> > > > >> > > > > > > > > stick with *range()* and pull
> everything
> > > else
> > > > > > >> out.
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > I guess my question here is, how do
> you
> > > > > > >> anticipate
> > > > > > >> > > using
> > > > > > >> > > > >> > > > > > *prefixSeek()*
> > > > > > >> > > > >> > > > > > > > > within the framework of Kafka
> Streams,
> > > or the
> > > > > > >> > > Processor
> > > > > > >> > > > >> API?
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > Adam
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > On Tue, May 12, 2020 at 2:52 AM
> Sagar <
> > > > > > >> > > > >> > > sagarmeansoc...@gmail.com>
> > > > > > >> > > > >> > > > > > > wrote:
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > > Hi All,
> > > > > > >> > > > >> > > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > > I would like to start a discussion
> on
> > > the
> > > > > KIP
> > > > > > >> that I
> > > > > > >> > > > >> created
> > > > > > >> > > > >> > > > > below
> > > > > > >> > > > >> > > > > > to
> > > > > > >> > > > >> > > > > > > > add
> > > > > > >> > > > >> > > > > > > > > > prefix scan support in State
> Stores:
> > > > > > >> > > > >> > > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > >
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >>
> > > > > > >> > >
> > > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-614%3A+Add+Prefix+Scan+support+for+State+Stores
> > > > > > >> > > > >> > > > > > > > > >
> > > > > > >> > > > >> > > > > > > > > > Thanks!
> > > > > > >> > > > >> > > > > > > > > > Sagar.
> > > > > > >> > > > >> > > > > > > > > >
> > > > > > >> > > > >> > > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > > > --
> > > > > > >> > > > >> > > > > > > > -- Guozhang
> > > > > > >> > > > >> > > > > > > >
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to