Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Richard Yu Mon, 16 Oct 2017 19:38:03 -0700

As Guozhang Wang mentioned earlier, we want to mirror the structure of
similar Store class (namely KTable). The WindowedStore class might be
unique in itself as it uses fetch() methods, but in my opinion, uniformity
should be better suited for simplicity.


On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <[email protected]> wrote:

> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions
> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range"
> keyword for when we query a specific range of keys?
>
> Xavier
>
> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <[email protected]>
> wrote:
>
> > Thanks for the clarifications, Xavier.
> > I have removed most of the methods except for keys() and all() which has
> > been renamed to Guozhang Wang's suggestions.
> >
> > Hope this helps.
> >
> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <[email protected]>
> > wrote:
> >
> > > Thanks for the KIP Richard, this is a very useful addition!
> > >
> > > As far as the API changes, I just have a few comments on the methods
> that
> > > don't seem directly related to the KIP title, and naming of course :).
> > > On the implementation, see my notes further down that will hopefully
> > > clarify a few things.
> > >
> > > Regarding the "bonus" methods:
> > > I agree with Guozhang that the KIP lacks proper motivation for adding
> the
> > > min, max, and allLatest methods.
> > > It is also not clear to me what min and max would really mean, what
> > > ordering do we refer to here? Are we first ordering by time, then key,
> or
> > > first by key, then time?
> > > The allLatest method might be useful, but I don't really see how it
> would
> > > be used in practice if we have to scan the entire range of keys for all
> > the
> > > state stores, every single time.
> > >
> > > Maybe we could flesh the motivation behind those extra methods, but in
> > the
> > > interest of time, and moving the KIP forward it might make sense to
> file
> > a
> > > follow-up once we have more concrete use-cases.
> > >
> > > On naming:
> > > I also agree with Guozhang that "keys()" should be renamed. It feels a
> > bit
> > > of a misnomer, since it not only returns keys, but also the values.
> > >
> > > As far as what to rename it to, I would argue we already have some
> > > discrepancy between key-value stores using range() vs. window stores
> > using
> > > fetch().
> > > I assume we called the window method "fetch" instead of "get" because
> you
> > > might get back more than one window for the requested key.
> > >
> > > If we wanted to make things consistent with both existing key-value
> store
> > > naming and window store naming, we could do the following:
> > > Decide that "all" always refers to the entire range of keys,
> independent
> > of
> > > the window and similarly "range" always refers to a particular range of
> > > keys, irrespective of the window.
> > > We can then prefix methods with "fetch" to indicate that more than one
> > > window may be returned for each key in the range.
> > >
> > > This would give us:
> > > - a new fetchAll() method for all the keys, which makes it clear that
> you
> > > might get back the same key in different windows
> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
> given
> > > time range, again with possibly more than one window per key
> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
> long,
> > > long)  and deprecate the old one to indicate a range of keys
> > >
> > > One inconsistency I noted: the "Proposed Changes" section in your KIP
> > talks
> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer to
> > the
> > > all(from, to) method, but I'm sure you'll fix that once we decide on
> > > naming.
> > >
> > > On the implementation side:
> > > You mentioned that caching and rocksdb store have very different
> > key/value
> > > structures, and while it appears to be that way on the surface, the
> > > structure between the two is actually very similar. Keys in the cache
> are
> > > prefixed with a segment ID to ensure the ordering in the cache stays
> > > consistent with the rocksdb implementation, which maintains multiple
> > > rocksdb instances, one for each segment. So we just "artificially"
> mirror
> > > the segment structure in the cache.
> > >
> > > The reason for keeping the ordering consistent is pretty simple: keep
> in
> > > mind that when we query a cached window store we are effectively
> querying
> > > both the cache and the persistent rocksdb store at the same time,
> merging
> > > results from both. To make that merge as painless as possible, we
> ensure
> > > the ordering is consistent when querying a range of keys in both
> stores.
> > >
> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple
> > window
> > > stores within a topology.
> > >
> > > Hope this clarifies some of the less trivial parts of caching window
> > store.
> > >
> > > Cheers,
> > > Xavier
> > >
> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <[email protected]>
> wrote:
> > >
> > > > Richard, Matthias:
> > > >
> > > > 0. Could you describe a bit what are the possible use cases of
> > > `allLatest`,
> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
> minimum
> > > > necessary amount, to avoid a swamp of new APIs that no one would
> really
> > > use
> > > > but just complicated the internal code base.
> > > >
> > > > 1. One minor comment on the other two new APIs: could we rename
> `keys`
> > to
> > > > `all` and `all` to `range` to be consistent with the other store's
> > APIs?
> > > >
> > > > 2. One meta comment on the implementation details: since both `keys`
> > and
> > > > `all` would likely touch multiple segments, we may need to use the
> > > internal
> > > > `SegmentIterator` class, but currently it always requires a Bytes
> from
> > > and
> > > > Bytes to for its constructor. What changes we need to make for that
> > > class?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > We should split KAFKA-4499 into several sub-issues with 4499 being
> > the
> > > > > parent issue.
> > > > > Adding the implementation to CachingWindowStore,
> RocksDBWindowStore,
> > > etc
> > > > > will each require the addition of a test and implementing the
> methods
> > > > which
> > > > > is not trivial.
> > > > > This way, it should be easier to manage the progress of the KIP.
> > > > >
> > > > >
> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for driving this and sorry for late response. With release
> > > > > > deadline it was pretty busy lately.
> > > > > >
> > > > > > Can you please add a description for the suggested method, what
> > they
> > > > are
> > > > > > going to return? It's a little unclear to me atm.
> > > > > >
> > > > > > It would also be helpful to discuss, for which use case each
> method
> > > is
> > > > > > useful. This might also help to identify potential gaps for which
> > > > > > another API might be more helpful.
> > > > > >
> > > > > > Also, we should talk about provided guarantees when using those
> > APIs
> > > > > > with regard to consistency -- not saying that we need to provide
> > > strong
> > > > > > guarantees, but he KIP should describe what user can expect.
> > > > > >
> > > > > >
> > > > > > -Matthias
> > > > > >
> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > > > > > Hello, I would like to solicit review and comment on this issue
> > > (link
> > > > > > > below):
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Reply via email to