Thanks Richard for updating the wiki.

Made another pass over it, overall it LGTM. Just a few minor comments:

1) Could you update the title to reflect the function names accordingly?
Generally I think having `all / range` is better in terms of consistency
with key-value windows. I.e. queries with key are named as `get / fetch`
for kv / window stores, and queries without key are named as `range / all`.

2) In the Javadoc, "@throws NullPointerException if null is used for any key"
is not needed as the API does not have any key parameters.


Please feel free to start the voting thread, also at the same time start
implementing the PR in case you may find some impl corner cases that have
not been thought about in the wiki. That would help us discover any
potential blockers earlier.


Guozhang


On Wed, Oct 18, 2017 at 8:42 PM, Richard Yu <yohan.richard...@gmail.com>
wrote:

> Soliciting more feedback before vote.
>
> On Wed, Oct 18, 2017 at 8:26 PM, Richard Yu <yohan.richard...@gmail.com>
> wrote:
>
> > Is this KIP close to completion? Because we could start working on the
> > code itself now. (Its at about this stage).
> >
> > On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yohan.richard...@gmail.com>
> > wrote:
> >
> >> As Guozhang Wang mentioned earlier, we want to mirror the structure of
> >> similar Store class (namely KTable). The WindowedStore class might be
> >> unique in itself as it uses fetch() methods, but in my opinion,
> uniformity
> >> should be better suited for simplicity.
> >>
> >> On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xav...@confluent.io>
> >> wrote:
> >>
> >>> Thank you Richard! Do you or Guozhang have any thoughts on my
> suggestions
> >>> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the
> "range"
> >>> keyword for when we query a specific range of keys?
> >>>
> >>> Xavier
> >>>
> >>> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yohan.richard...@gmail.com
> >
> >>> wrote:
> >>>
> >>> > Thanks for the clarifications, Xavier.
> >>> > I have removed most of the methods except for keys() and all() which
> >>> has
> >>> > been renamed to Guozhang Wang's suggestions.
> >>> >
> >>> > Hope this helps.
> >>> >
> >>> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xav...@confluent.io>
> >>> > wrote:
> >>> >
> >>> > > Thanks for the KIP Richard, this is a very useful addition!
> >>> > >
> >>> > > As far as the API changes, I just have a few comments on the
> methods
> >>> that
> >>> > > don't seem directly related to the KIP title, and naming of course
> >>> :).
> >>> > > On the implementation, see my notes further down that will
> hopefully
> >>> > > clarify a few things.
> >>> > >
> >>> > > Regarding the "bonus" methods:
> >>> > > I agree with Guozhang that the KIP lacks proper motivation for
> >>> adding the
> >>> > > min, max, and allLatest methods.
> >>> > > It is also not clear to me what min and max would really mean, what
> >>> > > ordering do we refer to here? Are we first ordering by time, then
> >>> key, or
> >>> > > first by key, then time?
> >>> > > The allLatest method might be useful, but I don't really see how it
> >>> would
> >>> > > be used in practice if we have to scan the entire range of keys for
> >>> all
> >>> > the
> >>> > > state stores, every single time.
> >>> > >
> >>> > > Maybe we could flesh the motivation behind those extra methods, but
> >>> in
> >>> > the
> >>> > > interest of time, and moving the KIP forward it might make sense to
> >>> file
> >>> > a
> >>> > > follow-up once we have more concrete use-cases.
> >>> > >
> >>> > > On naming:
> >>> > > I also agree with Guozhang that "keys()" should be renamed. It
> feels
> >>> a
> >>> > bit
> >>> > > of a misnomer, since it not only returns keys, but also the values.
> >>> > >
> >>> > > As far as what to rename it to, I would argue we already have some
> >>> > > discrepancy between key-value stores using range() vs. window
> stores
> >>> > using
> >>> > > fetch().
> >>> > > I assume we called the window method "fetch" instead of "get"
> >>> because you
> >>> > > might get back more than one window for the requested key.
> >>> > >
> >>> > > If we wanted to make things consistent with both existing key-value
> >>> store
> >>> > > naming and window store naming, we could do the following:
> >>> > > Decide that "all" always refers to the entire range of keys,
> >>> independent
> >>> > of
> >>> > > the window and similarly "range" always refers to a particular
> range
> >>> of
> >>> > > keys, irrespective of the window.
> >>> > > We can then prefix methods with "fetch" to indicate that more than
> >>> one
> >>> > > window may be returned for each key in the range.
> >>> > >
> >>> > > This would give us:
> >>> > > - a new fetchAll() method for all the keys, which makes it clear
> >>> that you
> >>> > > might get back the same key in different windows
> >>> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
> >>> given
> >>> > > time range, again with possibly more than one window per key
> >>> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
> >>> long,
> >>> > > long)  and deprecate the old one to indicate a range of keys
> >>> > >
> >>> > > One inconsistency I noted: the "Proposed Changes" section in your
> KIP
> >>> > talks
> >>> > > about a "range(timeFrom, timeTo)" method, I think you meant to
> refer
> >>> to
> >>> > the
> >>> > > all(from, to) method, but I'm sure you'll fix that once we decide
> on
> >>> > > naming.
> >>> > >
> >>> > > On the implementation side:
> >>> > > You mentioned that caching and rocksdb store have very different
> >>> > key/value
> >>> > > structures, and while it appears to be that way on the surface, the
> >>> > > structure between the two is actually very similar. Keys in the
> >>> cache are
> >>> > > prefixed with a segment ID to ensure the ordering in the cache
> stays
> >>> > > consistent with the rocksdb implementation, which maintains
> multiple
> >>> > > rocksdb instances, one for each segment. So we just "artificially"
> >>> mirror
> >>> > > the segment structure in the cache.
> >>> > >
> >>> > > The reason for keeping the ordering consistent is pretty simple:
> >>> keep in
> >>> > > mind that when we query a cached window store we are effectively
> >>> querying
> >>> > > both the cache and the persistent rocksdb store at the same time,
> >>> merging
> >>> > > results from both. To make that merge as painless as possible, we
> >>> ensure
> >>> > > the ordering is consistent when querying a range of keys in both
> >>> stores.
> >>> > >
> >>> > > Also keep in mind CompositeReadonlyWindowStore, which wraps
> multiple
> >>> > window
> >>> > > stores within a topology.
> >>> > >
> >>> > > Hope this clarifies some of the less trivial parts of caching
> window
> >>> > store.
> >>> > >
> >>> > > Cheers,
> >>> > > Xavier
> >>> > >
> >>> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>> > >
> >>> > > > Richard, Matthias:
> >>> > > >
> >>> > > > 0. Could you describe a bit what are the possible use cases of
> >>> > > `allLatest`,
> >>> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
> >>> minimum
> >>> > > > necessary amount, to avoid a swamp of new APIs that no one would
> >>> really
> >>> > > use
> >>> > > > but just complicated the internal code base.
> >>> > > >
> >>> > > > 1. One minor comment on the other two new APIs: could we rename
> >>> `keys`
> >>> > to
> >>> > > > `all` and `all` to `range` to be consistent with the other
> store's
> >>> > APIs?
> >>> > > >
> >>> > > > 2. One meta comment on the implementation details: since both
> >>> `keys`
> >>> > and
> >>> > > > `all` would likely touch multiple segments, we may need to use
> the
> >>> > > internal
> >>> > > > `SegmentIterator` class, but currently it always requires a Bytes
> >>> from
> >>> > > and
> >>> > > > Bytes to for its constructor. What changes we need to make for
> that
> >>> > > class?
> >>> > > >
> >>> > > >
> >>> > > > Guozhang
> >>> > > >
> >>> > > >
> >>> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
> >>> yohan.richard...@gmail.com
> >>> > >
> >>> > > > wrote:
> >>> > > >
> >>> > > > > We should split KAFKA-4499 into several sub-issues with 4499
> >>> being
> >>> > the
> >>> > > > > parent issue.
> >>> > > > > Adding the implementation to CachingWindowStore,
> >>> RocksDBWindowStore,
> >>> > > etc
> >>> > > > > will each require the addition of a test and implementing the
> >>> methods
> >>> > > > which
> >>> > > > > is not trivial.
> >>> > > > > This way, it should be easier to manage the progress of the
> KIP.
> >>> > > > >
> >>> > > > >
> >>> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
> >>> > matth...@confluent.io
> >>> > > >
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > > > Thanks for driving this and sorry for late response. With
> >>> release
> >>> > > > > > deadline it was pretty busy lately.
> >>> > > > > >
> >>> > > > > > Can you please add a description for the suggested method,
> what
> >>> > they
> >>> > > > are
> >>> > > > > > going to return? It's a little unclear to me atm.
> >>> > > > > >
> >>> > > > > > It would also be helpful to discuss, for which use case each
> >>> method
> >>> > > is
> >>> > > > > > useful. This might also help to identify potential gaps for
> >>> which
> >>> > > > > > another API might be more helpful.
> >>> > > > > >
> >>> > > > > > Also, we should talk about provided guarantees when using
> those
> >>> > APIs
> >>> > > > > > with regard to consistency -- not saying that we need to
> >>> provide
> >>> > > strong
> >>> > > > > > guarantees, but he KIP should describe what user can expect.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > -Matthias
> >>> > > > > >
> >>> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> >>> > > > > > > Hello, I would like to solicit review and comment on this
> >>> issue
> >>> > > (link
> >>> > > > > > > below):
> >>> > > > > > >
> >>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> >>> > > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > -- Guozhang
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>



-- 
-- Guozhang

Reply via email to