Conceptually, do the position methods only apply to topics you've subscribed to, or do they apply to all topics in the cluster?
E.g., could I retrieve or set the committed position of any partition? The positive use case for having access to all partition information would be to setup an active monitoring system (that can feed the positions to a pretty GUI, for instance). A downside is that you could have invalid partition offsets committed (perhaps being reset to 0 by an overzealous client). --Tom On Thu, Feb 13, 2014 at 5:15 PM, Pradeep Gollakota <pradeep...@gmail.com>wrote: > Hi Neha, > > 6. It seems like #4 can be avoided by using Map<TopicPartition, > >> Long> or Map<TopicPartition, TopicPartitionOffset> as the argument type. > > > > How? lastCommittedOffsets() is independent of positions(). I'm not sure I > > understood your suggestion. > > I think of subscription as you're subscribing to a Set of TopicPartitions. > Because the argument to positions() is TopicPartitionOffset ... it's > conceivable that the method can be called with two offsets for the same > TopicPartition. One way to handle this, is to accept either the first or > the last offset for a TopicPartition. However, if the argument type is > changed to Map<TopicPartition, Long> it precludes the possibility of > getting duplicate offsets of the same TopicPartition. > > 7. To address #3, maybe we can return List<TopicPartitionOffset> that > are > >> invalid. > > > > I don't particularly see the advantage of returning a list of invalid > > partitions from position(). It seems a bit awkward to return a list to > > indicate what is obviously a bug. Prefer throwing an error since the user > > should just fix that logic. > > I'm not sure if an Exception is needed or desirable here. I don't see this > as a catastrophic failure or a non-recoverable failure. Even if we just > write the bad offsets to a log file and call it a day, I'm ok with that. > But my main goal is to communicate to the API users somehow that they've > provided bad offests which are simply being ignored. > > Hi Jay, > > I would also like to shorten the name TopicOffsetPosition. Offset and > > Position are duplicative of each other. So perhaps we could call it a > > PartitionOffset or a TopicPosition or something like that. In general > class > > names that are just a concatenation of the fields (e.g. > > TopicAndPartitionAndOffset) seem kind of lazy to me since the name > doesn't > > really describe it just enumerates. But that is more of a nit pick. > > > 1. Did you mean to say TopicPartitionOffset instead of > TopicOffsetPosition? > 2. +1 on PartitionOffset > > The lastCommittedPosition is particular bothersome because: > > 1. The name is weird and long > > 2. It returns a list of results. But how can you use the list? The only > way > > to use the list is to make a map of tp=>offset and then look up results > in > > this map (or do a for loop over the list for the partition you want). > > This is sort of what I was talking about in my previous email. My > suggestion was to change the return type to Map<TopicPartition, Long>. > > What if we made it: > > long position(TopicPartition tp) > > void seek(TopicOffsetPosition p) > > long committed(TopicPartition tp) > > void commit(TopicOffsetPosition...); > > > 1. Absolutely love the idea of position(TopicPartition tp). > 2. I think we also need to provide a method for accessing all positions > positions() which maybe returns a Map<TopicPartition, Long>? > 3. What is the difference between position(TopicPartition tp) and > committed(TopicPartition > tp)? > 4. +1 on commit(PartitionOffset...) > 5. +1 on seek(PartitionOffset p) > 6. We should also provide a seek(PartitionOffset... offsets) > > Finally, in all the methods where we're using varargs, we should use an > appropriate Collection data structure. For example, for the > subscribe(TopicPartition... > partitions) method, I think a more accurate API would be > subscribe(Set<TopicPartition> > partitions). This allows for the code to be self-documenting. >