re: Purge vs PurgeMessages/Records - I think we agree that PurgeRequest is
more consistent with current requests while PurgeRecords is more explicit
by itself. The question is whether it is more important to be consistent or
explicit. My opinion is that consistency is more important here. In general
user knows ProduceRequset and FetchRequest before any other request, which
means they will know that such request applies to "Records" if not
explicitly specified. Thus PurgeRequest seems a reasonable choice. Also, it
is not a problem if we have something else to purge in the future - we can
simply name the new request as PurgeXXXRequest similar to
DeleteTopicsRequest. Does this make sense?



On Fri, Mar 3, 2017 at 11:35 AM, Jeff Widman <j...@netskope.com> wrote:

> re: Purge vs PurgeMessages/Records - I also prefer that it be more explicit
> about what is being purged. Despite the inconsistency with Fetch/Produce,
> because it's explicit about what's being purged there shouldn't be
> additional confusion. Who knows what in the future might need purging?
> Adding the extra word provides optionality in the future with very little
> cost.
>
> re: Records vs Messages... it would be nice if it was consistent across
> Kafka. Messages has a sense of expiring once received which seems like a
> better fit for a system like Kafka that has things flowing through it and
> then deleted, whereas records has a connotation of being kept for a
> unspecified period of time, such as a a more typical database scenario. But
> that's a minor point, really I just prefer it's consistent.
>
> On Fri, Mar 3, 2017 at 6:34 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > First of all, sorry to arrive late on this.
> >
> > Jun, do you have a reference that states that "purge" means to remove a
> > portion? If I do "define: purge" on Google, one of the definitions is
> > "physically remove (something) completely."
> >
> > In the PR, I was asking about the reasoning more than suggesting a
> change.
> > But let me clarify my thoughts. There are 2 separate things to think
> about:
> >
> > 1. The protocol change.
> >
> > It's currently called Purge with no mention of what it's purging. This is
> > consistent with Fetch and Produce and it makes sense if we reserve the
> word
> > "Purge" for dealing with records/messages. Having said that, I don't
> think
> > this is particularly intuitive for people who are not familiar with Kafka
> > and its history. The number of APIs in the protocol keeps growing and it
> > would be better to be explicit about what is being purged/deleted, in my
> > opinion. If we are explicit, then we need to decide what to call it,
> since
> > there is no precedent. A few options: PurgeRecords, PurgeMessages,
> > PurgeData, DeleteRecords, DeleteMessages, DeleteData (I personally don't
> > like the Data suffix as it's not used anywhere else).
> >
> > 2. The AdminClient change.
> >
> > Regarding the name of the method, I'd prefer to avoid the `Data` suffix
> > because I don't think we use that anywhere else (please correct me if I'm
> > wrong). In the Producer, we have `send(ProduceRecord)` and in the
> consumer
> > we have `ConsumerRecords poll(...)`. So maybe, the suffix should be
> > `Records`? Like in the protocol, we still need to decide if we want to
> use
> > `purge` or `delete`. We seem to use `delete` for all the other methods in
> > the AdminClient, so unless we have a reason to use a different name, it
> > seems like we should be consistent.
> >
> > The proposed method signature is `Future<Map<TopicPartition,
> > PurgeDataResult>> purgeDataBefore(Map<TopicPartition, Long>
> > offsetForPartition)`. In the AdminClient KIP (KIP-117), we are using
> > classes to encapsulate the parameters and result. We should probably do
> the
> > same in this KIP for consistency. Once we do that, we should also
> consider
> > if `Before` should be in the method name or should be in the parameter
> > class. Just an example to describe what I mean, one could say
> > `deleteRecords(DeleteRecordsParams.before(offsetsForPartition)`. That
> way,
> > we could provide a different way of deleting by simply updating the
> > parameters class.
> >
> > Some food for thought. :)
> >
> > Ismael
> >
> >
> >
> > On Thu, Mar 2, 2017 at 5:46 PM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > > Hey Jun,
> > >
> > > Thanks for this information. I am not aware of this difference between
> > the
> > > purge and delete. Given this difference, I will prefer to the existing
> > name
> > > of the purge.
> > >
> > > Ismael, please let me if you are strong about using delete.
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > > On Thu, Mar 2, 2017 at 9:40 AM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Dong,
> > > >
> > > > It seems that delete means removing everything while purge means
> > > removing a
> > > > portion. So, it seems that it's better to be able to distinguish the
> > two?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Mar 1, 2017 at 1:57 PM, Dong Lin <lindon...@gmail.com>
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have updated the KIP to include a script that allows user to
> purge
> > > data
> > > > > by providing a map from partition to offset. I think this script
> may
> > be
> > > > > convenience and useful, e.g., if user simply wants to purge all
> data
> > of
> > > > > given partitions from command line. I am wondering if anyone object
> > > this
> > > > > script or has suggestions on the interface.
> > > > >
> > > > > Besides, Ismael commented in the pull request that it may be better
> > to
> > > > > rename PurgeDataBefore() to DeleteDataBefore() and rename
> > PurgeRequest
> > > to
> > > > > DeleteRequest. I think it may be a good idea because
> kafka-topics.sh
> > > > > already use "delete" as an option. Personally I don't have strong
> > > > > preference between "purge" and "delete". I am wondering if anyone
> > > object
> > > > to
> > > > > this change.
> > > > >
> > > > > Thanks,
> > > > > Dong
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 1, 2017 at 9:46 AM, Dong Lin <lindon...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Ismael,
> > > > > >
> > > > > > I actually mean log_start_offset. I realized that it is a better
> > name
> > > > > > after I start implementation because "logStartOffset" is already
> > used
> > > > in
> > > > > > Log.scala and LogCleanerManager.scala. So I changed it from
> > > > > > log_begin_offset to log_start_offset in the patch. But I forgot
> to
> > > > update
> > > > > > the KIP and specify it in the mailing thread.
> > > > > >
> > > > > > Thanks for catching this. Let me update the KIP to reflect this
> > > change.
> > > > > >
> > > > > > Dong
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 1, 2017 at 6:15 AM, Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > > >
> > > > > >> Hi Dong,
> > > > > >>
> > > > > >> When you say "logStartOffset", do you mean "log_begin_offset "?
> I
> > > > could
> > > > > >> only find the latter in the KIP. If so, would log_start_offset
> be
> > a
> > > > > better
> > > > > >> name?
> > > > > >>
> > > > > >> Ismael
> > > > > >>
> > > > > >> On Tue, Feb 28, 2017 at 4:26 AM, Dong Lin <lindon...@gmail.com>
> > > > wrote:
> > > > > >>
> > > > > >> > Hi Jun and everyone,
> > > > > >> >
> > > > > >> > I would like to change the KIP in the following way.
> Currently,
> > if
> > > > any
> > > > > >> > replica if offline, the purge result for a partition will
> > > > > >> > be NotEnoughReplicasException and its low_watermark will be 0.
> > The
> > > > > >> > motivation for this approach is that we want to guarantee that
> > the
> > > > > data
> > > > > >> > before purgedOffset has been deleted on all replicas of this
> > > > partition
> > > > > >> if
> > > > > >> > purge result indicates success.
> > > > > >> >
> > > > > >> > But this approach seems too conservative. It should be
> > sufficient
> > > in
> > > > > >> most
> > > > > >> > cases to just tell user success and set low_watermark to
> minimum
> > > > > >> > logStartOffset of all live replicas in the PurgeResponse if
> > > > > >> logStartOffset
> > > > > >> > of all live replicas have reached purgedOffset. This is
> because
> > > for
> > > > an
> > > > > >> > offline replicas to become online and be elected leader, it
> > should
> > > > > have
> > > > > >> > received one FetchReponse from the current leader which should
> > > tell
> > > > it
> > > > > >> to
> > > > > >> > purge beyond purgedOffset. The benefit of doing this change is
> > > that
> > > > we
> > > > > >> can
> > > > > >> > allow purge operation to succeed when some replica is offline.
> > > > > >> >
> > > > > >> > Are you OK with this change? If so, I will go ahead to update
> > the
> > > > KIP
> > > > > >> and
> > > > > >> > implement this behavior.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Dong
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Jan 17, 2017 at 10:18 AM, Dong Lin <
> lindon...@gmail.com
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> > > Hey Jun,
> > > > > >> > >
> > > > > >> > > Do you have time to review the KIP again or vote for it?
> > > > > >> > >
> > > > > >> > > Hey Ewen,
> > > > > >> > >
> > > > > >> > > Can you also review the KIP again or vote for it? I have
> > > discussed
> > > > > >> with
> > > > > >> > > Radai and Becket regarding your concern. We still think
> > putting
> > > it
> > > > > in
> > > > > >> > Admin
> > > > > >> > > Client seems more intuitive because there is use-case where
> > > > > >> application
> > > > > >> > > which manages topic or produces data may also want to purge
> > > data.
> > > > It
> > > > > >> > seems
> > > > > >> > > weird if they need to create a consumer to do this.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Dong
> > > > > >> > >
> > > > > >> > > On Thu, Jan 12, 2017 at 9:34 AM, Mayuresh Gharat <
> > > > > >> > > gharatmayures...@gmail.com> wrote:
> > > > > >> > >
> > > > > >> > >> +1 (non-binding)
> > > > > >> > >>
> > > > > >> > >> Thanks,
> > > > > >> > >>
> > > > > >> > >> Mayuresh
> > > > > >> > >>
> > > > > >> > >> On Wed, Jan 11, 2017 at 1:03 PM, Dong Lin <
> > lindon...@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> > >>
> > > > > >> > >> > Sorry for the duplicated email. It seems that gmail will
> > put
> > > > the
> > > > > >> > voting
> > > > > >> > >> > email in this thread if I simply replace DISCUSS with
> VOTE
> > in
> > > > the
> > > > > >> > >> subject.
> > > > > >> > >> >
> > > > > >> > >> > On Wed, Jan 11, 2017 at 12:57 PM, Dong Lin <
> > > > lindon...@gmail.com>
> > > > > >> > wrote:
> > > > > >> > >> >
> > > > > >> > >> > > Hi all,
> > > > > >> > >> > >
> > > > > >> > >> > > It seems that there is no further concern with the
> > KIP-107.
> > > > At
> > > > > >> this
> > > > > >> > >> point
> > > > > >> > >> > > we would like to start the voting process. The KIP can
> be
> > > > found
> > > > > >> at
> > > > > >> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 107
> > > > > >> > >> > > %3A+Add+purgeDataBefore%28%29+API+in+AdminClient.
> > > > > >> > >> > >
> > > > > >> > >> > > Thanks,
> > > > > >> > >> > > Dong
> > > > > >> > >> > >
> > > > > >> > >> >
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >> --
> > > > > >> > >> -Regards,
> > > > > >> > >> Mayuresh R. Gharat
> > > > > >> > >> (862) 250-7125
> > > > > >> > >>
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to