Re: Possible implementation for KAFKA-560
Hi, Thanks for the answer. Looking at high water mark, then the logic would be to flag the partitions that have high_watermark == log_start_offset In addition, I'm thinking that having the leader fulfill that criteria is enough to flag a partition, maybe check the replicas only if requested by the user. fre. 21. jun. 2019, 23:35 skrev Colin McCabe : > I don't think this requires a change in the protocol. It seems like you > should be able to use the high water mark to figure something out here? > > best, > Colin > > > On Fri, Jun 21, 2019, at 04:56, Carlos Manuel Duclos-Vergara wrote: > > Hi, > > > > This is an ancient task, but I feel it is still current today (specially > > since as somebody that deals with a Kafka cluster I know that this > happens > > more often than not). > > > > The task is about garbage collection of topics in a sort of automated > way. > > After some consideration I started a prototype implementation based on a > > manual process: > > > > 1. Using the cli, I can use the --describe-topic to get a list of topics > > that have size 0 > > 2. Massage that list into something that can be then fed into the cli and > > remove the topics that have size 0. > > > > The guiding principle here is the assumption that abandoned topics will > > eventually have size 0, because all records will expire. This is not true > > for all topics, but it covers a large portion of them and having > something > > like this would help admins to find "suspicious" topics at least. > > > > I started implementing this change and I realized that it would require a > > change in the protocol, because the sizes are never sent over the wire. > > Funny enough we collect the sizes of the log files, but we do not send > them. > > > > I think this kind of changes will require a KIP, but I wanted to ask what > > others think about this. > > > > The in-progress implementation of this can be found here: > > > https://github.com/carlosduclos/kafka/commit/0dffe5e131c3bd32b77f56b9be8eded89a96df54 > > > > Comments? > > > > -- > > Carlos Manuel Duclos Vergara > > Backend Software Developer > > >
Possible implementation for KAFKA-560
Hi, This is an ancient task, but I feel it is still current today (specially since as somebody that deals with a Kafka cluster I know that this happens more often than not). The task is about garbage collection of topics in a sort of automated way. After some consideration I started a prototype implementation based on a manual process: 1. Using the cli, I can use the --describe-topic to get a list of topics that have size 0 2. Massage that list into something that can be then fed into the cli and remove the topics that have size 0. The guiding principle here is the assumption that abandoned topics will eventually have size 0, because all records will expire. This is not true for all topics, but it covers a large portion of them and having something like this would help admins to find "suspicious" topics at least. I started implementing this change and I realized that it would require a change in the protocol, because the sizes are never sent over the wire. Funny enough we collect the sizes of the log files, but we do not send them. I think this kind of changes will require a KIP, but I wanted to ask what others think about this. The in-progress implementation of this can be found here: https://github.com/carlosduclos/kafka/commit/0dffe5e131c3bd32b77f56b9be8eded89a96df54 Comments? -- Carlos Manuel Duclos Vergara Backend Software Developer
Re: Contributor permissions
Hi, Yes, I read the guidelines. I forgot to mention the JIRA ID. I already pushed a PR from my fork. My Jira-ID is carlos.duclos. Regards On Fri, 7 Jun 2019 at 12:03, Bruno Cadonna wrote: > Hi Carlos, > > It's great that you want to contribute to Apache Kafka. > > Have you already read the instructions on how to contribute to Kafka? > > https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes > https://kafka.apache.org/contributing.html > > To assign Jira tickets to yourself, you need to be added to the list > of contributors. For that you need to sign-up to the Apache Jira > (https://issues.apache.org/jira/projects/KAFKA) and choose a Jira-ID > for yourself. Then you send an e-mail that contains the Jira-ID and a > request to be added to the list of contributors to this mailing list > (you can find examples in the list archives). Once a project committer > has added you to the list of contributors, you can assign tickets to > yourself. > > Best, > Bruno > > > > > On Fri, Jun 7, 2019 at 11:18 AM Dulvin Witharane > wrote: > > > > Hi, > > > > I think for the first JIRA ticket you have to request to be added as the > > assignee in a new thread. Afterwards you can assign them yourself and > keep > > working. > > > > On Fri, Jun 7, 2019 at 12:28 PM Carlos Manuel Duclos-Vergara < > > carlos.duc...@schibsted.com> wrote: > > > > > Hi, > > > > > > I'd like to start working on some of the newbie tasks (I got permission > > > from my employer to use up to 10% of my time on this project). I have > > > already found a couple of tasks that I'd like to work on, so I'd like > to be > > > able to assign those tasks to me. What is the procedure? > > > > > > Regards > > > > > > -- > > > Carlos Manuel Duclos Vergara > > > Backend Software Developer > > > > > -- > > Witharane, DRH > > R & D Engineer > > Synopsys Lanka (Pvt) Ltd. > > Borella, Sri Lanka > > 0776746781 > > > > Sent from my iPhone > -- Carlos Manuel Duclos Vergara Backend Software Developer
Contributor permissions
Hi, I'd like to start working on some of the newbie tasks (I got permission from my employer to use up to 10% of my time on this project). I have already found a couple of tasks that I'd like to work on, so I'd like to be able to assign those tasks to me. What is the procedure? Regards -- Carlos Manuel Duclos Vergara Backend Software Developer