Re: First time patch submitter advice

2020-06-14 Thread Gwen Shapira
Hi,

1. Unfortunately, you need to get a committer to approve running the tests.
I just gave the green-light on your PR.
2. You can hope that committers will see your PR, but sometimes things get
lost. If you know someone who is familiar with that area of the code, it is
a good idea to ping them.
3. We do have some flaky tests. You can see that Jenkins will run 3
parallel builds, if some of them pass and the committer confirms that
failures are not related to your code, we are ok to merge. Obviously, if
you end up tracking them down and fixing, everyone will be very grateful.

Hope this helps,

Gwen

On Sun, Jun 14, 2020 at 5:52 PM Michael Carter <
michael.car...@instaclustr.com> wrote:

> Hi all,
>
> I’ve submitted a patch for the first time(
> https://github.com/apache/kafka/pull/8844 <
> https://github.com/apache/kafka/pull/8844>), and I have a couple of
> questions that I’m hoping someone can help me answer.
>
> I’m a little unclear what happens after that patch has been submitted. The
> coding guidelines say Jenkins will run tests automatically, but I don’t see
> any results anywhere. Have I misunderstood what should happen, or do I just
> not know where to look?
> Should I be attempting to find reviewers for the change myself, or is that
> done independently of the patch submitter?
>
> Also, in resolving a couple of conflicts that have arisen after the patch
> was first submitted, I noticed that there are now failing unit tests that
> have nothing to do with my change. Is there a convention on how to deal
> with these? Should it be something that I try to fix on my branch?
>
> Any thoughts are appreciated.
>
> Thanks,
> Michael



-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


KAFKA-8362: Partition snapshot still exist in cleaner-offset-checkpoint after the intra-disk move

2020-06-14 Thread Ming Liu
Hi Kafka,
In Kafka 2.5, it seems KAFKA-8362 still exists. The problem is for
compacted topic, after the disk re-assignment, both
disk cleaner-offset-checkpoint will contain that partition snapshot.  The
previous disk cleaner-offset-checkpoint contains the stale one and should
be deleted instead.
This will cause the problem in logcleaning phase.
I wonder anybody in the community is looking at or fixing this problem?

Thanks!
Ming


[jira] [Created] (KAFKA-10159) MirrorSourceConnector don`t work on connect-distributed.sh

2020-06-14 Thread cosmozhu (Jira)
cosmozhu created KAFKA-10159:


 Summary: MirrorSourceConnector don`t work on connect-distributed.sh
 Key: KAFKA-10159
 URL: https://issues.apache.org/jira/browse/KAFKA-10159
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 2.4.1
 Environment: centos7
Reporter: cosmozhu
 Fix For: 2.4.1
 Attachments: connectDistributed.out

hi
 I want to run a MirrorSourceConnector with connect-distributed .
 the connector config like this :
 ```
 {
 "name" : "cosmo-source",
 "config" :

{ "connector.class" : "org.apache.kafka.connect.mirror.MirrorSourceConnector", 
"source.cluster.alias" : "cosmo", "target.cluster.alias" : "nana", 
"source.cluster.bootstrap.servers" : 
"192.168.4.42:9092,192.168.4.42:9093,192.168.4.42:9094", "topics" : ".*" }

}
 ```

when I post the rest requestion, it returns to me 
```
{"name":"cosmo-source","config":{"connector.class":"org.apache.kafka.connect.mirror.MirrorSourceConnector","target.cluster.alias":"nana","topics":".*","source.cluster.alias":"cosmo","name":"cosmo-source","source.cluster.bootstrap.servers":"192.168.4.42:9092,192.168.4.42:9093,192.168.4.42:9094"},"tasks":[],"type":"source"}
```
the task array is empty.

It's obvious that something's wrong here.

in connectDistributed.out 
```
org.apache.kafka.common.config.ConfigException: Missing required configuration 
"bootstrap.servers" which has no default value.
```

full logs in the attachment.

thanks for any help.


 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


First time patch submitter advice

2020-06-14 Thread Michael Carter
Hi all,

I’ve submitted a patch for the first 
time(https://github.com/apache/kafka/pull/8844 
), and I have a couple of questions 
that I’m hoping someone can help me answer.

I’m a little unclear what happens after that patch has been submitted. The 
coding guidelines say Jenkins will run tests automatically, but I don’t see any 
results anywhere. Have I misunderstood what should happen, or do I just not 
know where to look?
Should I be attempting to find reviewers for the change myself, or is that done 
independently of the patch submitter?

Also, in resolving a couple of conflicts that have arisen after the patch was 
first submitted, I noticed that there are now failing unit tests that have 
nothing to do with my change. Is there a convention on how to deal with these? 
Should it be something that I try to fix on my branch?

Any thoughts are appreciated.

Thanks,
Michael

Re: Broker side round robin on topic partitions when receiving messages

2020-06-14 Thread M. Manna
Vinicius,

If you have an idea of what you want , and how it is going to improve
current Kafka offering perhaps you could submit a KiP ?

The guidelines are provided on Apache Kafka Wiki (Confluence).

Thanks,


On Sun, 14 Jun 2020 at 22:26, Vinicius Scheidegger <
vinicius.scheideg...@gmail.com> wrote:

> Hi Collin,
>
> Thanks for the reply. Actually the RoundRobinPartitioner won't do an equal
> distribution when working with multiple producers. One producer does not
> know the others. If you consider that producers are randomly producing
> messages, in the worst case scenario all producers can be synced and one
> could have as many messages in a single partition as the number of
> producers.
> It's easy to generate evidences of it.
>
> I have asked this question on the users mail list too (and on Slack and on
> Stackoverflow).
>
> Kafka currently does not have means to do a round robin across multiple
> producers or on the broker side.
>
> This means there is currently NO GUARANTEE of equal distribution across
> partitions as the partition election is decided by the producer.
>
> There result is an unbalanced consumption when working with consumer groups
> and the options are: creating a custom shared partitioner, relying on Kafka
> random partition or introducing a middle man between topics (all of them
> having big cons).
>
> I thought of asking here to see whether this is a topic that could concern
> other developers (and maybe understand whether this could be a KIP
> discussion)
>
> Maybe I'm missing something... I would like to know.
>
> According to my interpretation of the code (just read through some
> classes), but there is currently no way to do partition balancing on the
> broker - the producer sends messages directly to partition leaders so
> partition currently needs to be defined on the producer.
>
> I understand that in order to perform round robin across partitions of a
> topic when working with multiple producers, some development needs to be
> done. Am I right?
>
>
> Thanks
>
>
> On Fri, Jun 12, 2020, 10:57 PM Colin McCabe  wrote:
>
> > HI Vinicius,
> >
> > This question seems like a better fit for the user mailing list rather
> > than the developer mailing list.
> >
> > Anyway, if I understand correctly, you are asking if the producer can
> > choose to assign partitions in a round-robin fashion rather than based on
> > the key.  The answer is, you can, by using RoundRobinPartitioner. (again,
> > if I'm understanding the question correctly).
> >
> > best,
> > Colin
> >
> > On Tue, Jun 9, 2020, at 00:48, Vinicius Scheidegger wrote:
> > > Anyone?
> > >
> > > On Fri, Jun 5, 2020 at 2:42 PM Vinicius Scheidegger <
> > > vinicius.scheideg...@gmail.com> wrote:
> > >
> > > > Does anyone know how could I perform a load balance to distribute
> > equally
> > > > the messages to all consumers within the same consumer group having
> > > > multiple producers?
> > > >
> > > > Is this a conceptual flaw on Kafka, wasn't it thought for equal
> > > > distribution with multiple producers or am I missing something?
> > > > I've asked on Stack Overflow, on Kafka users mailing group, here (on
> > Kafka
> > > > Devs) and on Slack - and still have no definitive answer (actually
> > most of
> > > > the time I got no answer at all)
> > > >
> > > > Would something like this even be possible in the way Kafka is
> > currently
> > > > designed?
> > > > How does proposing for a KIP work?
> > > >
> > > > Thanks,
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2020, 3:44 PM Vinicius Scheidegger <
> > > > vinicius.scheideg...@gmail.com> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I'm trying to understand a little bit more about how Kafka works.
> > > >> I have a design with multiple producers writing to a single topic
> and
> > > >> multiple consumers in a single Consumer Group consuming message from
> > this
> > > >> topic.
> > > >>
> > > >> My idea is to distribute the messages from all producers equally.
> From
> > > >> reading the documentation I understood that the partition is always
> > > >> selected by the producer. Is that correct?
> > > >>
> > > >> I'd also like to know if there is an out of the box option to assign
> > the
> > > >> partition via a round robin *on the broker side *to guarantee equal
> > > >> distribution of the load - if possible to each consumer, but if not
> > > >> possible, at least to each partition.
> > > >>
> > > >> If my understanding is correct, it looks like in a multiple producer
> > > >> scenario there is lack of support from Kafka regarding load
> balancing
> > and
> > > >> customers have to either stick to the hash of the key (random
> > distribution,
> > > >> although it would guarantee same key goes to the same partition) or
> > they
> > > >> have to create their own logic on the producer side (i.e. by sharing
> > memory)
> > > >>
> > > >> Am I missing something?
> > > >>
> > > >> Thank you,
> > > >>
> > > >> Vinicius Scheidegger
> > > >>
> > > >
> > >
> >
>


Re: Kafka - replicas do not heal themselves by default

2020-06-14 Thread Israel Ekpo
It is always good to have context. It would be helpful to state the edition
of the book, version of Kafka, deployment architecture and other
environment details

What edition of the book are you referring to?

What version of Kafka is used in the book.

How the producers and consumers are interacting with the brokers influences
what happens and the impact.

The project is rapidly evolving and if you are running on Kubernetes, the
self healing aspect happens automatically in my experience for most use
cases

Could you share more details for the actual scenarios you are working on
outside the book?



On Sun, Jun 14, 2020 at 1:56 PM Nag Y  wrote:

> I am going through the kafka in action and come across this following
> phrase
>
> *One of the things to note with Kafka is that replicas do not heal
> themselves by default. If you lose a broker on which one of your copies of
> a partition existed, Kafka does not currently create a new copy. I mention
> this since some users are used to filesystems like HDFS that will maintain
> that replication number if a block is seen as corrupted or failed. So an
> important item to look at with monitoring the health of your system might
> be how many of your ISRs are indeed matching your intended number.*
>
>
> It looks interesting, as in most of the distributed systems, systems will
> try to create additional replicas if replicas are not available. I found it
> strange,  Any reason to do so ?
>


Re: Broker side round robin on topic partitions when receiving messages

2020-06-14 Thread Vinicius Scheidegger
Hi Collin,

Thanks for the reply. Actually the RoundRobinPartitioner won't do an equal
distribution when working with multiple producers. One producer does not
know the others. If you consider that producers are randomly producing
messages, in the worst case scenario all producers can be synced and one
could have as many messages in a single partition as the number of
producers.
It's easy to generate evidences of it.

I have asked this question on the users mail list too (and on Slack and on
Stackoverflow).

Kafka currently does not have means to do a round robin across multiple
producers or on the broker side.

This means there is currently NO GUARANTEE of equal distribution across
partitions as the partition election is decided by the producer.

There result is an unbalanced consumption when working with consumer groups
and the options are: creating a custom shared partitioner, relying on Kafka
random partition or introducing a middle man between topics (all of them
having big cons).

I thought of asking here to see whether this is a topic that could concern
other developers (and maybe understand whether this could be a KIP
discussion)

Maybe I'm missing something... I would like to know.

According to my interpretation of the code (just read through some
classes), but there is currently no way to do partition balancing on the
broker - the producer sends messages directly to partition leaders so
partition currently needs to be defined on the producer.

I understand that in order to perform round robin across partitions of a
topic when working with multiple producers, some development needs to be
done. Am I right?


Thanks


On Fri, Jun 12, 2020, 10:57 PM Colin McCabe  wrote:

> HI Vinicius,
>
> This question seems like a better fit for the user mailing list rather
> than the developer mailing list.
>
> Anyway, if I understand correctly, you are asking if the producer can
> choose to assign partitions in a round-robin fashion rather than based on
> the key.  The answer is, you can, by using RoundRobinPartitioner. (again,
> if I'm understanding the question correctly).
>
> best,
> Colin
>
> On Tue, Jun 9, 2020, at 00:48, Vinicius Scheidegger wrote:
> > Anyone?
> >
> > On Fri, Jun 5, 2020 at 2:42 PM Vinicius Scheidegger <
> > vinicius.scheideg...@gmail.com> wrote:
> >
> > > Does anyone know how could I perform a load balance to distribute
> equally
> > > the messages to all consumers within the same consumer group having
> > > multiple producers?
> > >
> > > Is this a conceptual flaw on Kafka, wasn't it thought for equal
> > > distribution with multiple producers or am I missing something?
> > > I've asked on Stack Overflow, on Kafka users mailing group, here (on
> Kafka
> > > Devs) and on Slack - and still have no definitive answer (actually
> most of
> > > the time I got no answer at all)
> > >
> > > Would something like this even be possible in the way Kafka is
> currently
> > > designed?
> > > How does proposing for a KIP work?
> > >
> > > Thanks,
> > >
> > >
> > >
> > > On Thu, May 28, 2020, 3:44 PM Vinicius Scheidegger <
> > > vinicius.scheideg...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm trying to understand a little bit more about how Kafka works.
> > >> I have a design with multiple producers writing to a single topic and
> > >> multiple consumers in a single Consumer Group consuming message from
> this
> > >> topic.
> > >>
> > >> My idea is to distribute the messages from all producers equally. From
> > >> reading the documentation I understood that the partition is always
> > >> selected by the producer. Is that correct?
> > >>
> > >> I'd also like to know if there is an out of the box option to assign
> the
> > >> partition via a round robin *on the broker side *to guarantee equal
> > >> distribution of the load - if possible to each consumer, but if not
> > >> possible, at least to each partition.
> > >>
> > >> If my understanding is correct, it looks like in a multiple producer
> > >> scenario there is lack of support from Kafka regarding load balancing
> and
> > >> customers have to either stick to the hash of the key (random
> distribution,
> > >> although it would guarantee same key goes to the same partition) or
> they
> > >> have to create their own logic on the producer side (i.e. by sharing
> memory)
> > >>
> > >> Am I missing something?
> > >>
> > >> Thank you,
> > >>
> > >> Vinicius Scheidegger
> > >>
> > >
> >
>


Kafka - replicas do not heal themselves by default

2020-06-14 Thread Nag Y
I am going through the kafka in action and come across this following phrase

*One of the things to note with Kafka is that replicas do not heal
themselves by default. If you lose a broker on which one of your copies of
a partition existed, Kafka does not currently create a new copy. I mention
this since some users are used to filesystems like HDFS that will maintain
that replication number if a block is seen as corrupted or failed. So an
important item to look at with monitoring the health of your system might
be how many of your ISRs are indeed matching your intended number.*


It looks interesting, as in most of the distributed systems, systems will
try to create additional replicas if replicas are not available. I found it
strange,  Any reason to do so ?


Kafka - how to know whether it is broker property or topic property or producer property

2020-06-14 Thread Nag Y
I am going through the documentation and often times, it is either not
clear or need to look at in multiple pleaces to see to which a prticular
property belongs and is it a specific property to an entity etc ..

To give an example, consider *"min.insync.replicas"* - This is just for an
example. From the apache documentation, it is mentioned under
https://kafka.apache.org/documentation/#brokerconfigs . From the confluent
documentation it is mentioned under
https://docs.confluent.io/current/installation/configuration/topic-configs.html
.
Later, I came to know that this property is available under both and
follows inheritance based on where it is configured. This needed to look
into multiple places to understand more about this property to see where it
belongs etc ..

But, is not there a documentation about where each property belongs, and
will it be inherited or not etc.

I do not think this answer need not be complex like looking into source
code, it should be simple enough - perhaps I might be missing something.


Also posted here
https://stackoverflow.com/questions/62369238/kafka-how-to-know-whether-it-is-broker-property-or-topic-property-or-producer