How to config the servers when Kafka cluster is behind a NAT?

2018-09-17 Thread XinYi Long
Hello, Guys,

I have a Kafka cluster, which is behind a NAT.

I have some devices on the internet, to play as consumers and producers. 
And I also have some applications which are in the same LAN with the Kafka 
cluster, and play as consumers and producers.

I have changed the "advertised.listeners" to "PLAINTEXT://{NAT IP}:{NAT 
port}", and add some routes on the servers. Because I found that Kafka brokers 
also use the "advertised.listeners" to talk with each other. Am I right?

When I start a consumer in the same LAN, I found it can receive metadata 
correctly. But it can't consume any message from other producer in the same LAN.

Did I miss anything, and how to make it work?


Thank you very much!


lxyscls


Re: Best way for reading all messages and close

2018-09-17 Thread David Espinosa
Thank you all for your responses!
I also asked this on the confluent slack channel (
https://confluentcommunity.slack.com) and I got this approach:

   1. Query the partitions' high watermark offset
   2. Set the consumer to consume from beginning
   3. Break out when you've reached the high offset

Still have some doubts regarding the implementation, but it seems a good
approach (I'm using a single partition so a single loop would be enough per
topic).
What do you think?

El sáb., 15 sept. 2018 a las 0:30, John Roesler ()
escribió:

> Specifically, you can monitor the "records-lag-max" (
> https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics)
> metric. (or the more granular one per partition).
>
> Once this metric goes to 0, you know that you've caught up with the tail of
> the log.
>
> Hope this helps,
> -John
>
> On Fri, Sep 14, 2018 at 2:02 PM Matthias J. Sax 
> wrote:
>
> > Using Kafka Streams this is a little tricky.
> >
> > The API itself has no built-in mechanism to do this. You would need to
> > monitor the lag of the application, and if the lag is zero (assuming you
> > don't write new data into the topic in parallel), terminate the
> > application.
> >
> >
> > -Matthias
> >
> > On 9/14/18 4:19 AM, Henning Røigaard-Petersen wrote:
> > > Spin up a consumer, subscribe to EOF events, assign all partitions from
> > the beginning, and keep polling until all partitions has reached EOF.
> > > Though, if you have concurrent writers, new messages may be appended
> > after you observe EOF on a partition, so you are never guaranteed to have
> > read all messages at the time you choose to close the consumer.
> > >
> > > /Henning Røigaard-Petersen
> > >
> > > -Original Message-
> > > From: David Espinosa 
> > > Sent: 14. september 2018 09:46
> > > To: users@kafka.apache.org
> > > Subject: Best way for reading all messages and close
> > >
> > > Hi all,
> > >
> > > Although the usage of Kafka is stream oriented, for a concrete use case
> > I need to read all the messages existing in a topic and once all them has
> > been read then closing the consumer.
> > >
> > > What's the best way or framework for doing this?
> > >
> > > Thanks in advance,
> > > David,
> > >
> >
> >
>


Re: How to config the servers when Kafka cluster is behind a NAT?

2018-09-17 Thread Robin Moffatt
This should help: https://rmoff.net/2018/08/02/kafka-listeners-explained/


-- 

Robin Moffatt | Developer Advocate | ro...@confluent.io | @rmoff


On Mon, 17 Sep 2018 at 12:18, XinYi Long  wrote:

> Hello, Guys,
>
> I have a Kafka cluster, which is behind a NAT.
>
> I have some devices on the internet, to play as consumers and
> producers. And I also have some applications which are in the same LAN with
> the Kafka cluster, and play as consumers and producers.
>
> I have changed the "advertised.listeners" to "PLAINTEXT://{NAT
> IP}:{NAT port}", and add some routes on the servers. Because I found that
> Kafka brokers also use the "advertised.listeners" to talk with each other.
> Am I right?
>
> When I start a consumer in the same LAN, I found it can receive
> metadata correctly. But it can't consume any message from other producer in
> the same LAN.
>
> Did I miss anything, and how to make it work?
>
>
> Thank you very much!
>
>
> lxyscls
>


Re: Best way for reading all messages and close

2018-09-17 Thread John Roesler
Yep, that should also work!
-John

On Mon, Sep 17, 2018 at 8:36 AM David Espinosa  wrote:

> Thank you all for your responses!
> I also asked this on the confluent slack channel (
> https://confluentcommunity.slack.com) and I got this approach:
>
>1. Query the partitions' high watermark offset
>2. Set the consumer to consume from beginning
>3. Break out when you've reached the high offset
>
> Still have some doubts regarding the implementation, but it seems a good
> approach (I'm using a single partition so a single loop would be enough per
> topic).
> What do you think?
>
> El sáb., 15 sept. 2018 a las 0:30, John Roesler ()
> escribió:
>
> > Specifically, you can monitor the "records-lag-max" (
> > https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics)
> > metric. (or the more granular one per partition).
> >
> > Once this metric goes to 0, you know that you've caught up with the tail
> of
> > the log.
> >
> > Hope this helps,
> > -John
> >
> > On Fri, Sep 14, 2018 at 2:02 PM Matthias J. Sax 
> > wrote:
> >
> > > Using Kafka Streams this is a little tricky.
> > >
> > > The API itself has no built-in mechanism to do this. You would need to
> > > monitor the lag of the application, and if the lag is zero (assuming
> you
> > > don't write new data into the topic in parallel), terminate the
> > > application.
> > >
> > >
> > > -Matthias
> > >
> > > On 9/14/18 4:19 AM, Henning Røigaard-Petersen wrote:
> > > > Spin up a consumer, subscribe to EOF events, assign all partitions
> from
> > > the beginning, and keep polling until all partitions has reached EOF.
> > > > Though, if you have concurrent writers, new messages may be appended
> > > after you observe EOF on a partition, so you are never guaranteed to
> have
> > > read all messages at the time you choose to close the consumer.
> > > >
> > > > /Henning Røigaard-Petersen
> > > >
> > > > -Original Message-
> > > > From: David Espinosa 
> > > > Sent: 14. september 2018 09:46
> > > > To: users@kafka.apache.org
> > > > Subject: Best way for reading all messages and close
> > > >
> > > > Hi all,
> > > >
> > > > Although the usage of Kafka is stream oriented, for a concrete use
> case
> > > I need to read all the messages existing in a topic and once all them
> has
> > > been read then closing the consumer.
> > > >
> > > > What's the best way or framework for doing this?
> > > >
> > > > Thanks in advance,
> > > > David,
> > > >
> > >
> > >
> >
>


Re: Low level kafka consumer API to KafkaStreams App.

2018-09-17 Thread John Roesler
Hey Praveen,

I also suspect that you can get away with far fewer threads. Here's the
general starting point I recommend:

* start with just a little over 1 thread per hardware thread (accounting
for cores and hyperthreading). For example, on my machine, I have 4 cores
with 2 threads of execution each, so I would configure the application with
8 or maybe 9 threads. Much more than that introduces a *lot* of CPU/memory
overhead in exchange for not much gain (if any).
* choose a number of partitions that would allow you to scale up to a
reasonable number of machines, with respect to the numbers you get above.

>From there, take a close look at all your important machine metrics (cpu,
memory, disk, network) as well as processing metrics (task throughput (how
long your application code takes), end-to-end processing throughput (how
long the full processing lifecycle takes, including the broker roundtrips)).

If there's any resource not saturated, you can tweak various configurations
to try and saturate it. I would think that stuff like buffer size and batch
size would be more helpful with less overhead than number of threads.

But keep a close look at your throughputs each time you make a change, to
be sure you're not locally optimizing at the expense of global performance.

I hope this helps!
-John

On Thu, Sep 13, 2018 at 4:53 PM Svante Karlsson 
wrote:

> You are doing something wrong if you need 10k threads to produce 800k
> messages per second. It feels you are a factor of 1000 off. What size are
> your messages?
>
> On Thu, Sep 13, 2018, 21:04 Praveen  wrote:
>
> > Hi there,
> >
> > I have a kafka application that uses kafka consumer low-level api to help
> > us process data from a single partition concurrently. Our use case is to
> > send out 800k messages per sec. We are able to do that with 4 boxes using
> > 10k threads and each request taking 50ms in a thread. (1000/50*1*4)
> >
> > I understand that kafka in general uses partitions as its parallelism
> > model. It is my understanding that if I want the exact same behavior with
> > kafka streams, I'd need to create 40k partitions for this topic. Is that
> > right?
> >
> > What is the overhead on creating thousands of partitions? If we end up
> > wanting to send out millions of messages per second, is increasing the
> > partitions the only way?
> >
> > Best,
> > Praveen
> >
>


Completely clear out kafka brokers & zookeeper?

2018-09-17 Thread Dylan Martin
I have some boxes that I'm using to test kafka configurations (and zookeeper).  
What's the recommended procedure to clean them out to a clean state so I can 
re-install kafka and zookeeper without worrying about old data or configuration 
getting in the way?


-Dylan


(206) 855-9740 - Home

(206) 235-8809 - Cell

The information contained in this email message, and any attachment thereto, is 
confidential and may not be disclosed without the sender's express permission. 
If you are not the intended recipient or an employee or agent responsible for 
delivering this message to the intended recipient, you are hereby notified that 
you have received this message in error and that any review, dissemination, 
distribution or copying of this message, or any attachment thereto, in whole or 
in part, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender by telephone, fax or email and delete the 
message and all of its attachments. Thank you.


Reduce number of brokers?

2018-09-17 Thread Dylan Martin
I have a cluster with 4 brokers and I want to reduce it to 2 brokers.  I cannot 
re-assign __consumer_offsets because it wants at least 3 brokers.


Is there a way to do this?  Or am I going to have to trash my cluster and start 
over?


-Dylan


(206) 855-9740 - Home

(206) 235-8809 - Cell

The information contained in this email message, and any attachment thereto, is 
confidential and may not be disclosed without the sender's express permission. 
If you are not the intended recipient or an employee or agent responsible for 
delivering this message to the intended recipient, you are hereby notified that 
you have received this message in error and that any review, dissemination, 
distribution or copying of this message, or any attachment thereto, in whole or 
in part, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender by telephone, fax or email and delete the 
message and all of its attachments. Thank you.


Re: Reduce number of brokers?

2018-09-17 Thread Brett Rann
You need to do a partition reassignment to increase or decrease the
replication factor. It's tediously manual, but it's just json so it's
trivial to manipulate which is probably why it's still tediously manual.

There's a guide here although it's ageing a little:
http://www.allprogrammingtutorials.com/tutorials/changing-replication-factor-of-topic-in-kafka.php
and here:
https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor

You can use some built in tooling to help:

Create a topics.json file:

{"topics":
  [{"topic": "__consumer_offsets"}],
  "version":1
}

Assuming you currently have brokers 1,2,3,4 and you want to end up with 1,2:

bin/kafka-reassign-partitions.sh --zookeeper your_zk_string --generate
--topics-to-move-json-file topics.json --generate --broker-list 1,2,3

>From that you want to remove broker 3 from the reassignment proposal,
turning the 3 element array into 2.  If you're on a newer version of kafka
you also get log_dirs array in the output. For me that's always "any" so I
just take it out. If they're different for you you'll have more work to do.
If you don't want to edit it manually you could use some jq:

bin/kafka-reassign-partitions.sh --zookeeper your_zk_string --generate
--topics-to-move-json-file topics.json --generate --broker-list 1,2,3 | sed
-e '1,/Proposed/d' | jq 'del(.partitions[].replicas[] | select(. == 3)) |
del(.partitions[].log_dirs)' > reassignment.json

If you're happy with the reassignment file then execute it, running with
--verify instead of --execute to see when it's done.

bin/kafka-reassign-partitions.sh --zookeeper
your_zk_string --reassignment-json-file reassignment.json --execute

And then double check with:

bin/kafka-topics.sh --zookeeper your_zk_string --describe --topic
__consumer_offsets




On Tue, Sep 18, 2018 at 9:34 AM Dylan Martin 
wrote:

> I have a cluster with 4 brokers and I want to reduce it to 2 brokers.  I
> cannot re-assign __consumer_offsets because it wants at least 3 brokers.
>
>
> Is there a way to do this?  Or am I going to have to trash my cluster and
> start over?
>
>
> -Dylan
>
>
> (206) 855-9740 - Home
>
> (206) 235-8809 - Cell
>
> The information contained in this email message, and any attachment
> thereto, is confidential and may not be disclosed without the sender's
> express permission. If you are not the intended recipient or an employee or
> agent responsible for delivering this message to the intended recipient,
> you are hereby notified that you have received this message in error and
> that any review, dissemination, distribution or copying of this message, or
> any attachment thereto, in whole or in part, is strictly prohibited. If you
> have received this message in error, please immediately notify the sender
> by telephone, fax or email and delete the message and all of its
> attachments. Thank you.
>