Re: Kafka topic partition directory

2020-03-28 Thread Peter Bukowinski
Kafka doesn’t monitor the contents of the log data directories unless it 
created the file or directory. If it didn’t create the directory/file it will 
ignore it.

-- Peter

> On Mar 28, 2020, at 4:17 PM, anila devi  
> wrote:
> 
> Hi Users, 
> If I create a directory or a file in the same directory where kafka creates 
> partition topic, the kafka broker node does not restart. Is it expected ? 
> Thanks,Dhiman
> 


Kafka topic partition directory

2020-03-28 Thread anila devi
Hi Users, 
If I create a directory or a file in the same directory where kafka creates 
partition topic, the kafka broker node does not restart. Is it expected ? 
Thanks,Dhiman



Re: Newbie Question

2020-03-28 Thread Colin Ross
Thanks Hans - this makes sense, except for the debug messages give me
exactly what I need without having to instrument any clients. It should be
noted that for now, I am running a single server, so perhaps the messages
change when I cluster?
I maybe caused confusion by mentioning that I want to know where the
messages go - that is not quite precise from an individual message
perspective, but it is right enough for what I want to achieve (for now ;-)
). I just want a record of each IP Address and which topic (or something
that can be traced back to a topic) they are connected to, from a high
level, without having to instrument the clients (which can be upwards of
10,000, and I have no control or access over).
Currently, as I mentioned, the debug messages have exactly what I need for
this phase:
[2020-03-28 20:32:23,901] DEBUG Principal = User:ANONYMOUS is Allowed
Operation = Read from host = x.x.x.x on resource = Topic:LITERAL:
(kafka.authorizer.logger)
Just figuring there must be a better way of getting this info rather than
turning on debug.

On Sat, Mar 28, 2020 at 4:15 PM Hans Jespersen  wrote:

> I can tell from the terminology you use that you are familiar with
> traditional message queue products. Kafka is very different. Thats what
> makes it so interesting and revolutionary in my opinion.
>
> Clients do not connect to topics because kafka is a distributed and
> clustered system where topics are sharded into pieces called partitions and
> the topic partitions are spread out across all the kafka brokers in the
> cluster (and also replicated several more times across the cluster for
> fault tolerance). When a client logically connects to a topic, its actually
> making many connections to many nodes in the kafka cluster which enables
> both parallel processing and fault tolerance.
>
> Also when a client consumes a message, the message is not removed from a
> queue, it remains in kafka for many days (sometimes months or years). It is
> not “taken off the queue” it is rather “copied from the commit log”. It can
> be consumed again and again if needed because it is an immutable record of
> an event that happened.
>
> Now getting back to your question of how to see where messages get
> consumed (copied). The reality is that they go many places and can be
> consumed many times. This makes tracing and tracking message delivery more
> difficult but not impossible. There are many tools both open source and
> commercial that can track data from producer to kafka (with replication) to
> multiple consumers. They typically involve taking telemetry from both
> clients (producers and consumers) and brokers (all of them as they act as a
> cluster) and aggregate all the data to see the full flow of messages in the
> system. Thats why the logs may seem overwelming and you need to look at the
> logs of all the broker (and perhaps all the clients as well) to get the
> full picture.
>
> -hans
>
> > On Mar 28, 2020, at 4:50 PM, Colin Ross  wrote:
> >
> > Hi All - just started to use Kafka. Just one thing driving me nuts. I
> want
> > to get logs of each time a publisher or subscriber connects. I am trying
> to
> > just get the IP that they connected from and the topic to which they
> > connected. I have managed to do this through enabling debug in the
> > kafka-authorizer, however, the number of logs are overwhelming as is the
> > update rate (looks like 2 per second per client).
> >
> > What I am actually trying to achieve is to understand where messages go,
> so
> > I would be more than happy to just see notifications when messages are
> > actually sent and actually taken off the queue.
> >
> > Is there a more efficient way of achieving my goal than turning on debug?
> >
> > Cheers
> > Rossi
>


Re: Newbie Question

2020-03-28 Thread Hans Jespersen
I can tell from the terminology you use that you are familiar with traditional 
message queue products. Kafka is very different. Thats what makes it so 
interesting and revolutionary in my opinion.

Clients do not connect to topics because kafka is a distributed and clustered 
system where topics are sharded into pieces called partitions and the topic 
partitions are spread out across all the kafka brokers in the cluster (and also 
replicated several more times across the cluster for fault tolerance). When a 
client logically connects to a topic, its actually making many connections to 
many nodes in the kafka cluster which enables both parallel processing and 
fault tolerance.

Also when a client consumes a message, the message is not removed from a queue, 
it remains in kafka for many days (sometimes months or years). It is not “taken 
off the queue” it is rather “copied from the commit log”. It can be consumed 
again and again if needed because it is an immutable record of an event that 
happened.

Now getting back to your question of how to see where messages get consumed 
(copied). The reality is that they go many places and can be consumed many 
times. This makes tracing and tracking message delivery more difficult but not 
impossible. There are many tools both open source and commercial that can track 
data from producer to kafka (with replication) to multiple consumers. They 
typically involve taking telemetry from both clients (producers and consumers) 
and brokers (all of them as they act as a cluster) and aggregate all the data 
to see the full flow of messages in the system. Thats why the logs may seem 
overwelming and you need to look at the logs of all the broker (and perhaps all 
the clients as well) to get the full picture.

-hans 

> On Mar 28, 2020, at 4:50 PM, Colin Ross  wrote:
> 
> Hi All - just started to use Kafka. Just one thing driving me nuts. I want
> to get logs of each time a publisher or subscriber connects. I am trying to
> just get the IP that they connected from and the topic to which they
> connected. I have managed to do this through enabling debug in the
> kafka-authorizer, however, the number of logs are overwhelming as is the
> update rate (looks like 2 per second per client).
> 
> What I am actually trying to achieve is to understand where messages go, so
> I would be more than happy to just see notifications when messages are
> actually sent and actually taken off the queue.
> 
> Is there a more efficient way of achieving my goal than turning on debug?
> 
> Cheers
> Rossi


Newbie Question

2020-03-28 Thread Colin Ross
Hi All - just started to use Kafka. Just one thing driving me nuts. I want
to get logs of each time a publisher or subscriber connects. I am trying to
just get the IP that they connected from and the topic to which they
connected. I have managed to do this through enabling debug in the
kafka-authorizer, however, the number of logs are overwhelming as is the
update rate (looks like 2 per second per client).

What I am actually trying to achieve is to understand where messages go, so
I would be more than happy to just see notifications when messages are
actually sent and actually taken off the queue.

Is there a more efficient way of achieving my goal than turning on debug?

Cheers
Rossi


Re: Kafka with RAID 5 on. busy cluster.

2020-03-28 Thread Hans Jespersen
RAID 5 typically is slower because Kafka is very write heavy load and that 
creates a bottleneck because writes to any disk require parity writes on the 
other disks.

-hans

> On Mar 28, 2020, at 2:55 PM, Vishal Santoshi  
> wrote:
> 
> Ny one ?  We doing a series of tests to be confident, but if there is some
> data folks, who have had RAID 5 on kafka,  have to share, please do.
> 
> Regards.
> 
>> On Mon, Mar 23, 2020 at 11:29 PM Vishal Santoshi 
>> wrote:
>> 
>> << In RAID 5 one can loose more than only one disk RAID here will be data
>> corruption.
 In RAID 5 if one looses more than only one disk RAID there will be data
>> corruption.
>> 
>> On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>> 
>>> One obvious issue is disk failure toleration . As in if RF =3 on.normal
>>> JBOD disk failure toleration is 2. In RAID 5 one can loose more than only
>>> one disk RAID here will be data corruption. effectively making the broker
>>> unusable, thus reducing our drive failure  toleration to 2 drives ON 2
>>> different brokers with the added caveat that we loose the whole broker as
>>> well ?
>>> 
>>> 
>>> On Mon, Mar 23, 2020 at 10:42 PM Vishal Santoshi <
>>> vishal.santo...@gmail.com> wrote:
>>> 
 We have a pretty busy kafka cluster with SSD and plain JBOD. We
 planning or thinking of using RAID 5  ( hardware raid  or  6 drive SSD
 bokers ) instead of JBID for various reasons. Hss some one used RAID 5 ( we
 know that there is a write overhead parity bit on blocks and recreating a
 dead drive )  and can share there experience on it . Confluent advises
 against it but there are obvious ease one gets with RAID ( RAID 10 is to
 expensive space wise )  Any advise /comments etc will be highly
 appreciated.
 
 Regards.
 
 


Re: Kafka with RAID 5 on. busy cluster.

2020-03-28 Thread Vishal Santoshi
Ny one ?  We doing a series of tests to be confident, but if there is some
data folks, who have had RAID 5 on kafka,  have to share, please do.

Regards.

On Mon, Mar 23, 2020 at 11:29 PM Vishal Santoshi 
wrote:

> << In RAID 5 one can loose more than only one disk RAID here will be data
> corruption.
> >> In RAID 5 if one looses more than only one disk RAID there will be data
> corruption.
>
> On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> One obvious issue is disk failure toleration . As in if RF =3 on.normal
>> JBOD disk failure toleration is 2. In RAID 5 one can loose more than only
>> one disk RAID here will be data corruption. effectively making the broker
>> unusable, thus reducing our drive failure  toleration to 2 drives ON 2
>> different brokers with the added caveat that we loose the whole broker as
>> well ?
>>
>>
>> On Mon, Mar 23, 2020 at 10:42 PM Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> We have a pretty busy kafka cluster with SSD and plain JBOD. We
>>> planning or thinking of using RAID 5  ( hardware raid  or  6 drive SSD
>>> bokers ) instead of JBID for various reasons. Hss some one used RAID 5 ( we
>>> know that there is a write overhead parity bit on blocks and recreating a
>>> dead drive )  and can share there experience on it . Confluent advises
>>> against it but there are obvious ease one gets with RAID ( RAID 10 is to
>>> expensive space wise )  Any advise /comments etc will be highly
>>> appreciated.
>>>
>>> Regards.
>>>
>>>