Re: Kafka topic partition directory
Kafka doesn’t monitor the contents of the log data directories unless it created the file or directory. If it didn’t create the directory/file it will ignore it. -- Peter > On Mar 28, 2020, at 4:17 PM, anila devi > wrote: > > Hi Users, > If I create a directory or a file in the same directory where kafka creates > partition topic, the kafka broker node does not restart. Is it expected ? > Thanks,Dhiman >
Kafka topic partition directory
Hi Users, If I create a directory or a file in the same directory where kafka creates partition topic, the kafka broker node does not restart. Is it expected ? Thanks,Dhiman
Re: Newbie Question
Thanks Hans - this makes sense, except for the debug messages give me exactly what I need without having to instrument any clients. It should be noted that for now, I am running a single server, so perhaps the messages change when I cluster? I maybe caused confusion by mentioning that I want to know where the messages go - that is not quite precise from an individual message perspective, but it is right enough for what I want to achieve (for now ;-) ). I just want a record of each IP Address and which topic (or something that can be traced back to a topic) they are connected to, from a high level, without having to instrument the clients (which can be upwards of 10,000, and I have no control or access over). Currently, as I mentioned, the debug messages have exactly what I need for this phase: [2020-03-28 20:32:23,901] DEBUG Principal = User:ANONYMOUS is Allowed Operation = Read from host = x.x.x.x on resource = Topic:LITERAL: (kafka.authorizer.logger) Just figuring there must be a better way of getting this info rather than turning on debug. On Sat, Mar 28, 2020 at 4:15 PM Hans Jespersen wrote: > I can tell from the terminology you use that you are familiar with > traditional message queue products. Kafka is very different. Thats what > makes it so interesting and revolutionary in my opinion. > > Clients do not connect to topics because kafka is a distributed and > clustered system where topics are sharded into pieces called partitions and > the topic partitions are spread out across all the kafka brokers in the > cluster (and also replicated several more times across the cluster for > fault tolerance). When a client logically connects to a topic, its actually > making many connections to many nodes in the kafka cluster which enables > both parallel processing and fault tolerance. > > Also when a client consumes a message, the message is not removed from a > queue, it remains in kafka for many days (sometimes months or years). It is > not “taken off the queue” it is rather “copied from the commit log”. It can > be consumed again and again if needed because it is an immutable record of > an event that happened. > > Now getting back to your question of how to see where messages get > consumed (copied). The reality is that they go many places and can be > consumed many times. This makes tracing and tracking message delivery more > difficult but not impossible. There are many tools both open source and > commercial that can track data from producer to kafka (with replication) to > multiple consumers. They typically involve taking telemetry from both > clients (producers and consumers) and brokers (all of them as they act as a > cluster) and aggregate all the data to see the full flow of messages in the > system. Thats why the logs may seem overwelming and you need to look at the > logs of all the broker (and perhaps all the clients as well) to get the > full picture. > > -hans > > > On Mar 28, 2020, at 4:50 PM, Colin Ross wrote: > > > > Hi All - just started to use Kafka. Just one thing driving me nuts. I > want > > to get logs of each time a publisher or subscriber connects. I am trying > to > > just get the IP that they connected from and the topic to which they > > connected. I have managed to do this through enabling debug in the > > kafka-authorizer, however, the number of logs are overwhelming as is the > > update rate (looks like 2 per second per client). > > > > What I am actually trying to achieve is to understand where messages go, > so > > I would be more than happy to just see notifications when messages are > > actually sent and actually taken off the queue. > > > > Is there a more efficient way of achieving my goal than turning on debug? > > > > Cheers > > Rossi >
Re: Newbie Question
I can tell from the terminology you use that you are familiar with traditional message queue products. Kafka is very different. Thats what makes it so interesting and revolutionary in my opinion. Clients do not connect to topics because kafka is a distributed and clustered system where topics are sharded into pieces called partitions and the topic partitions are spread out across all the kafka brokers in the cluster (and also replicated several more times across the cluster for fault tolerance). When a client logically connects to a topic, its actually making many connections to many nodes in the kafka cluster which enables both parallel processing and fault tolerance. Also when a client consumes a message, the message is not removed from a queue, it remains in kafka for many days (sometimes months or years). It is not “taken off the queue” it is rather “copied from the commit log”. It can be consumed again and again if needed because it is an immutable record of an event that happened. Now getting back to your question of how to see where messages get consumed (copied). The reality is that they go many places and can be consumed many times. This makes tracing and tracking message delivery more difficult but not impossible. There are many tools both open source and commercial that can track data from producer to kafka (with replication) to multiple consumers. They typically involve taking telemetry from both clients (producers and consumers) and brokers (all of them as they act as a cluster) and aggregate all the data to see the full flow of messages in the system. Thats why the logs may seem overwelming and you need to look at the logs of all the broker (and perhaps all the clients as well) to get the full picture. -hans > On Mar 28, 2020, at 4:50 PM, Colin Ross wrote: > > Hi All - just started to use Kafka. Just one thing driving me nuts. I want > to get logs of each time a publisher or subscriber connects. I am trying to > just get the IP that they connected from and the topic to which they > connected. I have managed to do this through enabling debug in the > kafka-authorizer, however, the number of logs are overwhelming as is the > update rate (looks like 2 per second per client). > > What I am actually trying to achieve is to understand where messages go, so > I would be more than happy to just see notifications when messages are > actually sent and actually taken off the queue. > > Is there a more efficient way of achieving my goal than turning on debug? > > Cheers > Rossi
Newbie Question
Hi All - just started to use Kafka. Just one thing driving me nuts. I want to get logs of each time a publisher or subscriber connects. I am trying to just get the IP that they connected from and the topic to which they connected. I have managed to do this through enabling debug in the kafka-authorizer, however, the number of logs are overwhelming as is the update rate (looks like 2 per second per client). What I am actually trying to achieve is to understand where messages go, so I would be more than happy to just see notifications when messages are actually sent and actually taken off the queue. Is there a more efficient way of achieving my goal than turning on debug? Cheers Rossi
Re: Kafka with RAID 5 on. busy cluster.
RAID 5 typically is slower because Kafka is very write heavy load and that creates a bottleneck because writes to any disk require parity writes on the other disks. -hans > On Mar 28, 2020, at 2:55 PM, Vishal Santoshi > wrote: > > Ny one ? We doing a series of tests to be confident, but if there is some > data folks, who have had RAID 5 on kafka, have to share, please do. > > Regards. > >> On Mon, Mar 23, 2020 at 11:29 PM Vishal Santoshi >> wrote: >> >> << In RAID 5 one can loose more than only one disk RAID here will be data >> corruption. In RAID 5 if one looses more than only one disk RAID there will be data >> corruption. >> >> On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> One obvious issue is disk failure toleration . As in if RF =3 on.normal >>> JBOD disk failure toleration is 2. In RAID 5 one can loose more than only >>> one disk RAID here will be data corruption. effectively making the broker >>> unusable, thus reducing our drive failure toleration to 2 drives ON 2 >>> different brokers with the added caveat that we loose the whole broker as >>> well ? >>> >>> >>> On Mon, Mar 23, 2020 at 10:42 PM Vishal Santoshi < >>> vishal.santo...@gmail.com> wrote: >>> We have a pretty busy kafka cluster with SSD and plain JBOD. We planning or thinking of using RAID 5 ( hardware raid or 6 drive SSD bokers ) instead of JBID for various reasons. Hss some one used RAID 5 ( we know that there is a write overhead parity bit on blocks and recreating a dead drive ) and can share there experience on it . Confluent advises against it but there are obvious ease one gets with RAID ( RAID 10 is to expensive space wise ) Any advise /comments etc will be highly appreciated. Regards.
Re: Kafka with RAID 5 on. busy cluster.
Ny one ? We doing a series of tests to be confident, but if there is some data folks, who have had RAID 5 on kafka, have to share, please do. Regards. On Mon, Mar 23, 2020 at 11:29 PM Vishal Santoshi wrote: > << In RAID 5 one can loose more than only one disk RAID here will be data > corruption. > >> In RAID 5 if one looses more than only one disk RAID there will be data > corruption. > > On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> One obvious issue is disk failure toleration . As in if RF =3 on.normal >> JBOD disk failure toleration is 2. In RAID 5 one can loose more than only >> one disk RAID here will be data corruption. effectively making the broker >> unusable, thus reducing our drive failure toleration to 2 drives ON 2 >> different brokers with the added caveat that we loose the whole broker as >> well ? >> >> >> On Mon, Mar 23, 2020 at 10:42 PM Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> We have a pretty busy kafka cluster with SSD and plain JBOD. We >>> planning or thinking of using RAID 5 ( hardware raid or 6 drive SSD >>> bokers ) instead of JBID for various reasons. Hss some one used RAID 5 ( we >>> know that there is a write overhead parity bit on blocks and recreating a >>> dead drive ) and can share there experience on it . Confluent advises >>> against it but there are obvious ease one gets with RAID ( RAID 10 is to >>> expensive space wise ) Any advise /comments etc will be highly >>> appreciated. >>> >>> Regards. >>> >>>