Re: Problems trying to make kafka 'rack-aware'
Hi Eno, many thanks for trying that. That is very helpful for me. That basic check didn't work for me but I have since discovered what my issue was. Despite using a version of kafka that supports rack-awareness we have been deliberately setting 'inter.broker.protocol.version' to an older version (due to various issues with some of our consumers). When I update this parameter to use a later version, I can see 'rack' being written to zookeeper. For now I need to turn my attention to resolving the issues with my consumers. Thanks again for helping out. Bryan On 21/09/2018 14:52, Eno Thereska wrote: Hi Bryan, I did a simple check with starting a broker with no rack id and then restarting with a rack id and I can confirm I could get the rack id from zookeeper after the restart. This was on trunk. Does that basic check work for you (i.e., without reassigning partitions)? Thanks Eno On Fri, Sep 21, 2018 at 2:07 PM, Bryan Duggan wrote: I didn't get a response to this, but I've been investigating more and can now frame the problem slightly differently (hopefully, more accurately). According to this document https://cwiki.apache.org/confluence/display/KAFKA/Kafka+ data+structures+in+Zookeeper Which defines broker data structures in zookeeper, the following is the broker schema (from version 0.10 onwards - I am using version 0.11) { "fields": [ {"name": "version", "type": "int", "doc": "version id"}, {"name": "host", "type": "string", "doc": "ip address or host name of the broker"}, {"name": "port", "type": "int", "doc": "port of the broker"}, {"name": "jmx_port", "type": "int", "doc": "port for jmx"} {"name": "endpoints", "type": "array", "items": "string", "doc": "endpoints supported by the broker"} {"name": "rack", "type": "string", "doc": "Rack of the broker. Optional. This will be used in rack aware replication assignment for fault tolerance."} ] } when I check my broker data in zookeeper (which has a non-null broker.rack setting in the properties file), I have the following; {"endpoints":["PLAINTEXT://x.x.x.x.abcd:9092"],"jmx_port":-1 ,"host":"x.x.x.x.abc","timestamp":"1537527988341","port":9092,"version":2} there is no 'rack'. In the server.log file on my kafka broker I see; [2018-09-21 13:00:40,227] INFO KafkaConfig values: advertised.host.name = null . . broker.id = 1234567 broker.rack = rack1 compression.type = producer . - so it looks fine from the broker side. However, when I restart kafka on the host, it doesn't load any rack information into zookeeper. Can someone please confirm to me, if I have rack awareness, should I expect to see a value for 'rack' in zookeeper? If so, do I need to do something else on the broker side to get it to include it as part of the meta-data it writes (as far as I can see it writes the metadata each time kafka is restarted). thanks Bryan On 20/09/2018 11:31, Bryan Duggan wrote: Hi, I have a kafka cluster consisting of 3 brokers across 3 different AWS availability zones. It hosts several topics, each of which has a replication factor of 3. The cluster is currently not 'rack-aware'. I am trying to do the following; - add 3 additional brokers (one in each of the 3 AZs) - make the cluster 'rack-aware'. (ie: create 3 racks on a per-AZ basic, each containing 2 brokers) - reassign the topics with the intention of having 1 replica in each of the 3 racks. To achieve this I've added 'broker.rack' to the properties file for each broker. The rack name is the same as the AZ name each broker is in. I've restarted kafka on all brokers (in case that's required for rack-awareness to take effect). Following restart I've attempted to reassign topics across all 6 brokers by running the following; - ./kafka-reassign-partitions.sh --zookeeper $ZK --topics-to-move-json-file topics-to-move.json --broker-list '1,2,3,4,5,6' (where topics-to-move.json is a simple json file containing the topics to reassign) The problem I am having is, after running 'kafka-reassign-partitions.sh' with 6 brokers listed in the broker-list, it doesn't honour rack-awareness, and instead assigns 2 partitions to brokers in a single rack with a 3rd being assigned elsewhere. The version of kafka I am using is 2.11-1.1.1. Any documentation I've read suggests the above should have achieved what I want. However, it is not working as expected. Has anyone else make their kafka cluster 'rack-aware'? If so, did you experience any issues doing so? Or, can anyone tell me if there's some step I'm missing to make this work. TIA Bryan
Re: Problems trying to make kafka 'rack-aware'
Hi Bryan, I did a simple check with starting a broker with no rack id and then restarting with a rack id and I can confirm I could get the rack id from zookeeper after the restart. This was on trunk. Does that basic check work for you (i.e., without reassigning partitions)? Thanks Eno On Fri, Sep 21, 2018 at 2:07 PM, Bryan Duggan wrote: > > I didn't get a response to this, but I've been investigating more and can > now frame the problem slightly differently (hopefully, more accurately). > > According to this document > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+ > data+structures+in+Zookeeper > > Which defines broker data structures in zookeeper, the following is the > broker schema (from version 0.10 onwards - I am using version 0.11) > > { "fields": > [ {"name": "version", "type": "int", "doc": "version id"}, > {"name": "host", "type": "string", "doc": "ip address or host name > of the broker"}, > {"name": "port", "type": "int", "doc": "port of the broker"}, > {"name": "jmx_port", "type": "int", "doc": "port for jmx"} > {"name": "endpoints", "type": "array", "items": "string", "doc": > "endpoints supported by the broker"} > {"name": "rack", "type": "string", "doc": "Rack of the broker. > Optional. This will be used in rack aware replication assignment for fault > tolerance."} > ] > } > > when I check my broker data in zookeeper (which has a non-null broker.rack > setting in the properties file), I have the following; > > {"endpoints":["PLAINTEXT://x.x.x.x.abcd:9092"],"jmx_port":-1 > ,"host":"x.x.x.x.abc","timestamp":"1537527988341","port":9092,"version":2} > > there is no 'rack'. > > In the server.log file on my kafka broker I see; > > [2018-09-21 13:00:40,227] INFO KafkaConfig values: > advertised.host.name = null > . > . > broker.id = 1234567 > broker.rack = rack1 > compression.type = producer > . > - > > so it looks fine from the broker side. However, when I restart kafka on > the host, it doesn't load any rack information into zookeeper. > > Can someone please confirm to me, if I have rack awareness, should I > expect to see a value for 'rack' in zookeeper? If so, do I need to do > something else on the broker side to get it to include it as part of the > meta-data it writes (as far as I can see it writes the metadata each time > kafka is restarted). > > thanks > Bryan > > > > > > > > > On 20/09/2018 11:31, Bryan Duggan wrote: > >> >> Hi, >> >> I have a kafka cluster consisting of 3 brokers across 3 different AWS >> availability zones. It hosts several topics, each of which has a >> replication factor of 3. The cluster is currently not 'rack-aware'. >> >> I am trying to do the following; >> >> - add 3 additional brokers (one in each of the 3 AZs) >> >> - make the cluster 'rack-aware'. (ie: create 3 racks on a per-AZ >> basic, each containing 2 brokers) >> >> - reassign the topics with the intention of having 1 replica in each >> of the 3 racks. >> >> To achieve this I've added 'broker.rack' to the properties file for each >> broker. The rack name is the same as the AZ name each broker is in. I've >> restarted kafka on all brokers (in case that's required for rack-awareness >> to take effect). >> >> Following restart I've attempted to reassign topics across all 6 brokers >> by running the following; >> >> - ./kafka-reassign-partitions.sh --zookeeper $ZK >> --topics-to-move-json-file topics-to-move.json --broker-list '1,2,3,4,5,6' >> >> (where topics-to-move.json is a simple json file containing the topics to >> reassign) >> >> The problem I am having is, after running 'kafka-reassign-partitions.sh' >> with 6 brokers listed in the broker-list, it doesn't honour >> rack-awareness, and instead assigns 2 partitions to brokers in a single >> rack with a 3rd being assigned elsewhere. >> >> The version of kafka I am using is 2.11-1.1.1. >> >> Any documentation I've read suggests the above should have achieved what >> I want. However, it is not working as expected. >> >> Has anyone else make their kafka cluster 'rack-aware'? If so, did you >> experience any issues doing so? >> >> Or, can anyone tell me if there's some step I'm missing to make this work. >> >> TIA >> >> Bryan >> >> >> >> >
Re: Problems trying to make kafka 'rack-aware'
I didn't get a response to this, but I've been investigating more and can now frame the problem slightly differently (hopefully, more accurately). According to this document https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper Which defines broker data structures in zookeeper, the following is the broker schema (from version 0.10 onwards - I am using version 0.11) { "fields": [ {"name": "version", "type": "int", "doc": "version id"}, {"name": "host", "type": "string", "doc": "ip address or host name of the broker"}, {"name": "port", "type": "int", "doc": "port of the broker"}, {"name": "jmx_port", "type": "int", "doc": "port for jmx"} {"name": "endpoints", "type": "array", "items": "string", "doc": "endpoints supported by the broker"} {"name": "rack", "type": "string", "doc": "Rack of the broker. Optional. This will be used in rack aware replication assignment for fault tolerance."} ] } when I check my broker data in zookeeper (which has a non-null broker.rack setting in the properties file), I have the following; {"endpoints":["PLAINTEXT://x.x.x.x.abcd:9092"],"jmx_port":-1,"host":"x.x.x.x.abc","timestamp":"1537527988341","port":9092,"version":2} there is no 'rack'. In the server.log file on my kafka broker I see; [2018-09-21 13:00:40,227] INFO KafkaConfig values: advertised.host.name = null . . broker.id = 1234567 broker.rack = rack1 compression.type = producer . - so it looks fine from the broker side. However, when I restart kafka on the host, it doesn't load any rack information into zookeeper. Can someone please confirm to me, if I have rack awareness, should I expect to see a value for 'rack' in zookeeper? If so, do I need to do something else on the broker side to get it to include it as part of the meta-data it writes (as far as I can see it writes the metadata each time kafka is restarted). thanks Bryan On 20/09/2018 11:31, Bryan Duggan wrote: Hi, I have a kafka cluster consisting of 3 brokers across 3 different AWS availability zones. It hosts several topics, each of which has a replication factor of 3. The cluster is currently not 'rack-aware'. I am trying to do the following; - add 3 additional brokers (one in each of the 3 AZs) - make the cluster 'rack-aware'. (ie: create 3 racks on a per-AZ basic, each containing 2 brokers) - reassign the topics with the intention of having 1 replica in each of the 3 racks. To achieve this I've added 'broker.rack' to the properties file for each broker. The rack name is the same as the AZ name each broker is in. I've restarted kafka on all brokers (in case that's required for rack-awareness to take effect). Following restart I've attempted to reassign topics across all 6 brokers by running the following; - ./kafka-reassign-partitions.sh --zookeeper $ZK --topics-to-move-json-file topics-to-move.json --broker-list '1,2,3,4,5,6' (where topics-to-move.json is a simple json file containing the topics to reassign) The problem I am having is, after running 'kafka-reassign-partitions.sh' with 6 brokers listed in the broker-list, it doesn't honour rack-awareness, and instead assigns 2 partitions to brokers in a single rack with a 3rd being assigned elsewhere. The version of kafka I am using is 2.11-1.1.1. Any documentation I've read suggests the above should have achieved what I want. However, it is not working as expected. Has anyone else make their kafka cluster 'rack-aware'? If so, did you experience any issues doing so? Or, can anyone tell me if there's some step I'm missing to make this work. TIA Bryan
Problems trying to make kafka 'rack-aware'
Hi, I have a kafka cluster consisting of 3 brokers across 3 different AWS availability zones. It hosts several topics, each of which has a replication factor of 3. The cluster is currently not 'rack-aware'. I am trying to do the following; - add 3 additional brokers (one in each of the 3 AZs) - make the cluster 'rack-aware'. (ie: create 3 racks on a per-AZ basic, each containing 2 brokers) - reassign the topics with the intention of having 1 replica in each of the 3 racks. To achieve this I've added 'broker.rack' to the properties file for each broker. The rack name is the same as the AZ name each broker is in. I've restarted kafka on all brokers (in case that's required for rack-awareness to take effect). Following restart I've attempted to reassign topics across all 6 brokers by running the following; - ./kafka-reassign-partitions.sh --zookeeper $ZK --topics-to-move-json-file topics-to-move.json --broker-list '1,2,3,4,5,6' (where topics-to-move.json is a simple json file containing the topics to reassign) The problem I am having is, after running 'kafka-reassign-partitions.sh' with 6 brokers listed in the broker-list, it doesn't honour rack-awareness, and instead assigns 2 partitions to brokers in a single rack with a 3rd being assigned elsewhere. The version of kafka I am using is 2.11-1.1.1. Any documentation I've read suggests the above should have achieved what I want. However, it is not working as expected. Has anyone else make their kafka cluster 'rack-aware'? If so, did you experience any issues doing so? Or, can anyone tell me if there's some step I'm missing to make this work. TIA Bryan