Re: Consumer that consumes only local partition?
Hi Robert Here is the kafka benchmark for your reference. if you want to use Flink, Storm, Samza or Spark, the performance will be going down. 821,557 records/sec(78.3 MB/sec) https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Best regards Hawin On Tue, Aug 4, 2015 at 11:57 AM, Robert Metzger rmetz...@apache.org wrote: Sorry for the very late reply ... The performance issue was not caused by network latency. I had a job like this: FlinkKafkaConsumer -- someSimpleOperation -- FlinkKafkaProducer. I thought that our FlinkKafkaConsumer is slow, but actually our FlinkKafkaProducer was using the old producer API of Kafka. Switching to the new producer API of Kafka greatly improved our writing performance to Kafka. Flink was slowing down the KafkaConsumer because of the producer. Since we are already talking about performance, let me ask you the following question: I am using Kafka and Flink on a HDP 2.2 cluster (with 40 machines). What would you consider a good read/write performance for 8-byte messages on the following setup? - 40 brokers, - topic with 120 partitions - 120 reading threads (on 30 machines) - 120 writing threads (on 30 machines) I'm getting a write throughput of ~75k elements/core/second and a read throughput of ~50k el/c/s. When I'm stopping the writers, the read throughput goes up to 130k. I would expect a higher throughput than (8*75000) / 1024 = 585.9 kb/sec per partition .. or are the messages too small and the overhead is very high. Which system out there would you recommend for getting reference performance numbers? Samza, Spark, Storm? On Wed, Jul 15, 2015 at 7:20 PM, Gwen Shapira gshap...@cloudera.com wrote: This is not something you can use the consumer API to simply do easily (consumers don't have locality notion). I can imagine using Kafka's low-level API calls to get a list of partitions and the lead replica, figuring out which are local and using those - but that sounds painful. Are you 100% sure the performance issue is due to network latency? If not, you may want to start optimizing somewhere more productive :) Kafka brokers and clients both have Metrics that may help you track where the performance issues are coming from. Gwen On Wed, Jul 15, 2015 at 9:24 AM, Robert Metzger rmetz...@apache.org wrote: Hi Shef, did you resolve this issue? I'm facing some performance issues and I was wondering whether reading locally would resolve them. On Mon, Jun 22, 2015 at 11:43 PM, Shef she...@yahoo.com wrote: Noob question here. I want to have a single consumer for each partition that consumes only the messages that have been written locally. In other words, I want the consumer to access the local disk and not pull anything across the network. Possible? How can I discover which partitions are local?
Re: Consumer that consumes only local partition?
Sorry for the very late reply ... The performance issue was not caused by network latency. I had a job like this: FlinkKafkaConsumer -- someSimpleOperation -- FlinkKafkaProducer. I thought that our FlinkKafkaConsumer is slow, but actually our FlinkKafkaProducer was using the old producer API of Kafka. Switching to the new producer API of Kafka greatly improved our writing performance to Kafka. Flink was slowing down the KafkaConsumer because of the producer. Since we are already talking about performance, let me ask you the following question: I am using Kafka and Flink on a HDP 2.2 cluster (with 40 machines). What would you consider a good read/write performance for 8-byte messages on the following setup? - 40 brokers, - topic with 120 partitions - 120 reading threads (on 30 machines) - 120 writing threads (on 30 machines) I'm getting a write throughput of ~75k elements/core/second and a read throughput of ~50k el/c/s. When I'm stopping the writers, the read throughput goes up to 130k. I would expect a higher throughput than (8*75000) / 1024 = 585.9 kb/sec per partition .. or are the messages too small and the overhead is very high. Which system out there would you recommend for getting reference performance numbers? Samza, Spark, Storm? On Wed, Jul 15, 2015 at 7:20 PM, Gwen Shapira gshap...@cloudera.com wrote: This is not something you can use the consumer API to simply do easily (consumers don't have locality notion). I can imagine using Kafka's low-level API calls to get a list of partitions and the lead replica, figuring out which are local and using those - but that sounds painful. Are you 100% sure the performance issue is due to network latency? If not, you may want to start optimizing somewhere more productive :) Kafka brokers and clients both have Metrics that may help you track where the performance issues are coming from. Gwen On Wed, Jul 15, 2015 at 9:24 AM, Robert Metzger rmetz...@apache.org wrote: Hi Shef, did you resolve this issue? I'm facing some performance issues and I was wondering whether reading locally would resolve them. On Mon, Jun 22, 2015 at 11:43 PM, Shef she...@yahoo.com wrote: Noob question here. I want to have a single consumer for each partition that consumes only the messages that have been written locally. In other words, I want the consumer to access the local disk and not pull anything across the network. Possible? How can I discover which partitions are local?
Re: Consumer that consumes only local partition?
Hi Shef, did you resolve this issue? I'm facing some performance issues and I was wondering whether reading locally would resolve them. On Mon, Jun 22, 2015 at 11:43 PM, Shef she...@yahoo.com wrote: Noob question here. I want to have a single consumer for each partition that consumes only the messages that have been written locally. In other words, I want the consumer to access the local disk and not pull anything across the network. Possible? How can I discover which partitions are local?