Re: Performance Flink streaming kafka consumer sink to s3
Hi, Do you think there can be any issue with Flinks performance, with 400Kb up to 1 MB payload record sizes ? my Spark streaming seems to be doing better. Are there any recommended configurations or increasing parallelism to improve Flink streaming using flink kafka connect? Regards, Vijay On Fri, Aug 14, 2020 at 2:04 PM Vijayendra Yadav wrote: > Hi Robert, > > Thanks for information. payloads so far are 400KB (each record). > To achieve high parallelism at the downstream operator do I rebalance the > kafka stream ? Could you give me an example please. > > Regards, > Vijay > > > On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger > wrote: > >> Hi, >> >> Also, can we increase parallel processing, beyond the number of >>> kafka partitions that we have, without causing any overhead ? >> >> >> Yes, the Kafka sources produce a tiny bit of overhead, but the potential >> benefit of having downstream operators at a high parallelism might be much >> bigger. >> >> How large is a large payload in your case? >> >> Best practices: >> Try to understand what's causing the performance slowdown: Kafka or S3 ? >> You can do a test where you read from kafka, and write it into a >> discarding sink. >> Likewise, use a datagenerator source, and write into S3. >> >> Do the math on your job: What's the theoretical limits of your job: >> https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines >> >> Hope this helps, >> Robert >> >> >> On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav >> wrote: >> >>> Hi Team, >>> >>> I am trying to increase throughput of my flink stream job streaming from >>> kafka source and sink to s3. Currently it is running fine for small events >>> records. But records with large payloads are running extremely slow like at >>> rate 2 TPS. >>> >>> Could you provide some best practices to tune? >>> Also, can we increase parallel processing, beyond the number of >>> kafka partitions that we have, without causing any overhead ? >>> >>> Regards, >>> Vijay >>> >>
Re: Performance Flink streaming kafka consumer sink to s3
Hi Robert, Thanks for information. payloads so far are 400KB (each record). To achieve high parallelism at the downstream operator do I rebalance the kafka stream ? Could you give me an example please. Regards, Vijay On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger wrote: > Hi, > > Also, can we increase parallel processing, beyond the number of >> kafka partitions that we have, without causing any overhead ? > > > Yes, the Kafka sources produce a tiny bit of overhead, but the potential > benefit of having downstream operators at a high parallelism might be much > bigger. > > How large is a large payload in your case? > > Best practices: > Try to understand what's causing the performance slowdown: Kafka or S3 ? > You can do a test where you read from kafka, and write it into a > discarding sink. > Likewise, use a datagenerator source, and write into S3. > > Do the math on your job: What's the theoretical limits of your job: > https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines > > Hope this helps, > Robert > > > On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav > wrote: > >> Hi Team, >> >> I am trying to increase throughput of my flink stream job streaming from >> kafka source and sink to s3. Currently it is running fine for small events >> records. But records with large payloads are running extremely slow like at >> rate 2 TPS. >> >> Could you provide some best practices to tune? >> Also, can we increase parallel processing, beyond the number of >> kafka partitions that we have, without causing any overhead ? >> >> Regards, >> Vijay >> >
Re: Performance Flink streaming kafka consumer sink to s3
Hi, Also, can we increase parallel processing, beyond the number of > kafka partitions that we have, without causing any overhead ? Yes, the Kafka sources produce a tiny bit of overhead, but the potential benefit of having downstream operators at a high parallelism might be much bigger. How large is a large payload in your case? Best practices: Try to understand what's causing the performance slowdown: Kafka or S3 ? You can do a test where you read from kafka, and write it into a discarding sink. Likewise, use a datagenerator source, and write into S3. Do the math on your job: What's the theoretical limits of your job: https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Hope this helps, Robert On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav wrote: > Hi Team, > > I am trying to increase throughput of my flink stream job streaming from > kafka source and sink to s3. Currently it is running fine for small events > records. But records with large payloads are running extremely slow like at > rate 2 TPS. > > Could you provide some best practices to tune? > Also, can we increase parallel processing, beyond the number of > kafka partitions that we have, without causing any overhead ? > > Regards, > Vijay >
Performance Flink streaming kafka consumer sink to s3
Hi Team, I am trying to increase throughput of my flink stream job streaming from kafka source and sink to s3. Currently it is running fine for small events records. But records with large payloads are running extremely slow like at rate 2 TPS. Could you provide some best practices to tune? Also, can we increase parallel processing, beyond the number of kafka partitions that we have, without causing any overhead ? Regards, Vijay