Re: Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-22 Thread Karthick
Hi All,

It will be helpful if we gave any pointers to the problem addressed.

Thanks
Karthick.

On Wed, Sep 20, 2023 at 3:03 PM Gowtham S  wrote:

> Hi Spark Community,
>
> Thank you for bringing up this issue. We've also encountered the same
> challenge and are actively working on finding a solution. It's reassuring
> to know that we're not alone in this.
>
> If you have any insights or suggestions regarding how to address this
> problem, please feel free to share them.
>
> Looking forward to hearing from others who might have encountered similar
> issues.
>
>
> Thanks and regards,
> Gowtham S
>
>
> On Tue, 19 Sept 2023 at 17:23, Karthick 
> wrote:
>
>> Subject: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem
>>
>> Dear Spark Community,
>>
>> I recently reached out to the Apache Flink community for assistance with
>> a critical issue we are facing in our IoT platform, which relies on Apache
>> Kafka and real-time data processing. We received some valuable insights and
>> suggestions from the Apache Flink community, and now, we would like to seek
>> your expertise and guidance on the same problem.
>>
>> In our IoT ecosystem, we are dealing with data streams from thousands of
>> devices, each uniquely identified. To maintain data integrity and ordering,
>> we have configured a Kafka topic with ten partitions, ensuring that each
>> device's data is directed to its respective partition based on its unique
>> identifier. While this architectural choice has been effective in
>> maintaining data order, it has unveiled a significant challenge:
>>
>> *Slow Consumer and Data Skew Problem:* When a single device experiences
>> processing delays, it acts as a bottleneck within the Kafka partition,
>> leading to delays in processing data from other devices sharing the same
>> partition. This issue severely affects the efficiency and scalability of
>> our entire data processing pipeline.
>>
>> Here are some key details:
>>
>> - Number of Devices: 1000 (with potential growth)
>> - Target Message Rate: 1000 messages per second (with expected growth)
>> - Kafka Partitions: 10 (some partitions are overloaded)
>> - We are planning to migrate from Apache Storm to Apache Flink/Spark.
>>
>> We are actively seeking guidance on the following aspects:
>>
>> *1. Independent Device Data Processing*: We require a strategy that
>> guarantees one device's processing speed does not affect other devices in
>> the same Kafka partition. In other words, we need a solution that ensures
>> the independent processing of each device's data.
>>
>> *2. Custom Partitioning Strategy:* We are looking for a custom
>> partitioning strategy to distribute the load evenly across Kafka
>> partitions. Currently, we are using Murmur hashing with the device's unique
>> identifier, but we are open to exploring alternative partitioning
>> strategies.
>>
>> *3. Determining Kafka Partition Count:* We seek guidance on how to
>> determine the optimal number of Kafka partitions to handle the target
>> message rate efficiently.
>>
>> *4. Handling Data Skew:* Strategies or techniques for handling data skew
>> within Apache Flink.
>>
>> We believe that many in your community may have faced similar challenges
>> or possess valuable insights into addressing them. Your expertise and
>> experiences can greatly benefit our team and the broader community dealing
>> with real-time data processing.
>>
>> If you have any knowledge, solutions, or references to open-source
>> projects, libraries, or community-contributed solutions that align with our
>> requirements, we would be immensely grateful for your input.
>>
>> We appreciate your prompt attention to this matter and eagerly await your
>> responses and insights. Your support will be invaluable in helping us
>> overcome this critical challenge.
>>
>> Thank you for your time and consideration.
>>
>> Thanks & regards,
>> Karthick.
>>
>


Spark Connect Multi-tenant Support

2023-09-22 Thread Kezhi Xiong
Hi,

>From Spark Connect's official site's image, it mentions the "Multi-tenant
Application Gateway" on driver. Are there any more documents about it? Can
I know how users can utilize such a feature?

Thanks,
Kezhi