Re: Architecture Advice: Best practices for consuming Kafka streams into Impala at 500k QPS

Michael Smith Tue, 13 Jan 2026 08:31:28 -0800

My experience with this is that you need another component to consume data
from Kafka topics, buffer it, and write to the table (either directly or
via Impala JDBC). Apache NiFi with ConsumeKafka
<https://nifi.apache.org/nifi-docs/getting-started.html#data-ingestion> ->
PutSQL
<https://nifi.apache.org/nifi-docs/getting-started.html#data-egress-sending-data>
is
one option, but it could probably be simpler too.


Something like https://github.com/confluentinc/kafka-connect-jdbc could
also work.
https://docs.confluent.io/kafka-connectors/jdbc/current/sink-connector/sink_config_options.html
shows an option for controlling batch size.

On Tue, Jan 13, 2026 at 12:49 AM 汲广熙 <[email protected]> wrote:

> We look forward to receiving your suggestions
>
>
>          原始邮件
>
>
> 发件人：汲广熙 <[email protected]&gt;
> 发件时间：2026年1月12日 14:29
> 收件人：dev <[email protected]&gt;
> 主题：Architecture Advice: Best practices for consuming Kafka streams into
> Impala at 500k QPS
>
>
>
>
> Hi Impala Devs,
>
> I am planning a production pipeline to ingest data from Kafka into
> Impala&nbsp;with a high throughput of approximately 500,000 QPS. Given the
> metadata overhead and file management constraints in Impala, I would like
> to get your recommendations on the most robust architecture.
>
> My Current Environment:
>
>
> Impala Version:&nbsp;4.5.0
>
>
>
> Storage:&nbsp;Tencent Cloud COS (Object Storage)
>
>
>
> Table Format:&nbsp;Apache Iceberg

Re: Architecture Advice: Best practices for consuming Kafka streams into Impala at 500k QPS

Reply via email to