My experience with this is that you need another component to consume data from Kafka topics, buffer it, and write to the table (either directly or via Impala JDBC). Apache NiFi with ConsumeKafka <https://nifi.apache.org/nifi-docs/getting-started.html#data-ingestion> -> PutSQL <https://nifi.apache.org/nifi-docs/getting-started.html#data-egress-sending-data> is one option, but it could probably be simpler too.
Something like https://github.com/confluentinc/kafka-connect-jdbc could also work. https://docs.confluent.io/kafka-connectors/jdbc/current/sink-connector/sink_config_options.html shows an option for controlling batch size. On Tue, Jan 13, 2026 at 12:49 AM 汲广熙 <[email protected]> wrote: > We look forward to receiving your suggestions > > > 原始邮件 > > > 发件人:汲广熙 <[email protected]> > 发件时间:2026年1月12日 14:29 > 收件人:dev <[email protected]> > 主题:Architecture Advice: Best practices for consuming Kafka streams into > Impala at 500k QPS > > > > > Hi Impala Devs, > > I am planning a production pipeline to ingest data from Kafka into > Impala with a high throughput of approximately 500,000 QPS. Given the > metadata overhead and file management constraints in Impala, I would like > to get your recommendations on the most robust architecture. > > My Current Environment: > > > Impala Version: 4.5.0 > > > > Storage: Tencent Cloud COS (Object Storage) > > > > Table Format: Apache Iceberg
