Hi,

Just a few tips off the top of my head:
 - use dedicated coordinators and executors, rule of thumb for
coordinator:executor ratio is 1:50. Though for HA you probably want >1
coordinators.
 - use local catalog mode (aka On-demand metadata):
https://impala.apache.org/docs/build/html/topics/impala_metadata.html
 - enabling remote data cache (with SSD disks) is essential in
compute-storage separated setup:
https://impala.apache.org/docs/build/html/topics/impala_data_cache.html

What table format / file format are you planning to use?
If table format is Iceberg, make sure you use the latest Impala as we
continuously improve Impala's performance on Iceberg.
File format: Impala most efficiently works on Parquet files.

Avoid small file issues:
 - choose proper partitioning for your data, i.e. avoid too coarse-grained
and too fine-grained partitioning. I.e. you probably want more than 200 MB
data per partition, but probably less than 20 GB.
 - compact your tables regularly, for Iceberg tables Impala has the
OPTIMIZE statement:
https://impala.apache.org/docs/build/html/topics/impala_iceberg.html

I hope others chime in as well.

We would love to hear back about your experiences, and feel free to open
tickets for Impala if you run into any issue:
https://issues.apache.org/jira/projects/IMPALA/issues

Cheers,
    Zoltan

On Tue, Dec 23, 2025 at 8:23 AM 汲广熙 <[email protected]> wrote:

> Dear Impala Team,
>
> I hope this message finds you well.
>
> I am currently planning to build a compute-storage separated architecture​
> based on Apache Impala. In this setup:
>
>
> Compute layer: Apache Impala will be used for SQL query execution.
>
>
>
> Storage layer: Tencent Cloud Object Storage (COS) will serve as the
> backend storage.
>
>
>
> Data ingestion: Kafka will be used to stream data into the system.
>
>
>
> Monitoring &amp; visualization: Grafana will be used to display
> operational and performance metrics.
>
>
> Could you please provide some recommendations and key considerations​ for
> such an architecture? Specifically, I would appreciate guidance on:
>
>
> Best practices for integrating Impala with cloud object storage (COS).
>
>
>
> Performance tuning tips for Impala in a disaggregated environment.
>
>
>
> Any known limitations or compatibility issues when using COS as storage.
>
>
>
> Recommended configurations for Kafka-to-Impala data pipelines.
>
>
>
> Monitoring strategies for tracking query performance and resource usage in
> Grafana.
>
>
> Thank you very much for your support and advice. I look forward to your
> reply.
>
> Best regards

Reply via email to