Hi, Just a few tips off the top of my head: - use dedicated coordinators and executors, rule of thumb for coordinator:executor ratio is 1:50. Though for HA you probably want >1 coordinators. - use local catalog mode (aka On-demand metadata): https://impala.apache.org/docs/build/html/topics/impala_metadata.html - enabling remote data cache (with SSD disks) is essential in compute-storage separated setup: https://impala.apache.org/docs/build/html/topics/impala_data_cache.html
What table format / file format are you planning to use? If table format is Iceberg, make sure you use the latest Impala as we continuously improve Impala's performance on Iceberg. File format: Impala most efficiently works on Parquet files. Avoid small file issues: - choose proper partitioning for your data, i.e. avoid too coarse-grained and too fine-grained partitioning. I.e. you probably want more than 200 MB data per partition, but probably less than 20 GB. - compact your tables regularly, for Iceberg tables Impala has the OPTIMIZE statement: https://impala.apache.org/docs/build/html/topics/impala_iceberg.html I hope others chime in as well. We would love to hear back about your experiences, and feel free to open tickets for Impala if you run into any issue: https://issues.apache.org/jira/projects/IMPALA/issues Cheers, Zoltan On Tue, Dec 23, 2025 at 8:23 AM 汲广熙 <[email protected]> wrote: > Dear Impala Team, > > I hope this message finds you well. > > I am currently planning to build a compute-storage separated architecture > based on Apache Impala. In this setup: > > > Compute layer: Apache Impala will be used for SQL query execution. > > > > Storage layer: Tencent Cloud Object Storage (COS) will serve as the > backend storage. > > > > Data ingestion: Kafka will be used to stream data into the system. > > > > Monitoring & visualization: Grafana will be used to display > operational and performance metrics. > > > Could you please provide some recommendations and key considerations for > such an architecture? Specifically, I would appreciate guidance on: > > > Best practices for integrating Impala with cloud object storage (COS). > > > > Performance tuning tips for Impala in a disaggregated environment. > > > > Any known limitations or compatibility issues when using COS as storage. > > > > Recommended configurations for Kafka-to-Impala data pipelines. > > > > Monitoring strategies for tracking query performance and resource usage in > Grafana. > > > Thank you very much for your support and advice. I look forward to your > reply. > > Best regards
