pandalanax opened a new issue, #2814: URL: https://github.com/apache/drill/issues/2814
**Describe the bug** We have a 20GB parquet file which we save in HDFS and regenerate monthly at the moment. The file is stored in 128MB parts so ~160 .parquet files. This is called our `history_data` We UNION these files with .json files which have newer data. Example given: Parquet file ranges from 2020-01-31 up to 2023-06-29. Json files ranges from 2023-07-01 up to current date. When issuing a query like: ```sql SELECT * FROM dfs.co2meter.sensor_data_v3 order by `timestamp` limit 10 ``` we observed that the Major Fragment 03-xx-xx containing: - JSON_SUB_SCAN - UNION_ALL - PROJECT - PROJECT - PARQUET_ROW_GROUP_SCAN only acts with min(# parquet files, # json files) threads/processes. E.g.: 160 parquet files & 9 Json files == 9/9 Minor Fragments Reporting (super slow) 160 parquet files & 160 Json files == 160/160 Minor Fragments Reporting (fastest) 160 parquet files & 320 Json files == 160/160 Minor Fragments Reporting (also ok fast) Is there a config for this? **To Reproduce** **Expected behavior** A clear and concise description of what you expected to happen. **Error detail, log output or screenshots** Prefer character data over screenshots for error messages and log output. **Drill version** 1.17.0 **Additional context** 4 Drillbits. 160/160 Minor Fragments Reporting  9/9 Minor Fragments Reporting  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org