pandalanax commented on issue #2814: URL: https://github.com/apache/drill/issues/2814#issuecomment-1621306114
to summarize: If ``` planner.width.max_per_query > $(actual # of files for one part of union)``` the `planner.width.max_per_query` shrinks to `$(actual # of files for one part of union)`. This could make sense now that i think about it as there is only an amount of parallel reading possible per file. However: the files are replicated throughout the HDFS Cluster (also 4 nodes, replication is 4). In my understanding it should be possible to read one file via >= 1 thread. Am i wrong here? Another idea: Is it maybe possible to split `JSON_SUB_SCAN` and `PARQUET_ROW_GROUP_SCAN` into own Major fragments when doing UNION ALL? (same with KAFKA_SUB_SCAN btw) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org