.json reading slow (drill)

via GitHub Wed, 05 Jul 2023 01:43:36 -0700


pandalanax commented on issue #2814:
URL: https://github.com/apache/drill/issues/2814#issuecomment-1621306114


   to summarize:
   If ``` planner.width.max_per_query > $(actual # of files for one part of 
union)```
   
   the `planner.width.max_per_query` shrinks to `$(actual # of files for one 
part of union)`.
   This could make sense now that i think about it as there is only an amount 
of parallel reading possible per file. 
   
   However: the files are replicated throughout the HDFS Cluster (also 4 nodes, 
replication is 4). In my understanding it should be possible to read one file 
via >= 1 thread. Am i wrong here?
   
   Another idea: 
   Is it maybe possible to split `JSON_SUB_SCAN` and `PARQUET_ROW_GROUP_SCAN` 
into own Major fragments when doing UNION ALL? (same with KAFKA_SUB_SCAN btw)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] dir/*.parquet UNION ALL dir/*.json reading slow (drill)

Reply via email to

Re: [I] dir/.parquet UNION ALL dir/.json reading slow (drill)