Re: [PR] DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns (drill)

via GitHub Fri, 30 Aug 2024 22:45:12 -0700


paul-rogers commented on PR #2937:
URL: https://github.com/apache/drill/pull/2937#issuecomment-2322784235


   > @paul-rogers I'm not aware of any recent significant work on the Parquet 
reader. I know @jnturton did some work regarding adding new compression 
capabilities and there have been a few bug fixes here and there, but nothing 
major as I recall. So I don't think we've added any Parquet "prescan" that I am 
aware of.
   
   Ah. I'm misunderstanding something. I outlined a number of the common 
problems we encounter when we try to figure out schema dynamically. A response 
suggested that this pull request solves this because it has access to the full 
Parquet schema in the planner. The only way to get schema that is either to use 
the old Parquet metadata cache, or something that scans all the Parquet files 
in the planner. I thought I saw a statement that such a scan was being done.
   
   To prevent further confusion, what is the source of the Parquet schema in 
this fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns (drill)

Reply via email to