alamb commented on issue #8840:
URL: https://github.com/apache/arrow-rs/issues/8840#issuecomment-3536458954

   So the topic (and pain) of schema merging comes up a bunch in DataFusion 
(and perhaps elsewhere)
   
   For example, here is the code in DataFusion that handles schema merging for 
Parquet
   
   
https://github.com/apache/datafusion/blob/6ab4d216b768c9327982e59376a62a29c69ca436/datafusion/datasource-parquet/src/file_format.rs#L406-L421
   
   We also have a similar challenge when comparing schemas (in some cases a 
field that is less nullable than another should be compatibility).
   
   I am not quite sure what merging logic belongs in what crate (e.g. I don't 
have a sense for if there is a broadly agreed upon definition of what schema 
merging means for schema outsides the context of the schema evolution context 
of DataFusion)
   
   Thus what I suggest is:
   1. We start by moving schema merging logic into DataFusion, and iterate 
there until we get it right
   2. the consider if the logic belongs upstream in arrow-rs or not where we 
can commit to an API
   
   Given all the various potential options, the API I would suggest is some 
sort of Merger structure. Something like
   
   ```rust
   let mut schema_merger = SchemaMerger::new()
     .with_preserve_nulls(true); // set various options builder style;
   
   // try to merge the schemas
   schema_merger.try_merge(schemas)?;
   
   // get the built schema
   let merged_schema = schema_merger.build()?
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to