tustvold commented on code in PR #8524:
URL: https://github.com/apache/arrow-rs/pull/8524#discussion_r2655844977


##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -490,6 +498,18 @@ impl ArrowWriterOptions {
             ..self
         }
     }
+
+    /// Explicitly specify the Parquet schema to be used

Review Comment:
   So this API actually ends up being a bit problematic, the reason being the 
type inference and coercion machinery are supposed to mirror each other.
   
   With this change:
   
   * You can write files that won't roundtrip correctly, as the reader doesn't 
understand the types in the arrow schema (and will just ignore them)
   * You can end up with incorrect type coercion for types, e.g. unsigned types 
not being handled correctly
   
   Further this interferes with removing arrow_cast as a dependency - 
https://github.com/apache/arrow-rs/pull/9077
   
   I'm not sure what the intention of this API is, why can't the arrays just be 
cast before being written, why does this logic need to live within the parquet 
writer itself?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to