Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

via GitHub Fri, 29 Aug 2025 11:33:19 -0700


rok commented on code in PR #16738:
URL: https://github.com/apache/datafusion/pull/16738#discussion_r2310832534



##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -1654,7 +1636,8 @@ async fn output_single_parquet_file_parallelized(
     object_store_writer: Box<dyn AsyncWrite + Send + Unpin>,
     data: Receiver<RecordBatch>,
     output_schema: Arc<Schema>,
-    parquet_props: &WriterProperties,
+    writer_properties: &WriterProperties,
+    skip_arrow_metadata: bool,

Review Comment:
   Previously we always had set `allow_single_file_parallelism == false`. Now 
that we allow for `true` the `WriterProperties` will use [another 
path](https://github.com/apache/datafusion/blob/25acb643585fe4460199a8731fc94c24e79466ef/datafusion/datasource-parquet/src/file_format.rs#L1127-L1134)
 for creating schema. We now fix this by calling:
   ```rust
       let options = ArrowWriterOptions::new()
           .with_properties(writer_properties.clone())
           .with_skip_arrow_metadata(skip_arrow_metadata);
   ```
   
[here](https://github.com/apache/datafusion/pull/16738/files#diff-a8919cf6209fb777550056cdd7decca3e6ed94370a2821a9395763fdd6271967R1652).
   
   I'm honestly not sure this is a good idea, but it was the simplest way I 
could find to fix the schema mismatch that was occurring without this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

Reply via email to