[GitHub] [arrow-datafusion] devinjdangelo commented on a diff in pull request #7435: Support Write Options in DataFrame::write_* methods

via GitHub Thu, 31 Aug 2023 14:45:26 -0700


devinjdangelo commented on code in PR #7435:
URL: https://github.com/apache/arrow-datafusion/pull/7435#discussion_r1312317591



##########
datafusion/core/src/dataframe.rs:
##########
@@ -2292,4 +2365,41 @@ mod tests {
 
         Ok(())
     }
+
+    #[tokio::test]
+    async fn write_parquet_with_compression() -> Result<()> {
+        let test_df = test_table().await?;
+        let tmp_dir = TempDir::new()?;
+        let local = Arc::new(LocalFileSystem::new_with_prefix(&tmp_dir)?);
+        let local_url = Url::parse("file://local").unwrap();
+        let ctx = &test_df.session_state;
+        ctx.runtime_env().register_object_store(&local_url, local);
+
+        let output_path = "file://local/test.parquet";
+        test_df
+            .write_parquet(
+                output_path,
+                DataFrameWriteOptions::new().with_single_file_output(true),
+                Some(
+                    WriterProperties::builder()
+                        .set_compression(parquet::basic::Compression::SNAPPY)

Review Comment:
   Relevant lines in parquet crate:
   
   
https://github.com/apache/arrow-rs/blob/eeba0a3792a2774dee1d10a25340b2741cf95c9e/parquet/src/file/metadata.rs#L640
   
   
https://github.com/apache/arrow-rs/blob/eeba0a3792a2774dee1d10a25340b2741cf95c9e/parquet/src/format.rs#L3495
   
   I suppose outside of testing, there is no compelling reason for the parquet 
file to store the compression level that was used for each column chunk.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] devinjdangelo commented on a diff in pull request #7435: Support Write Options in DataFrame::write_* methods

Reply via email to