Re: [PR] Document Arrow <--> Parquet schema conversion better [arrow-rs]

via GitHub Wed, 07 May 2025 11:11:54 -0700


alamb commented on code in PR #7479:
URL: https://github.com/apache/arrow-rs/pull/7479#discussion_r2078221299



##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -314,17 +317,26 @@ impl ArrowReaderOptions {
         }
     }
 
-    /// Provide a schema to use when reading the parquet file. If provided it
-    /// takes precedence over the schema inferred from the file or the schema 
defined
-    /// in the file's metadata. If the schema is not compatible with the file's
-    /// schema an error will be returned when constructing the builder.
+    /// Provide a schema to use when reading the Parquet file.
+    ///
+    /// If provided, this schema takes precedence over any schema defined in 
the
+    /// file's schema hint in the metadata (see the [`arrow`] documentation 
for more details).
+    /// If the provided schema is not compatible with the data stored in the
+    /// parquet file schema, an error will be returned when constructing the
+    /// builder.
+    ///
+    /// This option is only required if you want to explicitly control the
+    /// conversion of Parquet types to Arrow types, such as casting a column to
+    /// a different type. For example, if you wanted to read an Int64 in
+    /// a Parquet file to a [`TimestampMicrosecondArray`] in the Arrow schema.

Review Comment:
   It does error, see
   - https://github.com/apache/arrow-rs/pull/7481#



##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -314,17 +317,26 @@ impl ArrowReaderOptions {
         }
     }
 
-    /// Provide a schema to use when reading the parquet file. If provided it
-    /// takes precedence over the schema inferred from the file or the schema 
defined
-    /// in the file's metadata. If the schema is not compatible with the file's
-    /// schema an error will be returned when constructing the builder.
+    /// Provide a schema to use when reading the Parquet file.
+    ///
+    /// If provided, this schema takes precedence over any schema defined in 
the
+    /// file's schema hint in the metadata (see the [`arrow`] documentation 
for more details).
+    /// If the provided schema is not compatible with the data stored in the
+    /// parquet file schema, an error will be returned when constructing the
+    /// builder.
+    ///
+    /// This option is only required if you want to explicitly control the
+    /// conversion of Parquet types to Arrow types, such as casting a column to
+    /// a different type. For example, if you wanted to read an Int64 in
+    /// a Parquet file to a [`TimestampMicrosecondArray`] in the Arrow schema.

Review Comment:
   It does error, see tests in 
   - https://github.com/apache/arrow-rs/pull/7481#



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Document Arrow <--> Parquet schema conversion better [arrow-rs]

Reply via email to