tustvold commented on code in PR #7479:
URL: https://github.com/apache/arrow-rs/pull/7479#discussion_r2078252057


##########
parquet/src/arrow/mod.rs:
##########
@@ -15,13 +15,42 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! API for reading/writing
-//! Arrow [RecordBatch](arrow_array::RecordBatch)es and
-//! [Array](arrow_array::Array)s to/from Parquet Files.
+//! API for reading/writing Arrow [`RecordBatch`]es and [`Array`]s to/from
+//! Parquet Files.
 //!
-//! See the [crate-level documentation](crate) for more details.
+//! See the [crate-level documentation](crate) for more details on other APIs
 //!
-//! # Example of writing Arrow record batch to Parquet file
+//! # Schema Conversion
+//!
+//! These APIs ensure that data in Arrow [`RecordBatch`]es written to Parquet 
are
+//! read back as [`RecordBatch`]es with the exact same types and values.
+//!
+//! Parquet and Arrow have different type systems, and there is not
+//! always a one to one mapping between the systems. For example, data
+//! stored as a Parquet [`BYTE_ARRAY`] can be read as either an Arrow
+//! [`BinaryViewArray`] or [`BinaryArray`].
+//!
+//! To recover the original Arrow types, the writers in this module add a 
"hint" to
+//! the metadata in the [`ARROW_SCHEMA_META_KEY`] key which records the 
original Arrow
+//! schema. The metadata hint follows the same convention as arrow-cpp based
+//! implementations such as `pyarrow`. The reader looks for the schema hint in 
the
+//! metadata to determine Arrow types, and if it is not present, use 
reasonable defaults.
+//! You can also control the type conversion process in more detail using:
+//!

Review Comment:
   ```suggestion
   //! implementations such as `pyarrow`. The reader looks for the schema hint 
in the
   //! metadata to determine Arrow types, and if it is not present, infers the 
arrow schema
   //! from the parquet schema.
   //!
   //! In situations where the embedded arrow schema is not compatible with the 
parquet
   //! schema, the parquet schema takes precedence - see 
[#1663](https://github.com/apache/arrow-rs/issues/1663)
   //!
   //! You can also control the type conversion process in more detail using:
   //!
   ```
   Perhaps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to