nevi-me commented on pull request #261:
URL: https://github.com/apache/arrow-rs/pull/261#issuecomment-834079037
> Does this make sense?
It makes sense, thanks.
In this case, I want to expose `parquet_format::FileMetadata`, but in order
to have it public, one also potentially ends up seeing the below, because
`FileMetadata` is a root type that then uses all the other types in one way or
another.
* SchemaElement
* > Type
* > FieldRepetitionType
* > ConvertedType
* > LogicalType (this carries all the types)
* RowGroup (this is also useful to expose)
* > ColumnChunk
* >> ColumnMetadata
* >>> Encoding
* >>> CompressionCodec
* >>> Statistics (this is one of the structs that we wanted to expose by
returning `FileMetadata`)
* > SortingColumn
* KeyValue
* ColumnOrder
* > TypeDefinedOrder
I paused when I got to something like this:
```rust
// lib.rs
pub mod format;
// format.rs
pub(crate) use parquet_format::*;
/// Re-export parquet_format as `parquet::format`.
///
/// Users are encouraged to use this, to avoid format mismatches.
pub use parquet_format::{
BsonType, ColumnOrder, DateType, DecimalType, EnumType, FileMetaData,
IntType,
JsonType, ListType, MapType, NullType, RowGroup, SchemaElement,
StringType, TimeType,
TimeUnit, TimestampType, UUIDType, TypeDefinedOrder,
};
```
I was trying to avoid forcing a user to do this
(https://github.com/delta-io/kafka-delta-ingest/blob/587a7ca5429f985876d3f6c4492519341f141e97/Cargo.toml#L19)
in order to get the `parquet_format` structs (e.g. if using them in function
signatures). In this crate, the user needs the column stats so they can write
them to a metadata file.
We could return `parquet::file::metadata::ParquetMetaData`
(https://docs.rs/parquet/4.0.0/parquet/file/metadata/struct.ParquetMetaData.html#method.file_metadata),
in which case we wouldn't need to expose `parquet_format::FileMetaData`.
The downside is that we need to then convert from
`parquet_format::FileMetaData` to `parquet::file::metadata::ParquetMetaData`.
Maybe a solution there is to `impl From<parquet_format::MetaData> for
parquet::file::metadata::FileMetaData`, but that requires a bunch of other
conversions for almost all the structs in that laundry list above.
What options can you suggest going forward? I see:
1. We do nothing. The onus is on users to check that they're using the same
`parquet-format` version as our crate (if they want to use
`parquet_format::FileMetaData` or its descendants.
2. We implement `From` for a bunch of structs, and convert to
`parquet::file::metadata::{ParquetMetaData|FileMetaData}`, then expose the
latter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]