HaoYang670 commented on issue #1799:
URL: https://github.com/apache/arrow-rs/issues/1799#issuecomment-1147434811
The reason that I prefer removing `ArrayData::data_type` is that it
introduces the possibility of the inconsistency between `ArrayData::data_type`
and `ArrayData::layout`. And this could increase the workload of
`ArrayData::validate` (lots of pattern matching ...).
> You still need the DataType to roundtrip the actual type, e.g. int32 vs
uint32, the Field for nested types, etc...
The first way I thought is that we could inject `dataType` into
`ArrayDataLayout`. For example:
```rust
pub enum ArrayDataLayout {
...
Primitive(type: PrimitiveType, values: Buffer },
Binary (is_large: Boolean, values: Buffer ...},
...
}
pub enum PrimitiveType {
Int32,
Int64,
...
}
```
But this cannot support nested types well.
My second thought is that we could refactor `DataType` like this:
```rust
enum DataType {
Primitive(type: PrimitiveType)
List(type: ListType)
...
}
enum PrimitiveType {
Int32,
Int64,
...
}
enum ListType {
List(Box<Field>),
FixedSizeList(Box<Field>, i32),
LargeList(Box<Field>),
}
```
I guess this could decrease the workload of `ArrayData::validate`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]