zeroshade commented on code in PR #746:
URL: https://github.com/apache/arrow-go/pull/746#discussion_r3081235750
##########
parquet/pqarrow/file_writer.go:
##########
@@ -33,6 +33,48 @@ import (
"golang.org/x/xerrors"
)
+// normalizeFieldForParquet recursively normalizes an Arrow field so that its
+// type matches the Parquet column structure that fieldToNode would produce.
+// Specifically, list element field names are set to "element" because
+// ListOfWithName (used by fieldToNode) always names the Parquet element group
+// "element", regardless of the original Arrow element field name.
+func normalizeFieldForParquet(f arrow.Field) arrow.Field {
Review Comment:
Run-end-encoded arrays and list-views are not handled by `fieldToNode` as
group nodes — they fall through to `getParquetType`, which returns
`ErrNotImplemented` for them. So they can't produce Parquet group structures
with element-name mismatches to normalize.
For `NestedType`: it gives us `Fields()`/`NumFields()` for recursion but no
generic reconstruction path. We'd still need type-specific cases to rebuild
each type (e.g. `ListOfField` vs `FixedSizeListOfField` vs `StructOf`). Given
the normalization applies only to the handful of types `fieldToNode` actually
recurses into, I think the explicit switch is clearer about intent — it
documents exactly which types go through `ListOfWithName` and get the rename.
Using `NestedType` for detection but still doing type-specific reconstruction
would add indirection without reducing the case count.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]