XuQianJin-Stars opened a new pull request, #2578:
URL: https://github.com/apache/fluss/pull/2578
<!--
*Thank you very much for contributing to Fluss - we are happy that you want
to help us improve Fluss. To help the community review your contribution in the
best possible way, please go through the checklist below, which will get the
contribution into a shape in which it can be best reviewed.*
## Contribution Checklist
- Make sure that the pull request corresponds to a [GitHub
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos
in JavaDoc or documentation files, which need no issue.
- Name the pull request in the format "[component] Title of the pull
request", where *[component]* should be replaced by the name of the component
being changed. Typically, this corresponds to the component label assigned to
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are
unsure about which is the best component.
- Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the review.
- Make sure that the change passes the automated tests, i.e., `mvn clean
verify` passes.
- Each pull request should address only one issue, not mix up code from
multiple issues.
**(The sections below can be removed for hotfixes or typos)**
-->
### Purpose
<!-- Linking this pull request to the issue -->
Linked issue: close #2404
<!-- What is the purpose of the change -->
This PR adds NestedRow (Struct) type support for Lance lake storage,
extending the existing Array type support implementation.
### Brief change log
<!-- Please describe the changes made in this pull request and explain how
they address the issue -->
- **LanceArrowUtils.java**:
- Extended `toArrowField()` to handle `RowType` by recursively creating
child fields for nested struct types
- Extended `toArrowType()` to map Fluss `RowType` to Arrow
`Struct.INSTANCE`
- **ArrowDataConverter.java**:
- Added `copyStructVectorData()` method to recursively copy data from
shaded `StructVector` to non-shaded `StructVector`
- Updated `copyVectorData()` to detect and delegate to struct-specific
copy logic
- **ShadedArrowBatchWriter.java**:
- Extended `initFieldVector()` to properly allocate and initialize
`StructVector` and its child vectors
- **FlinkLanceTieringTestBase.java**:
- Added `createLogTableWithNestedRowType()` helper method for creating
tables with nested Row columns
- Added `createLogTableWithArrayOfRowType()` helper method for creating
tables with Array<Row> columns
### Tests
<!-- List UT and IT cases to verify this change -->
**Unit Tests (LanceArrowUtilsTest.java)**:
- `testToArrowSchemaWithNestedRowType`: Verifies simple nested Row type
conversion to Arrow Struct
- `testToArrowSchemaWithDeeplyNestedRowType`: Verifies deeply nested Row
type conversion
- `testToArrowSchemaWithArrayOfRowType`: Verifies Array<Row> type conversion
- `testToArrowSchemaWithRowContainingArray`: Verifies Row containing Array
field
**Unit Tests (LanceTieringTest.java)**:
- `testTieringWriteTableWithNestedRowType`: Verifies writing and reading
tables with nested Row type
**Integration Tests (LanceTieringITCase.java)**:
- `testTieringWithNestedRowType`: End-to-end test for tiering with nested
Row type
- `testTieringWithArrayOfRowType`: End-to-end test for tiering with
Array<Row> type
### API and Format
<!-- Does this change affect API or storage format -->
No API changes. This change extends the internal Lance lake storage format
to support Struct types, which is backward compatible.
### Documentation
<!-- Does this change introduce a new feature -->
No documentation changes needed. This is an internal enhancement to support
additional data types in Lance lake storage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]