callmepandey opened a new issue, #502:
URL: https://github.com/apache/iceberg-cpp/issues/502

   ## Summary
   
   The `ProjectRecordBatch` function in `parquet_data_util.cc` only supports 
`::arrow::ListArray` (32-bit offsets) but not `::arrow::LargeListArray` (64-bit 
offsets). This limitation is marked with a FIXME comment at line 151.
   
   ## Problem
   
   Arrow's `LargeListArray` uses 64-bit offsets instead of 32-bit, allowing it 
to handle lists with more than 2^31-1 total child elements. Currently, 
attempting to project a `LargeListArray` would fail with an error like:
   ```
   Expected list type, got: large_list<...>
   ```
   
   ## Proposed Solution
   
   1. **Add templated `ProjectListArrayImpl<>` function** - Generic 
implementation that works with both `ListArray` and `LargeListArray`
   
   2. **Add `ProjectLargeListArray` wrapper** - Calls the template with 
`LargeListArray` and `LargeListType`
   
   3. **Update `ProjectNestedArray`** - Handle both `::arrow::Type::LIST` and 
`::arrow::Type::LARGE_LIST` in the `TypeId::kList` case
   
   4. **Add test case** - Verify `LargeListArray` projection works correctly
   
   ## Files to Change
   
   - `src/iceberg/parquet/parquet_data_util.cc`
   - `src/iceberg/test/parquet_data_test.cc`
   
   ## References
   
   - FIXME comment: `src/iceberg/parquet/parquet_data_util.cc:151`
   - Arrow LargeListArray docs: https://arrow.apache.org/docs/cpp/api/array.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to