Karakatiza666 opened a new pull request, #438:
URL: https://github.com/apache/arrow-js/pull/438

   This PR was co-authored with [Claude Code](https://claude.com/claude-code).
   
   ---
   
   ## Summary
   
   This PR builds on an unresolved https://github.com/apache/arrow-js/pull/299 
to implement full support for the `LargeList` data type in Apache Arrow 
JavaScript bindings. LargeList uses 64-bit offsets (BigInt64Array) instead of 
32-bit offsets, enabling list values larger than 2GB.
   
   Where possible, the code size was reduced by distilling helpers used in both 
`List` and `LargeList`.
   
   ## Related Issues
   
   Closes #70
   
   ## Implementation Details
   
   ### Core Type System
   
   - Added `Type.LargeList = 21` enum value
   - Implemented `LargeList<T>` class with `BigInt64Array` offset support
   - Added `DataType.isLargeList()` type guard
   - Added `LargeListDataProps` interface and `MakeDataVisitor.visitLargeList` 
(widens 32-bit offsets via `toBigInt64Array`)
   - Mapped `LargeList` and `LargeListBuilder` into `TypeToDataType`, 
`TypeToBuilder`, and `DataTypeToBuilder` in `interfaces.ts`
   
   ### Visitor Pattern Implementation
   
   Wired `visitLargeList()` across every visitor, factoring shared helpers 
where the offset width was the only difference:
   - `GetVisitor` / `SetVisitor`: merged `getList` / `setList` into single 
helpers using `bigIntToNumber` at the offset boundary — one implementation 
covers both List and LargeList
   - `IteratorVisitor`, `IndexOfVisitor`: register `visitLargeList` (the 
generic implementations are offset-width agnostic)
   - `TypeComparator`: widened compareList to `List | LargeList` (structural 
comparison only)
   - `VectorAssembler`: generalized `assembleListVector` to coerce begin/end 
via `bigIntToNumber`; registers `visitLargeList`
   - `VectorLoader`: `visitLargeList` mirrors `visitList`; base `readOffsets` 
already honors `OffsetArrayType` (`BigInt64Array`)
   - `JSONVectorAssembler`: emits `OFFSET` via `bigNumsToStrings`, matching the 
`LargeUtf8` / `LargeBinary` pattern
   - `TypeAssembler` / `JSONTypeAssembler`: `FlatBuffers` + JSON type 
serialization
   
   ### IPC Support
   
   - `ipc/metadata/message.ts`: `decodeFieldType` handles `Type.LargeList`
   - Read and write paths both round-trip via the assembler/loader 
registrations above
   
   ### Latent Bug Fix
   
   - `util/buffer.ts`: `rebaseValueOffsets` now coerces its number offset to 
`BigInt` when the offsets array is `BigInt64Array`. Previously a non-zero 
offset on a 64-bit offsets array would `TypeError` on bigint += number — 
required for `LargeList` IPC writes on sliced data, and also fixes the same 
latent issue for `LargeUtf8` / `LargeBinary`.
   
   ### Builders
   
   - New `src/builder/largelist.ts` (`LargeListBuilder`), mirroring 
`ListBuilder` with `BigInt()` for offset accumulation and `Number()` coercion 
when passing the start index to `child.set`
   - Widened `VariableWidthBuilder` bound to include `LargeList` in `builder.ts`
   - `GetBuilderCtor.visitLargeList` returns `LargeListBuilder`
   
   ### Testing
   
   - `test/generate-test-data.ts`:
     - Factored a shared `generateListLike` helper used by both `generateList` 
(`Int32`) and `generateLargeList` (`BigInt64`)
     - Added `createVariableWidthOffsets64`; truncates `min` / `max` at entry 
so fractional stride from `childVec.length / (length - nullCount)` doesn't 
`RangeError` in `BigInt()`
   - `test/unit/generated-data-tests.ts`: `LargeList` added to the matrix
   - `test/unit/builders/builder-tests.ts`: `LargeListBuilder` entry added 
alongside `ListBuilder` / `FixedSizeListBuilder` / `MapBuilder`
   - `test/unit/visitor-tests.ts`: `visitLargeList` added to `BasicVisitor` / 
`FeatureVisitor` and to both describe matrices
   
   ### Public API
   
   - Exported `LargeList` and `LargeListBuilder` from `src/Arrow.ts` and 
`src/Arrow.dom.ts`
   
   ## Test Plan
   
   All existing tests continue to pass, plus the `LargeList` path is exercised 
by:
   - ✅ Generated-data matrix: `get` / `set` / `iterator` / `indexOf` / `slice` 
/ `concat` / IPC round-trip
   - ✅ Builder matrix: no-nulls / with-nulls / length=518
   - ✅ Visitor dispatch (`BasicVisitor` + `FeatureVisitor`)
   - ✅ IPC stream round-trip (16 IPC suites green, including JSON form via 
`JSONVectorAssembler` / `JSONVectorLoader`)
   
   All tests across 45 suites pass.
   
   The tests were run with:
   ```bash
   npx jest --config jestconfigs/jest.src.config.js
   ```
   
   ## Checklist
   
   - [x] Implementation follows existing code patterns
   - [x] All visitor methods implemented (`get` / `set` / `iterator` / 
`indexOf` / `TypeComparator` / `VectorAssembler` / `VectorLoader` / 
`JSONVectorAssembler` / `TypeAssembler` / `JSONTypeAssembler`)
   - [x] IPC serialization/deserialization support added (binary + JSON form)
   - [x] `LargeListBuilder` added and wired through `GetBuilderCtor` + 
`interfaces.ts`
   - [x] Latent `rebaseValueOffsets` bigint bug fixed
   - [x] Comprehensive tests added using existing test framework
   - [x] All tests passing
   - [x] Public API exports added
   - [x] No breaking changes
   
   ## Notes
   
   - This implementation provides full `LargeList` support: IPC read/write 
(binary + JSON form), in-memory access and mutation, type comparison, and 
construction via `LargeListBuilder` — parallel to the existing List type, just 
with 64-bit offsets.
   - Storage and wire format are honest 64-bit (`BigInt64Array` end-to-end). 
The only narrowing happens at JS-runtime boundaries where `Data.slice` accepts 
number — identical to the `LargeUtf8` / `LargeBinary` policy upstream
   - Helpers were merged across `List`/`LargeList` only where the offset width 
was the sole difference and `bigIntToNumber` coercion at the boundary made the 
merge non-confusing; `LargeListBuilder` stays separate because the `BigInt()` / 
`Number()` coercions in `_flushPending` would obscure a merged version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to