Re: [PR] [MSE] Introduce Apache Arrow as a columnar block format for MseBlock.Data [pinot]

via GitHub Mon, 11 May 2026 00:55:47 -0700


gortiz commented on PR #18262:
URL: https://github.com/apache/pinot/pull/18262#issuecomment-4418591435


   Thanks for the contribution Praveen — the four-class layering is clean and 
the test coverage for null/non-null round-trips is thorough. Before this merges 
I'd like to discuss two things: the overall merge strategy, and the ownership 
model for off-heap memory.
   
   **Merge strategy**
   
   Small, reviewable PRs are the right way to develop this incrementally, and 
we're happy to review them that way. However, we'd prefer not to merge partial 
Arrow infrastructure into `apache/pinot` master until the design is complete 
and proven end-to-end. Dead code in master (classes that exist but nothing 
uses) creates maintenance burden and makes it harder to reason about the 
codebase.
   
   A better workflow would be:
   
   - Create a long-lived integration branch — either in `apache/pinot` or your 
own fork
   - Stack your incremental PRs against that branch so each one is reviewable 
in isolation
   - Open a single draft PR from that branch into `apache/pinot` master, so we 
have full visibility into the end state
   
   This way we get the review ergonomics of small PRs without shipping partial 
infrastructure to master prematurely.
   
   **Ownership model: off-heap memory requires explicit release**
   
   With heap-backed blocks (`RowHeapDataBlock`, `SerializedDataBlock`) the GC 
handles reclamation automatically. Arrow blocks are different — the off-heap 
memory is invisible to the GC and must be explicitly freed. This means the 
design needs a clear answer to: who releases each block, and when?
   
   **Why a simple single-owner model is insufficient**
   
   Most of the operator pipeline is fine with single-owner linear transfer — a 
block is consumed, rows extracted, and then released. The case that breaks this 
is the **local broadcast / singleton exchange**: the same block is enqueued 
into multiple local mailbox queues without copying. The first consumer to 
release the block would free memory that other consumers are still reading.
   
   Note: for hashing distribution this is not a problem — new disjoint blocks 
are created per bucket, each with a single owner.
   
   **Proposed ownership model: retain / release**
   
   The right primitive here is reference counting:
   
   - Initial refcount = 1 (the producer holds the first reference)
   - `retain()` — increments refcount; throws if already 0 (use-after-free 
guard)
   - `release()` — decrements; frees the underlying `VectorSchemaRoot` only 
when it reaches 0
   
   A few notes on API design:
   
   - The primary API should be `retain()` / `release()`, not `close()`. 
`close()` implies "destroy unconditionally", which conflicts with ref counting 
semantics.
   - I'd recommend **not** implementing `AutoCloseable` on `ArrowBlock`. 
`try-with-resources` calls `close()` unconditionally regardless of refcount, 
and callers that share a block would silently release it early. Making the 
release explicit forces the caller to reason about ownership rather than 
relying on scope.
   - `VectorSchemaRoot` in Arrow Java doesn't have block-level ref counting 
built in; you'd implement it with an `AtomicInteger` on `ArrowBlock`, similar 
to Netty's `ReferenceCounted` pattern.
   
   **Mailbox implications**
   
   For the local mailbox send path, the sender would call `block.retain()` once 
per local recipient before enqueuing, then `release()` its own producer 
reference after all enqueues complete. The remote path serializes to bytes (a 
copy), so no retain is needed — the block can be released after serialization. 
Each consumer calls `release()` when done.
   
   This is worth designing explicitly before the operator and transport PRs are 
written, so the contract is clear from the start.
   
   Happy to discuss — wanted to raise this while the API surface is still small.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [MSE] Introduce Apache Arrow as a columnar block format for MseBlock.Data [pinot]

Reply via email to