praveenc7 commented on PR #15350:
URL: https://github.com/apache/pinot/pull/15350#issuecomment-2844116537
> Is there a design doc for this? I’m thinking of solving this for the
general case—whenever a newly-added column is accessed, we could inject a
virtual column on the fly whose values are null (default null value, with the
null vector set for all docs).
Thanks for looking into this, @Jackie-Jiang. We did explore the “on-the-fly
virtual column” idea, but ultimately chose to skip projecting columns until
every segment has fully loaded them. The two main reasons are:
1. Schema visibility on offline servers (immutable segments)
- Offline servers do not immediately receive the latest table schema.
- Injecting a virtual column without the correct data type would require
a HelixRefreshMessage (or equivalent controller event) to guarantee all offline
hosts refresh their local schemas before they start producing default values.
2. Inconsistent defaults across mixed segment states
- During reload, a broker/server often contains a mix of segments—some
already include the new column, while others do not.
- If we unconditionally materialise a “default-null” virtual column,
brokers/servers must reconcile two different views:
- Segments that carry real data for the new column.
- Segments that carry a synthetic, all-null representation.
- During broker reduce-phase/ server merging, we would still need logic
to guarantee consistency. The same problem re-appears at the broker layer if
different servers refresh at different times.
Given these trade-offs, we concluded it is safer—and clearer for users—to
withhold the column entirely until the load is 100 % complete. That contract
avoids incorrect or partially-correct results and eliminates the need for extra
reconciliation logic in the query path.
cc : @vvivekiyer
We can certainly document this behaviour more explicitly in a short design
note or page if that helps.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]