bobhan1 opened a new pull request, #61044:
URL: https://github.com/apache/doris/pull/61044

   ## Summary
   - Register `column_data_sizes` in 
`BackendPartitionedSchemaScanNode.BACKEND_TABLE` so that queries to 
`information_schema.column_data_sizes` are distributed to all BEs instead of a 
randomly selected single BE.
   - Previously, `column_data_sizes` was treated as a regular `SchemaScanNode`, 
which creates only one scan range targeting a single BE. Since each BE only 
collects local tablet data, the query result was incomplete and inconsistent 
across executions.
   
   ## Root Cause
   `column_data_sizes` was not registered in 
`BackendPartitionedSchemaScanNode.BACKEND_TABLE`. This caused the FE planner to 
use the base `SchemaScanNode` which creates a single scan range sent to one 
randomly chosen BE. Each BE only scans its local tablets, so only a subset of 
data was returned.
   
   ## Fix
   Add `column_data_sizes` to the `BACKEND_TABLE` set. The table already has a 
`BACKEND_ID` column (lowercase `backend_id`), which matches the existing entry 
in `BEACKEND_ID_COLUMN_SET`, so no additional changes are needed.
   
   ## Test plan
   - [ ] Query `select * from information_schema.column_data_sizes where 
table_id=xxx` on a multi-BE cluster and verify all BEs' data is returned
   - [ ] Verify `BACKEND_ID` column shows different BE IDs in results
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to