zclllyybb commented on issue #64006:
URL: https://github.com/apache/doris/issues/64006#issuecomment-4599813365
Breakwater-GitHub-Analysis-Slot: slot_969e40840992
Initial triage: this looks like a real Nereids/external-view schema-drift
bug on current master, not just a reporting artifact.
I re-read the issue and checked the referenced upstream tree
(`aa9162840f1`). The reported failure path matches the code:
- For HMS external views, `BindRelation` reads
`HMSExternalTable.getViewText()`, analyzes that SQL in the external
catalog/database context, and wraps the analyzed plan in `LogicalView(new
ExternalView(table, ddlSql), ...)`.
- `ExternalView.getFullSchema()` delegates to the external view table's own
`getFullSchema()`, i.e. the schema cached/loaded for the HMS view object.
- `REFRESH TABLE <base_table>` only invalidates the selected external table
cache through
`RefreshManager.refreshTableInternal(...)->invalidateTableCache(table)`. It
does not refresh the separate view table object.
- `LogicalView.computeOutput()` then iterates over `child().getOutput()` but
indexes `view.getFullSchema().get(i)` for every child slot. If the view SQL is
re-analyzed after the base table refresh and `SELECT *` now expands to 4 child
slots while the view object's stored schema still has 3 columns, `get(3)` is
exactly enough to produce `Index 3 out of bounds for length 3`.
So the issue's proposed diagnosis is consistent with the current code chain.
The existing `CollectionUtils.isEmpty(view.getFullSchema())` guard only
protects null/empty schemas; it does not protect a non-empty but shorter stored
view schema.
The suggested local fix is reasonable as a minimal crash fix:
```java
List<Column> fullSchema = view.getFullSchema();
if (CollectionUtils.isEmpty(fullSchema) || i >= fullSchema.size()) {
qualified = originSlot.withQualifier(fullQualifiers);
} else {
qualified = originSlot.withOneLevelTableAndColumnAndQualifier(view,
fullSchema.get(i), fullQualifiers);
}
```
One semantic point should be made explicit before merging: if Doris intends
HMS external views to follow the re-analyzed view SQL, then falling back to
`withQualifier()` for newly expanded slots is consistent with the current
analyzer behavior and avoids losing the new column. If Doris intends the stored
HMS view schema to be the authoritative output contract until `REFRESH TABLE
<view>`, then the fix should instead reconcile/cap the child output to the
stored view schema rather than only guarding the index. Either way, the current
uncaught `IndexOutOfBoundsException` is a bug.
Recommended next steps:
1. Add a regression test for an HMS external view whose stored view schema
has fewer columns than the re-analyzed view body output. The important
assertion is that `LogicalView.computeOutput()` does not throw when
`childOutput.size() > view.getFullSchema().size()`.
2. Reuse `view.getFullSchema()` in a local variable inside `computeOutput()`
so the code does not repeatedly fetch a potentially cache-backed schema during
the loop.
3. Confirm the intended result-column contract for `SELECT *` external views
after base-table schema drift: expose the newly added base column after base
refresh, or keep the old view schema until the view itself is refreshed.
4. Keep the documented workaround for affected users: run `REFRESH TABLE
<view>` or recreate the view after changing the underlying Hive table schema.
The issue currently has no labels; this should probably be routed to the
Nereids + external catalog/HMS view owners.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]