morningman commented on PR #64035:
URL: https://github.com/apache/doris/pull/64035#issuecomment-4610649668
## 1. Avoid warming *latest* schema/partitions for tables referenced only
via non-latest relations
For a table referenced only by non-latest relations (time travel / branch /
tag, i.e. `hasNonLatestRelation == true`):
- The latest-snapshot preload is correctly skipped.
- But `getBaseSchema()` (latest schema) and
`initSelectedPartitions(getSnapshot(table))` still run. Here
`getSnapshot(table)` returns `Optional.empty()`, and for Iceberg
`IcebergUtils.getSnapshotCacheValue(empty, table)` falls through to
`getLatestSnapshotCacheValue(...)` — so it warms the **latest** version's
schema/partitions, while the query actually targets a specific historical
snapshot/branch/tag.
This is **not a correctness bug**: the `initSelectedPartitions` return value
is discarded in preload (it only warms caches), and the real `LogicalFileScan`
recomputes with the correct snapshot. But it is **wasted work on exactly the
slow metadata path this PR is trying to optimize**, adding needless latency to
time-travel-style mixed queries. Consider gating the partition preload (and
possibly the whole preload) on the latest-only condition:
```java
boolean preloadPartition = table.supportInternalPartitionPruned() &&
preloadLatestSnapshot;
```
## 2. Mark the regression suite `nonConcurrent`, and reconsider the
`Suite.groovy` relaxation
`test_jdbc_preload_profile.groovy` declares only the `external` group. Per
`RegressionTest.getGroupExecType`, that maps to **NORMAL (concurrent)**
execution, yet the test sets a **global** debug point via
`enableDebugPointForAllFEs("PluginDrivenExternalTable.initSchema.sleep", ...)`.
While that debug point is active, any concurrently running external suite that
triggers a JDBC `initSchema` cache miss is forced to sleep 10s — a recipe for
**cross-test flakiness / timeouts**.
The cleaner fix is to mark the suite `nonConcurrent`:
```groovy
suite("test_jdbc_preload_profile", "external,nonConcurrent") {
```
`getGroupExecType` checks `nonConcurrent` first and returns `SINGLE`, so the
suite runs isolated on a single-threaded executor **and** satisfies the
*original* debug-point guard (`SINGLE` was already allowed). In other words,
once `nonConcurrent` is added, this PR's change to `Suite.groovy` is **no
longer necessary**.
The `Suite.groovy` change (allowing debug points in the `external` group)
**permanently weakens a guardrail**: future authors could then use a global
debug point inside a *concurrently executed* external suite and silently
introduce cross-test interference. Prefer the `nonConcurrent` approach and
consider reverting the framework change. **Why it matters**: a debug point is
process-global mutable state; suites that use one must run in isolation,
otherwise you get hard-to-diagnose CI flakiness.
## 3. Clarify profile time attribution to avoid double counting
`NEREIDS_LOCK_TABLE_TIME = nereidsLockTableFinishTime − parseSqlFinishTime`,
and `nereidsLockTableFinishTime` is set *after* `lock()`. Since preload runs
inside that window, the preload duration is counted **both** in the new
`Nereids Preload External Metadata Time` counter **and** in the parent `Nereids
Lock Table Time`. Both are rendered at the same indentation level (siblings),
so a user summing the sub-stages will double-count preload. (Note: `collect()`
time was already inside this window pre-PR; this change just adds preload to
it.)
Consider either documenting that "Lock Table Time" already includes Collect
+ Preload + Lock (so the sub-stages shouldn't be naively summed), or narrowing
the `Nereids Lock Table Time` computation to cover only the actual locking. Low
priority — observability only.
## 4. (Minor) Use `TimeUtils.getElapsedTimeMs` instead of manual subtraction
`NereidsPlanner` and `StatementContext` compute elapsed time as
`TimeUtils.getStartTimeMs() - startTime` in several places. `TimeUtils` already
provides `getElapsedTimeMs(long startTime)`, which reads more clearly.
(`getStartTimeMs()` is a slightly misleading name for "read the current
instant," though its implementation is just `System.currentTimeMillis()`, so
the result is correct.) Purely a readability cleanup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]