github-actions[bot] commented on code in PR #63298:
URL: https://github.com/apache/doris/pull/63298#discussion_r3290506272
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/MaterializedIndex.java:
##########
@@ -109,23 +112,54 @@ public Tablet getTablet(long tabletId) {
return idToTablets.get(tabletId);
}
- public void clearTabletsForRestore() {
+ public synchronized void clearTabletsForRestore() {
idToTablets.clear();
- tablets.clear();
+ tablets = new ArrayList<>(); // volatile write
}
- public void addTablet(Tablet tablet, TabletMeta tabletMeta) {
+ public synchronized void addTablet(Tablet tablet, TabletMeta tabletMeta) {
addTablet(tablet, tabletMeta, false);
}
- public void addTablet(Tablet tablet, TabletMeta tabletMeta, boolean
isRestore) {
- idToTablets.put(tablet.getId(), tablet);
- tablets.add(tablet);
+ // Writers are synchronized on this index to prevent concurrent
lost-update:
+ // some callers (e.g. InternalCatalog.createTablets) do NOT hold the
OlapTable
+ // write lock when adding tablets.
+ // Copy-on-write keeps readers CME-safe without locking; for bulk creation
use
+ // appendTablets(...) so the per-index tablets list is copied once per
batch
+ // instead of once per tablet.
+ public synchronized void addTablet(Tablet tablet, TabletMeta tabletMeta,
boolean isRestore) {
+ appendTabletsInternal(Collections.singletonList(tablet));
if (!isRestore) {
Env.getCurrentInvertedIndex().addTablet(tablet.getId(),
tabletMeta);
}
}
+ // Bulk-publish: append the given tablets to this index's tablets list in a
+ // single copy-on-write (O(existing + batch) instead of O(n^2) over n
+ // single-tablet adds inside a synchronized block).
+ //
+ // Does NOT touch TabletInvertedIndex. Bulk-creation callers register
tablets
+ // in TabletInvertedIndex eagerly inside their per-tablet loop because
+ // Tablet.addReplica(...) (non-restore) requires the tablet to already be
+ // present in the inverted index; only the per-index list copy is expensive
+ // enough to be worth batching.
+ public synchronized void appendTablets(Collection<Tablet> newTablets) {
+ appendTabletsInternal(newTablets);
+ }
+
+ private void appendTabletsInternal(Collection<Tablet> newTablets) {
+ if (newTablets.isEmpty()) {
+ return;
+ }
+ List<Tablet> next = new ArrayList<>(tablets.size() +
newTablets.size());
+ next.addAll(tablets);
+ for (Tablet tablet : newTablets) {
+ idToTablets.put(tablet.getId(), tablet);
+ next.add(tablet);
Review Comment:
This still mutates `idToTablets` in place while `getTablet()` reads the same
plain `HashMap` without synchronization. The PR makes the ordered `tablets`
list a volatile copy-on-write snapshot, but lookup readers such as
scheduler/report/proc paths use `getTablet()` and do not get that protection;
at the same time the new comments explicitly document writers that do not
necessarily hold the `OlapTable` write lock. A concurrent
`appendTablets()`/`clearTabletsForRestore()` can therefore race with
`getTablet()` on `HashMap`, which is exactly the kind of metadata reader race
this change is trying to remove. Please publish `idToTablets` with the same
snapshot discipline (for example a volatile copied map updated together with
`tablets`) or otherwise make all reads/writes consistently
synchronized/concurrent.
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/LocalTablet.java:
##########
@@ -31,16 +31,16 @@
import org.apache.logging.log4j.Logger;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.Comparator;
-import java.util.Iterator;
import java.util.List;
import java.util.stream.LongStream;
public class LocalTablet extends Tablet {
private static final Logger LOG = LogManager.getLogger(LocalTablet.class);
@SerializedName(value = "rs", alternate = {"replicas"})
- private List<Replica> replicas;
+ private volatile List<Replica> replicas;
@SerializedName(value = "lastCheckTime")
Review Comment:
Making `replicas` volatile is not enough while methods still read the field
directly more than once instead of capturing one snapshot. For example,
`getRemoteDataSize()` iterates `replicas` looking for `cooldownReplicaId`, then
rereads `replicas` for `stream().max(...).get()`; a concurrent
`deleteReplica*()` can publish an empty/different list between those reads and
make this throw or compute from a different snapshot. Other direct-field
readers such as `getReplicaByBackendId()`, `equals()`, and
`readyToBeRepaired()` also bypass the `getReplicas()` snapshot convention.
Please update these readers to use a single local `List<Replica> snapshot =
replicas` (or `getReplicas()`) per method so the copy-on-write guarantee
actually covers all replica readers, not only external iteration.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]