github-actions[bot] commented on code in PR #60543:
URL: https://github.com/apache/doris/pull/60543#discussion_r2945810367
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/CloudTabletStatMgr.java:
##########
@@ -338,24 +463,53 @@ private void updateStatInfo(List<Long> dbIds) {
(System.currentTimeMillis() - start));
}
- private void updateTabletStat(GetTabletStatsResponse response) {
+ private void updateTabletStat(GetTabletStatsResponse response, boolean
activeUpdate) {
TabletInvertedIndex invertedIndex = Env.getCurrentInvertedIndex();
for (TabletStatsPB stat : response.getTabletStatsList()) {
List<Replica> replicas =
invertedIndex.getReplicasByTabletId(stat.getIdx().getTabletId());
if (replicas == null || replicas.isEmpty() || replicas.get(0) ==
null) {
continue;
}
Replica replica = replicas.get(0);
+ boolean statsChanged = replica.getDataSize() != stat.getDataSize()
+ || replica.getRowsetCount() != stat.getNumRowsets()
+ || replica.getSegmentCount() != stat.getNumSegments()
+ || replica.getRowCount() != stat.getNumRows()
+ || replica.getLocalInvertedIndexSize() !=
stat.getIndexSize()
+ || replica.getLocalSegmentSize() != stat.getSegmentSize();
replica.setDataSize(stat.getDataSize());
replica.setRowsetCount(stat.getNumRowsets());
replica.setSegmentCount(stat.getNumSegments());
replica.setRowCount(stat.getNumRows());
replica.setLocalInvertedIndexSize(stat.getIndexSize());
replica.setLocalSegmentSize(stat.getSegmentSize());
+
+ CloudReplica cloudReplica = (CloudReplica) replica;
+ cloudReplica.setLastGetTabletStatsTime(System.currentTimeMillis());
+ int statsIntervalIndex = cloudReplica.getStatsIntervalIndex();
Review Comment:
**[Low - Concurrency]** The read-modify-write pattern on
`statsIntervalIndex` is not atomic:
```java
int statsIntervalIndex = cloudReplica.getStatsIntervalIndex(); // READ
// ... compute new value ...
cloudReplica.setStatsIntervalIndex(statsIntervalIndex); // WRITE
```
On a follower, if the daemon's `getAllTabletStats` interval path (which
calls `updateTabletStat(resp, false)`) and the `syncTabletStats` push path
(which calls `updateTabletStat(resp, true)`) process the same tablet
concurrently, the daemon's increment could overwrite the push's reset-to-0.
The practical impact is low (at worst, a tablet stays at a higher interval
for one extra cycle), but making the fields `volatile` and documenting this
accepted race would be cleaner.
##########
fe/fe-core/src/main/java/org/apache/doris/cloud/transaction/CloudGlobalTransactionMgr.java:
##########
@@ -1636,10 +1647,14 @@ private void commitTransactionWithSubTxns(long dbId,
List<Table> tableList, long
.map(c -> new
TabletCommitInfo(c.getTabletId(), c.getBackendId()))
.collect(Collectors.toList())))
.build());
+ for (TTabletCommitInfo tabletCommitInfo :
subTransactionState.getTabletCommitInfos()) {
+ tabletIds.add(tabletCommitInfo.getTabletId());
+ }
}
final CommitTxnRequest commitTxnRequest = builder.build();
- executeCommitTxnRequest(commitTxnRequest, transactionId, false, null);
+ executeCommitTxnRequest(commitTxnRequest, transactionId, false, null,
+ tabletIds.stream().collect(Collectors.toList()));
}
Review Comment:
**[Nit - Performance]** `tabletIds.stream().collect(Collectors.toList())`
can be simplified to `new ArrayList<>(tabletIds)` since `tabletIds` is already
a `Set<Long>`. The stream is unnecessary overhead.
##########
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:
##########
@@ -67,9 +69,23 @@ public class CloudReplica extends Replica implements
GsonPostProcessable {
private long indexId = -1;
@SerializedName(value = "idx")
private long idx = -1;
+ // last time to get tablet stats
+ @Getter
+ @Setter
+ long lastGetTabletStatsTime = 0;
+ /**
Review Comment:
**[Medium - Concurrency]** Both `lastGetTabletStatsTime` and
`statsIntervalIndex` are read by the daemon thread in
`CloudTabletStatMgr.runAfterCatalogReady()` and written by
`GET_TABLET_STATS_THREAD_POOL` / `SYNC_TABLET_STATS_THREAD_POOL` threads in
`updateTabletStat()`. Since these are plain fields (not `volatile`), the JMM
does not guarantee visibility across threads.
Concrete scenario: On a follower FE, the master pushes tablet stats via
`syncTabletStats` -> `updateTabletStat(response, true)`, which sets
`statsIntervalIndex = 0` and `lastGetTabletStatsTime = now`. The daemon thread
running `runAfterCatalogReady()` may read stale values (old
`statsIntervalIndex` and old `lastGetTabletStatsTime`), causing it to fetch the
same tablet stats again from meta_service — exactly the redundancy this PR aims
to eliminate.
Recommend: Mark both fields as `volatile`:
```java
volatile long lastGetTabletStatsTime = 0;
volatile int statsIntervalIndex = 0;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]