dataroaring opened a new pull request, #61292: URL: https://github.com/apache/doris/pull/61292
## Summary Replace `HashMap<Long, V>`, `ConcurrentHashMap<Long, V>`, `HashSet<Long>` and similar boxed collections with fastutil primitive-type-specialized collections (`Long2ObjectOpenHashMap`, `Long2LongOpenHashMap`, `LongOpenHashSet`, etc.) across FE hot paths to reduce memory footprint and eliminate autoboxing overhead. ### Key changes: - **New classes**: `ConcurrentLong2ObjectHashMap<V>` and `ConcurrentLong2LongHashMap` — thread-safe wrappers over fastutil maps using segment-based locking, replacing `ConcurrentHashMap<Long, V>` where concurrent access is needed - **Gson TypeAdapters**: Added serialization/deserialization support for `Long2ObjectOpenHashMap`, `Long2LongOpenHashMap`, `LongOpenHashSet`, and the concurrent variants. Wire format is backward-compatible with `HashMap<Long, V>` (string-keyed JSON objects) for rolling upgrade safety - **27 files converted** across catalog, transaction, statistics, alter, clone, cloud, and load subsystems ### Memory savings per collection type: | Pattern | Before (per entry) | After (per entry) | Savings | |---------|-------------------|-------------------|---------| | `HashMap<Long, Long>` | ~64 bytes | ~16 bytes | **4x** | | `HashMap<Long, Object>` | ~48 bytes | ~16 bytes | **3x** | | `HashSet<Long>` | ~48 bytes | ~8 bytes | **6x** | | `ConcurrentHashMap<Long, V>` | ~64 bytes | ~16 bytes | **4x** | ### Scope of changes: **Priority 1 (tablet/replica scale, millions of entries):** - `MaterializedIndex.idToTablets` - `DeleteBitmapUpdateLockContext` — 4× Long2Long maps + nested maps - `TransactionState` — deltaRows, loadedTblIndexes, errorReplicas - `PublishVersionTask` / `PublishVersionDaemon` fields - `ReportHandler.ReportTask` fields - `DeleteJob` tablet sets **Priority 2 (per-partition scale, thousands+):** - `PartitionInfo` — 7 maps keyed by partition_id - `OlapTable.idToPartition` - `TableCommitInfo.idToPartitionCommitInfo` - `DatabaseTransactionMgr` — running/final transaction maps, subTxnIdToTxnId - `TableStatsMeta` / `AnalysisInfo` / `ColStatsMeta` — partition update rows, indexes row count **Priority 3-4 (alter, cloud, load):** - `SchemaChangeJobV2` / `RollupJobV2` tablet maps - `CloudGlobalTransactionMgr` fields - `RoutineLoadManager` fields - `TabletScheduler` fields **Estimated total heap savings for a cluster with 1.3M tablets: 350-700MB.** ## Test plan - [x] Unit tests for `ConcurrentLong2ObjectHashMap` (432 lines, covers CRUD, concurrency, iteration) - [x] Unit tests for `ConcurrentLong2LongHashMap` (455 lines, covers CRUD, concurrency, default values) - [ ] Existing FE unit test suite passes - [ ] Verify Gson serialization/deserialization backward compatibility (rolling upgrade from old format) - [ ] Cluster-level validation with large tablet count 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
