aiden-sun opened a new issue, #18331:
URL: https://github.com/apache/druid/issues/18331

   ### Affected Version
   
   apache-druid-31.0.1
   
   ### Environment:
   - 30 Historical nodes
   - More than 50,000 segments
   - 1 × 256-core server deployed per 5 Historical nodes
   - Coordinator balancer strategy: cost
   - Segment size distribution: varies between 10–500 MB per segment
   - Deep storage: HDFS
   - Historical cache: nodes in the hot1 tier use tmpfs (memory-backed 
volumes), while nodes in the hot2 tier use NVMe SSDs
   - Cluster has been running for multiple months
   - Data update pattern: Irregular daily full updates (all) + hourly full 
updates (critical/30d)
   
   ### Problem Description
   Cache usage varies significantly across Historical nodes within the same 
tier, ranging from 38% to 90%.
   
   - Some nodes experience noticeable increases in cache usage — for example, 
from 60% to 85% within one hour.
   - In some cases, nodes with over 80% cache utilization continue loading new 
segments that are not associated with any ongoing offline or real-time 
ingestion tasks.
   - Nodes using tmpfs or memory-backed volumes show more severe skew and 
higher volatility in cache usage compared to those using NVMe SSDs.
   
   We have attempted the following tuning options with no meaningful 
improvement:
   - Disabling Smart Segment Loading
   - Adjusting maxSegmentsToMove from 100 to 1000
   - Testing useRoundRobinSegmentAssignment with both enabled and disabled 
states
   
   In a Druid 25 test cluster with append-only data sources and a stable 
segment count exceeding 1,000 (each ~300–500MB in size), segment distribution 
shows balanced allocation across Historical nodes.
   
   We suspect the issue stems from the data balancing mechanism, and hope that 
adjusting the relevant configurations can help improve cache usage balance.
   
   Current Configuration:
   
   ```json
   {
        "millisToWaitBeforeDeleting": 900000,
        "mergeBytesLimit": 524288000,
        "mergeSegmentsLimit": 200,
        "maxSegmentsToMove": 200,
        "replicantLifetime": 15,
        "replicationThrottleLimit": 500,
        "balancerComputeThreads": 50,
        "killDataSourceWhitelist": [],
        "killTaskSlotRatio": 0.1,
        "maxKillTaskSlots": 2147483647,
        "killPendingSegmentsSkipList": [],
        "maxSegmentsInNodeLoadingQueue": 1000,
        "decommissioningNodes": [],
        "pauseCoordination": false,
        "replicateAfterLoadTimeout": false,
        "useRoundRobinSegmentAssignment": false,
        "smartSegmentLoading": false,
        "debugDimensions": null
   }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to