[ 
https://issues.apache.org/jira/browse/HDDS-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-15335:
------------------------------
    Description: 
NSSummaryTask is a ReconOmTask that the dispatcher fans out on every batch of 
OM RocksDB updates Recon ingests.

Inside its process() method, three sub-tasks (FSO / Legacy / OBS) ran 
sequentially even though they operate on disjoint slices of the event stream 
(filtered by table and bucket layout) and write to disjoint NSSummary entries. 
The Legacy and OBS sub-tasks were also each individually slower than necessary 
because every event triggered a fresh RocksDB point read of the corresponding 
OmBucketInfo from Recon's local OM snapshot DB (via 
{{getBucketTable().getSkipCache(...)}}), even though bucket layout and objectID 
never change once a bucket exists.

Changes proposed:

  1. NSSummaryTaskDbEventHandler caches OmBucketInfo lookups in a
     field-level Map keyed by the bucket DB key. Bucket layout/objectID
     is immutable for an existing bucket, so an unbounded cache is safe;
     cluster bucket count is bounded so memory is not a concern. After
     the first event for a given bucket, the cost drops from a RocksDB
     point read to a HashMap.get().

  2. NSSummaryTask.process() submits each of the three sub-tasks to its
     own thread in a 3-thread pool and joins on all three. The threads
     do not partition events — all three see the same event list and
     each independently iterates it, processing only the events whose
     (table, bucket layout) matches its target:

       - FSO thread: events on fileTable / dirTable / deletedDirTable.
       - Legacy thread: keyTable events whose bucket layout is LEGACY.
       - OBS thread: keyTable events whose bucket layout is OBJECT_STORE.

     Events that don't match a thread's target are skipped (table-name
     check, or bucket-layout check after a now-cached bucket lookup
     from change 1). Each sub-task already maintains its own per-call
     NSSummary accumulation map and writes to ReconNamespaceSummaryManager
     only at flush time via an atomic RDBBatchOperation; the target
     NSSummary entries are disjoint between FSO and Legacy/OBS (FSO has
     its own namespace tree) and between Legacy and OBS (a bucket has
     exactly one layout), so no synchronization is needed across
     threads. Per-sub-task seek positions and per-task failure flags
     are preserved — same TaskResult contract as before.

  3. In the OBS UPDATE path, drop the redundant getKeyParentID(oldKeyInfo)
     call. The parent of an OBS key is the bucket, and a key cannot move
     between buckets via an UPDATE event (that would be a DELETE+PUT), so
     the parent objectID computed for the new key value is identical to
     the parent objectID for the old key value.

  was:
NSSummaryTask is a ReconOmTask that the dispatcher fans out on every batch of 
OM RocksDB updates Recon ingests.

Inside its process() method, three sub-tasks (FSO / Legacy / OBS) ran 
sequentially even though they operate on disjoint slices of the event stream 
(filtered by table and bucket layout) and write to disjoint NSSummary entries. 
The Legacy and OBS sub-tasks were also each individually slower than necessary 
because every event triggered a fresh RocksDB point read of the corresponding 
OmBucketInfo from Recon's local OM snapshot DB (via 
{{getBucketTable().getSkipCache(...)}}), even though bucket layout and objectID 
never change once a bucket exists.


> Recon: parallelize NSSummaryTask sub-tasks and cache OmBucketInfo lookups
> -------------------------------------------------------------------------
>
>                 Key: HDDS-15335
>                 URL: https://issues.apache.org/jira/browse/HDDS-15335
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Recon
>            Reporter: Siyao Meng
>            Assignee: Siyao Meng
>            Priority: Major
>
> NSSummaryTask is a ReconOmTask that the dispatcher fans out on every batch of 
> OM RocksDB updates Recon ingests.
> Inside its process() method, three sub-tasks (FSO / Legacy / OBS) ran 
> sequentially even though they operate on disjoint slices of the event stream 
> (filtered by table and bucket layout) and write to disjoint NSSummary 
> entries. The Legacy and OBS sub-tasks were also each individually slower than 
> necessary because every event triggered a fresh RocksDB point read of the 
> corresponding OmBucketInfo from Recon's local OM snapshot DB (via 
> {{getBucketTable().getSkipCache(...)}}), even though bucket layout and 
> objectID never change once a bucket exists.
> Changes proposed:
>   1. NSSummaryTaskDbEventHandler caches OmBucketInfo lookups in a
>      field-level Map keyed by the bucket DB key. Bucket layout/objectID
>      is immutable for an existing bucket, so an unbounded cache is safe;
>      cluster bucket count is bounded so memory is not a concern. After
>      the first event for a given bucket, the cost drops from a RocksDB
>      point read to a HashMap.get().
>   2. NSSummaryTask.process() submits each of the three sub-tasks to its
>      own thread in a 3-thread pool and joins on all three. The threads
>      do not partition events — all three see the same event list and
>      each independently iterates it, processing only the events whose
>      (table, bucket layout) matches its target:
>        - FSO thread: events on fileTable / dirTable / deletedDirTable.
>        - Legacy thread: keyTable events whose bucket layout is LEGACY.
>        - OBS thread: keyTable events whose bucket layout is OBJECT_STORE.
>      Events that don't match a thread's target are skipped (table-name
>      check, or bucket-layout check after a now-cached bucket lookup
>      from change 1). Each sub-task already maintains its own per-call
>      NSSummary accumulation map and writes to ReconNamespaceSummaryManager
>      only at flush time via an atomic RDBBatchOperation; the target
>      NSSummary entries are disjoint between FSO and Legacy/OBS (FSO has
>      its own namespace tree) and between Legacy and OBS (a bucket has
>      exactly one layout), so no synchronization is needed across
>      threads. Per-sub-task seek positions and per-task failure flags
>      are preserved — same TaskResult contract as before.
>   3. In the OBS UPDATE path, drop the redundant getKeyParentID(oldKeyInfo)
>      call. The parent of an OBS key is the bucket, and a key cannot move
>      between buckets via an UPDATE event (that would be a DELETE+PUT), so
>      the parent objectID computed for the new key value is identical to
>      the parent objectID for the old key value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to