[ 
https://issues.apache.org/jira/browse/HDDS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-5384:
----------------------------------
    Description: 
The OM's refreshPipeline (used by liststatus) implementation iterates over 
OmKeyLocationInfoGroup.getLocationList(), which is a very expensive call. It 
iterate over a collection of list of objects, allocates a new list, perform 
operations on each of them. In short, it's an O( n ) method in terms of space 
and time complexity.

There are many places in the Ozone code that uses this method. Most usages 
iterates over the generated list, without modifying the list. We should instead 
return the collection of lists, which is O( 1 ).

I have a client that issues many listStatus calls to examine the effect. Before 
the change, refreshPipeline costs 8.65% of heap usage. After: 1.95%.
CPU cost: before: 8.18% after: 4.22%

We should refrain from invoking getLocationList() as much as possible. But 
given the wide usage in the code, I elect not to remove the usage completely to 
avoid destabilizing it. Instead, I changed the usage in refreshPipeline to 
demonstrate its impact.

  was:
The OM's refreshPipeline (used by liststatus) implementation iterates over 
OmKeyLocationInfoGroup.getLocationList(), which is a very expensive call. It 
iterate over a collection of list of objects, allocates a new list, perform 
operations on each of them. In short, it's an O(n) method in terms of space and 
time complexity.

There are many places in the Ozone code that uses this method. Most usages 
iterates over the generated list, without modifying the list. We should instead 
return the collection of lists, which is O(1).

I have a client that issues many listStatus calls to examine the effect. Before 
the change, refreshPipeline costs 8.65% of heap usage. After: 1.95%.
CPU cost: before: 8.18% after: 4.22%

We should refrain from invoking getLocationList() as much as possible. But 
given the wide usage in the code, I elect not to remove the usage completely to 
avoid destabilizing it. Instead, I changed the usage in refreshPipeline to 
demonstrate its impact.


> OM refreshPipeline should not invoke the expensive 
> OmKeyLocationInfoGroup.getLocationList()
> -------------------------------------------------------------------------------------------
>
>                 Key: HDDS-5384
>                 URL: https://issues.apache.org/jira/browse/HDDS-5384
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: OM
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: om_listatus_alloc_after.svg, 
> om_liststatus_alloc_before.svg, om_liststatus_cpu_after.svg, 
> om_liststatus_cpu_before.svg
>
>
> The OM's refreshPipeline (used by liststatus) implementation iterates over 
> OmKeyLocationInfoGroup.getLocationList(), which is a very expensive call. It 
> iterate over a collection of list of objects, allocates a new list, perform 
> operations on each of them. In short, it's an O( n ) method in terms of space 
> and time complexity.
> There are many places in the Ozone code that uses this method. Most usages 
> iterates over the generated list, without modifying the list. We should 
> instead return the collection of lists, which is O( 1 ).
> I have a client that issues many listStatus calls to examine the effect. 
> Before the change, refreshPipeline costs 8.65% of heap usage. After: 1.95%.
> CPU cost: before: 8.18% after: 4.22%
> We should refrain from invoking getLocationList() as much as possible. But 
> given the wide usage in the code, I elect not to remove the usage completely 
> to avoid destabilizing it. Instead, I changed the usage in refreshPipeline to 
> demonstrate its impact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to