[ 
https://issues.apache.org/jira/browse/SAMZA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726758#comment-17726758
 ] 

Andy Sautins commented on SAMZA-2783:
-------------------------------------

PR: https://github.com/apache/samza/pull/1669

> Memoize DirDiffUtil to avoid repeated calls to areSameFile
> ----------------------------------------------------------
>
>                 Key: SAMZA-2783
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2783
>             Project: Samza
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Andy Sautins
>            Priority: Minor
>
> While profiling a Samza job it was noticed that, for this given job, ~38% of 
> the time was spent in 
> org.apache.samza.storage.blobstore.util.DirDiffUtil.getDirDiff, with the 
> primary contributor being areSameFile.
>  
> Looking at the code it has the following comment:
> DirDiffUtil.java:271
> {code:java}
>   // TODO MED shesharm: this compares each file in directory 3 times. 
> Categorize files in one traversal instead.{code}
>  
> While re-structuring the code is an option, a quick win would be to memoize 
> the results from areSameFile.  Re-structuring the code could potentially 
> result in a lower memory footprint ( memoize results are kept in memory ).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to