[ 
https://issues.apache.org/jira/browse/HDFS-16891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678009#comment-17678009
 ] 

ASF GitHub Bot commented on HDFS-16891:
---------------------------------------

sodonnel commented on PR #5300:
URL: https://github.com/apache/hadoop/pull/5300#issuecomment-1386198922

   I don't recall my reason for using a copyOnWrite list, but the list is only 
used in the case of an exception, which will result in the image failing to 
load and the NN aborting, so its an exception that we really don't expect to 
happen. Therefore as it stands, the CopyOnWrite list has basically zero 
overhead. Even if there are exceptions, the total number of entries is equal to 
the parallel loading threads, so low tens of entries at the most.
   
   Using `Collections.synchronizedList` does seem simpler or synchronizing on 
the exceptions object rather than having a separate lock object probably makes 
sense to simplify this change further.




> Avoid the overhead of copy-on-write exception list while loading inodes sub 
> sections in parallel
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16891
>                 URL: https://issues.apache.org/jira/browse/HDFS-16891
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.4
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> If we enable parallel loading and persisting of inodes from/to fs image, we 
> get the benefit of improved performance. However, while loading sub-sections 
> INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write 
> list to maintain the list of exceptions. Since our usecase is not to iterate 
> over this list while executor threads are adding new elements to the list, 
> using copy-on-write is bit of an overhead for this usecase.
> It would be better to synchronize adding new elements to the list rather than 
> having the list copy all elements over every time new element is added to the 
> list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to