[ 
https://issues.apache.org/jira/browse/HDFS-16891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678031#comment-17678031
 ] 

ASF GitHub Bot commented on HDFS-16891:
---------------------------------------

virajjasani commented on PR #5300:
URL: https://github.com/apache/hadoop/pull/5300#issuecomment-1386270795

   Thanks for the reviews @cnauroth @sodonnel.
   
   > which will result in the image failing to load and the NN aborting, so its 
an exception that we really don't expect to happen.
   
   That is correct. As such this is going to lead to failure eventually. The 
only reason I came across this sometime back was due to profiling of a 
purposeful failure asserting test. We would like to use this parallelism of 
inodes loading with hadoop 3 upgrades (still running hadoop 2 for majority 
clusters), and hence running some tests around this.
   
   
   > Can the code be simplifed to this?
   > final List<IOException> exceptions = Collections.synchronizedList(new 
ArrayList<>());
   
   > Using `Collections.synchronizedList` does seem simpler or synchronizing on 
the exceptions object rather than having a separate lock object probably makes 
sense to simplify this change further.
   
   Sounds good, thanks.




> Avoid the overhead of copy-on-write exception list while loading inodes sub 
> sections in parallel
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16891
>                 URL: https://issues.apache.org/jira/browse/HDFS-16891
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.4
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> If we enable parallel loading and persisting of inodes from/to fs image, we 
> get the benefit of improved performance. However, while loading sub-sections 
> INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write 
> list to maintain the list of exceptions. Since our usecase is not to iterate 
> over this list while executor threads are adding new elements to the list, 
> using copy-on-write is bit of an overhead for this usecase.
> It would be better to synchronize adding new elements to the list rather than 
> having the list copy all elements over every time new element is added to the 
> list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to