[ https://issues.apache.org/jira/browse/HDFS-16891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678031#comment-17678031 ]
ASF GitHub Bot commented on HDFS-16891: --------------------------------------- virajjasani commented on PR #5300: URL: https://github.com/apache/hadoop/pull/5300#issuecomment-1386270795 Thanks for the reviews @cnauroth @sodonnel. > which will result in the image failing to load and the NN aborting, so its an exception that we really don't expect to happen. That is correct. As such this is going to lead to failure eventually. The only reason I came across this sometime back was due to profiling of a purposeful failure asserting test. We would like to use this parallelism of inodes loading with hadoop 3 upgrades (still running hadoop 2 for majority clusters), and hence running some tests around this. > Can the code be simplifed to this? > final List<IOException> exceptions = Collections.synchronizedList(new ArrayList<>()); > Using `Collections.synchronizedList` does seem simpler or synchronizing on the exceptions object rather than having a separate lock object probably makes sense to simplify this change further. Sounds good, thanks. > Avoid the overhead of copy-on-write exception list while loading inodes sub > sections in parallel > ------------------------------------------------------------------------------------------------ > > Key: HDFS-16891 > URL: https://issues.apache.org/jira/browse/HDFS-16891 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.3.4 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Labels: pull-request-available > > If we enable parallel loading and persisting of inodes from/to fs image, we > get the benefit of improved performance. However, while loading sub-sections > INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write > list to maintain the list of exceptions. Since our usecase is not to iterate > over this list while executor threads are adding new elements to the list, > using copy-on-write is bit of an overhead for this usecase. > It would be better to synchronize adding new elements to the list rather than > having the list copy all elements over every time new element is added to the > list. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org