[ https://issues.apache.org/jira/browse/HADOOP-17559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HADOOP-17559 started by Steve Loughran. ----------------------------------------------- > S3Guard import can OOM on large imports > --------------------------------------- > > Key: HADOOP-17559 > URL: https://issues.apache.org/jira/browse/HADOOP-17559 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.3.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > > I know I'm closing ~all S3Guard issues as wontfix, but this is pressing so > I'm going to do it anyway > S3guard import of directory tree containing many, many files will OOM. > Looking at the code this is going to be because > * import tool builds a map of all dirs imported, which as the comments note > "superfluous for DDB". - *cut* > * DDB AncestorState tracks files as well as dirs, purely as a safety check to > make sure current op doesn't somehow write a file entry above a dir entry in > the same operation > We've been running S3Guard for a long time, and condition #2 has never arisen. > Propose: don't store filenames there, so memory consumption goes from O(files > + dirs) to O(dirs) > Code straightforward, can't think of any tests -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org