[ https://issues.apache.org/jira/browse/HADOOP-18291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733712#comment-17733712 ]
ASF GitHub Bot commented on HADOOP-18291: ----------------------------------------- virajjasani commented on PR #5754: URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1595661932 `us-west-2`: ``` $ mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale -Dprefetch $ mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale ``` > S3A prefetch - Implement LRU cache for SingleFilePerBlockCache > -------------------------------------------------------------- > > Key: HADOOP-18291 > URL: https://issues.apache.org/jira/browse/HADOOP-18291 > Project: Hadoop Common > Issue Type: Sub-task > Affects Versions: 3.4.0 > Reporter: Ahmar Suhail > Assignee: Viraj Jasani > Priority: Major > Labels: pull-request-available > > Currently there is no limit on the size of disk cache. This means we could > have a large number of files on files, especially for access patterns that > are very random and do not always read the block fully. > > eg: > in.seek(5); > in.read(); > in.seek(blockSize + 10) // block 0 gets saved to disk as it's not fully read > in.read(); > in.seek(2 * blockSize + 10) // block 1 gets saved to disk > .. and so on > > The in memory cache is bounded, and by default has a limit of 72MB (9 > blocks). When a block is fully read, and a seek is issued it's released > [here|https://github.com/apache/hadoop/blob/feature-HADOOP-18028-s3a-prefetch/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3CachingInputStream.java#L109]. > We can also delete the on disk file for the block here if it exists. > > Also maybe add an upper limit on disk space, and delete the file which stores > data of the block furthest from the current block (similar to the in memory > cache) when this limit is reached. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org