[ https://issues.apache.org/jira/browse/HADOOP-16202?focusedWorklogId=652405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652405 ]
ASF GitHub Bot logged work on HADOOP-16202: ------------------------------------------- Author: ASF GitHub Bot Created on: 17/Sep/21 17:37 Start Date: 17/Sep/21 17:37 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2584: URL: https://github.com/apache/hadoop/pull/2584#discussion_r711237755 ########## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java ########## @@ -518,4 +522,112 @@ public String toString() { MD5MD5CRC, // MD5 of block checksums, which are MD5 over chunk CRCs COMPOSITE_CRC // Block/chunk-independent composite CRC } + + /** + * The standard {@code openFile()} options. + */ + @InterfaceAudience.Public + @InterfaceStability.Evolving + public static final class OpenFileOptions { + + private OpenFileOptions() { + } + + /** + * Prefix for all standard filesystem options: {@value}. + */ + public static final String FILESYSTEM_OPTION = "fs.option."; + + /** + * Prefix for all openFile options: {@value}. + */ + public static final String FS_OPTION_OPENFILE = + FILESYSTEM_OPTION + "openfile."; + + /** + * OpenFile option for file length: {@value}. + */ + public static final String FS_OPTION_OPENFILE_LENGTH = + FS_OPTION_OPENFILE + "length"; + + /** + * OpenFile option for split start: {@value}. + */ + public static final String FS_OPTION_OPENFILE_SPLIT_START = + FS_OPTION_OPENFILE + "split.start"; + + /** + * OpenFile option for split end: {@value}. + */ + public static final String FS_OPTION_OPENFILE_SPLIT_END = + FS_OPTION_OPENFILE + "split.end"; + + /** + * OpenFile option for buffer size: {@value}. + */ + public static final String FS_OPTION_OPENFILE_BUFFER_SIZE = + FS_OPTION_OPENFILE + "buffer.size"; + + /** + * OpenFile option for read policies: {@value}. + */ + public static final String FS_OPTION_OPENFILE_READ_POLICY = + FS_OPTION_OPENFILE + "read.policy"; + + /** + * Read policy for adaptive IO: {@value}. + */ + public static final String FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE = Review comment: good point. I want us to have some basic names of them so they can be used more broadly, and in particular, we can have tools asking for some "whole-file", knowing that an FS which recognises them will handle them properly. FWIW whole-file is one I've realised abfs & s3a can do very efficiently * switch to a buffer read strategy of big buffers but fewer readers * start to read immediately in contrast, `sequential` is often used for splits, so lazy seek is needed and as it may end earlier, smaller buffers still make sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 652405) Time Spent: 12h 10m (was: 12h) > Stabilize openFile() and adopt internally > ----------------------------------------- > > Key: HADOOP-16202 > URL: https://issues.apache.org/jira/browse/HADOOP-16202 > Project: Hadoop Common > Issue Type: Bug > Components: fs, fs/s3, tools/distcp > Affects Versions: 3.3.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Labels: pull-request-available > Time Spent: 12h 10m > Remaining Estimate: 0h > > The {{openFile()}} builder API lets us add new options when reading a file > Add an option {{"fs.s3a.open.option.length"}} which takes a long and allows > the length of the file to be declared. If set, *no check for the existence of > the file is issued when opening the file* > Also: withFileStatus() to take any FileStatus implementation, rather than > only S3AFileStatus -and not check that the path matches the path being > opened. Needed to support viewFS-style wrapping and mounting. > and Adopt where appropriate to stop clusters with S3A reads switched to > random IO from killing download/localization > * fs shell copyToLocal > * distcp > * IOUtils.copy -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org