ManManson commented on code in PR #13796:
URL: https://github.com/apache/arrow/pull/13796#discussion_r939504438


##########
cpp/src/arrow/filesystem/filesystem.h:
##########
@@ -132,6 +133,9 @@ struct ARROW_EXPORT FileSelector {
   bool recursive;
   /// The maximum number of subdirectories to recurse into.
   int32_t max_recursion;
+  /// How many partitions should be processed in parallel. May not be 
supported by all
+  /// implementations of `GetFileSystemGenerator`.
+  util::optional<int32_t> partitions_readahead;

Review Comment:
   >We don't have any ABI guarantees. But when there is a well-known default 
value, it doesn't make sense to pass an optional, IMHO. 
   
   Maybe you are right, this can be supplied a reasonable default value, so no 
need to make it optional.
   
   > However, I also don't understand why this is an attribute of FileSelector. 
This sounds more like an implementation-specific know that should probably be 
in LocalFileSystemOptions.
   
   Well, I don't have a strong opinion on this one. Don't see anything wrong 
about it being a part of FileSelector (which is exactly designed to specify 
details of file selection algorithms). Though, since this option is only 
applicable for `LocalFileSystem`, maybe it makes sense to hide it behind 
`LocalFileSystemOptions` thing.
   
   > (I'm also curious why this needs to be exposed at all)
   
   To be able to fine-tune the behavior if the default doesn't work well in a 
particular case, or better performance can be achieved with another value. For 
example, various filesystems have varying capabilities regarding parallel IO, 
e.g. XFS, which is the only FS I know of, that is capable of truly async IO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to