Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 1. updated the code to bypass the glob routine when there is no wildcard; this bypasses something fairly inefficient. 1. reporting FNFE on that base dir differently; skip the stack trace (maybe: log at a lower level?). 1. Updated the docs with a special list of blobstore best practises. It's a bit hard to get some of that phrasing of what the wildcard does right; needs careful review. Tested using my s3 streaming test, which did use a * in the wildcard. All works, but no improvements in speed on what is a fairly unrealistic structure. The time to recursively list object stores remotely is tangibly slow. Maybe that should go in the text too: "it can be take seconds to scan object stores for new data, with the time being proportional to directory depth and the number of files in a directory. Shallow and wide directory trees are faster"
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org