Reading files directly from Amazon S3 can be frustrating especially if you're dealing with a large number of input files, could you please elaborate more on your use-case? Does the S3 bucket in question already contain a large number of files?
The implementation of the * wildcard operator in S3 input paths requires an AWS S3 API call to list everything based on the common-prefix; so if your input is something like; s3://my-bucket/<year>/<month>/<date>/*.json Then the prefix "<year>/<month>/<date>/" will be passed to the API and should be fairly efficient. However if you're doing something more adventurous like; s3://my-bucket/*/*/*/*.json There is no common-prefix to give the API here, it will literally list every object in the bucket and then filter client-side to find anything that matches "*.json", these types of requests are prone to timeouts and other intermittent issues as well as taking a ridiculous amount of time before the job can start.