nchammas commented on a change in pull request #26958: [SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC URL: https://github.com/apache/spark/pull/26958#discussion_r360436906
########## File path: python/pyspark/sql/readwriter.py ########## @@ -520,20 +537,24 @@ def func(iterator): raise TypeError("path can be only string, list or RDD") @since(1.5) - def orc(self, path, mergeSchema=None, recursiveFileLookup=None): + def orc(self, path, mergeSchema=None, pathGlobFilter=None, recursiveFileLookup=None): """Loads ORC files, returning the result as a :class:`DataFrame`. :param mergeSchema: sets whether we should merge schemas collected from all ORC part-files. This will override ``spark.sql.orc.mergeSchema``. The default value is specified in ``spark.sql.orc.mergeSchema``. + :param pathGlobFilter: an optional glob pattern to only include files with paths matching + the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`. + It does not change the behavior of `partition discovery`_. :param recursiveFileLookup: recursively scan a directory for files. Using this option - disables `partition discovery`_. + disables `partition discovery`_. Review comment: Would you be open to a batch update of all docstrings to tweak the indentation? Or would it be considered as adding too much noise to the git history? I assumed we didn't want to make sweeping changes like that, so my idea was to update the indentation bit by bit as part of more significant changes, which is what I did with `recursiveFileLookup` and `mergeSchema` when I added those in. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org