Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217802972 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since Spark 2.4, File listing for compute statistics is done in parallel by default. This can be disabled by setting `spark.sql.parallelFileListingInStatsComputation.enabled` to `False`. - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary files are not counted as data files when calculating table size during Statistics computation. - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In version 2.3 and earlier, empty strings are equal to `null` values and do not reflect to any characters in saved CSV files. For example, the row of `"a", null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as `a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to empty (not quoted) string. + - Since Spark 2.4 load command from local filesystem supports wildcards in the folder level paths(e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/).Also in Older versions space in folder/file names has been represented using '%20'(e.g. LOAD DATA INPATH 'tmp/folderName/myFile%20Name.csv), this usage will not be supported from spark 2.4 version. Since Spark 2.4, Spark supports normal space character in folder/file names (e.g. LOAD DATA INPATH 'hdfs://tmp/folderName/file Name.csv') and wildcard character '?' can be used. (e.g. LOAD DATA INPATH 'hdfs://tmp/folderName/fileName?.csv') --- End diff -- @cloud-fan We follow the same syntax as old versions for Load command path, except in older versions user was not able to provide wildcard characters in folder level of the local fs , Now we do support with our new implementation and even in hdfs we do support the same syntax. So now it is consistent. All the usage which i mentioned can be applied in both local and hdfs file systems. Now the usages are more consistent compare to older versions. For more details please refer below PR let me know for any clarifications. Thanks https://github.com/apache/spark/pull/20611
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org