Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217673140 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since Spark 2.4, File listing for compute statistics is done in parallel by default. This can be disabled by setting `spark.sql.parallelFileListingInStatsComputation.enabled` to `False`. - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary files are not counted as data files when calculating table size during Statistics computation. - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In version 2.3 and earlier, empty strings are equal to `null` values and do not reflect to any characters in saved CSV files. For example, the row of `"a", null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as `a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to empty (not quoted) string. + - Since Spark 2.4 load command from local filesystem supports wildcards in the folder level paths(e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/). Now onwards normal space convention can be used in folder/file names (e.g. LOAD DATA INPATH 'tmp/folderName/file Name.csv), Older versions space in folder/file names has been represented using '%20'(e.g. LOAD DATA INPATH 'tmp/folderName/myFile%20Name.csv), this usage will not be supported from spark 2.4 version. --- End diff -- Is it specific to the local file system? << Yes, , its specific to local file system as in hdfs user can provide wildcard character in folder level also, for local file system folder level support was not there and error will be thrown) Can this text add a quick example of using ? too?<< Yes i added the same>>
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org