[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

sujith71955 Fri, 14 Sep 2018 03:45:44 -0700

Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22396#discussion_r217673140
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
       - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
       - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
       - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
    +  - Since Spark 2.4 load command from local filesystem supports wildcards 
in the folder level paths(e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/). Now 
onwards normal space convention can be used in folder/file names (e.g. LOAD 
DATA INPATH 'tmp/folderName/file Name.csv), Older versions space in folder/file 
names has been represented using '%20'(e.g. LOAD DATA INPATH 
'tmp/folderName/myFile%20Name.csv), this usage will not be supported from spark 
2.4 version.
    --- End diff --
    
    Is it specific to the local file system? << Yes, , its specific to local 
file system as in hdfs user can provide wildcard character in folder level 
also, for local file system folder level support was not there and error will 
be thrown)
    Can this text add a quick example of using ? too?<< Yes i added the same>>




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

Reply via email to