Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19389#discussion_r152307222
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1577,6 +1577,145 @@ options.
     
       - Since Spark 2.3, the queries from raw JSON/CSV files are disallowed 
when the referenced columns only include the internal corrupt record column 
(named `_corrupt_record` by default). For example, 
`spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()`
 and `spark.read.schema(schema).json(file).select("_corrupt_record").show()`. 
Instead, you can cache or save the parsed results and then send the same query. 
For example, `val df = spark.read.schema(schema).json(file).cache()` and then 
`df.filter($"_corrupt_record".isNotNull).count()`.
       - The `percentile_approx` function previously accepted numeric type 
input and output double type results. Now it supports date type, timestamp type 
and numeric types as input types. The result type is also changed to be the 
same as the input type, which is more reasonable for percentiles.
    +  - Partition column inference previously found incorrect common type for 
different inferred types, for example, previously it ended up with double type 
as the common type for double type and date type. Now it finds the correct 
common type for such conflicts. The conflict resolution follows the table below:
    --- End diff --
    
    Doc shows as below:
    <img width="1144" alt="2017-11-22 12 19 44" 
src="https://user-images.githubusercontent.com/6477701/33080370-45ae19ba-cf1b-11e7-9876-0f794974dff4.png";>



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to