[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

youngbink Mon, 18 Dec 2017 20:30:43 -0800

Github user youngbink commented on the issue:

    https://github.com/apache/spark/pull/20015
  
    @HyukjinKwon  Just took a look at this PR #14788. 
    
    My point of mentioning those databases was just to give examples of the 
function that Spark doesn't support but other databases commonly do. (They all 
have this `date_trunc` which takes `timestamp` and output `timestamp`)
    As you said, we could extend `trunc` and simply create an alias 
`date_trunc`, but it's actually not as simple. For e.g, PR #14788 won't be able 
to handle the following command collectly on PySpark:
    ```
    df = spark.createDataFrame([('1997-02-28 05:02:11',)], ['d'])
    df.select(functions.trunc(df.d, 'year').alias('year')).collect()  
    df.select(functions.trunc(df.d, 'SS').alias('SS')).collect() 
    ```
    This is because `trunc(string, string)` isn't correctly handled. We could 
find a way around this and get it working, but after having a discussion with 
@cloud-fan, @gatorsmile, @rednaxelafx and Reynold, we decided to add 
`date_trunc` to be compatible with Postgres for now.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

Reply via email to