[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85146/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85146/testReport)** for PR 20015 at commit [`238d7d4`](https://github.com/apache/spark/commit/238d7d470c583c910bccbca8bbcaa681b67d6025). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait TruncInstant extends BinaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20015 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20015 Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85141/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85141/testReport)** for PR 20015 at commit [`0d1a8cb`](https://github.com/apache/spark/commit/0d1a8cbc922bc410d8d4a69c26b16290773a197c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait TruncInstant extends BinaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85146/testReport)** for PR 20015 at commit [`238d7d4`](https://github.com/apache/spark/commit/238d7d470c583c910bccbca8bbcaa681b67d6025). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85145/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85140/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85140/testReport)** for PR 20015 at commit [`80a1959`](https://github.com/apache/spark/commit/80a195989bd138fade5910cbf495b472a57ce445). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait TruncInstant extends BinaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85141/testReport)** for PR 20015 at commit [`0d1a8cb`](https://github.com/apache/spark/commit/0d1a8cbc922bc410d8d4a69c26b16290773a197c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85140/testReport)** for PR 20015 at commit [`80a1959`](https://github.com/apache/spark/commit/80a195989bd138fade5910cbf495b472a57ce445). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85132/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85132/testReport)** for PR 20015 at commit [`b12ba92`](https://github.com/apache/spark/commit/b12ba92add942c087dd45933464937479fc24bcd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait TruncInstant extends BinaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85131/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85131/testReport)** for PR 20015 at commit [`3547b7c`](https://github.com/apache/spark/commit/3547b7c0bf8caa972e018f6875e4b0f599c4e12f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait TruncInstant extends BinaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85132/testReport)** for PR 20015 at commit [`b12ba92`](https://github.com/apache/spark/commit/b12ba92add942c087dd45933464937479fc24bcd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85131/testReport)** for PR 20015 at commit [`3547b7c`](https://github.com/apache/spark/commit/3547b7c0bf8caa972e018f6875e4b0f599c4e12f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20015 - The API proposed by this PR is consistent with the other DBs. - The implementation does not introduce the behavior changes. The implementation is clean and the PR quality is pretty good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20015 SPARK-17174 originally described few functions related with hour, min, etc. but I received an advice to fix up other related functions too even though they could also be done alone too. I agreed with doing other functions too at that time and I tried to propose as so. I am saying I think this PR actually more targets adding another (better) version of `trunc` to support day, hour, min, etc. in the format. In this case, I think we should deduplicate the logics with related functions too. Ah, so, I think I am less sure about why this should be done alone leaving out other related changes, and other functions we (I) usually reject. and I think you and @cloud-fan say the reasons are, it's common and this PR targets a separate functionality consistent with other DBMS. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user youngbink commented on the issue: https://github.com/apache/spark/pull/20015 hmm...even if we decide to change this later, I honestly think merging `trunc` and `date_trunc` would be a simple work that only touches a couple of files, mostly in `datetimeExpressions.scala`. This PR isn't too small as you said, but most of the codes here can be used without modification when we merge `date_trunc`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20015 I get `date_trunc` is common in other DBMS. I can see that this can be done now and we can still proceed `trunc`, etc. later. So, I am fine but still less sure tho. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20015 If we haven't get a similar function, I would have gone +1 but what I am less sure is `date_trunc` actually quite sounds a better version of `trunc` to be honest. Seems both also extend the same parent here `TruncTime`. I feel like we are trying to add this better version alone by working around because it takes a relatively larger change to update other related functions consistently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20015 We had an offline discussion and wanna send this out to get more feedbacks. So generally just adding `date_trunc` is pretty straightforward and makes Spark consistent with other databases about this function, while extending `trunc` to support timestamp type is a better API design. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20015 Yeah keep any substantive discussion on the public lists. Sometimes a side conversation happens; summarize the points here. We've rejected a lot of other functions that other DBs, but not Hive, support. Spark mostly follows Hive, and for everything else, there are UDFs. I'm not against this so much as not clear why it's exceptional --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20015 OK. I am fine if you all guys strongly feel about this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85083/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85083/testReport)** for PR 20015 at commit [`f94f401`](https://github.com/apache/spark/commit/f94f401bcfd765b21c3fb466041b42a605d6a814). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20015 > after having a discussion with @cloud-fan, @gatorsmile, @rednaxelafx and Reynold Where did the discussion happen? Was this offline discussion? I also want to actively join in the discussion. Many implementations of the trunc works differently and I think we decide the current behaviour after sufficient discussion. If we don't fix the stuff about #14788 in 2.3.0 timeline, it could be even more difficult because we need to keep the previous behaviour. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user youngbink commented on the issue: https://github.com/apache/spark/pull/20015 @HyukjinKwon Just took a look at this PR #14788. My point of mentioning those databases was just to give examples of the function that Spark doesn't support but other databases commonly do. (They all have this `date_trunc` which takes `timestamp` and output `timestamp`) As you said, we could extend `trunc` and simply create an alias `date_trunc`, but it's actually not as simple. For e.g, PR #14788 won't be able to handle the following command collectly on PySpark: ``` df = spark.createDataFrame([('1997-02-28 05:02:11',)], ['d']) df.select(functions.trunc(df.d, 'year').alias('year')).collect() df.select(functions.trunc(df.d, 'SS').alias('SS')).collect() ``` This is because `trunc(string, string)` isn't correctly handled. We could find a way around this and get it working, but after having a discussion with @cloud-fan, @gatorsmile, @rednaxelafx and Reynold, we decided to add `date_trunc` to be compatible with Postgres for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20015 @cloud-fan and @youngbink how about reviving https://github.com/apache/spark/pull/14788 with a configuration to control this? AWS Redshift seems having `TRUNC` which just converts a timestamp to a date whereas we have Spark's `trunc` where supports date formats. This is not quite equivalent. I think Spark's `trunc` is more like Redshift's `DATE_TRUNC`. PostgreSQL does not have `trunc` but has `date_trunc` where we can specify the format and returns a timestamp always. Presto also looks not having a duplicated functionality. I think we can simply introduce an alias for `trunc` after resolving https://github.com/apache/spark/pull/14788. Did I maybe miss something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org