Repository: spark
Updated Branches:
  refs/heads/master c17a8ff52 -> ad43e2c1e


[SPARK-23792][DOCS] Documentation improvements for datetime functions

## What changes were proposed in this pull request?

Improved the documentation for the datetime functions in 
`org.apache.spark.sql.functions` by adding details about the supported column 
input types, the column return type, behaviour on invalid input, supporting 
examples and clarifications.

## How was this patch tested?

Manually testing each of the datetime functions with different input to ensure 
that the corresponding Javadoc/Scaladoc matches the behaviour of the function. 
Successfully ran the `unidoc` SBT process.

Closes #20901 from abradbury/SPARK-23792.

Authored-by: Adam Bradbury <abradb...@users.noreply.github.com>
Signed-off-by: Sean Owen <sean.o...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad43e2c1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad43e2c1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad43e2c1

Branch: refs/heads/master
Commit: ad43e2c1e8c2142a66b135766ff0d7712ce965db
Parents: c17a8ff
Author: Adam Bradbury <abradb...@users.noreply.github.com>
Authored: Sun Aug 26 08:37:52 2018 -0500
Committer: Sean Owen <sean.o...@databricks.com>
Committed: Sun Aug 26 08:37:52 2018 -0500

----------------------------------------------------------------------
 .../scala/org/apache/spark/sql/functions.scala  | 189 +++++++++++++++----
 1 file changed, 154 insertions(+), 35 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ad43e2c1/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index c933188..1d806e0 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -2626,8 +2626,12 @@ object functions {
   
//////////////////////////////////////////////////////////////////////////////////////////////
 
   /**
-   * Returns the date that is numMonths after startDate.
+   * Returns the date that is `numMonths` after `startDate`.
    *
+   * @param startDate A date, timestamp or string. If a string, the data must 
be in a format that
+   *                  can be cast to a date, such as `yyyy-MM-dd` or 
`yyyy-MM-dd HH:mm:ss.SSSS`
+   * @param numMonths The number of months to add to `startDate`, can be 
negative to subtract months
+   * @return A date, or null if `startDate` was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2655,12 +2659,15 @@ object functions {
    * Converts a date/timestamp/string to a value of string in the format 
specified by the date
    * format given by the second argument.
    *
-   * A pattern `dd.MM.yyyy` would return a string like `18.03.1993`.
-   * All pattern letters of `java.text.SimpleDateFormat` can be used.
+   * See [[java.text.SimpleDateFormat]] for valid date and time format patterns
    *
+   * @param dateExpr A date, timestamp or string. If a string, the data must 
be in a format that
+   *                 can be cast to a timestamp, such as `yyyy-MM-dd` or 
`yyyy-MM-dd HH:mm:ss.SSSS`
+   * @param format A pattern `dd.MM.yyyy` would return a string like 
`18.03.1993`
+   * @return A string, or null if `dateExpr` was a string that could not be 
cast to a timestamp
    * @note Use specialized functions like [[year]] whenever possible as they 
benefit from a
    * specialized implementation.
-   *
+   * @throws IllegalArgumentException if the `format` pattern is invalid
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2670,6 +2677,11 @@ object functions {
 
   /**
    * Returns the date that is `days` days after `start`
+   *
+   * @param start A date, timestamp or string. If a string, the data must be 
in a format that
+   *              can be cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param days  The number of days to add to `start`, can be negative to 
subtract days
+   * @return A date, or null if `start` was a string that could not be cast to 
a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2677,6 +2689,11 @@ object functions {
 
   /**
    * Returns the date that is `days` days before `start`
+   *
+   * @param start A date, timestamp or string. If a string, the data must be 
in a format that
+   *              can be cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param days  The number of days to subtract from `start`, can be negative 
to add days
+   * @return A date, or null if `start` was a string that could not be cast to 
a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2684,6 +2701,19 @@ object functions {
 
   /**
    * Returns the number of days from `start` to `end`.
+   *
+   * Only considers the date part of the input. For example:
+   * {{{
+   * dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59")
+   * // returns 1
+   * }}}
+   *
+   * @param end A date, timestamp or string. If a string, the data must be in 
a format that
+   *            can be cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param start A date, timestamp or string. If a string, the data must be 
in a format that
+   *              can be cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @return An integer, or null if either `end` or `start` were strings that 
could not be cast to
+   *         a date. Negative if `end` is before `start`
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2691,6 +2721,7 @@ object functions {
 
   /**
    * Extracts the year as an integer from a given date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2698,6 +2729,7 @@ object functions {
 
   /**
    * Extracts the quarter as an integer from a given date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2705,6 +2737,7 @@ object functions {
 
   /**
    * Extracts the month as an integer from a given date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2712,6 +2745,8 @@ object functions {
 
   /**
    * Extracts the day of the week as an integer from a given 
date/timestamp/string.
+   * Ranges from 1 for a Sunday through to 7 for a Saturday
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 2.3.0
    */
@@ -2719,6 +2754,7 @@ object functions {
 
   /**
    * Extracts the day of the month as an integer from a given 
date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2726,6 +2762,7 @@ object functions {
 
   /**
    * Extracts the day of the year as an integer from a given 
date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2733,16 +2770,20 @@ object functions {
 
   /**
    * Extracts the hours as an integer from a given date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
   def hour(e: Column): Column = withExpr { Hour(e.expr) }
 
   /**
-   * Given a date column, returns the last day of the month which the given 
date belongs to.
+   * Returns the last day of the month which the given date belongs to.
    * For example, input "2015-07-27" returns "2015-07-31" since July 31 is the 
last day of the
    * month in July 2015.
    *
+   * @param e A date, timestamp or string. If a string, the data must be in a 
format that can be
+   *          cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @return A date, or null if the input was a string that could not be cast 
to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2750,46 +2791,60 @@ object functions {
 
   /**
    * Extracts the minutes as an integer from a given date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
   def minute(e: Column): Column = withExpr { Minute(e.expr) }
 
   /**
-   * Returns number of months between dates `date1` and `date2`.
-   * If `date1` is later than `date2`, then the result is positive.
-   * If `date1` and `date2` are on the same day of month, or both are the last 
day of month,
-   * time of day will be ignored.
+   * Returns number of months between dates `start` and `end`.
+   *
+   * A whole number is returned if both inputs have the same day of month or 
both are the last day
+   * of their respective months. Otherwise, the difference is calculated 
assuming 31 days per month.
    *
-   * Otherwise, the difference is calculated based on 31 days per month, and 
rounded to
-   * 8 digits.
+   * For example:
+   * {{{
+   * months_between("2017-11-14", "2017-07-14")  // returns 4.0
+   * months_between("2017-01-01", "2017-01-10")  // returns 0.29032258
+   * months_between("2017-06-01", "2017-06-16 12:00:00")  // returns -0.5
+   * }}}
+   *
+   * @param end   A date, timestamp or string. If a string, the data must be 
in a format that can
+   *              be cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param start A date, timestamp or string. If a string, the data must be 
in a format that can
+   *              cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @return A double, or null if either `end` or `start` were strings that 
could not be cast to a
+   *         timestamp. Negative if `end` is before `start`
    * @group datetime_funcs
    * @since 1.5.0
    */
-  def months_between(date1: Column, date2: Column): Column = withExpr {
-    new MonthsBetween(date1.expr, date2.expr)
+  def months_between(end: Column, start: Column): Column = withExpr {
+    new MonthsBetween(end.expr, start.expr)
   }
 
   /**
-   * Returns number of months between dates `date1` and `date2`. If `roundOff` 
is set to true, the
+   * Returns number of months between dates `end` and `start`. If `roundOff` 
is set to true, the
    * result is rounded off to 8 digits; it is not rounded otherwise.
    * @group datetime_funcs
    * @since 2.4.0
    */
-  def months_between(date1: Column, date2: Column, roundOff: Boolean): Column 
= withExpr {
-    MonthsBetween(date1.expr, date2.expr, lit(roundOff).expr)
+  def months_between(end: Column, start: Column, roundOff: Boolean): Column = 
withExpr {
+    MonthsBetween(end.expr, start.expr, lit(roundOff).expr)
   }
 
   /**
-   * Given a date column, returns the first date which is later than the value 
of the date column
-   * that is on the specified day of the week.
+   * Returns the first date which is later than the value of the `date` column 
that is on the
+   * specified day of the week.
    *
    * For example, `next_day('2015-07-27', "Sunday")` returns 2015-08-02 
because that is the first
    * Sunday after 2015-07-27.
    *
-   * Day of the week parameter is case insensitive, and accepts:
-   * "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun".
-   *
+   * @param date      A date, timestamp or string. If a string, the data must 
be in a format that
+   *                  can be cast to a date, such as `yyyy-MM-dd` or 
`yyyy-MM-dd HH:mm:ss.SSSS`
+   * @param dayOfWeek Case insensitive, and accepts: "Mon", "Tue", "Wed", 
"Thu", "Fri", "Sat", "Sun"
+   * @return A date, or null if `date` was a string that could not be cast to 
a date or if
+   *         `dayOfWeek` was an invalid value
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2799,6 +2854,7 @@ object functions {
 
   /**
    * Extracts the seconds as an integer from a given date/timestamp/string.
+   * @return An integer, or null if the input was a string that could not be 
cast to a timestamp
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2806,6 +2862,11 @@ object functions {
 
   /**
    * Extracts the week number as an integer from a given date/timestamp/string.
+   *
+   * A week is considered to start on a Monday and week 1 is the first week 
with more than 3 days,
+   * as defined by ISO 8601
+   *
+   * @return An integer, or null if the input was a string that could not be 
cast to a date
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2813,8 +2874,12 @@ object functions {
 
   /**
    * Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) 
to a string
-   * representing the timestamp of that moment in the current system time zone 
in the given
-   * format.
+   * representing the timestamp of that moment in the current system time zone 
in the
+   * yyyy-MM-dd HH:mm:ss format.
+   *
+   * @param ut A number of a type that is castable to a long, such as string 
or integer. Can be
+   *           negative for timestamps before the unix epoch
+   * @return A string, or null if the input was a string that could not be 
cast to a long
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2826,6 +2891,14 @@ object functions {
    * Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) 
to a string
    * representing the timestamp of that moment in the current system time zone 
in the given
    * format.
+   *
+   * See [[java.text.SimpleDateFormat]] for valid date and time format patterns
+   *
+   * @param ut A number of a type that is castable to a long, such as string 
or integer. Can be
+   *           negative for timestamps before the unix epoch
+   * @param f  A date time pattern that the input will be formatted to
+   * @return A string, or null if `ut` was a string that could not be cast to 
a long or `f` was
+   *         an invalid date time pattern
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2834,7 +2907,7 @@ object functions {
   }
 
   /**
-   * Returns the current Unix timestamp (in seconds).
+   * Returns the current Unix timestamp (in seconds) as a long.
    *
    * @note All calls of `unix_timestamp` within the same query return the same 
value
    * (i.e. the current timestamp is calculated at the start of query 
evaluation).
@@ -2849,8 +2922,10 @@ object functions {
   /**
    * Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in 
seconds),
    * using the default timezone and the default locale.
-   * Returns `null` if fails.
    *
+   * @param s A date, timestamp or string. If a string, the data must be in the
+   *          `yyyy-MM-dd HH:mm:ss` format
+   * @return A long, or null if the input was a string not of the correct 
format
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2860,17 +2935,25 @@ object functions {
 
   /**
    * Converts time string with given pattern to Unix timestamp (in seconds).
-   * Returns `null` if fails.
    *
-   * @see <a 
href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html";>
-   * Customizing Formats</a>
+   * See [[java.text.SimpleDateFormat]] for valid date and time format patterns
+   *
+   * @param s A date, timestamp or string. If a string, the data must be in a 
format that can be
+   *          cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param p A date time pattern detailing the format of `s` when `s` is a 
string
+   * @return A long, or null if `s` was a string that could not be cast to a 
date or `p` was
+   *         an invalid format
    * @group datetime_funcs
    * @since 1.5.0
    */
   def unix_timestamp(s: Column, p: String): Column = withExpr { 
UnixTimestamp(s.expr, Literal(p)) }
 
   /**
-   * Convert time string to a Unix timestamp (in seconds) by casting rules to 
`TimestampType`.
+   * Converts to a timestamp by casting rules to `TimestampType`.
+   *
+   * @param s A date, timestamp or string. If a string, the data must be in a 
format that can be
+   *          cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @return A timestamp, or null if the input was a string that could not be 
cast to a timestamp
    * @group datetime_funcs
    * @since 2.2.0
    */
@@ -2879,9 +2962,15 @@ object functions {
   }
 
   /**
-   * Convert time string to a Unix timestamp (in seconds) with a specified 
format
-   * (see 
[http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
-   * to Unix timestamp (in seconds), return null if fail.
+   * Converts time string with the given pattern to timestamp.
+   *
+   * See [[java.text.SimpleDateFormat]] for valid date and time format patterns
+   *
+   * @param s   A date, timestamp or string. If a string, the data must be in 
a format that can be
+   *            cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param fmt A date time pattern detailing the format of `s` when `s` is a 
string
+   * @return A timestamp, or null if `s` was a string that could not be cast 
to a timestamp or
+   *         `fmt` was an invalid format
    * @group datetime_funcs
    * @since 2.2.0
    */
@@ -2899,9 +2988,14 @@ object functions {
 
   /**
    * Converts the column into a `DateType` with a specified format
-   * (see 
[http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
-   * return null if fail.
    *
+   * See [[java.text.SimpleDateFormat]] for valid date and time format patterns
+   *
+   * @param e   A date, timestamp or string. If a string, the data must be in 
a format that can be
+   *            cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param fmt A date time pattern detailing the format of `e` when `e`is a 
string
+   * @return A date, or null if `e` was a string that could not be cast to a 
date or `fmt` was an
+   *         invalid format
    * @group datetime_funcs
    * @since 2.2.0
    */
@@ -2912,9 +3006,15 @@ object functions {
   /**
    * Returns date truncated to the unit specified by the format.
    *
+   * For example, `trunc("2018-11-19 12:01:19", "year")` returns 2018-01-01
+   *
+   * @param date A date, timestamp or string. If a string, the data must be in 
a format that can be
+   *             cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
    * @param format: 'year', 'yyyy', 'yy' for truncate by year,
    *               or 'month', 'mon', 'mm' for truncate by month
    *
+   * @return A date, or null if `date` was a string that could not be cast to 
a date or `format`
+   *         was an invalid value
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2925,11 +3025,16 @@ object functions {
   /**
    * Returns timestamp truncated to the unit specified by the format.
    *
+   * For example, `date_tunc("2018-11-19 12:01:19", "year")` returns 
2018-01-01 00:00:00
+   *
    * @param format: 'year', 'yyyy', 'yy' for truncate by year,
    *                'month', 'mon', 'mm' for truncate by month,
    *                'day', 'dd' for truncate by day,
    *                Other options are: 'second', 'minute', 'hour', 'week', 
'month', 'quarter'
-   *
+   * @param timestamp A date, timestamp or string. If a string, the data must 
be in a format that
+   *                  can be cast to a timestamp, such as `yyyy-MM-dd` or 
`yyyy-MM-dd HH:mm:ss.SSSS`
+   * @return A timestamp, or null if `timestamp` was a string that could not 
be cast to a timestamp
+   *         or `format` was an invalid value
    * @group datetime_funcs
    * @since 2.3.0
    */
@@ -2941,6 +3046,13 @@ object functions {
    * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time 
in UTC, and renders
    * that time as a timestamp in the given time zone. For example, 'GMT+1' 
would yield
    * '2017-07-14 03:40:00.0'.
+   *
+   * @param ts A date, timestamp or string. If a string, the data must be in a 
format that can be
+   *           cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param tz A string detailing the time zone that the input should be 
adjusted to, such as
+   *           `Europe/London`, `PST` or `GMT+5`
+   * @return A timestamp, or null if `ts` was a string that could not be cast 
to a timestamp or
+   *         `tz` was an invalid value
    * @group datetime_funcs
    * @since 1.5.0
    */
@@ -2963,6 +3075,13 @@ object functions {
    * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time 
in the given time
    * zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' 
would yield
    * '2017-07-14 01:40:00.0'.
+   *
+   * @param ts A date, timestamp or string. If a string, the data must be in a 
format that can be
+   *           cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd 
HH:mm:ss.SSSS`
+   * @param tz A string detailing the time zone that the input belongs to, 
such as `Europe/London`,
+   *           `PST` or `GMT+5`
+   * @return A timestamp, or null if `ts` was a string that could not be cast 
to a timestamp or
+   *         `tz` was an invalid value
    * @group datetime_funcs
    * @since 1.5.0
    */


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to