RE: determine week of month from date in spark3

2022-02-11 Thread Appel, Kevin
The output I sent originally with WEEKOFMONTHF is when LEGACY is set, when 
EXCEPTION is set this is the result which is also different

+---+--++
|  a| b|WEEKOFMONTHF|
+---+--++
|  1|2014-03-07|   7|
|  2|2014-03-08|   1|
|  3|2014-03-30|   2|
|  4|2014-03-31|   3|
|  5|2015-03-07|   7|
|  6|2015-03-08|   1|
|  7|2015-03-30|   2|
|  8|2015-03-31|   3|
+---+--++

From: Appel, Kevin
Sent: Friday, February 11, 2022 2:35 PM
To: user@spark.apache.org; 'Sean Owen' 
Subject: RE: determine week of month from date in spark3

Thanks for the reply, that is looking to be along the lines of what is going 
on, and they mention in that item the W is banned which is what I saw in the 
error, however F is not giving the same as W.

+---+--+++
|  a| b|WEEKOFMONTHW|WEEKOFMONTHF|
+---+--+++
|  1|2014-03-07|   2|   1|
|  2|2014-03-08|   2|   2|
|  3|2014-03-30|   6|   5|
|  4|2014-03-31|   6|   5|
|  5|2015-03-07|   1|   1|
|  6|2015-03-08|   2|   2|
|  7|2015-03-30|   5|   5|
|  8|2015-03-31|   5|   5|
+---+--+++

The best way to explain what W is giving, if you look at a printed picture of a 
calendar from March 2014, the March 30 and March 31 are on row6 which is week6; 
whereas if you look at the same for March 2015, they are on row5 which is week5

From: Sean Owen [mailto:sro...@gmail.com]
Sent: Friday, February 11, 2022 2:11 PM
To: Appel, Kevin mailto:kevin.ap...@bofa.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: determine week of month from date in spark3

Here is some back-story: 
https://issues.apache.org/jira/browse/SPARK-32683<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/SPARK-32683__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothV5PIVyCX$>
I think the answer may be: use "F"?

On Fri, Feb 11, 2022 at 12:43 PM Appel, Kevin 
mailto:kevin.ap...@bofa.com.invalid>> wrote:
Previously in Spark2 we could use the spark function date_format with the “W” 
flag and it will provide back the week of month of that date.  In Spark3 when 
trying this there is an error back:


• org.apache.spark.SparkUpgradeException: You may get a different 
result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the 
DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY 
to restore the behavior before Spark 3.0. 2) You can form a valid datetime 
pattern with the guide from 
https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html<https://urldefense.com/v3/__https:/spark.apache.org/docs/latest/sql-ref-datetime-pattern.html__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothV8Wk6Rhv$>


• Caused by: java.lang.IllegalArgumentException: All week-based 
patterns are unsupported since Spark 3.0, detected: W, Please use the SQL 
function EXTRACT instead

If I use the first solution and set the policy to LEGACY, currently it is 
EXCEPTION, then the code runs through

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

df1 = spark.createDataFrame(
[
(1, date(2014, 3, 7)),
(2, date(2014, 3, 8)),
(3, date(2014, 3, 30)),
(4, date(2014, 3, 31)),
(5, date(2015, 3, 7)),
(6, date(2015, 3, 8)),
(7, date(2015, 3, 30)),
(8, date(2015, 3, 31)),
],
schema="a long, b date",
)
df1 = df1.withColumn("WEEKOFMONTH", F.date_format(F.col("b"), "W"))
df1.show()

+---+--+---+
|  a| b|WEEKOFMONTH|
+---+--+---+
|  1|2014-03-07|  2|
|  2|2014-03-08|  2|
|  3|2014-03-30|  6|
|  4|2014-03-31|  6|
|  5|2015-03-07|  1|
|  6|2015-03-08|  2|
|  7|2015-03-30|  5|
|  8|2015-03-31|  5|
+---+--+---+

Trying to explore the latter options, in both the EXTRACT and the datetime 
patterns that are listed, I don’t see that there is the “W” option or 
equivalent way to produce this.
The datetime link mentions: In Spark 3.0, we define our own pattern strings in 
Datetime Patterns for Formatting and 
Parsing<https://urldefense.com/v3/__https:/spark.apache.org/docs/3.2.1/sql-ref-datetime-pattern.html__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothV4nyIR2F$>,
 which is implemented via 
DateTimeFormatter<https://urldefense.com/v3/__https:/docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothVwUZzbod$>
 under the hood.
If I follow th

Re: determine week of month from date in spark3

2022-02-11 Thread Sean Owen
Here is some back-story: https://issues.apache.org/jira/browse/SPARK-32683
I think the answer may be: use "F"?

On Fri, Feb 11, 2022 at 12:43 PM Appel, Kevin 
wrote:

> Previously in Spark2 we could use the spark function date_format with the
> “W” flag and it will provide back the week of month of that date.  In
> Spark3 when trying this there is an error back:
>
>
>
> · org.apache.spark.SparkUpgradeException: You may get a different
> result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in
> the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to
> LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid
> datetime pattern with the guide from
> https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>
>
>
> · Caused by: java.lang.IllegalArgumentException: All week-based
> patterns are unsupported since Spark 3.0, detected: W, Please use the SQL
> function EXTRACT instead
>
>
>
> If I use the first solution and set the policy to LEGACY, currently it is
> EXCEPTION, then the code runs through
>
>
>
> spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")
>
>
>
> df1 = spark.createDataFrame(
>
> [
>
> (1, date(2014, 3, 7)),
>
> (2, date(2014, 3, 8)),
>
> (3, date(2014, 3, 30)),
>
> (4, date(2014, 3, 31)),
>
> (5, date(2015, 3, 7)),
>
> (6, date(2015, 3, 8)),
>
> (7, date(2015, 3, 30)),
>
> (8, date(2015, 3, 31)),
>
> ],
>
> schema="a long, b date",
>
> )
>
> df1 = df1.withColumn("WEEKOFMONTH", F.date_format(F.col("b"), "W"))
>
> df1.show()
>
>
>
> +---+--+---+
>
>
> |  a| b|WEEKOFMONTH|
>
> +---+--+---+
>
> |  1|2014-03-07|  2|
>
> |  2|2014-03-08|  2|
>
> |  3|2014-03-30|  6|
>
> |  4|2014-03-31|  6|
>
> |  5|2015-03-07|  1|
>
> |  6|2015-03-08|  2|
>
> |  7|2015-03-30|  5|
>
> |  8|2015-03-31|  5|
>
> +---+--+---+
>
>
>
> Trying to explore the latter options, in both the EXTRACT and the datetime
> patterns that are listed, I don’t see that there is the “W” option or
> equivalent way to produce this.
>
> The datetime link mentions: In Spark 3.0, we define our own pattern
> strings in Datetime Patterns for Formatting and Parsing
> ,
> which is implemented via DateTimeFormatter
> 
>  under
> the hood.
>
> If I follow the link into the DateTimeFormatter then I see the W existing
> there:
>
>W   week-of-month   number4
>
>
>
> My first question is, with the Spark3 and not setting the policy to
> LEGACY, is there no longer a way to compute this using the spark built in
> functions?
>
>
>
> The second question is, if we are setting the policy to LEGACY, is there
> any caveats or items to be aware of that might get us later? For example in
> a future Spark 3.3.X is this option going to be deprecated
>
>
>
> This was an item that we ran into from Spark2 to Spark3 conversion and
> trying to see how to best handle this
>
>
>
> Thanks for your feedback,
>
>
>
> Kevin
>
>
>
>
>
>
>
>
> --
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
> recipient, please delete this message.
>