The output I sent originally with WEEKOFMONTHF is when LEGACY is set, when 
EXCEPTION is set this is the result which is also different

+---+----------+------------+
|  a|         b|WEEKOFMONTHF|
+---+----------+------------+
|  1|2014-03-07|           7|
|  2|2014-03-08|           1|
|  3|2014-03-30|           2|
|  4|2014-03-31|           3|
|  5|2015-03-07|           7|
|  6|2015-03-08|           1|
|  7|2015-03-30|           2|
|  8|2015-03-31|           3|
+---+----------+------------+

From: Appel, Kevin
Sent: Friday, February 11, 2022 2:35 PM
To: user@spark.apache.org; 'Sean Owen' <sro...@gmail.com>
Subject: RE: determine week of month from date in spark3

Thanks for the reply, that is looking to be along the lines of what is going 
on, and they mention in that item the W is banned which is what I saw in the 
error, however F is not giving the same as W.

+---+----------+------------+------------+
|  a|         b|WEEKOFMONTHW|WEEKOFMONTHF|
+---+----------+------------+------------+
|  1|2014-03-07|           2|           1|
|  2|2014-03-08|           2|           2|
|  3|2014-03-30|           6|           5|
|  4|2014-03-31|           6|           5|
|  5|2015-03-07|           1|           1|
|  6|2015-03-08|           2|           2|
|  7|2015-03-30|           5|           5|
|  8|2015-03-31|           5|           5|
+---+----------+------------+------------+

The best way to explain what W is giving, if you look at a printed picture of a 
calendar from March 2014, the March 30 and March 31 are on row6 which is week6; 
whereas if you look at the same for March 2015, they are on row5 which is week5

From: Sean Owen [mailto:sro...@gmail.com]
Sent: Friday, February 11, 2022 2:11 PM
To: Appel, Kevin <kevin.ap...@bofa.com<mailto:kevin.ap...@bofa.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: determine week of month from date in spark3

Here is some back-story: 
https://issues.apache.org/jira/browse/SPARK-32683<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/SPARK-32683__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothV5PIVyCX$>
I think the answer may be: use "F"?

On Fri, Feb 11, 2022 at 12:43 PM Appel, Kevin 
<kevin.ap...@bofa.com.invalid<mailto:kevin.ap...@bofa.com.invalid>> wrote:
Previously in Spark2 we could use the spark function date_format with the “W” 
flag and it will provide back the week of month of that date.  In Spark3 when 
trying this there is an error back:


•         org.apache.spark.SparkUpgradeException: You may get a different 
result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the 
DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY 
to restore the behavior before Spark 3.0. 2) You can form a valid datetime 
pattern with the guide from 
https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html<https://urldefense.com/v3/__https:/spark.apache.org/docs/latest/sql-ref-datetime-pattern.html__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothV8Wk6Rhv$>


•         Caused by: java.lang.IllegalArgumentException: All week-based 
patterns are unsupported since Spark 3.0, detected: W, Please use the SQL 
function EXTRACT instead

If I use the first solution and set the policy to LEGACY, currently it is 
EXCEPTION, then the code runs through

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

df1 = spark.createDataFrame(
    [
        (1, date(2014, 3, 7)),
        (2, date(2014, 3, 8)),
        (3, date(2014, 3, 30)),
        (4, date(2014, 3, 31)),
        (5, date(2015, 3, 7)),
        (6, date(2015, 3, 8)),
        (7, date(2015, 3, 30)),
        (8, date(2015, 3, 31)),
    ],
    schema="a long, b date",
)
df1 = df1.withColumn("WEEKOFMONTH", F.date_format(F.col("b"), "W"))
df1.show()

+---+----------+-----------+
|  a|         b|WEEKOFMONTH|
+---+----------+-----------+
|  1|2014-03-07|          2|
|  2|2014-03-08|          2|
|  3|2014-03-30|          6|
|  4|2014-03-31|          6|
|  5|2015-03-07|          1|
|  6|2015-03-08|          2|
|  7|2015-03-30|          5|
|  8|2015-03-31|          5|
+---+----------+-----------+

Trying to explore the latter options, in both the EXTRACT and the datetime 
patterns that are listed, I don’t see that there is the “W” option or 
equivalent way to produce this.
The datetime link mentions: In Spark 3.0, we define our own pattern strings in 
Datetime Patterns for Formatting and 
Parsing<https://urldefense.com/v3/__https:/spark.apache.org/docs/3.2.1/sql-ref-datetime-pattern.html__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothV4nyIR2F$>,
 which is implemented via 
DateTimeFormatter<https://urldefense.com/v3/__https:/docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html__;!!I2XIyG2ANlwasLbx!CmBwmER27rdSMbrkX0Q5pAnGP1kT0purbfK-xyXJaSYcA73FQ-BRNothVwUZzbod$>
 under the hood.
If I follow the link into the DateTimeFormatter then I see the W existing there:
   W       week-of-month               number            4

My first question is, with the Spark3 and not setting the policy to LEGACY, is 
there no longer a way to compute this using the spark built in functions?

The second question is, if we are setting the policy to LEGACY, is there any 
caveats or items to be aware of that might get us later? For example in a 
future Spark 3.3.X is this option going to be deprecated

This was an item that we ran into from Spark2 to Spark3 conversion and trying 
to see how to best handle this

Thanks for your feedback,

Kevin




________________________________
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer. If you are not the intended 
recipient, please delete this message.

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Reply via email to