[ https://issues.apache.org/jira/browse/SPARK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro updated SPARK-32683: ------------------------------------- Component/s: (was: Spark Core) SQL > Datetime Pattern F not working as expected > ------------------------------------------ > > Key: SPARK-32683 > URL: https://issues.apache.org/jira/browse/SPARK-32683 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Environment: Windows 10 Pro > * with Jupyter Lab - Docker Image > ** jupyter/all-spark-notebook:f1811928b3dd > *** spark 3.0.0 > *** python 3.8.5 > *** openjdk 11.0.8 > Reporter: Daeho Ro > Priority: Major > Attachments: comment.png > > > h3. Background > From the > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html], > the pattern F should give a week of the month. > |*Symbol*|*Meaning*|*Presentation*|*Example*| > |F|week-of-month|number(1)|3| > h3. Test Data > Here is my test data, that is a csv file. > {code:java} > date > 2020-08-01 > 2020-08-02 > 2020-08-03 > 2020-08-04 > 2020-08-05 > 2020-08-06 > 2020-08-07 > 2020-08-08 > 2020-08-09 > 2020-08-10 {code} > h3. Steps to the bug > I have tested in the scala spark 3.0.0 and pyspark 3.0.0: > {code:java} > // Spark > df.withColumn("date", to_timestamp('date, "yyyy-MM-dd")) > .withColumn("week", date_format('date, "F")).show > +-------------------+----+ > | date|week| > +-------------------+----+ > |2020-08-01 00:00:00| 1| > |2020-08-02 00:00:00| 2| > |2020-08-03 00:00:00| 3| > |2020-08-04 00:00:00| 4| > |2020-08-05 00:00:00| 5| > |2020-08-06 00:00:00| 6| > |2020-08-07 00:00:00| 7| > |2020-08-08 00:00:00| 1| > |2020-08-09 00:00:00| 2| > |2020-08-10 00:00:00| 3| > +-------------------+----+ > # pyspark > df.withColumn('date', to_timestamp('date', 'yyyy-MM-dd')) \ > .withColumn('week', date_format('date', 'F')) \ > .show(10, False) > +-------------------+----+ > |date |week| > +-------------------+----+ > |2020-08-01 00:00:00|1 | > |2020-08-02 00:00:00|2 | > |2020-08-03 00:00:00|3 | > |2020-08-04 00:00:00|4 | > |2020-08-05 00:00:00|5 | > |2020-08-06 00:00:00|6 | > |2020-08-07 00:00:00|7 | > |2020-08-08 00:00:00|1 | > |2020-08-09 00:00:00|2 | > |2020-08-10 00:00:00|3 | > +-------------------+----+{code} > h3. Expected result > The `week` column is not the week of the month. It is a day of the week as a > number. > !comment.png! > From my calendar, the first day of August should have 1 for the week-of-month > and from 2nd to 8th should have 2 and so on. > {code:java} > +-------------------+----+ > |date |week| > +-------------------+----+ > |2020-08-01 00:00:00|1 | > |2020-08-02 00:00:00|2 | > |2020-08-03 00:00:00|2 | > |2020-08-04 00:00:00|2 | > |2020-08-05 00:00:00|2 | > |2020-08-06 00:00:00|2 | > |2020-08-07 00:00:00|2 | > |2020-08-08 00:00:00|2 | > |2020-08-09 00:00:00|3 | > |2020-08-10 00:00:00|3 | > +-------------------+----+{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org