i changed the format to yyyy-mm-dd for the example Le mar. 14 juin 2022 à 22:52, Sean Owen <sro...@gmail.com> a écrit :
> Look at your data - doesn't match date format you give > > On Tue, Jun 14, 2022, 3:41 PM marc nicole <mk1853...@gmail.com> wrote: > >> for the input (I changed the format) : >> >> +---------------+ >> | Date | >> +---------------+ >> | 2019-02-08 | >> +----------------+ >> | 2019-02-07 | >> +----------------+ >> | 2019-12-01 | >> +----------------+ >> | 2015-02-02 | >> +----------------+ >> | 2012-02-03 | >> +----------------+ >> | 2018-05-06 | >> +----------------+ >> | 2022-02-08 | >> +----------------+ >> the output was 2012-01-03 >> >> To note that for my below code to work I cast to string the resulting min >> column. >> >> Le mar. 14 juin 2022 à 21:12, Sean Owen <sro...@gmail.com> a écrit : >> >>> You haven't shown your input or the result >>> >>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole <mk1853...@gmail.com> wrote: >>> >>>> Hi Sean, >>>> >>>> Even with MM for months it gives incorrect (but different this time) >>>> min value. >>>> >>>> Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit : >>>> >>>>> Yes that is right. It has to be parsed as a date to correctly reason >>>>> about ordering. Otherwise you are finding the minimum string >>>>> alphabetically. >>>>> >>>>> Small note, MM is month. mm is minute. You have to fix that for this >>>>> to work. These are Java format strings. >>>>> >>>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I want to identify a column of dates as such, the column has >>>>>> formatted strings in the likes of: "06-14-2022" (the format being >>>>>> mm-dd-yyyy) and get the minimum of those dates. >>>>>> >>>>>> I tried in Java as follows: >>>>>> >>>>>> if (dataset.filter(org.apache.spark.sql.functions.to_date( >>>>>>> dataset.col(colName), >>>>>>> "mm-dd-yyyy").isNotNull()).select(colName).count() != >>>>>>> 0) { .... >>>>>> >>>>>> >>>>>> And to get the *min *of the column: >>>>>> >>>>>> Object colMin = >>>>>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >>>>>>> "mm-dd-yyyy"))).first().get(0); >>>>>> >>>>>> // then I cast the *colMin *to string. >>>>>> >>>>>> To note that if i don't apply *to_date*() to the target column then >>>>>> the result will be erroneous (i think Spark will take the values as >>>>>> string >>>>>> and will get the min as if it was applied on an alphabetical string). >>>>>> >>>>>> Any better approach to accomplish this? >>>>>> Thanks. >>>>>> >>>>>