finally solved with the MM for months format recommendation. thanks Le mar. 14 juin 2022 à 23:02, marc nicole <mk1853...@gmail.com> a écrit :
> i changed the format to yyyy-mm-dd for the example > > Le mar. 14 juin 2022 à 22:52, Sean Owen <sro...@gmail.com> a écrit : > >> Look at your data - doesn't match date format you give >> >> On Tue, Jun 14, 2022, 3:41 PM marc nicole <mk1853...@gmail.com> wrote: >> >>> for the input (I changed the format) : >>> >>> +---------------+ >>> | Date | >>> +---------------+ >>> | 2019-02-08 | >>> +----------------+ >>> | 2019-02-07 | >>> +----------------+ >>> | 2019-12-01 | >>> +----------------+ >>> | 2015-02-02 | >>> +----------------+ >>> | 2012-02-03 | >>> +----------------+ >>> | 2018-05-06 | >>> +----------------+ >>> | 2022-02-08 | >>> +----------------+ >>> the output was 2012-01-03 >>> >>> To note that for my below code to work I cast to string the resulting >>> min column. >>> >>> Le mar. 14 juin 2022 à 21:12, Sean Owen <sro...@gmail.com> a écrit : >>> >>>> You haven't shown your input or the result >>>> >>>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole <mk1853...@gmail.com> >>>> wrote: >>>> >>>>> Hi Sean, >>>>> >>>>> Even with MM for months it gives incorrect (but different this time) >>>>> min value. >>>>> >>>>> Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit : >>>>> >>>>>> Yes that is right. It has to be parsed as a date to correctly reason >>>>>> about ordering. Otherwise you are finding the minimum string >>>>>> alphabetically. >>>>>> >>>>>> Small note, MM is month. mm is minute. You have to fix that for this >>>>>> to work. These are Java format strings. >>>>>> >>>>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I want to identify a column of dates as such, the column has >>>>>>> formatted strings in the likes of: "06-14-2022" (the format being >>>>>>> mm-dd-yyyy) and get the minimum of those dates. >>>>>>> >>>>>>> I tried in Java as follows: >>>>>>> >>>>>>> if (dataset.filter(org.apache.spark.sql.functions.to_date( >>>>>>>> dataset.col(colName), >>>>>>>> "mm-dd-yyyy").isNotNull()).select(colName).count() != >>>>>>>> 0) { .... >>>>>>> >>>>>>> >>>>>>> And to get the *min *of the column: >>>>>>> >>>>>>> Object colMin = >>>>>>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >>>>>>>> "mm-dd-yyyy"))).first().get(0); >>>>>>> >>>>>>> // then I cast the *colMin *to string. >>>>>>> >>>>>>> To note that if i don't apply *to_date*() to the target column then >>>>>>> the result will be erroneous (i think Spark will take the values as >>>>>>> string >>>>>>> and will get the min as if it was applied on an alphabetical string). >>>>>>> >>>>>>> Any better approach to accomplish this? >>>>>>> Thanks. >>>>>>> >>>>>>