Look at your data - doesn't match date format you give On Tue, Jun 14, 2022, 3:41 PM marc nicole <mk1853...@gmail.com> wrote:
> for the input (I changed the format) : > > +---------------+ > | Date | > +---------------+ > | 2019-02-08 | > +----------------+ > | 2019-02-07 | > +----------------+ > | 2019-12-01 | > +----------------+ > | 2015-02-02 | > +----------------+ > | 2012-02-03 | > +----------------+ > | 2018-05-06 | > +----------------+ > | 2022-02-08 | > +----------------+ > the output was 2012-01-03 > > To note that for my below code to work I cast to string the resulting min > column. > > Le mar. 14 juin 2022 à 21:12, Sean Owen <sro...@gmail.com> a écrit : > >> You haven't shown your input or the result >> >> On Tue, Jun 14, 2022 at 1:40 PM marc nicole <mk1853...@gmail.com> wrote: >> >>> Hi Sean, >>> >>> Even with MM for months it gives incorrect (but different this time) min >>> value. >>> >>> Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit : >>> >>>> Yes that is right. It has to be parsed as a date to correctly reason >>>> about ordering. Otherwise you are finding the minimum string >>>> alphabetically. >>>> >>>> Small note, MM is month. mm is minute. You have to fix that for this to >>>> work. These are Java format strings. >>>> >>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I want to identify a column of dates as such, the column has formatted >>>>> strings in the likes of: "06-14-2022" (the format being mm-dd-yyyy) and >>>>> get >>>>> the minimum of those dates. >>>>> >>>>> I tried in Java as follows: >>>>> >>>>> if (dataset.filter(org.apache.spark.sql.functions.to_date( >>>>>> dataset.col(colName), "mm-dd-yyyy").isNotNull()).select(colName).count() >>>>>> != >>>>>> 0) { .... >>>>> >>>>> >>>>> And to get the *min *of the column: >>>>> >>>>> Object colMin = >>>>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >>>>>> "mm-dd-yyyy"))).first().get(0); >>>>> >>>>> // then I cast the *colMin *to string. >>>>> >>>>> To note that if i don't apply *to_date*() to the target column then >>>>> the result will be erroneous (i think Spark will take the values as string >>>>> and will get the min as if it was applied on an alphabetical string). >>>>> >>>>> Any better approach to accomplish this? >>>>> Thanks. >>>>> >>>>