Yes that is right. It has to be parsed as a date to correctly reason about
ordering. Otherwise you are finding the minimum string alphabetically.

Small note, MM is month. mm is minute. You have to fix that for this to
work. These are Java format strings.

On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com> wrote:

> Hi,
>
> I want to identify a column of dates as such, the column has formatted
> strings in the likes of: "06-14-2022" (the format being mm-dd-yyyy) and get
> the minimum of those dates.
>
> I tried in Java as follows:
>
> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>> dataset.col(colName), "mm-dd-yyyy").isNotNull()).select(colName).count() !=
>> 0) { ....
>
>
> And to get the *min *of the column:
>
> Object colMin =
>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>> "mm-dd-yyyy"))).first().get(0);
>
> // then I cast the *colMin *to string.
>
> To note that if i don't apply *to_date*() to the target column then the
> result will be erroneous (i think Spark will take the values as string and
> will get the min as if it was applied on an alphabetical string).
>
> Any better approach to accomplish this?
> Thanks.
>

Reply via email to