Hi,

I want to identify a column of dates as such, the column has formatted
strings in the likes of: "06-14-2022" (the format being mm-dd-yyyy) and get
the minimum of those dates.

I tried in Java as follows:

if (dataset.filter(org.apache.spark.sql.functions.to_date(
> dataset.col(colName), "mm-dd-yyyy").isNotNull()).select(colName).count() !=
> 0) { ....


And to get the *min *of the column:

Object colMin =
> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
> "mm-dd-yyyy"))).first().get(0);

// then I cast the *colMin *to string.

To note that if i don't apply *to_date*() to the target column then the
result will be erroneous (i think Spark will take the values as string and
will get the min as if it was applied on an alphabetical string).

Any better approach to accomplish this?
Thanks.

Reply via email to