Look at your data - doesn't match date format you give

On Tue, Jun 14, 2022, 3:41 PM marc nicole <mk1853...@gmail.com> wrote:

> for the input  (I changed the format)  :
>
> +---------------+
> |    Date        |
> +---------------+
> | 2019-02-08 |
> +----------------+
> | 2019-02-07 |
> +----------------+
> | 2019-12-01 |
> +----------------+
> | 2015-02-02 |
> +----------------+
> | 2012-02-03 |
> +----------------+
> | 2018-05-06 |
> +----------------+
> | 2022-02-08 |
> +----------------+
> the output was 2012-01-03
>
> To note that for my below code to work I cast to string the resulting min
> column.
>
> Le mar. 14 juin 2022 à 21:12, Sean Owen <sro...@gmail.com> a écrit :
>
>> You haven't shown your input or the result
>>
>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole <mk1853...@gmail.com> wrote:
>>
>>> Hi Sean,
>>>
>>> Even with MM for months it gives incorrect (but different this time) min
>>> value.
>>>
>>> Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit :
>>>
>>>> Yes that is right. It has to be parsed as a date to correctly reason
>>>> about ordering. Otherwise you are finding the minimum string
>>>> alphabetically.
>>>>
>>>> Small note, MM is month. mm is minute. You have to fix that for this to
>>>> work. These are Java format strings.
>>>>
>>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to identify a column of dates as such, the column has formatted
>>>>> strings in the likes of: "06-14-2022" (the format being mm-dd-yyyy) and 
>>>>> get
>>>>> the minimum of those dates.
>>>>>
>>>>> I tried in Java as follows:
>>>>>
>>>>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>>>>>> dataset.col(colName), "mm-dd-yyyy").isNotNull()).select(colName).count() 
>>>>>> !=
>>>>>> 0) { ....
>>>>>
>>>>>
>>>>> And to get the *min *of the column:
>>>>>
>>>>> Object colMin =
>>>>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>>>>>> "mm-dd-yyyy"))).first().get(0);
>>>>>
>>>>> // then I cast the *colMin *to string.
>>>>>
>>>>> To note that if i don't apply *to_date*() to the target column then
>>>>> the result will be erroneous (i think Spark will take the values as string
>>>>> and will get the min as if it was applied on an alphabetical string).
>>>>>
>>>>> Any better approach to accomplish this?
>>>>> Thanks.
>>>>>
>>>>

Reply via email to