finally solved with the MM for months format recommendation. thanks

Le mar. 14 juin 2022 à 23:02, marc nicole <mk1853...@gmail.com> a écrit :

> i changed the format to yyyy-mm-dd for the example
>
> Le mar. 14 juin 2022 à 22:52, Sean Owen <sro...@gmail.com> a écrit :
>
>> Look at your data - doesn't match date format you give
>>
>> On Tue, Jun 14, 2022, 3:41 PM marc nicole <mk1853...@gmail.com> wrote:
>>
>>> for the input  (I changed the format)  :
>>>
>>> +---------------+
>>> |    Date        |
>>> +---------------+
>>> | 2019-02-08 |
>>> +----------------+
>>> | 2019-02-07 |
>>> +----------------+
>>> | 2019-12-01 |
>>> +----------------+
>>> | 2015-02-02 |
>>> +----------------+
>>> | 2012-02-03 |
>>> +----------------+
>>> | 2018-05-06 |
>>> +----------------+
>>> | 2022-02-08 |
>>> +----------------+
>>> the output was 2012-01-03
>>>
>>> To note that for my below code to work I cast to string the resulting
>>> min column.
>>>
>>> Le mar. 14 juin 2022 à 21:12, Sean Owen <sro...@gmail.com> a écrit :
>>>
>>>> You haven't shown your input or the result
>>>>
>>>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole <mk1853...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sean,
>>>>>
>>>>> Even with MM for months it gives incorrect (but different this time)
>>>>> min value.
>>>>>
>>>>> Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit :
>>>>>
>>>>>> Yes that is right. It has to be parsed as a date to correctly reason
>>>>>> about ordering. Otherwise you are finding the minimum string
>>>>>> alphabetically.
>>>>>>
>>>>>> Small note, MM is month. mm is minute. You have to fix that for this
>>>>>> to work. These are Java format strings.
>>>>>>
>>>>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to identify a column of dates as such, the column has
>>>>>>> formatted strings in the likes of: "06-14-2022" (the format being
>>>>>>> mm-dd-yyyy) and get the minimum of those dates.
>>>>>>>
>>>>>>> I tried in Java as follows:
>>>>>>>
>>>>>>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>>>>>>>> dataset.col(colName), 
>>>>>>>> "mm-dd-yyyy").isNotNull()).select(colName).count() !=
>>>>>>>> 0) { ....
>>>>>>>
>>>>>>>
>>>>>>> And to get the *min *of the column:
>>>>>>>
>>>>>>> Object colMin =
>>>>>>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>>>>>>>> "mm-dd-yyyy"))).first().get(0);
>>>>>>>
>>>>>>> // then I cast the *colMin *to string.
>>>>>>>
>>>>>>> To note that if i don't apply *to_date*() to the target column then
>>>>>>> the result will be erroneous (i think Spark will take the values as 
>>>>>>> string
>>>>>>> and will get the min as if it was applied on an alphabetical string).
>>>>>>>
>>>>>>> Any better approach to accomplish this?
>>>>>>> Thanks.
>>>>>>>
>>>>>>

Reply via email to