i changed the format to yyyy-mm-dd for the example

Le mar. 14 juin 2022 à 22:52, Sean Owen <sro...@gmail.com> a écrit :

> Look at your data - doesn't match date format you give
>
> On Tue, Jun 14, 2022, 3:41 PM marc nicole <mk1853...@gmail.com> wrote:
>
>> for the input  (I changed the format)  :
>>
>> +---------------+
>> |    Date        |
>> +---------------+
>> | 2019-02-08 |
>> +----------------+
>> | 2019-02-07 |
>> +----------------+
>> | 2019-12-01 |
>> +----------------+
>> | 2015-02-02 |
>> +----------------+
>> | 2012-02-03 |
>> +----------------+
>> | 2018-05-06 |
>> +----------------+
>> | 2022-02-08 |
>> +----------------+
>> the output was 2012-01-03
>>
>> To note that for my below code to work I cast to string the resulting min
>> column.
>>
>> Le mar. 14 juin 2022 à 21:12, Sean Owen <sro...@gmail.com> a écrit :
>>
>>> You haven't shown your input or the result
>>>
>>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole <mk1853...@gmail.com> wrote:
>>>
>>>> Hi Sean,
>>>>
>>>> Even with MM for months it gives incorrect (but different this time)
>>>> min value.
>>>>
>>>> Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit :
>>>>
>>>>> Yes that is right. It has to be parsed as a date to correctly reason
>>>>> about ordering. Otherwise you are finding the minimum string
>>>>> alphabetically.
>>>>>
>>>>> Small note, MM is month. mm is minute. You have to fix that for this
>>>>> to work. These are Java format strings.
>>>>>
>>>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to identify a column of dates as such, the column has
>>>>>> formatted strings in the likes of: "06-14-2022" (the format being
>>>>>> mm-dd-yyyy) and get the minimum of those dates.
>>>>>>
>>>>>> I tried in Java as follows:
>>>>>>
>>>>>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>>>>>>> dataset.col(colName), 
>>>>>>> "mm-dd-yyyy").isNotNull()).select(colName).count() !=
>>>>>>> 0) { ....
>>>>>>
>>>>>>
>>>>>> And to get the *min *of the column:
>>>>>>
>>>>>> Object colMin =
>>>>>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>>>>>>> "mm-dd-yyyy"))).first().get(0);
>>>>>>
>>>>>> // then I cast the *colMin *to string.
>>>>>>
>>>>>> To note that if i don't apply *to_date*() to the target column then
>>>>>> the result will be erroneous (i think Spark will take the values as 
>>>>>> string
>>>>>> and will get the min as if it was applied on an alphabetical string).
>>>>>>
>>>>>> Any better approach to accomplish this?
>>>>>> Thanks.
>>>>>>
>>>>>

Reply via email to