Hi,

you can truncate datetimes like this (in pyspark), e.g. to 5 minutes:

import pyspark.sql.functions as F
df.select((F.floor(F.col('myDateColumn').cast('long') / 300) *
300).cast('timestamp'))

Best,
Eike

David Hodefi <davidhodeffi.w...@gmail.com> schrieb am Mo., 13. Nov. 2017 um
12:27 Uhr:

> I am familiar with those functions, none of them is actually truncating a
> date. We can use those methods to help implement truncate method. I think
> truncating a day/ hour should be as simple as "truncate(...,"DD")  or
> truncate(...,"HH")  ".
>
> On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz <gmu...@datiobd.com> wrote:
>
>> There are functions for day (called dayOfMonth and dayOfYear) and hour
>> (called hour). You can view them here:
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions
>>
>> Example:
>>
>> import org.apache.spark.sql.functions._
>> val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"),
>> dayOfYear($"myDateColumn"))
>>
>> 2017-11-09 12:05 GMT+01:00 David Hodefi <davidhodeffi.w...@gmail.com>:
>>
>>> I would like to truncate date to his day or hour. currently it is only
>>> possible to truncate MONTH or YEAR.
>>> 1.How can achieve that?
>>> 2.Is there any pull request about this issue?
>>> 3.If there is not any open pull request about this issue, what are the
>>> implications that I should be aware of when coding /contributing it as a
>>> pull request?
>>>
>>> Last question is,  Looking at DateTImeUtils class code, it seems like
>>> implementation is not using any open library for handling dates i.e
>>> apache-common , Why implementing it instead of reusing open source?
>>>
>>> Thanks David
>>>
>>
>>
>>
>> --
>> Gaspar Muñoz Soria
>>
>> Vía de las dos Castillas, 33
>> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
>> Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473
>>
>
>

Reply via email to