[ 
https://issues.apache.org/jira/browse/SPARK-17971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583159#comment-15583159
 ] 

Gabriele Del Prete commented on SPARK-17971:
--------------------------------------------

Thanks, I know the basics of time handling in modern APIs. 

Unix timestamps *are* defined as the seconds (or millis, or nanos, if you want 
more precision) since 1970-01-01 at 0:00:00 *UTC* . Under Java, UTC can be used 
as a Timezone. UTC itself is *not* by itself a timezone, but Java wrongly 
considers it one, and Java API-wise it's perfectly legal do calendar-based 
computations by using a Calendar object in the UTC "timezone". This is done all 
the time, I've seen it multiple times, when one does not need to deal with DST 
or wall clock time. We can debate if it's good or bad, but it happens.The way 
Sparks' time handling functions are designed simply prevents that. 

The method in your link takes a format string, not a timezone.

> Unix timestamp handling in Spark SQL not allowing calculations on UTC times
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-17971
>                 URL: https://issues.apache.org/jira/browse/SPARK-17971
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.6.2
>         Environment: MacOS X JDK 7
>            Reporter: Gabriele Del Prete
>
> In our Spark data pipeline we store timed events using a bigint column called 
> 'timestamp', the values contained being Unix timestamp time points.
> Our datacenter servers Java VMs are all set up to start with timezone set to 
> UTC, while developer's computers are all in the US Eastern timezone. 
> Given how Spark SQL datetime functions work, it's impossible to do 
> calculations (eg. extract and compare hours, year-month-date triplets) using 
> UTC values:
> - from_unixtime takes a bigint unix timestamp and forces it to the computer's 
> local timezone;
> - casting the bigint column to timestamp does the same (it converts it to the 
> local timezone)
> - from_utc_timestamp works in the same way, the only difference being that it 
> gets a string as input instead of a bigint.
> The result of all of this is that it's impossible to extract individual 
> fields of a UTC timestamp, since all timestamp always get converted to the 
> local timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to