[ 
https://issues.apache.org/jira/browse/OOZIE-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned OOZIE-2155:
------------------------------------

    Assignee: Robert Kanter

> Incorrect DST Shifts are occurring based on the Database timezone
> -----------------------------------------------------------------
>
>                 Key: OOZIE-2155
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2155
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: trunk, 4.1.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>            Priority: Blocker
>
> We discovered a critical bug where incorrect Daylight Saving Time shifts were 
> occurring based on the Database timezone.
> Try running this Coordinator:
> {code:xml}
> <coordinator-app name=“foo” frequency="${coord:hours(1)}" 
> start="2014-11-02T04:15Z" end="2014-11-02T16:00Z" timezone=“UTC” 
> xmlns="uri:oozie:coordinator:0.1">
>    <controls>
>       <concurrency>10</concurrency>
>    </controls>
>    <action>
>       <workflow>
>          <app-path>${appPath}</app-path>
>          <configuration>
>             <property>
>                <name>jobTracker</name>
>                <value>${jobTracker}</value>
>             </property>
>             <property>
>                <name>nameNode</name>
>                <value>${nameNode}</value>
>             </property>
>             <property>
>                <name>queueName</name>
>                <value>${queueName}</value>
>             </property>
>          </configuration>
>       </workflow>
>    </action>     
> </coordinator-app>
> {code}
> Note that it runs over a DST shift (at least in most US timezones).
> Here's a sample of some of the actions, along with their Nominal Times:
> {noformat}
> 0000013-150212140838307-oozie-rkan-C@4     2014-11-02 07:15 GMT 
> 0000013-150212140838307-oozie-rkan-C@6     2014-11-02 09:15 GMT 
> 0000013-150212140838307-oozie-rkan-C@5     2014-11-02 09:15 GMT 
> 0000013-150212140838307-oozie-rkan-C@7     2014-11-02 10:15 GMT 
> {noformat}
> Note that actions 5 and 6 have the same time.  This is incorrect because the 
> times are in GMT (with no DST).  I've also confirmed via a debugger that 
> these dates have the same number of seconds since epoch (so it's not just a 
> rendering issue).  
> By the way, if you're in a different timezone, you'll see this problem occur 
> on different actions because the issue is related to the timezone that your 
> database is in.  It depends on when the DST shift occurs in your timezone; 
> for example, in "America/New_York", it happens with actions 2 and 3.  
> On that note, if I ask Oozie to use "America/Los_Angeles" to print the dates, 
> I get this:
> {noformat}
> 0000013-150212140838307-oozie-rkan-C@4     2014-11-02 00:15 PDT 
> 0000013-150212140838307-oozie-rkan-C@6     2014-11-02 01:15 PST 
> 0000013-150212140838307-oozie-rkan-C@5     2014-11-02 01:15 PST 
> 0000013-150212140838307-oozie-rkan-C@7     2014-11-02 02:15 PST 
> {noformat}
> Action 5's nominal time should be {{2014-11-02 01:15 PDT}}, not {{2014-11-02 
> 01:15 PST}}.  
> Using the debugger some more, I verified that Oozie is creating the nominal 
> times correctly, and writing them to the database correctly (at least, it's 
> converting them to Java's SQL TimeStamp objects correctly; OpenJPA handles 
> writing them).  But when the problematic value is read back from the 
> database, it has the wrong value!
> Here's something interesting from the [MySQL 
> documentation|http://dev.mysql.com/doc/refman/5.5/en/datetime.html]:
> {quote}MySQL converts TIMESTAMP values from the current time zone to UTC for 
> storage, and back from UTC to the current time zone for retrieval. (This does 
> not occur for other types such as DATETIME.) By default, the current time 
> zone for each connection is the server's time. The time zone can be set on a 
> per-connection basis. As long as the time zone setting remains constant, you 
> get back the same value you store. If you store a TIMESTAMP value, and then 
> change the time zone and retrieve the value, the retrieved value is different 
> from the value you stored. This occurs because the same time zone was not 
> used for conversion in both directions.
> {quote}
> So, I think what's happening is that it's interpreting everything in PST, and 
> not in PST and PDT depending on the time itself.  Ideally, it would just 
> store the time since epoch, like Java's Date does, but it's doing this broken 
> interpretation instead.  In fact, this [Stack 
> Overflow|http://stackoverflow.com/questions/1646171/mysql-datetime-fields-and-daylight-savings-time-how-do-i-reference-the-extra/1650910#1650910]
>  I found talks about this problem.
> I tried to create a unit test to show the problem without all the action 
> stuff, but it seems like HSQLDB doesn't have this problem, so the test 
> passes.  I've confirmed that this affects Derby and MySQL; I'm not sure about 
> Postgres, Oracle, or SQLServer.
> I was able to find a workaround for Derby and one of my colleagues found one 
> for MySQL:
> - For Derby, you just need to change the JVM's timezone to GMT (see 
> [here|http://objectmix.com/apache/647950-moving-derby-database-across-timezones.html]).
>   All you have to do is add {{-Duser.timezone=GMT}} to {{CATALINA_OPTS}}.
> - For MySQL, you can either change the global timezone to GMT (which the DB 
> admin probably won't go for), or you can add 
> {{useLegacyDatetimeCode=false&serverTimezone=GMT}} to JDBC URL.  For example:
> {code:xml}
> <property>
>    <name>oozie.service.JPAService.jdbc.url</name>
>    
> <value>jdbc:mysql://HOST/oozie?useUnicode=true&characterEncoding=UTF-8&useLegacyDatetimeCode=false&serverTimezone=GMT</value>
> </property>
> {code}
> I imagine the Derby workaround won't work for anything else, but perhaps the 
> MySQL workaround would?  
> The easiest way to fix this is to add the {{user.timezone}} thing (to fix 
> Derby), and have Oozie always add the JDBC configs to the JDBC URL (for the 
> others).  If Oracle, Postgres, and SQLServer don't support that, we'll have 
> to come up with other ideas.  The other ideas I had were to replace TIMESTAMP 
> with DATETIME or the SQL equivalent of {{long}}; though these have the 
> downside of being schema changes and possibly other side effects.
> I have some homework for everyone :)
> Please check if the DB type listed below is affected by this, and if so, does 
> the above workaround (or some  similar configs) fix it?
> - [~bowenzhangusa]: SQLServer
> - [~chitnis], [~puru], [~rohini], or [~egashira]: Oracle
> - Me: Postgres



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to