Robert Kanter created OOZIE-2155:
------------------------------------

             Summary: Incorrect DST Shifts are occurring based on the Database 
timezone
                 Key: OOZIE-2155
                 URL: https://issues.apache.org/jira/browse/OOZIE-2155
             Project: Oozie
          Issue Type: Bug
          Components: core
    Affects Versions: 4.1.0, trunk
            Reporter: Robert Kanter
            Priority: Blocker


We discovered a critical bug where incorrect Daylight Saving Time shifts were 
occurring based on the Database timezone.

Try running this Coordinator:
{code:xml}
<coordinator-app name=“foo” frequency="${coord:hours(1)}" 
start="2014-11-02T04:15Z" end="2014-11-02T16:00Z" timezone=“UTC” 
xmlns="uri:oozie:coordinator:0.1">
   <controls>
      <concurrency>10</concurrency>
   </controls>
   <action>
      <workflow>
         <app-path>${appPath}</app-path>
         <configuration>
            <property>
               <name>jobTracker</name>
               <value>${jobTracker}</value>
            </property>
            <property>
               <name>nameNode</name>
               <value>${nameNode}</value>
            </property>
            <property>
               <name>queueName</name>
               <value>${queueName}</value>
            </property>
         </configuration>
      </workflow>
   </action>     
</coordinator-app>
{code}
Note that it runs over a DST shift (at least in most US timezones).

Here's a sample of some of the actions, along with their Nominal Times:
{noformat}
0000013-150212140838307-oozie-rkan-C@4     2014-11-02 07:15 GMT 
0000013-150212140838307-oozie-rkan-C@6     2014-11-02 09:15 GMT 
0000013-150212140838307-oozie-rkan-C@5     2014-11-02 09:15 GMT 
0000013-150212140838307-oozie-rkan-C@7     2014-11-02 10:15 GMT 
{noformat}
Note that actions 5 and 6 have the same time.  This is incorrect because the 
times are in GMT (with no DST).  I've also confirmed via a debugger that these 
dates have the same number of seconds since epoch (so it's not just a rendering 
issue).  
By the way, if you're in a different timezone, you'll see this problem occur on 
different actions because the issue is related to the timezone that your 
database is in.  It depends on when the DST shift occurs in your timezone; for 
example, in "America/New_York", it happens with actions 2 and 3.  
On that note, if I ask Oozie to use "America/Los_Angeles" to print the dates, I 
get this:
{noformat}
0000013-150212140838307-oozie-rkan-C@4     2014-11-02 00:15 PDT 
0000013-150212140838307-oozie-rkan-C@6     2014-11-02 01:15 PST 
0000013-150212140838307-oozie-rkan-C@5     2014-11-02 01:15 PST 
0000013-150212140838307-oozie-rkan-C@7     2014-11-02 02:15 PST 
{noformat}
Action 5's nominal time should be {{2014-11-02 01:15 PDT}}, not {{2014-11-02 
01:15 PST}}.  

Using the debugger some more, I verified that Oozie is creating the nominal 
times correctly, and writing them to the database correctly (at least, it's 
converting them to Java's SQL TimeStamp objects correctly; OpenJPA handles 
writing them).  But when the problematic value is read back from the database, 
it has the wrong value!

Here's something interesting from the [MySQL 
documentation|http://dev.mysql.com/doc/refman/5.5/en/datetime.html]:
{quote}MySQL converts TIMESTAMP values from the current time zone to UTC for 
storage, and back from UTC to the current time zone for retrieval. (This does 
not occur for other types such as DATETIME.) By default, the current time zone 
for each connection is the server's time. The time zone can be set on a 
per-connection basis. As long as the time zone setting remains constant, you 
get back the same value you store. If you store a TIMESTAMP value, and then 
change the time zone and retrieve the value, the retrieved value is different 
from the value you stored. This occurs because the same time zone was not used 
for conversion in both directions.
{quote}
So, I think what's happening is that it's interpreting everything in PST, and 
not in PST and PDT depending on the time itself.  Ideally, it would just store 
the time since epoch, like Java's Date does, but it's doing this broken 
interpretation instead.  In fact, this [Stack 
Overflow|http://stackoverflow.com/questions/1646171/mysql-datetime-fields-and-daylight-savings-time-how-do-i-reference-the-extra/1650910#1650910]
 I found talks about this problem.

I tried to create a unit test to show the problem without all the action stuff, 
but it seems like HSQLDB doesn't have this problem, so the test passes.  I've 
confirmed that this affects Derby and MySQL; I'm not sure about Postgres, 
Oracle, or SQLServer.

I was able to find a workaround for Derby and one of my colleagues found one 
for MySQL:
- For Derby, you just need to change the JVM's timezone to GMT (see 
[here|http://objectmix.com/apache/647950-moving-derby-database-across-timezones.html]).
  All you have to do is add {{-Duser.timezone=GMT}} to {{CATALINA_OPTS}}.
- For MySQL, you can either change the global timezone to GMT (which the DB 
admin probably won't go for), or you can add 
{{useLegacyDatetimeCode=false&serverTimezone=GMT}} to JDBC URL.  For example:
{code:xml}
<property>
   <name>oozie.service.JPAService.jdbc.url</name>
   
<value>jdbc:mysql://HOST/oozie?useUnicode=true&characterEncoding=UTF-8&useLegacyDatetimeCode=false&serverTimezone=GMT</value>
</property>
{code}
I imagine the Derby workaround won't work for anything else, but perhaps the 
MySQL workaround would?  

The easiest way to fix this is to add the {{user.timezone}} thing (to fix 
Derby), and have Oozie always add the JDBC configs to the JDBC URL (for the 
others).  If Oracle, Postgres, and SQLServer don't support that, we'll have to 
come up with other ideas.  The other ideas I had were to replace TIMESTAMP with 
DATETIME or the SQL equivalent of {{long}}; though these have the downside of 
being schema changes and possibly other side effects.

I have some homework for everyone :)
Please check if the DB type listed below is affected by this, and if so, does 
the above workaround (or some  similar configs) fix it?
- [~bowenzhangusa]: SQLServer
- [~chitnis], [~puru], [~rohini], or [~egashira]: Oracle
- Me: Postgres



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to