Werner Daehn created AVRO-2950:
----------------------------------

             Summary: LocalDateTime-millis and -micros is bound to lead to 
wrong data
                 Key: AVRO-2950
                 URL: https://issues.apache.org/jira/browse/AVRO-2950
             Project: Apache Avro
          Issue Type: Bug
          Components: logical types
    Affects Versions: 1.10.0
            Reporter: Werner Daehn


The recent addition of LocalDateTime Logical Types I find it extremely 
dangerous. It will lead to wrong data for many users without noticing.

While I understand the idea and reason, the oversight in my opinion is the the 
difference between Hadoop Files and Avro Messages: Hadoop is for data storage, 
Avro is for data exchange. Hadoop runs in a single cluster and it has a well 
defined time zone. Thus LocalDateTime does have a meaning. Avro is used to 
exchange data between systems. Serializing data on a system in time zone 1 and 
loading it into the Hadoop cluster located in time zone 2 will lead to wrong 
data with an high likelihood.

Example: Kafka Connect Producer is running in US (PST) and Hadoop in UK (GMT).

User 1 expectation: In Hadoop the data is in LocalDateTime meaning in GMT. The 
Java data types Date, java.sql.Timestamp and LocalDateTime are used, which all 
are data types without a time zone information. Thus they return correct data 
if the loaded data has the meaning of UK-time. The Kafka Producer does not know 
the time zone of Hadoop.

User 2 expectation: In Hadoop the data belongs to an office and has an implicit 
time zone hence. It is the time zone of the office location. In that case a 
LocalDateTime is meant as the time as seen on the office clock.

As these two cases cannot be distinguished from each other and people tend to 
think locally, we are inviting people to produce wrong data.

 

The better logical type would have been for the Java ZonedDateTime. Then the 
producer and consumer are in sync. The producer is loading data in PST time 
zone and the consumer can read the data as GMT times. If he wants the local 
office times, he has to add the office timezone offsets.

LocalDateTime was introduced here: 
https://issues.apache.org/jira/browse/AVRO-2328

 

Can you please open the discussion on this item to make sure you are fully 
aware of the implications and still want to go with it?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to