[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160099#comment-16160099
 ] 

ASF GitHub Bot commented on DRILL-5377:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/916
  
    Back to my original question. The premise of this bug seems to be that we 
corrupt Parquet dates and convert perfectly valid 4-digit years into invalid 
5-digit years. That is clearly a data corruption bug that should never occur. 
Why don't we fix that?
    
    Given that we've accepted the data corruption, we need to display 
five-digit years which the Java classes for date and time don't support in 
`toString()`. The code uses `toString()` because it does not do correct 
formatting using the classes provided. That's the second bug. Date display 
should make use of format preferences provided by the user, not the default 
ones provided by `toString()`. So, that's bug number 2.
    
    Now given the above two bugs, we introduce a third by creating ad-hoc, 
Drill-specific date/time classes, violating the JDBC standard, to display the 
corrupt five-digit years. So, no longer will Drill return the java.sql.Date 
class as specified by the standard, but rather our own subclass. How will this 
affect client code that relies on standard behavior?
    
    I feel we are compounding error upon error. Can we go back and fix the 
original problem: that users might prefer that we don't corrupt dates in their 
data? That is, the problem is not so much that we don't format corrupt data 
correctly, but rather that we do, in fact, corrupt data.


> Five-digit year dates are displayed incorrectly via jdbc
> --------------------------------------------------------
>
>                 Key: DRILL-5377
>                 URL: https://issues.apache.org/jira/browse/DRILL-5377
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>              Labels: ready-to-commit
>             Fix For: 1.12.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--------------+
> | FUTURE_DATE  |
> +--------------+
> | 356-02-16   |
> +--------------+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to