[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160099#comment-16160099 ]
ASF GitHub Bot commented on DRILL-5377: --------------------------------------- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/916 Back to my original question. The premise of this bug seems to be that we corrupt Parquet dates and convert perfectly valid 4-digit years into invalid 5-digit years. That is clearly a data corruption bug that should never occur. Why don't we fix that? Given that we've accepted the data corruption, we need to display five-digit years which the Java classes for date and time don't support in `toString()`. The code uses `toString()` because it does not do correct formatting using the classes provided. That's the second bug. Date display should make use of format preferences provided by the user, not the default ones provided by `toString()`. So, that's bug number 2. Now given the above two bugs, we introduce a third by creating ad-hoc, Drill-specific date/time classes, violating the JDBC standard, to display the corrupt five-digit years. So, no longer will Drill return the java.sql.Date class as specified by the standard, but rather our own subclass. How will this affect client code that relies on standard behavior? I feel we are compounding error upon error. Can we go back and fix the original problem: that users might prefer that we don't corrupt dates in their data? That is, the problem is not so much that we don't format corrupt data correctly, but rather that we do, in fact, corrupt data. > Five-digit year dates are displayed incorrectly via jdbc > -------------------------------------------------------- > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.10.0 > Reporter: Rahul Challapalli > Assignee: Vitalii Diravka > Labels: ready-to-commit > Fix For: 1.12.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--------------+ > | FUTURE_DATE | > +--------------+ > | 356-02-16 | > +--------------+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)