[ 
https://issues.apache.org/jira/browse/DRILL-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161763#comment-16161763
 ] 

ASF GitHub Bot commented on DRILL-5002:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/937#discussion_r138133454
  
    --- Diff: 
contrib/storage-hive/core/src/main/codegen/templates/ObjectInspectorHelper.java 
---
    @@ -204,7 +204,11 @@ public static JBlock getDrillObject(JCodeModel m, 
ObjectInspector oi,
               <#elseif entry.hiveType == "TIMESTAMP">
                 JVar tsVar = 
jc._else().decl(m.directClass(java.sql.Timestamp.class.getCanonicalName()), 
"ts",
                   castedOI.invoke("getPrimitiveJavaObject").arg(returnValue));
    -            jc._else().assign(returnValueHolder.ref("value"), 
tsVar.invoke("getTime"));
    +            // Bringing relative timestamp value without timezone info to 
timestamp value in UTC, since Drill keeps date-time values in UTC
    --- End diff --
    
    I'm not fully agreed. Let me explain:
    
    Let's take for example TimeStampVector, Drill keeps date-time values in 
DrillBuf as millis from epoch. Then timestamp values are extracted via 
[getObject()](https://github.com/apache/drill/blob/1c09c2f13bd0f50ca40c17dc0bfa7aae5826b8c3/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java#L446)
 method (I agree that the code in this method is questionable, but it is not 
current issue, this jira contains other bug).  
      For PST timezone machine and query `select timestamp '1970-01-01 
00:00:00' from (VALUES(1))` the `0 millis` timestamp value is stored in 
DrillBuf and only on extracting stage the value is represented with server 
timezone. The same result for querying the timestamp data from any data source: 
'1970-01-01 00:00:00' will be stored as `0 millis` timestamp for any timezone. 
But only for hive functions the logic differs:
    Let's consider `select to_utc_timestamp('1969-12-31 16:00:00','PST') from 
(VALUES(1))`. The result should be "1970-01-01 00:00:00" even with current 
Drill date/time logic. 
    The output from hive's UDF is "java.sql.Timestamp" value (for PST timezone 
machine -- `28800000 millis` internally, for UTC timezone machine -- `0 millis` 
internally). But in favour of above logic of 
[getObject()](https://github.com/apache/drill/blob/1c09c2f13bd0f50ca40c17dc0bfa7aae5826b8c3/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java#L446)
 the `0 millis` timestamp value should be storing in the DrillBuf for any 
timezone.
    
    In the result, with my fix the same timestamp values will be stored in 
DrillBuf for any timezone machine. 
    Let me know if I am wrong with something.


> Using hive's date functions on top of date column gives wrong results for 
> local time-zone
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-5002
>                 URL: https://issues.apache.org/jira/browse/DRILL-5002
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Hive, Storage - Parquet
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>            Priority: Critical
>         Attachments: 0_0_0.parquet
>
>
> git.commit.id.abbrev=190d5d4
> Wrong Result 1 :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1994-02-01' limit 2;
> +-------------+---------+
> | l_shipdate  | EXPR$1  |
> +-------------+---------+
> | 1994-02-01  | 1       |
> | 1994-02-01  | 1       |
> +-------------+---------+
> {code}
> Wrong Result 2 : 
> {code}
> select l_shipdate, `day`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-------------+---------+
> | l_shipdate  | EXPR$1  |
> +-------------+---------+
> | 1998-06-02  | 1       |
> | 1998-06-02  | 1       |
> +-------------+---------+
> {code}
> Correct Result :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-------------+---------+
> | l_shipdate  | EXPR$1  |
> +-------------+---------+
> | 1998-06-02  | 6       |
> | 1998-06-02  | 6       |
> +-------------+---------+
> {code}
> It looks like we are getting wrong results when the 'day' is '01'. I only 
> tried month and day hive functions....but wouldn't be surprised if they have 
> similar issues too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to