[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

ASF GitHub Bot (JIRA) Mon, 17 Oct 2016 11:59:39 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583114#comment-15583114
 ]


ASF GitHub Bot commented on DRILL-4373:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/600#discussion_r83710133
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
    @@ -754,15 +764,45 @@ public void testImpalaParquetVarBinary_DictChange() 
throws Exception {
         compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
       }
     
    +  @Test
    +  public void testImpalaParquetBinaryTimeStamp_DictChange() throws 
Exception {
    +    try {
    +      test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
    +      compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
    --- End diff --
    
    1. Is it better to compare result with baseline columns and values from the 
file or it is ok to compare with `sqlBaselineQuery` and disabled new 
`PARQUET_READER_INT96_AS_TIMESTAMP` option?
    2. In the process of investigating this test I found that the primitive 
data type of the column in the file `int96_dict_change.parquet`  is BINARY, not 
INT96.  
    I am a little bit confused with this. Do we need convert this BINARY to 
TIMESTAMP as well?
    CONVERT_FROM function with IMPALA_TIMESTAMP argument works properly for 
this field.
    I will investigate a little more about does impala and hive can store 
timestamps into parquet BINARY. 


> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4373
>                 URL: https://issues.apache.org/jira/browse/DRILL-4373
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive, Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Rahul Challapalli
>            Assignee: Karthikeyan Manivannan
>              Labels: doc-impacting
>             Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

Reply via email to