Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/600#discussion_r83908798
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
    @@ -739,30 +739,54 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
       }
     
       /*
    -  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
    +    Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
    +    the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
    +    store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
        */
       @Test
       public void testImpalaParquetInt96() throws Exception {
         compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
    +    try {
    +      test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
    +      compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
    --- End diff --
    
    Github seems to have swallowed the previous comments so including 
@vdiravka's questions here:
    
    >  1) Is it better to compare result with baseline columns and values from 
the file or it is ok to compare with sqlBaselineQuery and disabled new 
PARQUET_READER_INT96_AS_TIMESTAMP option?
    > In the process of investigating this test I found that the primitive data 
type of the column in the file int96_dict_change.parquet is BINARY, not INT96.
    > 2) I am a little bit confused with this. Do we need convert this BINARY 
to TIMESTAMP as well? CONVERT_FROM function with IMPALA_TIMESTAMP argument 
works properly for this field. I will investigate a little more about does 
impala and hive can store timestamps into parquet BINARY.
    
    For 1) I think it is better to compare values from the file as opposed to 
running with the the PARQUET_READER_INT96_AS_TIMESTAMP disabled.
    For 2) Can you correct the int96 data in the file? AFAIK, the data should 
be int96 for the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to