[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429335#comment-16429335 ] ASF GitHub Bot commented on DRILL-6016: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1166 > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422075#comment-16422075 ] ASF GitHub Bot commented on DRILL-6016: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul thanks for making all the changes (and of course for the fix)! > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421944#comment-16421944 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka removed the extra line. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421643#comment-16421643 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178456861 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -61,6 +60,7 @@ import org.junit.runners.Parameterized; @RunWith(Parameterized.class) + --- End diff -- ok, just remove it > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421642#comment-16421642 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178456935 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -780,17 +780,31 @@ public void testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception { Test the reading of a binary field as drill timestamp where data is in dictionary _and_ non-dictionary encoded pages */ @Test - @Ignore("relies on particular time zone, works for UTC") public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { try { testBuilder() - .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` order by int96_ts") + .sqlQuery("select min(int96_ts) date_value from dfs.`parquet/int96_dict_change`") --- End diff -- It is just more obvious what result is expected. But using MIN is ok. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420683#comment-16420683 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178324303 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -61,6 +60,7 @@ import org.junit.runners.Parameterized; @RunWith(Parameterized.class) + --- End diff -- Actually not required, tried to add another RunWith for Mocking and removed later on leaving the newline. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420493#comment-16420493 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178290675 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -780,17 +780,31 @@ public void testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception { Test the reading of a binary field as drill timestamp where data is in dictionary _and_ non-dictionary encoded pages */ @Test - @Ignore("relies on particular time zone, works for UTC") public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { try { testBuilder() - .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` order by int96_ts") + .sqlQuery("select min(int96_ts) date_value from dfs.`parquet/int96_dict_change`") --- End diff -- I did not try a WHERE statement, MIN was used to select a single record to compare. Was there any specific reason to use WHERE? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420294#comment-16420294 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178255942 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -780,17 +780,31 @@ public void testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception { Test the reading of a binary field as drill timestamp where data is in dictionary _and_ non-dictionary encoded pages */ @Test - @Ignore("relies on particular time zone, works for UTC") public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { try { testBuilder() - .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` order by int96_ts") + .sqlQuery("select min(int96_ts) date_value from dfs.`parquet/int96_dict_change`") --- End diff -- Did you try WHERE statement? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420295#comment-16420295 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178255699 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -61,6 +60,7 @@ import org.junit.runners.Parameterized; @RunWith(Parameterized.class) + --- End diff -- new line? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420144#comment-16420144 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka Done. Please review. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419087#comment-16419087 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178071020 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -35,6 +36,7 @@ import java.util.Map; import com.google.common.base.Joiner; +import mockit.integration.junit4.JMockit; --- End diff -- the same > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419088#comment-16419088 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178070635 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -780,17 +783,42 @@ public void testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception { Test the reading of a binary field as drill timestamp where data is in dictionary _and_ non-dictionary encoded pages */ @Test - @Ignore("relies on particular time zone, works for UTC") public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { try { testBuilder() .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` order by int96_ts") .optionSettingQueriesForTestQuery( "alter session set `%s` = true", ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP) .ordered() - .csvBaselineFile("testframework/testParquetReader/testInt96DictChange/q1.tsv") - .baselineTypes(TypeProtos.MinorType.TIMESTAMP) .baselineColumns("int96_ts") + .baselineValues(new DateTime(convertToLocalTimestamp("1970-01-01 00:00:01.000"))) --- End diff -- One baselineValue is enough. Please use `where` in the query. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419086#comment-16419086 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178072549 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -27,6 +27,7 @@ import java.math.BigDecimal; import java.nio.file.Paths; import java.sql.Date; +import java.sql.Timestamp; --- End diff -- unused import? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418981#comment-16418981 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka I have made similar changes for testSparkParquetBinaryAsTimeStamp_DictChange, testHiveParquetTimestampAsInt96_basic and testImpalaParquetBinaryAsTimeStamp_DictChange. All tests are passing, please have a look. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418688#comment-16418688 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178009588 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -797,6 +797,24 @@ public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { } } + @Test + public void testSparkParquetBinaryAsTimeStamp_DictChange() throws Exception { +try { + mockUtcDateTimeZone(); --- End diff -- I see that in `TestParquetWriter` only one parameter is used - `repeat`. I think you can replace `Parameterized` running of this. test with simple variable. Other approach - you can write programmatically using of JMockit. But I prefer not to use mocks if possible. So try to use `convertToLocalTimestamp`. By using it you can enable also `testHiveParquetTimestampAsInt96_basic` test and `testImpalaParquetBinaryAsTimeStamp_DictChange` with removing redundant rows. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418423#comment-16418423 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r177950795 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -797,6 +797,24 @@ public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { } } + @Test + public void testSparkParquetBinaryAsTimeStamp_DictChange() throws Exception { +try { + mockUtcDateTimeZone(); --- End diff -- @vdiravka your thoughts on comment above? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415088#comment-16415088 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r177318780 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -797,6 +797,24 @@ public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { } } + @Test + public void testSparkParquetBinaryAsTimeStamp_DictChange() throws Exception { +try { + mockUtcDateTimeZone(); --- End diff -- I could see two ways of doing this within the code itself. 1. Mock and run with UTC, and compare the results in UTC as in TestCastFunctions#testToDateForTimeStamp. Since TestParquetWriter already has a RunWith annotation, we might have to create another class and move both the methods. 2. Run with the JVM timezone(no mocking) and compare the results after a 'convertToLocalTimestamp' as in TestParquetWriter#testInt96TimeStampValueWidth Approach 2 does not used fixed UTC timezone. Which approach do you suggest? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414091#comment-16414091 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r177154051 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -797,6 +797,24 @@ public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { } } + @Test + public void testSparkParquetBinaryAsTimeStamp_DictChange() throws Exception { +try { + mockUtcDateTimeZone(); --- End diff -- It doesn't work without `@RunWith(JMockit.class)`. Also please enable above test case `testImpalaParquetBinaryAsTimeStamp_DictChange` with the same change. And be sure that tests pass in the other time zone. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413783#comment-16413783 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka I have made the changes. Please have a look. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412718#comment-16412718 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul Unit test from your PR relies on particular timezone similar to `TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange`. Could you please edit test case for working within any time zone? Please see this PR #904 for more details. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407690#comment-16407690 ] ASF GitHub Bot commented on DRILL-6016: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 +1. LGTM > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401444#comment-16401444 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 The schema given below creates the issue, as @vdiravka pointed int96 is marked required here. This parquet was generated with an older version of spark and is included in the test case. ``` message spark_schema { optional binary article_no (UTF8); optional binary qty (UTF8); required int96 run_date; } ``` Newer spark version created the schema below where int96 has become optional. ``` message spark_schema { optional binary country (UTF8); optional double sales; optional int96 targetDate; } ``` > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400151#comment-16400151 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra @vdiravka I have added the test case using the same parquet file(2.9k bytes). I tried creating a smaller file using Spark, but could not replicate the behavior. I have rebased the changes on the same commit and PR. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398652#comment-16398652 ] ASF GitHub Bot commented on DRILL-6016: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra I have compared meta of files from `TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange` and the meta from Rahul's dataset and found that test case indeed makes a query from two parquet files: one is dictionary encoded and other isn't. But the dataMode of column is `Optional`, that's why `Nullable` column reader is used. Rahul's dataset contains `required` mode for INT96 column. This is a difference. Therefore other non-nullable column reader is necessary. But I believe we have some mess in names of that column readers. Maybe to make some refactoring would be a good point. What do you think? For example to remove `Dictionary` prefixes from nested classes, but to leave it for top class name. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398508#comment-16398508 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra I will create a unit test with few time stamp fields. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398446#comment-16398446 ] ASF GitHub Bot commented on DRILL-6016: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul this link is good. As expected, the int96 column is dictionary encoded. Is it possible for you to extract just a couple of records from this file and then use that for a unit test? see [TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java#L784) @vdiravka TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange also uses an int96 that is dictionary encoded. Any idea whether (and why) it might be going thru a different code path? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398379#comment-16398379 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra please use the link https://github.com/rajrahul/files/raw/master/result.tar.gz The files are present inside result/parquet/latest. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398344#comment-16398344 ] ASF GitHub Bot commented on DRILL-6016: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul, thanks for submitting the patch. It looks good. I guess we missed dictionary encoded int96 timestamps (even though timestamps with nanosecond precision) are the one thing that should never, ever, be dictionary encoded! Just to make sure, I tried the use the sample file in DRILL-6016, but I could not even unzip it! Can you please check and see if the file is correct? WE can use that to create the unit test as well. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398190#comment-16398190 ] ASF GitHub Bot commented on DRILL-6016: --- Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra @vdiravka I do not have a test case for this. I have manually verified the scenario with and without the patch. The sample input file is attached with https://issues.apache.org/jira/browse/DRILL-6016. > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398167#comment-16398167 ] ASF GitHub Bot commented on DRILL-6016: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra would you please review this? > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj >Priority: Major > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281561#comment-16281561 ] Vitalii Diravka commented on DRILL-6016: Interesting dataset. Drill reads INT96 by default as VARBINARY: https://drill.apache.org/docs/parquet-format/#sql-data-types-to-parquet But with provided dataset it returns an error. Even with explicit converting it returns an error: {code} 0: jdbc:drill:zk=local> select CONVERT_FROM(run_date, 'TIMESTAMP_IMPALA') from dfs.`/home/vitalii/Downloads/result/parquet/latest/part-r-0-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet` limit 1; Error: DATA_READ ERROR: Error reading from Parquet file File: /home/vitalii/Downloads/result/parquet/latest/part-r-0-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet Column: run_date Row Group Start: 5523 Fragment 0:0 {code} But the schema looks good: {code} vitalii@vitalii-pc:~/parquet-tools/parquet-mr/parquet-tools/target$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema /home/vitalii/Downloads/result/parquet/latest/part-r-0-0c44161e-49e7-4b40-b4ab-c3d8e492bf33.snappy.parquet message spark_schema { optional binary article_no (UTF8); optional binary qty (UTF8); required int96 run_date; } {code} > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug > Environment: Drill 1.11 >Reporter: Rahul Raj > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v6.4.14#64029)