[GitHub] drill pull request #1067: DRILL-3958: Return a valid error message when stor...
GitHub user rajrahul opened a pull request: https://github.com/apache/drill/pull/1067 DRILL-3958: Return a valid error message when storage plugin fails You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajrahul/drill DRILL-3958 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1067 commit d5db51c795e9a9537c9feb4cf330d3b11045be8f Author: Rahul Raj Date: 2017-12-12T08:51:53Z DRILL-3958: Return a valid error message when storage plugin fails ---
[GitHub] drill pull request #1166: DRILL-6016 - Fix for Error reading INT96 created b...
GitHub user rajrahul opened a pull request: https://github.com/apache/drill/pull/1166 DRILL-6016 - Fix for Error reading INT96 created by Apache Spark This fixes DRILL-6016 where drill was failing to read int96 generated by Apache Spark even after setting store.parquet.reader.int96_as_timestamp to true. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajrahul/drill DRILL-6016 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1166.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1166 commit 6d74884f314638e8d95e0c6ea3431222cb308ec5 Author: Rahul Raj Date: 2018-03-14T06:35:45Z DRILL-6016 - Fix for Error reading INT96 created by Apache Spark ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra @vdiravka I do not have a test case for this. I have manually verified the scenario with and without the patch. The sample input file is attached with https://issues.apache.org/jira/browse/DRILL-6016. ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra please use the link https://github.com/rajrahul/files/raw/master/result.tar.gz The files are present inside result/parquet/latest. ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra I will create a unit test with few time stamp fields. ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @parthchandra @vdiravka I have added the test case using the same parquet file(2.9k bytes). I tried creating a smaller file using Spark, but could not replicate the behavior. I have rebased the changes on the same commit and PR. ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 The schema given below creates the issue, as @vdiravka pointed int96 is marked required here. This parquet was generated with an older version of spark and is included in the test case. ``` message spark_schema { optional binary article_no (UTF8); optional binary qty (UTF8); required int96 run_date; } ``` Newer spark version created the schema below where int96 has become optional. ``` message spark_schema { optional binary country (UTF8); optional double sales; optional int96 targetDate; } ``` ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka I have made the changes. Please have a look. ---
[GitHub] drill pull request #1166: DRILL-6016 - Fix for Error reading INT96 created b...
Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r177318780 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -797,6 +797,24 @@ public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { } } + @Test + public void testSparkParquetBinaryAsTimeStamp_DictChange() throws Exception { +try { + mockUtcDateTimeZone(); --- End diff -- I could see two ways of doing this within the code itself. 1. Mock and run with UTC, and compare the results in UTC as in TestCastFunctions#testToDateForTimeStamp. Since TestParquetWriter already has a RunWith annotation, we might have to create another class and move both the methods. 2. Run with the JVM timezone(no mocking) and compare the results after a 'convertToLocalTimestamp' as in TestParquetWriter#testInt96TimeStampValueWidth Approach 2 does not used fixed UTC timezone. Which approach do you suggest? ---
[GitHub] drill pull request #1166: DRILL-6016 - Fix for Error reading INT96 created b...
Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r177950795 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -797,6 +797,24 @@ public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { } } + @Test + public void testSparkParquetBinaryAsTimeStamp_DictChange() throws Exception { +try { + mockUtcDateTimeZone(); --- End diff -- @vdiravka your thoughts on comment above? ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka I have made similar changes for testSparkParquetBinaryAsTimeStamp_DictChange, testHiveParquetTimestampAsInt96_basic and testImpalaParquetBinaryAsTimeStamp_DictChange. All tests are passing, please have a look. ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka Done. Please review. ---
[GitHub] drill pull request #1166: DRILL-6016 - Fix for Error reading INT96 created b...
Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178290675 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -780,17 +780,31 @@ public void testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception { Test the reading of a binary field as drill timestamp where data is in dictionary _and_ non-dictionary encoded pages */ @Test - @Ignore("relies on particular time zone, works for UTC") public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { try { testBuilder() - .sqlQuery("select int96_ts from dfs.`parquet/int96_dict_change` order by int96_ts") + .sqlQuery("select min(int96_ts) date_value from dfs.`parquet/int96_dict_change`") --- End diff -- I did not try a WHERE statement, MIN was used to select a single record to compare. Was there any specific reason to use WHERE? ---
[GitHub] drill pull request #1166: DRILL-6016 - Fix for Error reading INT96 created b...
Github user rajrahul commented on a diff in the pull request: https://github.com/apache/drill/pull/1166#discussion_r178324303 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -61,6 +60,7 @@ import org.junit.runners.Parameterized; @RunWith(Parameterized.class) + --- End diff -- Actually not required, tried to add another RunWith for Mocking and removed later on leaving the newline. ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user rajrahul commented on the issue: https://github.com/apache/drill/pull/1166 @vdiravka removed the extra line. ---