[jira] [Comment Edited] (DRILL-7038) Queries on partitioned columns scan the entire datasets
[ https://issues.apache.org/jira/browse/DRILL-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808416#comment-16808416 ] Bohdan Kazydub edited comment on DRILL-7038 at 4/3/19 6:48 AM: --- Hi, [~bbevens]. I think it's OK, but I think it is needed to specify that additionally to {{DISTINCT}} or {{GROUP BY}} operation the query has to query ({{SELECT}}) partition columns (dir0, dir1,..., dirN) only. was (Author: kazydubb): Hi, [~bbevens]. I think it's OK, but I think it is needed to specify that additionally for {{DISTINCT}} or {{GROUP BY}} operation the query has to query ({{SELECT}}) partition columns (dir0, dir1,..., dirN) only. > Queries on partitioned columns scan the entire datasets > --- > > Key: DRILL-7038 > URL: https://issues.apache.org/jira/browse/DRILL-7038 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > For tables with hive-style partitions like > {code} > /table/2018/Q1 > /table/2018/Q2 > /table/2019/Q1 > etc. > {code} > if any of the following queries is run: > {code} > select distinct dir0 from dfs.`/table` > {code} > {code} > select dir0 from dfs.`/table` group by dir0 > {code} > it will actually scan every single record in the table rather than just > getting a list of directories at the dir0 level. This applies even when > cached metadata is available. This is a big penalty especially as the > datasets grow. > To avoid such situations, a logical prune rule can be used to collect > partition columns (`dir0`), either from metadata cache (if available) or > group scan, and drop unnecessary files from being read. The rule will be > applied on following conditions: > 1) all queried columns are partitoin columns, and > 2) either {{DISTINCT}} or {{GROUP BY}} operations are performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7038) Queries on partitioned columns scan the entire datasets
[ https://issues.apache.org/jira/browse/DRILL-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808416#comment-16808416 ] Bohdan Kazydub commented on DRILL-7038: --- Hi, [~bbevens]. I think it's OK, but I think it is needed to specify that additionally for {{DISTINCT}} or {{GROUP BY}} operation the query has to query ({{SELECT}}) partition columns (dir0, dir1,..., dirN) only. > Queries on partitioned columns scan the entire datasets > --- > > Key: DRILL-7038 > URL: https://issues.apache.org/jira/browse/DRILL-7038 > Project: Apache Drill > Issue Type: Improvement >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > For tables with hive-style partitions like > {code} > /table/2018/Q1 > /table/2018/Q2 > /table/2019/Q1 > etc. > {code} > if any of the following queries is run: > {code} > select distinct dir0 from dfs.`/table` > {code} > {code} > select dir0 from dfs.`/table` group by dir0 > {code} > it will actually scan every single record in the table rather than just > getting a list of directories at the dir0 level. This applies even when > cached metadata is available. This is a big penalty especially as the > datasets grow. > To avoid such situations, a logical prune rule can be used to collect > partition columns (`dir0`), either from metadata cache (if available) or > group scan, and drop unnecessary files from being read. The rule will be > applied on following conditions: > 1) all queried columns are partitoin columns, and > 2) either {{DISTINCT}} or {{GROUP BY}} operations are performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types
[ https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-7132. - > Metadata cache does not have correct min/max values for varchar and interval > data types > --- > > Key: DRILL-7132 > URL: https://issues.apache.org/jira/browse/DRILL-7132 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.14.0 >Reporter: Robert Hou >Priority: Major > Fix For: 1.17.0 > > Attachments: 0_0_10.parquet > > > The parquet metadata cache does not have correct min/max values for varchar > and interval data types. > I have attached a parquet file. Here is what parquet tools shows for varchar: > [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 > average: 67 total: 67 (raw data: 65 saving -3%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 65 max: 65 average: 65 total: 65 > column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "varchar_col" ], > "minValue" : "aW9lZ2pOSkt2bmtk", > "maxValue" : "aW9lZ2pOSkt2bmtk", > "nulls" : 0 > Here is what parquet tools shows for interval: > [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 > average: 52 total: 52 (raw data: 50 saving -4%) > values: min: 1 max: 1 average: 1 total: 1 > uncompressed: min: 50 max: 50 average: 50 total: 50 > column values statistics: min: P18582D, max: P18582D, num_nulls: 0 > Here is what the metadata cache file shows: > "name" : [ "interval_col" ], > "minValue" : "UDE4NTgyRA==", > "maxValue" : "UDE4NTgyRA==", > "nulls" : 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65
[ https://issues.apache.org/jira/browse/DRILL-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808309#comment-16808309 ] ASF GitHub Bot commented on DRILL-7153: --- cgivre commented on pull request #1731: DRILL-7153: Drill Fails to Build using JDK 1.8.0_65 URL: https://github.com/apache/drill/pull/1731 This PR fixes a bug in which building Drill using JDK 1.8.0_65 results in the following error. ``` [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project drill-java-exec: Compilation failure [ERROR] /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68] error: unreported exception E; must be caught or declared to be thrown [ERROR] where E,T,V are type-variables: [ERROR] E extends Exception declared in method accept(ExprVisitor,V) [ERROR] T extends Object declared in method accept(ExprVisitor,V) [ERROR] V extends Object declared in method accept(ExprVisitor,V) [ERROR] [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :drill-java-exec ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill Fails to Build using JDK 1.8.0_65 > --- > > Key: DRILL-7153 > URL: https://issues.apache.org/jira/browse/DRILL-7153 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Blocker > Fix For: 1.16.0 > > > Drill fails to build when using Java 1.8.0_65. Throws the following error: > [{{ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile > (default-compile) on project drill-java-exec: Compilation failure > [ERROR] > /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68] > error: unreported exception E; must be caught or declared to be thrown > [ERROR] where E,T,V are type-variables: > [ERROR] E extends Exception declared in method > accept(ExprVisitor,V) > [ERROR] T extends Object declared in method > accept(ExprVisitor,V) > [ERROR] V extends Object declared in method > accept(ExprVisitor,V) > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :drill-java-exec}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65
Charles Givre created DRILL-7153: Summary: Drill Fails to Build using JDK 1.8.0_65 Key: DRILL-7153 URL: https://issues.apache.org/jira/browse/DRILL-7153 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.16.0 Drill fails to build when using Java 1.8.0_65. Throws the following error: [{{ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project drill-java-exec: Compilation failure [ERROR] /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68] error: unreported exception E; must be caught or declared to be thrown [ERROR] where E,T,V are type-variables: [ERROR] E extends Exception declared in method accept(ExprVisitor,V) [ERROR] T extends Object declared in method accept(ExprVisitor,V) [ERROR] V extends Object declared in method accept(ExprVisitor,V) [ERROR] [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :drill-java-exec}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7152) Histogram creation throws exception for all nulls column
[ https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha resolved DRILL-7152. --- Resolution: Fixed Fixed in 54384a9. > Histogram creation throws exception for all nulls column > > > Key: DRILL-7152 > URL: https://issues.apache.org/jira/browse/DRILL-7152 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > > ANALYZE command fails when creating the histogram for a table with 1 column > with all NULLs. > Analyze table `table_stats/parquet_col_nulls` compute statistics; > {noformat} > Error: SYSTEM ERROR: NullPointerException > (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get > TDigest output > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {noformat} > This table has 1 column with all NULL values: > {noformat} > apache drill (dfs.drilltestdir)> select * from > `table_stats/parquet_col_nulls` limit 20; > +--+--+ > | col1 | col2 | > +--+--+ > | 0| null | > | 1| null | > | 2| null | > | 3| null | > | 4| null | > | 5| null | > | 6| null | > | 7| null | > | 8| null | > | 9| null | > | 10 | null | > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column
[ https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808298#comment-16808298 ] ASF GitHub Bot commented on DRILL-7152: --- amansinha100 commented on pull request #1730: DRILL-7152: During histogram creation handle the case when all values… URL: https://github.com/apache/drill/pull/1730 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Histogram creation throws exception for all nulls column > > > Key: DRILL-7152 > URL: https://issues.apache.org/jira/browse/DRILL-7152 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > > ANALYZE command fails when creating the histogram for a table with 1 column > with all NULLs. > Analyze table `table_stats/parquet_col_nulls` compute statistics; > {noformat} > Error: SYSTEM ERROR: NullPointerException > (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get > TDigest output > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {noformat} > This table has 1 column with all NULL values: > {noformat} > apache drill (dfs.drilltestdir)> select * from > `table_stats/parquet_col_nulls` limit 20; > +--+--+ > | col1 | col2 | > +--+--+ > | 0| null | > | 1| null | > | 2| null | > | 3| null | > | 4| null | > | 5| null | > | 6| null | > | 7| null | > | 8| null | > | 9| null | > | 10 | null | > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808294#comment-16808294 ] ASF GitHub Bot commented on DRILL-7143: --- paul-rogers commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271557253 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/AbstractFixedWidthWriter.java ## @@ -93,17 +112,62 @@ protected final int prepareWrite(int writeIndex) { @Override protected final void fillEmpties(final int writeIndex) { final int width = width(); - final int stride = ZERO_BUF.length / width; + final int stride = emptyValue.length / width; int dest = lastWriteIndex + 1; while (dest < writeIndex) { int length = writeIndex - dest; length = Math.min(length, stride); -drillBuf.setBytes(dest * width, ZERO_BUF, 0, length * width); +drillBuf.setBytes(dest * width, emptyValue, 0, length * width); dest += length; } } } + /** + * Base class for writers that use the Java int type as their native + * type. Handles common implicit conversions from other types to int. + */ + public static abstract class BaseIntWriter extends BaseFixedWidthWriter { + +@Override +public final void setLong(final long value) { + try { +// Catches int overflow. Does not catch overflow for smaller types. +setInt(Math.toIntExact(value)); + } catch (final ArithmeticException e) { +throw InvalidConversionError.writeError(schema(), value, e); + } +} + +@Override +public final void setDouble(final double value) { Review comment: Yes, just as setInt() covers TInyInt, SmallInt, Int, UInt1, and UInt2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808295#comment-16808295 ] ASF GitHub Bot commented on DRILL-7143: --- paul-rogers commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271557119 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/VectorPrinter.java ## @@ -33,7 +32,10 @@ public static void printOffsets(UInt4Vector vector, int start, int length) { header(vector, start, length); for (int i = start, j = 0; j < length; i++, j++) { - if (j > 0) { + if (j % 40 == 0) { Review comment: Before this change, I had a vector of 1000 items all on one line. After this change, the output is 40 elements per line. Note that this code is used only during debugging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808273#comment-16808273 ] ASF GitHub Bot commented on DRILL-7150: --- amansinha100 commented on pull request #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST URL: https://github.com/apache/drill/pull/1729#discussion_r271548124 ## File path: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java ## @@ -93,7 +95,9 @@ public static CompareFunctionsProcessor processWithTimeZoneOffset(FunctionCall c protected boolean visitTimestampExpr(SchemaPath path, TimeStampExpression valueArg) { // converts timestamp value from local time zone to UTC since the record reader // reads the timestamp in local timezone if the readTimestampWithZoneOffset flag is enabled -long timeStamp = valueArg.getTimeStamp() - DateUtility.TIMEZONE_OFFSET_MILLIS; +long timeStamp = Instant.ofEpochMilli(valueArg.getTimeStamp()).atZone(ZoneId.of("UTC")) Review comment: This is a long chain of functions .. could you split this into couple of statements ? Helps both readability and debugging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1. Create the table in MaprDB shell: > {noformat} > create /tmp/testtimestamp > insert /tmp/testtimestamp --value > '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' > {noformat} > 2. Create an external hive table: > {code:sql} > CREATE EXTERNAL TABLE default.timeTest > (`_id` string, > `str` string, > `ts` timestamp) > ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') > {code} > 3. Enable native reader and timezone conversion for MaprDB timestamp: > {code:sql} > alter session set > `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; > alter session set > `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; > {code} > 4. Run the query on the table from Drill using hive plugin: > {code:java} > 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; > +--+--+--+ > | _id | str|ts| > +--+--+--+ > | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | > | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | > | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | > | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | > +--+--+--+ > 4 rows selected (0.343 seconds) > {code} > Please note that timestamps for {{eot}} and {{pst}} values are incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808272#comment-16808272 ] ASF GitHub Bot commented on DRILL-7150: --- amansinha100 commented on pull request #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST URL: https://github.com/apache/drill/pull/1729#discussion_r271548171 ## File path: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java ## @@ -357,7 +357,8 @@ protected void writeTimeStamp(MapOrListWriterImpl writer, String fieldName, Docu * @param readerdocument reader */ private void writeTimestampWithLocalZoneOffset(MapOrListWriterImpl writer, String fieldName, DocumentReader reader) { -long timestamp = reader.getTimestampLong() + DateUtility.TIMEZONE_OFFSET_MILLIS; +long timestamp = Instant.ofEpochMilli(reader.getTimestampLong()).atZone(ZoneId.systemDefault()) Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1. Create the table in MaprDB shell: > {noformat} > create /tmp/testtimestamp > insert /tmp/testtimestamp --value > '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' > {noformat} > 2. Create an external hive table: > {code:sql} > CREATE EXTERNAL TABLE default.timeTest > (`_id` string, > `str` string, > `ts` timestamp) > ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') > {code} > 3. Enable native reader and timezone conversion for MaprDB timestamp: > {code:sql} > alter session set > `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; > alter session set > `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; > {code} > 4. Run the query on the table from Drill using hive plugin: > {code:java} > 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; > +--+--+--+ > | _id | str|ts| > +--+--+--+ > | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | > | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | > | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | > | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | > +--+--+--+ > 4 rows selected (0.343 seconds) > {code} > Please note that timestamps for {{eot}} and {{pst}} values are incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column
[ https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808207#comment-16808207 ] ASF GitHub Bot commented on DRILL-7152: --- gparai commented on issue #1730: DRILL-7152: During histogram creation handle the case when all values… URL: https://github.com/apache/drill/pull/1730#issuecomment-479236474 @amansinha100 please take a look at the Travis failure. Otherwise, changes LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Histogram creation throws exception for all nulls column > > > Key: DRILL-7152 > URL: https://issues.apache.org/jira/browse/DRILL-7152 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > > ANALYZE command fails when creating the histogram for a table with 1 column > with all NULLs. > Analyze table `table_stats/parquet_col_nulls` compute statistics; > {noformat} > Error: SYSTEM ERROR: NullPointerException > (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get > TDigest output > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {noformat} > This table has 1 column with all NULL values: > {noformat} > apache drill (dfs.drilltestdir)> select * from > `table_stats/parquet_col_nulls` limit 20; > +--+--+ > | col1 | col2 | > +--+--+ > | 0| null | > | 1| null | > | 2| null | > | 3| null | > | 4| null | > | 5| null | > | 6| null | > | 7| null | > | 8| null | > | 9| null | > | 10 | null | > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7152) Histogram creation throws exception for all nulls column
[ https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-7152: -- Reviewer: Gautam Parai > Histogram creation throws exception for all nulls column > > > Key: DRILL-7152 > URL: https://issues.apache.org/jira/browse/DRILL-7152 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > > ANALYZE command fails when creating the histogram for a table with 1 column > with all NULLs. > Analyze table `table_stats/parquet_col_nulls` compute statistics; > {noformat} > Error: SYSTEM ERROR: NullPointerException > (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get > TDigest output > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {noformat} > This table has 1 column with all NULL values: > {noformat} > apache drill (dfs.drilltestdir)> select * from > `table_stats/parquet_col_nulls` limit 20; > +--+--+ > | col1 | col2 | > +--+--+ > | 0| null | > | 1| null | > | 2| null | > | 3| null | > | 4| null | > | 5| null | > | 6| null | > | 7| null | > | 8| null | > | 9| null | > | 10 | null | > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column
[ https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808181#comment-16808181 ] ASF GitHub Bot commented on DRILL-7152: --- amansinha100 commented on issue #1730: DRILL-7152: During histogram creation handle the case when all values… URL: https://github.com/apache/drill/pull/1730#issuecomment-479218730 @gparai could you please review ? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Histogram creation throws exception for all nulls column > > > Key: DRILL-7152 > URL: https://issues.apache.org/jira/browse/DRILL-7152 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > > ANALYZE command fails when creating the histogram for a table with 1 column > with all NULLs. > Analyze table `table_stats/parquet_col_nulls` compute statistics; > {noformat} > Error: SYSTEM ERROR: NullPointerException > (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get > TDigest output > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {noformat} > This table has 1 column with all NULL values: > {noformat} > apache drill (dfs.drilltestdir)> select * from > `table_stats/parquet_col_nulls` limit 20; > +--+--+ > | col1 | col2 | > +--+--+ > | 0| null | > | 1| null | > | 2| null | > | 3| null | > | 4| null | > | 5| null | > | 6| null | > | 7| null | > | 8| null | > | 9| null | > | 10 | null | > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column
[ https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808178#comment-16808178 ] ASF GitHub Bot commented on DRILL-7152: --- amansinha100 commented on pull request #1730: DRILL-7152: During histogram creation handle the case when all values… URL: https://github.com/apache/drill/pull/1730 … of a column are NULLs. Please see [DRILL-7152](https://issues.apache.org/jira/browse/DRILL-7152) for a description of the issue. It was caused because all the column's values are NULLs and the t-digest code-gen functions tried to generate an output for an empty t-digest since it does not store any NULL values. The fix is to check the t-digest size() first before trying to create the output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Histogram creation throws exception for all nulls column > > > Key: DRILL-7152 > URL: https://issues.apache.org/jira/browse/DRILL-7152 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > > ANALYZE command fails when creating the histogram for a table with 1 column > with all NULLs. > Analyze table `table_stats/parquet_col_nulls` compute statistics; > {noformat} > Error: SYSTEM ERROR: NullPointerException > (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get > TDigest output > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 > > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 > org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > > org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {noformat} > This table has 1 column with all NULL values: > {noformat} > apache drill (dfs.drilltestdir)> select * from > `table_stats/parquet_col_nulls` limit 20; > +--+--+ > | col1 | col2 | > +--+--+ > | 0| null | > | 1| null | > | 2| null | > | 3| null | > | 4| null | > | 5| null | > | 6| null | > | 7| null | > | 8| null | > | 9| null | > | 10 | null | > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)
[ https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808175#comment-16808175 ] ASF GitHub Bot commented on DRILL-7063: --- dvjyothsna commented on pull request #1723: DRILL-7063: Seperate metadata cache file into summary, file metadata URL: https://github.com/apache/drill/pull/1723#discussion_r271507937 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java ## @@ -161,7 +161,7 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws ForemanSetupException { */ private SqlNodeList getColumnList(final SqlRefreshMetadata sqlrefreshMetadata) { SqlNodeList columnList = sqlrefreshMetadata.getFieldList(); -if (columnList == null || !SqlNodeList.isEmptyList(columnList)) { Review comment: Removed the extra check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create separate summary file for schema, totalRowCount, totalNullCount > (includes maintenance) > - > > Key: DRILL-7063 > URL: https://issues.apache.org/jira/browse/DRILL-7063 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 252h > Remaining Estimate: 252h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate
[ https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-7136: Assignee: Gautam Parai (was: Pritesh Maker) > Num_buckets for HashAgg in profile may be inaccurate > > > Key: DRILL-7136 > URL: https://issues.apache.org/jira/browse/DRILL-7136 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Fix For: 1.16.0 > > Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill > > > I ran TPCH query 17 with sf 1000. Here is the query: > {noformat} > select > sum(l.l_extendedprice) / 7.0 as avg_yearly > from > lineitem l, > part p > where > p.p_partkey = l.l_partkey > and p.p_brand = 'Brand#13' > and p.p_container = 'JUMBO CAN' > and l.l_quantity < ( > select > 0.2 * avg(l2.l_quantity) > from > lineitem l2 > where > l2.l_partkey = p.p_partkey > ); > {noformat} > One of the hash agg operators has resized 6 times. It should have 4M > buckets. But the profile shows it has 64K buckets. > I have attached a sample profile. In this profile, the hash agg operator is > (04-02). > {noformat} > Operator Metrics > Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING > RESIZING_TIME_MSNUM_PARTITIONS SPILLED_PARTITIONS SPILL_MB > SPILL_CYCLE INPUT_BATCH_COUNT AVG_INPUT_BATCH_BYTES > AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT OUTPUT_BATCH_COUNT > AVG_OUTPUT_BATCH_BYTES AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT > 04-00-02 65,536 748,746 6 364 1 > 582 0 813 582,653 18 26,316,456 401 1,631,943 > 25 26,176,350 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7152) Histogram creation throws exception for all nulls column
Aman Sinha created DRILL-7152: - Summary: Histogram creation throws exception for all nulls column Key: DRILL-7152 URL: https://issues.apache.org/jira/browse/DRILL-7152 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Aman Sinha Assignee: Aman Sinha Fix For: 1.16.0 ANALYZE command fails when creating the histogram for a table with 1 column with all NULLs. Analyze table `table_stats/parquet_col_nulls` compute statistics; {noformat} Error: SYSTEM ERROR: NullPointerException (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get TDigest output org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085 org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492 org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224 org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288 org.apache.drill.exec.record.AbstractRecordBatch.next():186 org.apache.drill.exec.record.AbstractRecordBatch.next():126 org.apache.drill.exec.record.AbstractRecordBatch.next():116 org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358 org.apache.drill.exec.record.AbstractRecordBatch.next():186 org.apache.drill.exec.record.AbstractRecordBatch.next():126 org.apache.drill.exec.record.AbstractRecordBatch.next():116 org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106 org.apache.drill.exec.record.AbstractRecordBatch.next():186 org.apache.drill.exec.record.AbstractRecordBatch.next():126 org.apache.drill.exec.record.AbstractRecordBatch.next():116 org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96 org.apache.drill.exec.record.AbstractRecordBatch.next():186 org.apache.drill.exec.record.AbstractRecordBatch.next():126 org.apache.drill.exec.record.AbstractRecordBatch.next():116 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 org.apache.drill.exec.record.AbstractRecordBatch.next():186 org.apache.drill.exec.physical.impl.BaseRootExec.next():104 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 org.apache.drill.exec.physical.impl.BaseRootExec.next():94 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1669 org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 {noformat} This table has 1 column with all NULL values: {noformat} apache drill (dfs.drilltestdir)> select * from `table_stats/parquet_col_nulls` limit 20; +--+--+ | col1 | col2 | +--+--+ | 0| null | | 1| null | | 2| null | | 3| null | | 4| null | | 5| null | | 6| null | | 7| null | | 8| null | | 9| null | | 10 | null | {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7045) UDF string_binary java.lang.IndexOutOfBoundsException:
[ https://issues.apache.org/jira/browse/DRILL-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808134#comment-16808134 ] ASF GitHub Bot commented on DRILL-7045: --- sohami commented on issue #1671: DRILL-7045 UDF string_binary java.lang.IndexOutOfBoundsException URL: https://github.com/apache/drill/pull/1671#issuecomment-479186883 @jcmcote - I have address @KazydubB comment in this commit and rebased on latest apache. Can you please make the change or pull in this commit so that we can close this PR ? https://github.com/sohami/drill/commit/7aaef8691a4a594442464301035ea3aefd7497dd This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UDF string_binary java.lang.IndexOutOfBoundsException: > -- > > Key: DRILL-7045 > URL: https://issues.apache.org/jira/browse/DRILL-7045 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.15.0 >Reporter: jean-claude >Assignee: jean-claude >Priority: Minor > Fix For: 1.16.0 > > > Given a large field like > > cat input.json > { "col0": > "lajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasd
[jira] [Commented] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808132#comment-16808132 ] Bridget Bevens commented on DRILL-540: -- Hi [~IhorHuzenko] and [~vitalii], Aside from removing the note stating hive views are not supported, do I need to add any other information to the docs, for example, shouold I aslo include the warning? Warning: Because views in Hive aren't present as physical files and access can't be granted using file system commands, then access to Hive views for Storage Based Authorization is based on the underlying tables used in view definition. For current example views were defined as selection over appropriate tables. Thanks, Bridget > Allow querying hive views in Drill > -- > > Key: DRILL-540 > URL: https://issues.apache.org/jira/browse/DRILL-540 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive >Reporter: Ramana Inukonda Nagaraj >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > Currently hive views cannot be queried from drill. > This Jira aims to add support for Hive views in Drill. > *Implementation details:* > # Drill persists it's views metadata in file with suffix .view.drill using > json format. For example: > {noformat} > { > "name" : "view_from_calcite_1_4", > "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", > "fields" : [ { > "name" : "*", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "tmp" ] > } > {noformat} > Later Drill parses the metadata and uses it to treat view names in SQL as a > subquery. > 2. In Apache Hive metadata about views is stored in similar way to > tables. Below is example from metastore.TBLS : > > {noformat} > TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID > |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | > ---||--|-|--|--|--|--|--|---| > 2 |1542111078 |1 |0|mapr |0 |2 |cview >|VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | > {noformat} > 3. So in Hive metastore views are considered as tables of special type. > And main benefit is that we also have expanded SQL definition of views (just > like in view.drill files). Also reading of the metadata is already > implemented in Drill with help of thrift Metastore API. > 4. To enable querying of Hive views we'll reuse existing code for Drill > views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for > _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which > is actually model for data persisted in .view.drill files_) and then based on > this instance return new _*DrillViewTable*_. Using this approach drill will > handle hive views the same way as if it was initially defined in Drill and > persisted in .view.drill file. > 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ > we'll reuse existing code from _*DrillHiveTable*_, so the conversion > functionality will be extracted and used for both (table and view) fields > type conversions. > *Security implications* > Consider simple example case where we have users, > {code:java} > user0 user1 user2 >\ / > group12 > {code} > and sample db where object names contains user or group who should access > them > {code:java} > db_all > tbl_user0 > vw_user0 > tbl_group12 > vw_group12 > {code} > There are two Hive authorization modes supported by Drill - SQL Standart and > Strorage Based authorization. For SQL Standart authorization permissions > were granted using SQL: > {code:java} > SET ROLE admin; > GRANT SELECT ON db_all.tbl_user0 TO USER user0; > GRANT SELECT ON db_all.vw_user0 TO USER user0; > CREATE ROLE group12; > GRANT ROLE group12 TO USER user1; > GRANT ROLE group12 TO USER user2; > GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; > GRANT SELECT ON db_all.vw_group12 TO ROLE group12; > {code} > And for Storage based authorization permissions were granted using commands: > {code:java} > hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 > hadoop fs -chown user1:group12 > /user/hive/warehouse/db_all.db/tbl_group12{code} > Then the following table shows us results of queries for both authorization > models. > *SQL > Standart Storage Ba
[jira] [Updated] (DRILL-7146) Query failing with NPE when ZK queue is enabled
[ https://issues.apache.org/jira/browse/DRILL-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7146: - Labels: ready-to-commit (was: ) > Query failing with NPE when ZK queue is enabled > --- > > Key: DRILL-7146 > URL: https://issues.apache.org/jira/browse/DRILL-7146 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Sorabh Hamirwasia >Assignee: Hanumath Rao Maduri >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > > {code:java} > >> Query: alter system reset all; > SYSTEM ERROR: NullPointerException > Please, refer to logs for more information. > [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010] > java.sql.SQLException: SYSTEM ERROR: NullPointerException > Please, refer to logs for more information. > [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:535) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:607) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58) > at > oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1107) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1118) > at > oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217) > at org.apache.drill.test.framework.Utils.execSQL(Utils.java:917) > at org.apache.drill.test.framework.TestDriver.setup(TestDriver.java:632) > at org.apache.drill.test.framework.TestDriver.runTests(TestDriver.java:152) > at org.apache.drill.test.framework.TestDriver.main(TestDriver.java:94) > Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: > SYSTEM ERROR: NullPointerException > Please, refer to logs for more information. > [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010] > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) > at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312) > at > oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.ja
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808053#comment-16808053 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on issue #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#issuecomment-479152740 @vvysotskyi , @ihuzenko I've done the changes and verified the tests. If everything is fine, I'll rebase on the latest master (there are small conflicts due to new commits on master introducing additional system options) I've also included a trim for the values, so an input of `100 ` will be treated as valid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808051#comment-16808051 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on issue #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#issuecomment-479152740 @vvysotskyi , @ihuzenko I've done the changes and verified the tests. If everything is fine, I'll rebase on the latest master (there are small conflicts due to new commits on master introducing additional system options) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808047#comment-16808047 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r271452501 ## File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java ## @@ -462,4 +618,25 @@ public void testParamSettingWhenUnsupportedTypeSaysUnsupported() throws SQLExcep } } + + // Sets the SystemMaxRows option + private void setSystemMaxRows(int sysValueToSet) throws SQLException { Review comment: As per our chat, I've introduced `@Before` and `@After` methods for synchronizing the `system` level modifications to the options. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-7060) Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in JsonReader
[ https://issues.apache.org/jira/browse/DRILL-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Girish closed DRILL-7060. -- > Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in > JsonReader > - > > Key: DRILL-7060 > URL: https://issues.apache.org/jira/browse/DRILL-7060 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.15.0, 1.16.0 >Reporter: Abhishek Girish >Assignee: Abhishek Girish >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > Some JSON files may have strings with backslashes - which are read as escape > characters. By default only standard escape characters are allowed. So > querying such files would fail. For example see: > Data > {code} > {"file":"C:\Sfiles\escape.json"} > {code} > Error > {code} > (com.fasterxml.jackson.core.JsonParseException) Unrecognized character escape > 'S' (code 83) > at [Source: (org.apache.drill.exec.store.dfs.DrillFSDataInputStream); line: > 1, column: 178] > com.fasterxml.jackson.core.JsonParser._constructError():1804 > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError():663 > > com.fasterxml.jackson.core.base.ParserMinimalBase._handleUnrecognizedCharacterEscape():640 > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeEscaped():3243 > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipString():2537 > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken():683 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():342 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():298 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():246 > org.apache.drill.exec.vector.complex.fn.JsonReader.write():205 > org.apache.drill.exec.store.easy.json.JSONRecordReader.next():216 > org.apache.drill.exec.physical.impl.ScanBatch.internalNext():223 > ... > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7060) Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in JsonReader
[ https://issues.apache.org/jira/browse/DRILL-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808009#comment-16808009 ] Abhishek Girish commented on DRILL-7060: [~kkhatua], I don't think any additional documentation is necessary as such. I think the option description is clear. When someone really needs this, they'll be able to find it. It's not required in most scenarios. > Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in > JsonReader > - > > Key: DRILL-7060 > URL: https://issues.apache.org/jira/browse/DRILL-7060 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.15.0, 1.16.0 >Reporter: Abhishek Girish >Assignee: Abhishek Girish >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > Some JSON files may have strings with backslashes - which are read as escape > characters. By default only standard escape characters are allowed. So > querying such files would fail. For example see: > Data > {code} > {"file":"C:\Sfiles\escape.json"} > {code} > Error > {code} > (com.fasterxml.jackson.core.JsonParseException) Unrecognized character escape > 'S' (code 83) > at [Source: (org.apache.drill.exec.store.dfs.DrillFSDataInputStream); line: > 1, column: 178] > com.fasterxml.jackson.core.JsonParser._constructError():1804 > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError():663 > > com.fasterxml.jackson.core.base.ParserMinimalBase._handleUnrecognizedCharacterEscape():640 > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeEscaped():3243 > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipString():2537 > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken():683 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():342 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():298 > org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():246 > org.apache.drill.exec.vector.complex.fn.JsonReader.write():205 > org.apache.drill.exec.store.easy.json.JSONRecordReader.next():216 > org.apache.drill.exec.physical.impl.ScanBatch.internalNext():223 > ... > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7146) Query failing with NPE when ZK queue is enabled
[ https://issues.apache.org/jira/browse/DRILL-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807996#comment-16807996 ] ASF GitHub Bot commented on DRILL-7146: --- HanumathRao commented on pull request #1725: DRILL-7146: Query failing with NPE when ZK queue is enabled. URL: https://github.com/apache/drill/pull/1725#discussion_r271437936 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/planner/rm/TestMemoryCalculator.java ## @@ -59,6 +59,7 @@ private static final long DEFAULT_SLICE_TARGET = 10L; private static final long DEFAULT_BATCH_SIZE = 16*1024*1024; + private static final String ENABLE_QUEUE = "drill.exec.queue.embedded.enable"; Review comment: I have updated the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Query failing with NPE when ZK queue is enabled > --- > > Key: DRILL-7146 > URL: https://issues.apache.org/jira/browse/DRILL-7146 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Sorabh Hamirwasia >Assignee: Hanumath Rao Maduri >Priority: Major > Fix For: 1.16.0 > > > > {code:java} > >> Query: alter system reset all; > SYSTEM ERROR: NullPointerException > Please, refer to logs for more information. > [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010] > java.sql.SQLException: SYSTEM ERROR: NullPointerException > Please, refer to logs for more information. > [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:535) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:607) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58) > at > oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1107) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1118) > at > oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217) > at org.apache.drill.test.framework.Utils.execSQL(Utils.java:917) > at org.apache.drill.test.framework.TestDriver.setup(TestDriver.java:632) > at org.apache.drill.test.framework.TestDriver.runTests(TestDriver.java:152) > at org.apache.drill.test.framework.TestDriver.main(TestDriver.java:94) > Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: > SYSTEM ERROR: NullPointerException > Please, refer to logs for more information. > [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010] > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) > at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.
[jira] [Commented] (DRILL-6558) Drill query fails when file name contains semicolon
[ https://issues.apache.org/jira/browse/DRILL-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807973#comment-16807973 ] Vitalii Diravka commented on DRILL-6558: The original issue is present for Drill with Hadoop 3.2 libs: {code:java} Apache Drill 1.16.0-SNAPSHOT "What ever the mind of man can conceive and believe, Drill can query." apache drill> select * from dfs.`/tmp/af:3`; Error: VALIDATION ERROR: java.net.URISyntaxException: Relative path in absolute URI: af:3 [Error Id: a6687d43-24f4-460b-8a39-ea05c1fb2f3f on vitalii-UX331UN:31010] (state=,code=0) {code} But I didn't reproduce the second case: {code:java} apache drill> select * from dfs.`/tmp/af:3`; Error: VALIDATION ERROR: java.net.URISyntaxException: Relative path in absolute URI: af:3 [Error Id: a6687d43-24f4-460b-8a39-ea05c1fb2f3f on vitalii-UX331UN:31010] (state=,code=0) apache drill> use dfs.tmp; +--+-+ | ok | summary | +--+-+ | true | Default schema changed to [dfs.tmp] | +--+-+ 1 row selected (0.24 seconds) apache drill (dfs.tmp)> select * from sys.version; +-+--+-++++ | version | commit_id | commit_message | commit_time | build_email | build_time | +-+--+-++++ | 1.16.0-SNAPSHOT | a070d0b592b3f77411864c04d9c4025e0d1cf888 | Fix test failures. Update HBase version | 02.04.2019 @ 13:26:18 EEST | vita...@apache.org | 02.04.2019 @ 20:37:18 EEST | +-+--+-++++ 1 row selected (1.295 seconds) {code} > Drill query fails when file name contains semicolon > --- > > Key: DRILL-6558 > URL: https://issues.apache.org/jira/browse/DRILL-6558 > Project: Apache Drill > Issue Type: Bug >Reporter: Volodymyr Vysotskyi >Priority: Major > > Queries on the tables which contain semicolon in the name: > {code:sql} > select * from dfs.`/tmp/q:0` > {code} > fails with error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: > java.net.URISyntaxException: Relative path in absolute URI: q:0 > SQL Query null > [Error Id: 34fafee1-8fbe-4fe0-9fcb-ddcc926bb192 on user515050-pc:31010] > (java.lang.IllegalArgumentException) java.net.URISyntaxException: Relative > path in absolute URI: q:0 > org.apache.hadoop.fs.Path.initialize():205 > org.apache.hadoop.fs.Path.():171 > org.apache.hadoop.fs.Path.():93 > org.apache.hadoop.fs.Globber.glob():253 > org.apache.hadoop.fs.FileSystem.globStatus():1655 > org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():547 > org.apache.drill.exec.store.dfs.FileSelection.create():274 > > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():607 > > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():408 > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry():96 > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get():90 > > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable():561 > > org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.getTable():132 > org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():82 > org.apache.calcite.jdbc.CalciteSchema.getTable():257 > org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom():1022 > org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry():979 > org.apache.calcite.prepare.CalciteCatalogReader.getTable():123 > > org.apache.drill.exec.planner.sql.SqlConverter$DrillCalciteCatalogReader.getTable():650 > > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():260 > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3219 > org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60 > org.apache.calcite.sql.validate.AbstractNamespace.validate():84 > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947 > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928 > org.apache.calcite.sql.SqlSelect.validate():226 > > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903 > org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613 > org.apache.drill.exec.planner.sql.SqlConverter.validate():190 > > org.apache.drill.exec.plann
[jira] [Updated] (DRILL-6097) Create an interface for the QueryContext
[ https://issues.apache.org/jira/browse/DRILL-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6097: Environment: (was: Currently the QueryContext does not implement an interface and the concrete class is passed around everywhere. Additionally Mockito is used in tests to mock it. Ideally we would make the QueryContext implement an interface and create a mock implementation of it that is used in the tests, just like what we did for the FragmentContext.) > Create an interface for the QueryContext > > > Key: DRILL-6097 > URL: https://issues.apache.org/jira/browse/DRILL-6097 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the QueryContext does not implement an interface and the concrete > class is passed around everywhere. Additionally Mockito is used in tests to > mock it. Ideally we would make the QueryContext implement an interface and > create a mock implementation of it that is used in the tests, just like what > we did for the FragmentContext. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6097) Create an interface for the QueryContext
[ https://issues.apache.org/jira/browse/DRILL-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6097: Description: Currently the QueryContext does not implement an interface and the concrete class is passed around everywhere. Additionally Mockito is used in tests to mock it. Ideally we would make the QueryContext implement an interface and create a mock implementation of it that is used in the tests, just like what we did for the FragmentContext. > Create an interface for the QueryContext > > > Key: DRILL-6097 > URL: https://issues.apache.org/jira/browse/DRILL-6097 > Project: Apache Drill > Issue Type: Improvement > Environment: Currently the QueryContext does not implement an > interface and the concrete class is passed around everywhere. Additionally > Mockito is used in tests to mock it. Ideally we would make the QueryContext > implement an interface and create a mock implementation of it that is used in > the tests, just like what we did for the FragmentContext. >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the QueryContext does not implement an interface and the concrete > class is passed around everywhere. Additionally Mockito is used in tests to > mock it. Ideally we would make the QueryContext implement an interface and > create a mock implementation of it that is used in the tests, just like what > we did for the FragmentContext. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6377) typeof() does not return DECIMAL scale, precision
[ https://issues.apache.org/jira/browse/DRILL-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-6377. - Resolution: Fixed Fix Version/s: 1.16.0 > typeof() does not return DECIMAL scale, precision > - > > Key: DRILL-6377 > URL: https://issues.apache.org/jira/browse/DRILL-6377 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Priority: Minor > Fix For: 1.16.0 > > > The {{typeof()}} function returns the type of a column: > {noformat} > SELECT typeof(CAST(a AS DOUBLE)) FROM (VALUES (1)) AS T(a); > +-+ > | EXPR$0 | > +-+ > | FLOAT8 | > +-+ > {noformat} > In Drill, the {{DECIMAL}} type is parameterized with scale and precision. > However, {{typeof()}} does not return this information: > {noformat} > ALTER SESSION SET `planner.enable_decimal_data_type` = true; > SELECT typeof(CAST(a AS DECIMAL)) FROM (VALUES (1)) AS T(a); > +--+ > | EXPR$0 | > +--+ > | DECIMAL38SPARSE | > +--+ > SELECT typeof(CAST(a AS DECIMAL(6, 3))) FROM (VALUES (1)) AS T(a); > +---+ > | EXPR$0 | > +---+ > | DECIMAL9 | > +---+ > {noformat} > Expected something of the form {{DECIMAL(, )}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6377) typeof() does not return DECIMAL scale, precision
[ https://issues.apache.org/jira/browse/DRILL-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807929#comment-16807929 ] Arina Ielchiieva commented on DRILL-6377: - For decimals SqlTypeOf function covers precision and scale: {noformat} apache drill> select sqltypeof(cast(10.56 as decimal(5,2))) from (values(1)); +---+ |EXPR$0 | +---+ | DECIMAL(5, 2) | {noformat} For interval looks like issue was fixed: {noformat} apache drill> select sqltypeof(INTERVAL '1' YEAR) from (values(1)); ++ | EXPR$0 | ++ | INTERVAL YEAR TO MONTH | ++ 1 row selected (0.109 seconds) apache drill> select typeof(INTERVAL '1' YEAR) from (values(1)); +--+ |EXPR$0| +--+ | INTERVALYEAR | {noformat} > typeof() does not return DECIMAL scale, precision > - > > Key: DRILL-6377 > URL: https://issues.apache.org/jira/browse/DRILL-6377 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Priority: Minor > > The {{typeof()}} function returns the type of a column: > {noformat} > SELECT typeof(CAST(a AS DOUBLE)) FROM (VALUES (1)) AS T(a); > +-+ > | EXPR$0 | > +-+ > | FLOAT8 | > +-+ > {noformat} > In Drill, the {{DECIMAL}} type is parameterized with scale and precision. > However, {{typeof()}} does not return this information: > {noformat} > ALTER SESSION SET `planner.enable_decimal_data_type` = true; > SELECT typeof(CAST(a AS DECIMAL)) FROM (VALUES (1)) AS T(a); > +--+ > | EXPR$0 | > +--+ > | DECIMAL38SPARSE | > +--+ > SELECT typeof(CAST(a AS DECIMAL(6, 3))) FROM (VALUES (1)) AS T(a); > +---+ > | EXPR$0 | > +---+ > | DECIMAL9 | > +---+ > {noformat} > Expected something of the form {{DECIMAL(, )}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-4211) Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned.
[ https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi reassigned DRILL-4211: -- Assignee: Volodymyr Vysotskyi (was: Timothy Farkas) > Column aliases not pushed down to JDBC stores in some cases when Drill > expects aliased columns to be returned. > -- > > Key: DRILL-4211 > URL: https://issues.apache.org/jira/browse/DRILL-4211 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - JDBC >Affects Versions: 1.3.0, 1.11.0 > Environment: Postgres db storage >Reporter: Robert Hamilton-Smith >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: newbie > Fix For: 1.16.0 > > > When making an sql statement that incorporates a join to a table and then a > self join to that table to get a parent value , Drill brings back > inconsistent results. > Here is the sql in postgres with correct output: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from transactions trx > join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join categories w1 on (cat.categoryparentguid = w1.categoryguid) > where cat.categoryparentguid IS NOT NULL; > {code} > Output: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|food&Dining| > |id1|restaurants|food&Dining| > |id2|Coffee Shops|food&Dining| > |id2|Coffee Shops|food&Dining| > When run in Drill with correct storage prefix: > {code:sql} > select trx.categoryguid, > cat.categoryname, w1.categoryname as parentcat > from db.schema.transactions trx > join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID) > join db.schema.wpfm_categories w1 on (cat.categoryparentguid = > w1.categoryguid) > where cat.categoryparentguid IS NOT NULL > {code} > Results are: > ||categoryid||categoryname||parentcategory|| > |id1|restaurants|null| > |id1|restaurants|null| > |id2|Coffee Shops|null| > |id2|Coffee Shops|null| > Physical plan is: > {code:sql} > 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) > categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = > {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293 > 00-01 Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : > rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292 > 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) > : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, > VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291 > 00-03 Jdbc(sql=[SELECT * > FROM "public"."transactions" > INNER JOIN (SELECT * > FROM "public"."categories" > WHERE "categoryparentguid" IS NOT NULL) AS "t" ON > "transactions"."categoryguid" = "t"."categoryguid" > INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" > = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) > transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) > transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) > transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, > VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, > VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) > transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) > transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) > transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) > transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, > VARCHAR(50) transactionorigpartyguid, VARCHAR(255) > transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) > transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, > 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) > transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) > transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) > transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, > VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, > TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) > transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, > VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) > categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807920#comment-16807920 ] ASF GitHub Bot commented on DRILL-7115: --- ihuzenko commented on issue #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#issuecomment-479094218 @vdiravka , I've addressed comments. I totally agree with you that refactoring is better to put into separate commits and I'll use this approach in future. For show tables authorization improvement was created [DRILL-7151](https://issues.apache.org/jira/browse/DRILL-7151) ticket. For caches was changed type of ```tableNamesCache``` to ```LoadingCache> ```, previously only names were cached here, also all work with Guava caches was unified under ```HiveMetadataCache``` facade. For Drill Hive SASL (Kerberos) connection I didn't introduce changes, related code from ```DrillHiveMetaStoreClientFactory``` was previously in ```DrillHiveMetaStoreClient```. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7150: --- Reviewer: Aman Sinha > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1. Create the table in MaprDB shell: > {noformat} > create /tmp/testtimestamp > insert /tmp/testtimestamp --value > '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' > {noformat} > 2. Create an external hive table: > {code:sql} > CREATE EXTERNAL TABLE default.timeTest > (`_id` string, > `str` string, > `ts` timestamp) > ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') > {code} > 3. Enable native reader and timezone conversion for MaprDB timestamp: > {code:sql} > alter session set > `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; > alter session set > `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; > {code} > 4. Run the query on the table from Drill using hive plugin: > {code:java} > 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; > +--+--+--+ > | _id | str|ts| > +--+--+--+ > | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | > | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | > | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | > | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | > +--+--+--+ > 4 rows selected (0.343 seconds) > {code} > Please note that timestamps for {{eot}} and {{pst}} values are incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807910#comment-16807910 ] ASF GitHub Bot commented on DRILL-7150: --- vvysotskyi commented on issue #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST URL: https://github.com/apache/drill/pull/1729#issuecomment-479083498 @amansinha100, since you have reviewed the original PR, could you please review this one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1. Create the table in MaprDB shell: > {noformat} > create /tmp/testtimestamp > insert /tmp/testtimestamp --value > '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' > {noformat} > 2. Create an external hive table: > {code:sql} > CREATE EXTERNAL TABLE default.timeTest > (`_id` string, > `str` string, > `ts` timestamp) > ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') > {code} > 3. Enable native reader and timezone conversion for MaprDB timestamp: > {code:sql} > alter session set > `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; > alter session set > `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; > {code} > 4. Run the query on the table from Drill using hive plugin: > {code:java} > 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; > +--+--+--+ > | _id | str|ts| > +--+--+--+ > | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | > | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | > | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | > | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | > +--+--+--+ > 4 rows selected (0.343 seconds) > {code} > Please note that timestamps for {{eot}} and {{pst}} values are incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807909#comment-16807909 ] ASF GitHub Bot commented on DRILL-7150: --- vvysotskyi commented on pull request #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST URL: https://github.com/apache/drill/pull/1729 Used JDK classes to convert timestamp from one timezone to another one instead of adding milliseconds which corresponds to the offset. For problem description please see [DRILL-7151](https://issues.apache.org/jira/browse/DRILL-7151). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1. Create the table in MaprDB shell: > {noformat} > create /tmp/testtimestamp > insert /tmp/testtimestamp --value > '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' > insert /tmp/testtimestamp --value > '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' > {noformat} > 2. Create an external hive table: > {code:sql} > CREATE EXTERNAL TABLE default.timeTest > (`_id` string, > `str` string, > `ts` timestamp) > ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') > {code} > 3. Enable native reader and timezone conversion for MaprDB timestamp: > {code:sql} > alter session set > `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; > alter session set > `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; > {code} > 4. Run the query on the table from Drill using hive plugin: > {code:java} > 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; > +--+--+--+ > | _id | str|ts| > +--+--+--+ > | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | > | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | > | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | > | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | > +--+--+--+ > 4 rows selected (0.343 seconds) > {code} > Please note that timestamps for {{eot}} and {{pst}} values are incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7147) Source order of "drill-env.sh" and "distrib-env.sh" should be swapped
[ https://issues.apache.org/jira/browse/DRILL-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807905#comment-16807905 ] Abhishek Girish commented on DRILL-7147: [~Paul.Rogers], you are right. With the "simple" way, there is no issue with {{drill-env.sh}} & &{{distrib-env.sh}}. Like you said, setting the simple way in {{drill-env.sh}} could cause issues if corresponding ENV variables are set. And I think that's something we could document instead of finding a fix. > Source order of "drill-env.sh" and "distrib-env.sh" should be swapped > - > > Key: DRILL-7147 > URL: https://issues.apache.org/jira/browse/DRILL-7147 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.15.0 >Reporter: Hao Zhu >Assignee: Abhishek Girish >Priority: Minor > Fix For: 1.16.0 > > > In bin/drill-config.sh, the description of the source order is: > {code:java} > # Variables may be set in one of four places: > # > # Environment (per run) > # drill-env.sh (per site) > # distrib-env.sh (per distribution) > # drill-config.sh (this file, Drill defaults) > # > # Properties "inherit" from items lower on the list, and may be "overridden" > by items > # higher on the list. In the environment, just set the variable: > {code} > However actually bin/drill-config.sh sources drill-env.sh firstly, and then > distrib-env.sh. > {code:java} > drillEnv="$DRILL_CONF_DIR/drill-env.sh" > if [ -r "$drillEnv" ]; then > . "$drillEnv" > fi > ... > distribEnv="$DRILL_CONF_DIR/distrib-env.sh" > if [ -r "$distribEnv" ]; then > . "$distribEnv" > else > distribEnv="$DRILL_HOME/conf/distrib-env.sh" > if [ -r "$distribEnv" ]; then > . "$distribEnv" > fi > fi > {code} > We need to swap the source order of drill-env.sh and distrib-env.sh. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.2 libraries
[ https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6540: --- Description: Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, hadoop-yarn-client). A year ago the [Hadoop 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and recently it was updated to [Hadoop 3.2.0|https://hadoop.apache.org/docs/r3.2.0/]. To use Drill under Hadoop3.0 distribution we need this upgrade. Also the newer version includes new features, which can be useful for Drill. This upgrade is also needed to leverage the newest version of Zookeeper libraries and Hive 3.1 version. was: Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, hadoop-yarn-client). Half of year ago the [Hadoop 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and recently it was an update - [Hadoop 3.2.0|https://hadoop.apache.org/docs/r3.2.0/]. To use Drill under Hadoop3.0 distribution we need this upgrade. Also the newer version includes new features, which can be useful for Drill. This upgrade is also needed to leverage the newest version of Zookeeper libraries and Hive 3.1 version. > Upgrade to HADOOP-3.2 libraries > > > Key: DRILL-6540 > URL: https://issues.apache.org/jira/browse/DRILL-6540 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.17.0 > > > Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, > hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, > hadoop-yarn-client). > A year ago the [Hadoop 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] > was released and recently it was updated to [Hadoop > 3.2.0|https://hadoop.apache.org/docs/r3.2.0/]. > To use Drill under Hadoop3.0 distribution we need this upgrade. Also the > newer version includes new features, which can be useful for Drill. > This upgrade is also needed to leverage the newest version of Zookeeper > libraries and Hive 3.1 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.2 libraries
[ https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6540: --- Summary: Upgrade to HADOOP-3.2 libraries (was: Upgrade to HADOOP-3.1 libraries ) > Upgrade to HADOOP-3.2 libraries > > > Key: DRILL-6540 > URL: https://issues.apache.org/jira/browse/DRILL-6540 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.17.0 > > > Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, > hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, > hadoop-yarn-client). > Half of year ago the [Hadoop > 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and > recently it was an update - [Hadoop > 3.2.0|https://hadoop.apache.org/docs/r3.2.0/]. > To use Drill under Hadoop3.0 distribution we need this upgrade. Also the > newer version includes new features, which can be useful for Drill. > This upgrade is also needed to leverage the newest version of Zookeeper > libraries and Hive 3.1 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7151) Show only accessible tables when Hive authorization enabled
Igor Guzenko created DRILL-7151: --- Summary: Show only accessible tables when Hive authorization enabled Key: DRILL-7151 URL: https://issues.apache.org/jira/browse/DRILL-7151 Project: Apache Drill Issue Type: Improvement Reporter: Igor Guzenko Assignee: Igor Guzenko The SHOW TABLES for Hive worked inconsistently for very long time. Before changes introduced by DRILL-7115 only accessible tables were shown only when Hive Storage Based Authorization is enabled, but for SQL Standard Based Authorization all tables were shown to user ([related discussion|https://github.com/apache/drill/pull/461#discussion_r58753354]). In scope of DRILL-7115 the only accessible restriction for Storage Based Authorization was weakened in order to improve query performance. There is still need to improve security of Hive show tables query and at the same time do not violate performance requirements. For SQL Standard Based Authorization this can be done by asking ```HiveAuthorizationHelper.authorizerV2``` for table's 'SELECT' permission. For Storage Based Authorization performance acceptable approach is not known for now, one of ideas is try using appropriate Hive storage based authorizer class for the purpose. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7150: --- Description: Steps to reproduce: 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} 1. Create the table in MaprDB shell: {noformat} create /tmp/testtimestamp insert /tmp/testtimestamp --value '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' {noformat} 2. Create an external hive table: {code:sql} CREATE EXTERNAL TABLE default.timeTest (`_id` string, `str` string, `ts` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') {code} 3. Enable native reader and timezone conversion for MaprDB timestamp: {code:sql} alter session set `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; alter session set `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; {code} 4. Run the query on the table from Drill using hive plugin: {code:java} 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; +--+--+--+ | _id | str|ts| +--+--+--+ | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | +--+--+--+ 4 rows selected (0.343 seconds) {code} Please note that timestamps for {{eot}} and {{pst}} values are incorrect. was: Steps to reproduce: 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} 1. Create the table in MaprDB shell: {noformat} create /tmp/testtimestamp insert /tmp/testtimestamp --value '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' {noformat} 2. Create an external hive table: {code:sql} CREATE EXTERNAL TABLE default.timeTest (`_id` string, `str` string, `ts` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') {code} 3. Enable native reader and timezone conversion for MaprDB timestamp: {code:sql} alter session set `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; alter session set `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; {code} 4. Run the query on the table from Drill using hive plugin: {code:java} 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; +--+--+--+ | _id | str|ts| +--+--+--+ | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | +--+--+--+ 4 rows selected (0.343 seconds) {code} Please note that timestamps for {{eot}} and {{pst}} values are wrong. > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1.
[jira] [Updated] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
[ https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7150: --- Description: Steps to reproduce: 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} 1. Create the table in MaprDB shell: {noformat} create /tmp/testtimestamp insert /tmp/testtimestamp --value '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' {noformat} 2. Create an external hive table: {code:sql} CREATE EXTERNAL TABLE default.timeTest (`_id` string, `str` string, `ts` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') {code} 3. Enable native reader and timezone conversion for MaprDB timestamp: {code:sql} alter session set `store.hive.maprdb_json.optimize_scan_with_native_reader`=true; alter session set `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true; {code} 4. Run the query on the table from Drill using hive plugin: {code:java} 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; +--+--+--+ | _id | str|ts| +--+--+--+ | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | +--+--+--+ 4 rows selected (0.343 seconds) {code} Please note that timestamps for {{eot}} and {{pst}} values are wrong. was: Steps to reproduce: 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} 1. Create the table in MaprDB shell: {noformat} create /tmp/testtimestamp insert /tmp/testtimestamp --value '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' {noformat} 2. Create a hive table: {code:sql} CREATE EXTERNAL TABLE default.timeTest (`_id` string, `str` string, `ts` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') {code} 3. Enable native reader and timezone conversion for maprdb timestamp: {code:sql} alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true; alter session store.hive.maprdb_json.read_timestamp_with_timezone_offset=true; {code} 4. Run the query on the table from Drill using hive plugin: {code} 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; +--+--+--+ | _id | str|ts| +--+--+--+ | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | +--+--+--+ 4 rows selected (0.343 seconds) {code} Plese note that the results for {{eot}} and {{pst}} values are wrong. > Fix timezone conversion for timestamp from maprdb after the transition from > PDT to PST > -- > > Key: DRILL-7150 > URL: https://issues.apache.org/jira/browse/DRILL-7150 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > Steps to reproduce: > 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} > 1. Create the table in MaprDB
[jira] [Created] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
Volodymyr Vysotskyi created DRILL-7150: -- Summary: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST Key: DRILL-7150 URL: https://issues.apache.org/jira/browse/DRILL-7150 Project: Apache Drill Issue Type: Bug Components: Storage - MapRDB Affects Versions: 1.16.0 Reporter: Volodymyr Vysotskyi Assignee: Volodymyr Vysotskyi Fix For: 1.16.0 Steps to reproduce: 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}} 1. Create the table in MaprDB shell: {noformat} create /tmp/testtimestamp insert /tmp/testtimestamp --value '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}' insert /tmp/testtimestamp --value '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}' {noformat} 2. Create a hive table: {code:sql} CREATE EXTERNAL TABLE default.timeTest (`_id` string, `str` string, `ts` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe' STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest') {code} 3. Enable native reader and timezone conversion for maprdb timestamp: {code:sql} alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true; alter session store.hive.maprdb_json.read_timestamp_with_timezone_offset=true; {code} 4. Run the query on the table from Drill using hive plugin: {code} 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest; +--+--+--+ | _id | str|ts| +--+--+--+ | eot | -01-01T23:59:59.999 | -01-02 00:59:59.999 | | pdt | 2019-04-01T23:59:59.999 | 2019-04-01 23:59:59.999 | | pst | 2019-01-01T23:59:59.999 | 2019-01-02 00:59:59.999 | | unk | 2017-07-08T20:01:49.885 | 2017-07-08 20:01:49.885 | +--+--+--+ 4 rows selected (0.343 seconds) {code} Plese note that the results for {{eot}} and {{pst}} values are wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807815#comment-16807815 ] ASF GitHub Bot commented on DRILL-7115: --- ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271339849 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ## @@ -920,46 +920,11 @@ public void dropTable(String table) { } @Override -public List> getTableNamesAndTypes(boolean bulkLoad, int bulkSize) { - final List> tableNamesAndTypes = Lists.newArrayList(); - - // Look for raw tables first - if (!tables.isEmpty()) { -for (Map.Entry tableEntry : tables.entrySet()) { - tableNamesAndTypes - .add(Pair.of(tableEntry.getKey().sig.name, tableEntry.getValue().getJdbcTableType())); -} - } - // Then look for files that start with this name and end in .drill. - List files = Collections.emptyList(); - try { -files = DotDrillUtil.getDotDrills(getFS(), new Path(config.getLocation()), DotDrillType.VIEW); - } catch (AccessControlException e) { -if (!schemaConfig.getIgnoreAuthErrors()) { - logger.debug(e.getMessage()); - throw UserException.permissionError(e) - .message("Not authorized to list or query tables in schema [%s]", getFullSchemaName()) - .build(logger); -} - } catch (IOException e) { -logger.warn("Failure while trying to list view tables in workspace [{}]", getFullSchemaName(), e); - } catch (UnsupportedOperationException e) { -// the file system (e.g. the classpath filesystem) may not support listing -// of files. But see getViews(), it ignores the exception and continues -logger.debug("Failure while trying to list view tables in workspace [{}]", getFullSchemaName(), e); - } - - try { -for (DotDrillFile f : files) { - if (f.getType() == DotDrillType.VIEW) { -tableNamesAndTypes.add(Pair.of(f.getBaseName(), TableType.VIEW)); - } -} - } catch (UnsupportedOperationException e) { -logger.debug("The filesystem for this workspace does not support this operation.", e); Review comment: This deleted code mostly duplicated body of existing ```getViews()``` method. This logging statement also present in the method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7149) Kerberos Code Missing from Drill on YARN
Charles Givre created DRILL-7149: Summary: Kerberos Code Missing from Drill on YARN Key: DRILL-7149 URL: https://issues.apache.org/jira/browse/DRILL-7149 Project: Apache Drill Issue Type: Bug Components: Security Affects Versions: 1.14.0 Reporter: Charles Givre My company is trying to deploy Drill using the Drill on Yarn (DoY) and we have run into the issue that DoY does not seem to support passing Kerberos credentials in order to interact with HDFS. Upon checking the source code available in GIT (https://github.com/apache/drill/blob/1.14.0/drill-yarn/src/main/java/org/apache/drill/yarn/core/) and referring to Apache YARN documentation (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html) , we saw no section for passing the security credentials needed by the application to interact with any Hadoop cluster services and applications. This we feel needs to be added to the source code so that delegation tokens can be passed inside the container for the process to be able access Drill archive on HDFS and start. It probably should be added to the ContainerLaunchContext within the ApplicationSubmissionContext for DoY as suggested under Apache documentation. We tried the same DoY utility on a non-kerberised cluster and the process started well. Although we ran into a different issue there of hosts getting blacklisted We tested with the Single Principal per cluster option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807808#comment-16807808 ] ASF GitHub Bot commented on DRILL-7115: --- ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271335079 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.List; +import java.util.stream.Collectors; + +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.exec.store.hive.ColumnListsCache; +import org.apache.drill.exec.store.hive.HiveReadEntry; +import org.apache.drill.exec.store.hive.HiveTableWithColumnCache; +import org.apache.drill.exec.store.hive.HiveTableWrapper; +import org.apache.drill.exec.store.hive.HiveUtilities; +import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.api.UnknownTableException; +import org.apache.thrift.TException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * CacheLoader that synchronized on client and tries to reconnect when + * client fails. Used by {@link HiveMetadataCache}. + */ +final class TableEntryCacheLoader extends CacheLoader { + + private static final Logger logger = LoggerFactory.getLogger(TableNameLoader.class); + + private final DrillHiveMetaStoreClient client; + + TableEntryCacheLoader(DrillHiveMetaStoreClient client) { +this.client = client; + } + + + @Override + @SuppressWarnings("NullableProblems") + public HiveReadEntry load(TableName key) throws Exception { +Table table; +List partitions; +synchronized (client) { + table = getTable(key); + partitions = getPartitions(key); +} +HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, new ColumnListsCache(table)); +List partitionWrappers = partitions.isEmpty() +? null Review comment: I've considered possibility to use empty lists and can conclude that doing this will break backward compatibility because ```HiveReadEntry``` is part of JSON serializable ```HiveScan``` operator, and deserializing empty lists for older drillbit which expect null list may break null dependent checks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807779#comment-16807779 ] ASF GitHub Bot commented on DRILL-7089: --- vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728#issuecomment-479008647 @amansinha100, could you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807778#comment-16807778 ] ASF GitHub Bot commented on DRILL-7089: --- vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728#issuecomment-479008544 Diagrams of the classes introduced in this PR: https://docs.google.com/presentation/d/1XG_xgR4okzXaJ3Z7HFHfzCwlM5VkNfre8GFEAd2Zo8k/edit?usp=sharing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807710#comment-16807710 ] ASF GitHub Bot commented on DRILL-7089: --- vvysotskyi commented on pull request #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728 In the scope of this PR introduced caching of table metadata (schema and statistics) at the query level. Introduced `MetadataProviderManager` which holds both `SchemaProvider` and `DrillStatsTable` and `TableMetadataProvider` if it was already created. `MetadataProviderManager` instance will be cached and used for every `DrillTable` which corresponds to the same table. Such an approach was used to preserve lazy initialization of group scan and `TableMetadataProvider` instances, so once the first instance of `TableMetadataProvider` is created, it will be stored in the `MetadataProviderManager` and its metadata will be reused for all further `TableMetadataProvider` instances. Another part of this PR is connected with the adoption of statistics to use Drill Metastore API. Enhanced logic to distinguish exact and estimated metadata, and used `TableMetadata` for obtaining statistics. Will create and attach a class diagram later. Also, tests should be run for this PR, so for now, I'll leave it in draft state. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807700#comment-16807700 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271269107 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/VectorPrinter.java ## @@ -33,7 +32,10 @@ public static void printOffsets(UInt4Vector vector, int start, int length) { header(vector, start, length); for (int i = start, j = 0; j < length; i++, j++) { - if (j > 0) { + if (j % 40 == 0) { Review comment: How this will look like after the change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807703#comment-16807703 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271270612 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/OffsetVectorWriterImpl.java ## @@ -302,4 +312,9 @@ public void dump(HierarchicalFormatter format) { .attribute("nextOffset", nextOffset) .endObject(); } + + @Override + public void setDefaultValue(Object value) { +throw new UnsupportedOperationException("Encoding not supported for offset vectors"); Review comment: Same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807697#comment-16807697 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271268522 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/ScalarReader.java ## @@ -86,4 +87,10 @@ LocalDate getDate(); LocalTime getTime(); Instant getTimestamp(); + + /** + * Return the value of the object using the extended type. + * @return Review comment: Please move add description to the return to avoid warnings in the IDE (just move upper line to `@return`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807699#comment-16807699 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271265638 ## File path: common/src/main/java/org/apache/drill/common/types/Types.java ## @@ -463,23 +462,29 @@ public static boolean usesHolderForGet(final MajorType type) { default: return true; } - } public static boolean isFixedWidthType(final MajorType type) { -switch(type.getMinorType()) { +return isFixedWidthType(type.getMinorType()); + } + + public static boolean isFixedWidthType(final MinorType type) { +return ! isVarWidthType(type); + } + + public static boolean isVarWidthType(final MinorType type) { +switch(type) { Review comment: ```suggestion switch (type) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807698#comment-16807698 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271270255 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/NullableScalarWriter.java ## @@ -278,4 +278,9 @@ public void dump(HierarchicalFormatter format) { baseWriter.dump(format); format.endObject(); } + + @Override + public void setDefaultValue(Object value) { +throw new UnsupportedOperationException("Default values not supported for nullable types"); Review comment: Maybe include `value` into the error message? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807701#comment-16807701 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271269796 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/AbstractFixedWidthWriter.java ## @@ -93,17 +112,62 @@ protected final int prepareWrite(int writeIndex) { @Override protected final void fillEmpties(final int writeIndex) { final int width = width(); - final int stride = ZERO_BUF.length / width; + final int stride = emptyValue.length / width; int dest = lastWriteIndex + 1; while (dest < writeIndex) { int length = writeIndex - dest; length = Math.min(length, stride); -drillBuf.setBytes(dest * width, ZERO_BUF, 0, length * width); +drillBuf.setBytes(dest * width, emptyValue, 0, length * width); dest += length; } } } + /** + * Base class for writers that use the Java int type as their native + * type. Handles common implicit conversions from other types to int. + */ + public static abstract class BaseIntWriter extends BaseFixedWidthWriter { + +@Override +public final void setLong(final long value) { + try { +// Catches int overflow. Does not catch overflow for smaller types. +setInt(Math.toIntExact(value)); + } catch (final ArithmeticException e) { +throw InvalidConversionError.writeError(schema(), value, e); + } +} + +@Override +public final void setDouble(final double value) { Review comment: Double covers Float as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807702#comment-16807702 ] ASF GitHub Bot commented on DRILL-7143: --- arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#discussion_r271268972 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/convert/AbstractWriteConverter.java ## @@ -68,6 +68,11 @@ public ColumnMetadata schema() { return baseWriter.schema(); } + @Override + public void setDefaultValue(Object value) { +throw new IllegalStateException("Cannot set a default value through a shim; types conflict."); Review comment: Should we include `value` in the error message? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7145: Comment: was deleted (was: aielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727#discussion_r271257957 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java ## @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, QueryWritableBatch result loader.clear(); } } catch (Exception e) { - exception = UserException.systemError(e).build(logger); + throw UserException.systemError(e).build(logger); Review comment: I don't think we should throw an exception here. We should stick to original approach and store it but just add method `getException()` in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And then do proper handling of both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org ) > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7145: Comment: was deleted (was: aielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727#discussion_r271257957 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java ## @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, QueryWritableBatch result loader.clear(); } } catch (Exception e) { - exception = UserException.systemError(e).build(logger); + throw UserException.systemError(e).build(logger); Review comment: I don't think we should throw an exception here. We should stick to original approach and store it but just add method `getException()` in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And then do proper handling of both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org ) > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7145: Reviewer: Arina Ielchiieva > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807680#comment-16807680 ] ASF GitHub Bot commented on DRILL-7145: --- arina-ielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727#discussion_r271258333 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java ## @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, QueryWritableBatch result loader.clear(); } } catch (Exception e) { - exception = UserException.systemError(e).build(logger); + throw UserException.systemError(e).build(logger); Review comment: I don't think we should throw an exception here. We should stick to original approach and store it but just add method getException() inAbstractDisposableUserClientConnection class, similar to getError(). And then do proper handling of both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807681#comment-16807681 ] ASF GitHub Bot commented on DRILL-7145: --- arina-ielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727#discussion_r271258333 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java ## @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, QueryWritableBatch result loader.clear(); } } catch (Exception e) { - exception = UserException.systemError(e).build(logger); + throw UserException.systemError(e).build(logger); Review comment: I don't think we should throw an exception here. We should stick to original approach and store it but just add method `getException()` in `AbstractDisposableUserClientConnection` class, similar to `getError()`. And then do proper handling of both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807678#comment-16807678 ] ASF GitHub Bot commented on DRILL-7145: --- aielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727#discussion_r271257957 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java ## @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, QueryWritableBatch result loader.clear(); } } catch (Exception e) { - exception = UserException.systemError(e).build(logger); + throw UserException.systemError(e).build(logger); Review comment: I don't think we should throw an exception here. We should stick to original approach and store it but just add method `getException()` in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And then do proper handling of both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807676#comment-16807676 ] ASF GitHub Bot commented on DRILL-7145: --- aielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727#discussion_r271257957 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java ## @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, QueryWritableBatch result loader.clear(); } } catch (Exception e) { - exception = UserException.systemError(e).build(logger); + throw UserException.systemError(e).build(logger); Review comment: I don't think we should throw an exception here. We should stick to original approach and store it but just add method `getException()` in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And then do proper handling of both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
[ https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807670#comment-16807670 ] ASF GitHub Bot commented on DRILL-7145: --- agozhiy commented on pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe… URL: https://github.com/apache/drill/pull/1727 …ctor are not being displayed at the Drill Web UI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Exceptions happened during retrieving values from ValueVector are not being > displayed at the Drill Web UI > - > > Key: DRILL-7145 > URL: https://issues.apache.org/jira/browse/DRILL-7145 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Major > Fix For: 1.16.0 > > > *Data:* > A text file with the following content: > {noformat} > Id,col1,col2 > 1,aaa,bbb > 2,ccc,ddd > 3,eee > 4,fff,ggg > {noformat} > Note that the record with id 3 has not value for the third column. > exec.storage.enable_v3_text_reader should be false. > *Submit the query from the Web UI:* > {code:sql} > select * from > table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true)) > {code} > *Expected result:* > Exception should happen due to DRILL-4814. It should be properly displayed. > *Actual result:* > Incorrect data is returned but without error. Query status: success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7087) Integrate Arrow's Gandiva into Drill
[ https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807667#comment-16807667 ] Arina Ielchiieva commented on DRILL-7087: - [~weijie] please start conversation about this on the mailing list, let's see what community thinks about having Arrow fork. Personally I am against of having an Arrow fork. > Integrate Arrow's Gandiva into Drill > > > Key: DRILL-7087 > URL: https://issues.apache.org/jira/browse/DRILL-7087 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Codegen, Execution - Relational Operators >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > It's a prior work to integrate arrow into drill by invoking the its gandiva > feature. Comparing arrow and drill 's in memory column representation , > there's different null representation internal now. Drill use 1 byte while > arrow using 1 bit to indicate one null row. Also all columns of arrow is > nullable now. Apart from those basic differences , they have same memory > representation to the different data types. > The integrating strategy is to invoke arrow's JniWrapper's native method > directly by passing the ValueVector's memory address. > I have done a implementation at our own Drill version by integrating gandiva > into Drill's project operator. The performance shows that there's nearly 1 > times performance gain at expression computation. > So if there's no objection , I will submit a related PR to contribute this > feature. Also this issue waits for arrow's related issue[ARROW-4819]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7087) Integrate Arrow's Gandiva into Drill
[ https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807656#comment-16807656 ] weijie.tong commented on DRILL-7087: What's your opinion about a self managed branch of arrow since the arrow ones seem don't agree with the ARROW-4819 ? > Integrate Arrow's Gandiva into Drill > > > Key: DRILL-7087 > URL: https://issues.apache.org/jira/browse/DRILL-7087 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Codegen, Execution - Relational Operators >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > It's a prior work to integrate arrow into drill by invoking the its gandiva > feature. Comparing arrow and drill 's in memory column representation , > there's different null representation internal now. Drill use 1 byte while > arrow using 1 bit to indicate one null row. Also all columns of arrow is > nullable now. Apart from those basic differences , they have same memory > representation to the different data types. > The integrating strategy is to invoke arrow's JniWrapper's native method > directly by passing the ValueVector's memory address. > I have done a implementation at our own Drill version by integrating gandiva > into Drill's project operator. The performance shows that there's nearly 1 > times performance gain at expression computation. > So if there's no objection , I will submit a related PR to contribute this > feature. Also this issue waits for arrow's related issue[ARROW-4819]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807612#comment-16807612 ] ASF GitHub Bot commented on DRILL-7115: --- ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271222800 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java ## @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function for(ExprNode arg : exprNode.args) { Result exprResult = evaluateHelper(recordValues, arg); - if (exprResult == Result.FALSE) { -return exprResult; - } - if (exprResult == Result.INCONCLUSIVE) { -result = Result.INCONCLUSIVE; + switch (exprResult) { Review comment: Suggested change will break the logic, here is a loop and when invocation of ```evaluateHelper(recordValues, arg)``` returned ```Result.INCONCLUSIVE``` once it's still a chance that next iteration will return false I guess. Previously here was the chunk: ```java for(ExprNode arg : exprNode.args) { Result exprResult = evaluateHelper(recordValues, arg); if (exprResult == Result.FALSE) { return exprResult; } if (exprResult == Result.INCONCLUSIVE) { result = Result.INCONCLUSIVE; } } ``` I see that my change made it more confusing, so I'll revert it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807592#comment-16807592 ] ASF GitHub Bot commented on DRILL-7115: --- ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271215506 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ## @@ -63,89 +58,38 @@ public Table getTable(String tableName) { return hiveSchema.getDrillTable(this.name, tableName); } + @Override + public Collection> getTableNamesAndTypes() { +ensureInitTables(); +return tables.entrySet(); + } + @Override public Set getTableNames() { +ensureInitTables(); +return tables.keySet(); + } + + private void ensureInitTables() { if (tables == null) { try { -tables = Sets.newHashSet(mClient.getTableNames(this.name, schemaConfig.getIgnoreAuthErrors())); - } catch (final TException e) { -logger.warn("Failure while attempting to access HiveDatabase '{}'.", this.name, e.getCause()); -tables = Sets.newHashSet(); // empty set. +tables = mClient.getTableNamesAndTypes(this.name, schemaConfig.getIgnoreAuthErrors()); + } catch (TException e) { +logger.warn(String.format( Review comment: It's invocation of ```warn(String msg, Throwable t)``` which means that stack trace won't be missed in logs. Using string with placeholders ```{}``` and ```warn(String format, Object... arguments)``` most probably will just call ```toString()``` on exception object and stack trace details won't be shown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807572#comment-16807572 ] ASF GitHub Bot commented on DRILL-7115: --- ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271203347 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.List; +import java.util.stream.Collectors; + +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.exec.store.hive.ColumnListsCache; +import org.apache.drill.exec.store.hive.HiveReadEntry; +import org.apache.drill.exec.store.hive.HiveTableWithColumnCache; +import org.apache.drill.exec.store.hive.HiveTableWrapper; +import org.apache.drill.exec.store.hive.HiveUtilities; +import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.api.UnknownTableException; +import org.apache.thrift.TException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * CacheLoader that synchronized on client and tries to reconnect when + * client fails. Used by {@link HiveMetadataCache}. + */ +final class TableEntryCacheLoader extends CacheLoader { + + private static final Logger logger = LoggerFactory.getLogger(TableNameLoader.class); + + private final DrillHiveMetaStoreClient client; + + TableEntryCacheLoader(DrillHiveMetaStoreClient client) { +this.client = client; + } + + + @Override + @SuppressWarnings("NullableProblems") + public HiveReadEntry load(TableName key) throws Exception { +Table table; +List partitions; +synchronized (client) { + table = getTable(key); + partitions = getPartitions(key); +} +HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, new ColumnListsCache(table)); +List partitionWrappers = partitions.isEmpty() +? null Review comment: Good catch, the logic was here previously since the class was static nested. So I extracted it and preserved existing logic, but I'll try to use empty list and maybe somewhere else redundant null check will be removed too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807523#comment-16807523 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271177669 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableName.java ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.Objects; + +/** + * Combination of dbName and tableName fields used Review comment: ```suggestion * Combination of database and table names used ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807526#comment-16807526 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271164002 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java ## @@ -266,8 +266,7 @@ private void scanSchema(String schemaPath, SchemaPlus schema) { */ public void visitTables(String schemaPath, SchemaPlus schema) { final AbstractSchema drillSchema = schema.unwrap(AbstractSchema.class); -final List tableNames = Lists.newArrayList(schema.getTableNames()); -for(Pair tableNameToTable : drillSchema.getTablesByNames(tableNames)) { +for(Pair tableNameToTable : drillSchema.getTablesByNames(schema.getTableNames())) { Review comment: ```suggestion for (Pair tableNameToTable : drillSchema.getTablesByNames(schema.getTableNames())) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807525#comment-16807525 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271179071 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ## @@ -63,89 +58,38 @@ public Table getTable(String tableName) { return hiveSchema.getDrillTable(this.name, tableName); } + @Override + public Collection> getTableNamesAndTypes() { +ensureInitTables(); +return tables.entrySet(); + } + @Override public Set getTableNames() { +ensureInitTables(); +return tables.keySet(); + } + + private void ensureInitTables() { if (tables == null) { try { -tables = Sets.newHashSet(mClient.getTableNames(this.name, schemaConfig.getIgnoreAuthErrors())); - } catch (final TException e) { -logger.warn("Failure while attempting to access HiveDatabase '{}'.", this.name, e.getCause()); -tables = Sets.newHashSet(); // empty set. +tables = mClient.getTableNamesAndTypes(this.name, schemaConfig.getIgnoreAuthErrors()); + } catch (TException e) { +logger.warn(String.format( Review comment: Why `String.format`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807521#comment-16807521 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271163913 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java ## @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function for(ExprNode arg : exprNode.args) { Result exprResult = evaluateHelper(recordValues, arg); - if (exprResult == Result.FALSE) { -return exprResult; - } - if (exprResult == Result.INCONCLUSIVE) { -result = Result.INCONCLUSIVE; + switch (exprResult) { Review comment: consider ``` if (exprResult == Result.FALSE || exprResult == Result.INCONCLUSIVE) { return exprResult; } ``` i find it simpler This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807516#comment-16807516 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271145021 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; + +import org.apache.calcite.schema.Schema.TableType; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static java.util.stream.Collectors.toMap; +import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW; + +/** + * CacheLoader that synchronized on client and tries to reconnect when + * client fails. Used by {@link HiveMetadataCache}. + */ +final class TableNameLoader extends CacheLoader> { + + private static final Logger logger = LoggerFactory.getLogger(TableNameLoader.class); + + private final DrillHiveMetaStoreClient client; + + TableNameLoader(DrillHiveMetaStoreClient client) { +this.client = client; + } + + @Override + @SuppressWarnings("NullableProblems") + public Map load(String dbName) throws Exception { +List tableAndViewNames; +final Set viewNames = new HashSet<>(); +synchronized (client) { + try { +tableAndViewNames = client.getAllTables(dbName); +viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW)); + } catch (MetaException e) { + /* + HiveMetaStoreClient is encapsulating both the MetaException/TExceptions inside MetaException. + Since we don't have good way to differentiate, we will close older connection and retry once. + This is only applicable for getAllTables and getAllDatabases method since other methods are + properly throwing correct exceptions. + */ +logger.warn("Failure while attempting to get hive tables. Retries once.", e); +AutoCloseables.closeSilently(client::close); +client.reconnect(); +tableAndViewNames = client.getAllTables(dbName); +viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW)); + } +} +Function valueMapper = viewNames.isEmpty() +? tableName -> TableType.TABLE Review comment: Please replace two-level ternary operator. We are trying to avoid it in Drill for readability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlass
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807524#comment-16807524 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271177349 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableName.java ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.Objects; + +/** + * Combination of dbName and tableName fields used + * to represent key for getting table data from cache. + */ +final class TableName { + + private final String dbName; + Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807520#comment-16807520 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271160776 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ## @@ -920,46 +920,11 @@ public void dropTable(String table) { } @Override -public List> getTableNamesAndTypes(boolean bulkLoad, int bulkSize) { - final List> tableNamesAndTypes = Lists.newArrayList(); - - // Look for raw tables first - if (!tables.isEmpty()) { -for (Map.Entry tableEntry : tables.entrySet()) { - tableNamesAndTypes - .add(Pair.of(tableEntry.getKey().sig.name, tableEntry.getValue().getJdbcTableType())); -} - } - // Then look for files that start with this name and end in .drill. - List files = Collections.emptyList(); - try { -files = DotDrillUtil.getDotDrills(getFS(), new Path(config.getLocation()), DotDrillType.VIEW); - } catch (AccessControlException e) { -if (!schemaConfig.getIgnoreAuthErrors()) { - logger.debug(e.getMessage()); - throw UserException.permissionError(e) - .message("Not authorized to list or query tables in schema [%s]", getFullSchemaName()) - .build(logger); -} - } catch (IOException e) { -logger.warn("Failure while trying to list view tables in workspace [{}]", getFullSchemaName(), e); - } catch (UnsupportedOperationException e) { -// the file system (e.g. the classpath filesystem) may not support listing -// of files. But see getViews(), it ignores the exception and continues -logger.debug("Failure while trying to list view tables in workspace [{}]", getFullSchemaName(), e); - } - - try { -for (DotDrillFile f : files) { - if (f.getType() == DotDrillType.VIEW) { -tableNamesAndTypes.add(Pair.of(f.getBaseName(), TableType.VIEW)); - } -} - } catch (UnsupportedOperationException e) { -logger.debug("The filesystem for this workspace does not support this operation.", e); Review comment: What about logging? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807522#comment-16807522 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271161918 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java ## @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function for(ExprNode arg : exprNode.args) { Review comment: ```suggestion for (ExprNode arg : exprNode.args) { ``` please edit in 3 other cases in this class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807518#comment-16807518 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271150560 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.List; +import java.util.stream.Collectors; + +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.exec.store.hive.ColumnListsCache; +import org.apache.drill.exec.store.hive.HiveReadEntry; +import org.apache.drill.exec.store.hive.HiveTableWithColumnCache; +import org.apache.drill.exec.store.hive.HiveTableWrapper; +import org.apache.drill.exec.store.hive.HiveUtilities; +import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.api.UnknownTableException; +import org.apache.thrift.TException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * CacheLoader that synchronized on client and tries to reconnect when + * client fails. Used by {@link HiveMetadataCache}. + */ +final class TableEntryCacheLoader extends CacheLoader { + + private static final Logger logger = LoggerFactory.getLogger(TableNameLoader.class); + + private final DrillHiveMetaStoreClient client; + + TableEntryCacheLoader(DrillHiveMetaStoreClient client) { +this.client = client; + } + + + @Override + @SuppressWarnings("NullableProblems") + public HiveReadEntry load(TableName key) throws Exception { +Table table; +List partitions; +synchronized (client) { + table = getTable(key); + partitions = getPartitions(key); +} +HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, new ColumnListsCache(table)); +List partitionWrappers = partitions.isEmpty() +? null Review comment: Why not empty list instead of null in case of empty partitions list? Depends on the above answer you can use `Optional` or `Stream` `filter(Objects::nonNull)` for better stream chaining. You can ignore it, if you added `if` condition intentionally to avoid creation of `Optional` or `Stream` objects. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807519#comment-16807519 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271160223 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ## @@ -67,23 +66,24 @@ import org.apache.drill.exec.store.AbstractSchema; import org.apache.drill.exec.store.PartitionNotFoundException; import org.apache.drill.exec.store.SchemaConfig; -import org.apache.drill.exec.util.DrillFileSystemUtil; import org.apache.drill.exec.store.StorageStrategy; import org.apache.drill.exec.store.easy.json.JSONFormatPlugin; +import org.apache.drill.exec.util.DrillFileSystemUtil; import org.apache.drill.exec.util.ImpersonationUtil; +import org.apache.drill.shaded.guava.com.google.common.base.Joiner; +import org.apache.drill.shaded.guava.com.google.common.base.Strings; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; +import org.apache.drill.shaded.guava.com.google.common.collect.Lists; +import org.apache.drill.shaded.guava.com.google.common.collect.Sets; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.permission.FsAction; import org.apache.hadoop.fs.permission.FsPermission; import org.apache.hadoop.security.AccessControlException; -import com.fasterxml.jackson.databind.ObjectMapper; -import org.apache.drill.shaded.guava.com.google.common.base.Joiner; -import org.apache.drill.shaded.guava.com.google.common.base.Strings; -import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; -import org.apache.drill.shaded.guava.com.google.common.collect.Lists; -import org.apache.drill.shaded.guava.com.google.common.collect.Sets; +import static java.util.Collections.unmodifiableList; Review comment: Usually we don't touch imports ordering, since different IDE can change it for a lot of classes. But it is ok here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in my test cluster by creating 6000 tables(empty!) in Hive > and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807517#comment-16807517 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271145356 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; + +import org.apache.calcite.schema.Schema.TableType; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static java.util.stream.Collectors.toMap; +import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW; + +/** + * CacheLoader that synchronized on client and tries to reconnect when + * client fails. Used by {@link HiveMetadataCache}. + */ +final class TableNameLoader extends CacheLoader> { + + private static final Logger logger = LoggerFactory.getLogger(TableNameLoader.class); + + private final DrillHiveMetaStoreClient client; + + TableNameLoader(DrillHiveMetaStoreClient client) { +this.client = client; + } + + @Override + @SuppressWarnings("NullableProblems") + public Map load(String dbName) throws Exception { +List tableAndViewNames; +final Set viewNames = new HashSet<>(); +synchronized (client) { + try { +tableAndViewNames = client.getAllTables(dbName); +viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW)); + } catch (MetaException e) { + /* + HiveMetaStoreClient is encapsulating both the MetaException/TExceptions inside MetaException. + Since we don't have good way to differentiate, we will close older connection and retry once. + This is only applicable for getAllTables and getAllDatabases method since other methods are + properly throwing correct exceptions. + */ +logger.warn("Failure while attempting to get hive tables. Retries once.", e); +AutoCloseables.closeSilently(client::close); +client.reconnect(); +tableAndViewNames = client.getAllTables(dbName); +viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW)); + } +} +Function valueMapper = viewNames.isEmpty() +? tableName -> TableType.TABLE +: tableOrViewName -> viewNames.contains(tableOrViewName) ? TableType.VIEW : TableType.TABLE; +return Collections.unmodifiableMap(tableAndViewNames.stream() +.collect(toMap(Function.identity(), valueMapper))); Review comment: please follow the common way to use `Collectors` class name with a static method usage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a spli
[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance
[ https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807515#comment-16807515 ] ASF GitHub Bot commented on DRILL-7115: --- vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r271144225 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.hive.client; + +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; + +import org.apache.calcite.schema.Schema.TableType; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static java.util.stream.Collectors.toMap; +import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW; + +/** + * CacheLoader that synchronized on client and tries to reconnect when + * client fails. Used by {@link HiveMetadataCache}. + */ +final class TableNameLoader extends CacheLoader> { + + private static final Logger logger = LoggerFactory.getLogger(TableNameLoader.class); + + private final DrillHiveMetaStoreClient client; + + TableNameLoader(DrillHiveMetaStoreClient client) { +this.client = client; + } + + @Override + @SuppressWarnings("NullableProblems") + public Map load(String dbName) throws Exception { +List tableAndViewNames; +final Set viewNames = new HashSet<>(); +synchronized (client) { + try { +tableAndViewNames = client.getAllTables(dbName); +viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW)); + } catch (MetaException e) { + /* + HiveMetaStoreClient is encapsulating both the MetaException/TExceptions inside MetaException. + Since we don't have good way to differentiate, we will close older connection and retry once. + This is only applicable for getAllTables and getAllDatabases method since other methods are + properly throwing correct exceptions. + */ +logger.warn("Failure while attempting to get hive tables. Retries once.", e); +AutoCloseables.closeSilently(client::close); +client.reconnect(); +tableAndViewNames = client.getAllTables(dbName); +viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW)); + } +} +Function valueMapper = viewNames.isEmpty() +? tableName -> TableType.TABLE +: tableOrViewName -> viewNames.contains(tableOrViewName) ? TableType.VIEW : TableType.TABLE; +return Collections.unmodifiableMap(tableAndViewNames.stream() +.collect(toMap(Function.identity(), valueMapper))); + } + +} Review comment: ```suggestion } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Hive schema show tables performance > --- > > Key: DRILL-7115 > URL: https://issues.apache.org/jira/browse/DRILL-7115 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Information Schema >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to > 20mins. The schema has nearly ~8000 tables. > Whereas the same in beeline(Hive) is throwing the result in a split second(~ > 0.2 secs). > I tested the same in m
[jira] [Commented] (DRILL-7072) Query with semi join fails for JDBC storage plugin
[ https://issues.apache.org/jira/browse/DRILL-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807500#comment-16807500 ] Volodymyr Vysotskyi commented on DRILL-7072: No, it doesn't. > Query with semi join fails for JDBC storage plugin > -- > > Key: DRILL-7072 > URL: https://issues.apache.org/jira/browse/DRILL-7072 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.15.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > When running a query with semi join for JDBC storage plugin, it fails with > class cast exception: > {code:sql} > select person_id from mysql.`drill_mysql_test`.person t1 > where exists ( > select person_id from mysql.`drill_mysql_test`.person > where t1.person_id = person_id) > {code} > {noformat} > SYSTEM ERROR: ClassCastException: > org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate cannot be cast to > org.apache.drill.exec.planner.logical.DrillAggregateRel > Please, refer to logs for more information. > [Error Id: 85a27762-a4e5-4571-909f-0efa18ca0689 on user515050-pc:31013] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > ClassCastException: org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate > cannot be cast to org.apache.drill.exec.planner.logical.DrillAggregateRel > Please, refer to logs for more information. > [Error Id: 85a27762-a4e5-4571-909f-0efa18ca0689 on user515050-pc:31013] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[classes/:na] > at > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:779) > [classes/:na] > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:325) > [classes/:na] > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:221) > [classes/:na] > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83) > [classes/:na] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:299) > [classes/:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_191] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_191] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] > Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected > exception during fragment initialization: > org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate cannot be cast to > org.apache.drill.exec.planner.logical.DrillAggregateRel > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:300) > [classes/:na] > ... 3 common frames omitted > Caused by: java.lang.ClassCastException: > org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate cannot be cast to > org.apache.drill.exec.planner.logical.DrillAggregateRel > at > org.apache.drill.exec.planner.logical.DrillSemiJoinRule.matches(DrillSemiJoinRule.java:171) > ~[classes/:na] > at > org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:557) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:257) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:216) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:203) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:431) > ~[classes/:na] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:382) > ~[classes/:na] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:365) > ~[classes/:na] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:289) > ~[classes/:na] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:331) > ~[classes/:na] > at > org.apa
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807485#comment-16807485 ] ASF GitHub Bot commented on DRILL-7143: --- paul-rogers commented on issue #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726#issuecomment-478882176 @arina-ielchiieva, here is a first-cut at the improved default values. Have tested selected mechanisms and CSV with schema. Have not yet run the full set of unit tests. Consider this a "preview" to begin the code review in parallel with the remaining busy-work needed to complete the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema
[ https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807481#comment-16807481 ] ASF GitHub Bot commented on DRILL-7143: --- paul-rogers commented on pull request #1726: DRILL-7143: Support default value for empty columns URL: https://github.com/apache/drill/pull/1726 Modifies the prior work to add default values for columns. The prior work added defaults when the entire column is missing from a reader (the old Nullable Int column). The Row Set mechanism now will also "fill empty" slots with the default value. Added default support for the column writers. The writers automatically obtain the default value from the column schema. The default can also be set explicitly on the column writer. Updated the null column mechanism to use this feature rather than the ad-hoc implemention in the prior commit. Semantics changed a bit. Only Required columns take a default. The default value is ignored or nullable columns since nullable columns already have a file default: NULL. Updated the CSV-with-schema tests to illustrate the new behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce column-level constraints when using a schema > > > Key: DRILL-7143 > URL: https://issues.apache.org/jira/browse/DRILL-7143 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > The recently added schema framework enforces schema constraints at the table > level. We now wish to add additional constraints at the column level. > * If a column is marked as "strict", then the reader will use the exact type > and mode from the column schema, or fail if it is not possible to do so. > * If a column is marked as required, and provides a default value, then that > value is used instead of 0 if a row is missing a value for that column. > This PR may also contain other fixes the the base functional revealed through > additional testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)