[GitHub] [drill] HanumathRao commented on a change in pull request #1708: DRILL-7118: Filter not getting pushed down on MapR-DB tables.
HanumathRao commented on a change in pull request #1708: DRILL-7118: Filter not getting pushed down on MapR-DB tables. URL: https://github.com/apache/drill/pull/1708#discussion_r268039787 ## File path: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java ## @@ -107,15 +107,13 @@ private static CompareFunctionsProcessor processWithEvaluator(FunctionCall call, LogicalExpression nameArg = call.args.get(0); LogicalExpression valueArg = call.args.size() >= 2 ? call.args.get(1) : null; -if (valueArg != null) { - if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) { -LogicalExpression swapArg = valueArg; -valueArg = nameArg; -nameArg = swapArg; -evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); - } - evaluator.success = nameArg.accept(evaluator, valueArg); +if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) { + LogicalExpression swapArg = valueArg; + valueArg = nameArg; + nameArg = swapArg; + evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); } +evaluator.success = nameArg.accept(evaluator, valueArg); Review comment: @vvysotskyi I have made the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero
salim achouche created DRILL-7130: - Summary: IllegalStateException: Read batch count [0] should be greater than zero Key: DRILL-7130 URL: https://issues.apache.org/jira/browse/DRILL-7130 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.15.0 Reporter: salim achouche Assignee: salim achouche Fix For: 1.17.0 The following exception is being hit when reading parquet data: Caused by: java.lang.IllegalStateException: Read batch count [0] should be greater than zero at org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509) ~[drill-shaded-guava-23.0.jar:23.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624) ~[vector-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716) ~[vector-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate
Anisha Reddy created DRILL-7129: --- Summary: Join with more than 1 condition is not using stats to compute row count estimate Key: DRILL-7129 URL: https://issues.apache.org/jira/browse/DRILL-7129 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: Anisha Reddy Fix For: 1.17.0 Below are the details: {code:java} 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from `table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | 57068 | +-+ 1 row selected (0.179 seconds) 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from `table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | 7474 | +-+ 1 row selected (0.171 seconds) 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from `table_stats/Tpch0.01/parquet/lineitem` l, `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | +-+ 1 row selected (0.769 seconds) 0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes for select * from `table_stats/Tpch0.01/parquet/lineitem` l, `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey and l.l_suppkey = ps.ps_suppkey; +--+--+ | text | json | +--+--+ | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount = 57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: =[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {114136.0 rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107572 00-07 Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/lineitem]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `l_partkey`, `l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 rows, 171204.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 Project(T11¦¦**=[$0], ps_partkey=[$1], ps_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {14948.0 rows, 44844.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107574 00-06 Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/partsupp]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 rows, 22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573 {code} The ndv for l_partkey = 2000 ps_partkey = 1817 l_supkey = 100 ps_suppkey = 100 We see that such joins is just taking the max of left side and the right side table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [drill] Agirish merged pull request #1709: DRILL-7126: Contrib format-ltsv is not being included in distribution
Agirish merged pull request #1709: DRILL-7126: Contrib format-ltsv is not being included in distribution URL: https://github.com/apache/drill/pull/1709 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267903717 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded + * for all table names at once. Then as part of DRILL-4826 was added option to regulate bulk size, + * because big amount of views was causing performance degradation. After last improvements for + * DRILL-7115 both options ({@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became obsolete and may be removed + * in future releases. + */ + @Deprecated public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; - public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null); /** - * When getting Hive Table information with exec.enable_bulk_load_table_list set to true, - * use the exec.bulk_load_table_list.bulk_size to determine how many tables to fetch from HiveMetaStore - * at a time. (The number of tables can get to be quite large.) + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} */ + @Deprecated + public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, + new OptionDescription("Deprecated after DRILL-7115 improvement.")); + + /** + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + */ + @Deprecated public static final String BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY = "exec.bulk_load_table_list.bulk_size"; - public static final PositiveLongValidator BULK_LOAD_TABLE_LIST_BULK_SIZE = new PositiveLongValidator(BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY, Integer.MAX_VALUE, null); + + /** + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} Review comment: ```suggestion * @see ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267903537 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded + * for all table names at once. Then as part of DRILL-4826 was added option to regulate bulk size, + * because big amount of views was causing performance degradation. After last improvements for + * DRILL-7115 both options ({@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became obsolete and may be removed + * in future releases. + */ + @Deprecated public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; - public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null); /** - * When getting Hive Table information with exec.enable_bulk_load_table_list set to true, - * use the exec.bulk_load_table_list.bulk_size to determine how many tables to fetch from HiveMetaStore - * at a time. (The number of tables can get to be quite large.) + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} */ + @Deprecated + public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, + new OptionDescription("Deprecated after DRILL-7115 improvement.")); + + /** + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} Review comment: ```suggestion * @deprecated option. It will not take any effect. * @see ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267903099 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded + * for all table names at once. Then as part of DRILL-4826 was added option to regulate bulk size, + * because big amount of views was causing performance degradation. After last improvements for + * DRILL-7115 both options ({@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became obsolete and may be removed + * in future releases. + */ + @Deprecated public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; - public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null); /** - * When getting Hive Table information with exec.enable_bulk_load_table_list set to true, - * use the exec.bulk_load_table_list.bulk_size to determine how many tables to fetch from HiveMetaStore - * at a time. (The number of tables can get to be quite large.) + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} Review comment: ```suggestion * @see ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267903537 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded + * for all table names at once. Then as part of DRILL-4826 was added option to regulate bulk size, + * because big amount of views was causing performance degradation. After last improvements for + * DRILL-7115 both options ({@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became obsolete and may be removed + * in future releases. + */ + @Deprecated public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; - public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null); /** - * When getting Hive Table information with exec.enable_bulk_load_table_list set to true, - * use the exec.bulk_load_table_list.bulk_size to determine how many tables to fetch from HiveMetaStore - * at a time. (The number of tables can get to be quite large.) + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} */ + @Deprecated + public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, + new OptionDescription("Deprecated after DRILL-7115 improvement.")); + + /** + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} Review comment: ```suggestion * @deprecated option. It will not take any effect. * See {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} javadoc for details ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267937747 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ## @@ -275,29 +274,14 @@ public void dropTable(String tableName) { .build(logger); } - /** - * Get the collection of {@link Table} tables specified in the tableNames with bulk-load (if the underlying storage - * plugin supports). - * It is not guaranteed that the retrieved tables would have RowType and Statistic being fully populated. - * - * Specifically, calling {@link Table#getRowType(org.apache.calcite.rel.type.RelDataTypeFactory)} or {@link Table#getStatistic()} might incur - * {@link UnsupportedOperationException} being thrown. - * - * @param tableNames the requested tables, specified by the table names - * @return the collection of requested tables - */ - public List> getTablesByNamesByBulkLoad(final List tableNames, int bulkSize) { -return getTablesByNames(tableNames); - } - /** * Get the collection of {@link Table} tables specified in the tableNames. * - * @param tableNames the requested tables, specified by the table names + * @param tableNames the requested tables, specified by the table namesbulkSize Review comment: typo? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267903099 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded + * for all table names at once. Then as part of DRILL-4826 was added option to regulate bulk size, + * because big amount of views was causing performance degradation. After last improvements for + * DRILL-7115 both options ({@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became obsolete and may be removed + * in future releases. + */ + @Deprecated public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; - public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null); /** - * When getting Hive Table information with exec.enable_bulk_load_table_list set to true, - * use the exec.bulk_load_table_list.bulk_size to determine how many tables to fetch from HiveMetaStore - * at a time. (The number of tables can get to be quite large.) + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} Review comment: ```suggestion * See {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} javadoc for details ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267937301 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ## @@ -275,29 +274,14 @@ public void dropTable(String tableName) { .build(logger); } - /** - * Get the collection of {@link Table} tables specified in the tableNames with bulk-load (if the underlying storage - * plugin supports). - * It is not guaranteed that the retrieved tables would have RowType and Statistic being fully populated. - * - * Specifically, calling {@link Table#getRowType(org.apache.calcite.rel.type.RelDataTypeFactory)} or {@link Table#getStatistic()} might incur - * {@link UnsupportedOperationException} being thrown. - * - * @param tableNames the requested tables, specified by the table names - * @return the collection of requested tables - */ - public List> getTablesByNamesByBulkLoad(final List tableNames, int bulkSize) { -return getTablesByNames(tableNames); - } - /** * Get the collection of {@link Table} tables specified in the tableNames. * - * @param tableNames the requested tables, specified by the table names + * @param tableNames the requested tables, specified by the table namesbulkSize * @return the collection of requested tables */ public List> getTablesByNames(final List tableNames) { -final List> tables = Lists.newArrayList(); +final List> tables = new ArrayList<>(tableNames.size()); for (String tableName : tableNames) { Review comment: ``` return tableNames.stream() // Schema may return NULL for table if the query user doesn't have permissions to load the table. Ignore such // tables as INFO SCHEMA is about showing tables which the use has access to query. .map(tableName -> Pair.of(tableName, getTable(tableName))) .filter(pair -> Objects.nonNull(pair.getRight())) .collect(Collectors.toList()); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267902483 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded Review comment: ```suggestion * @deprecated option. It will not take any effect. * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267903717 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -744,16 +744,37 @@ private ExecConstants() { public static final BooleanValidator CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE, new OptionDescription("Uses a hash algorithm to distribute data on partition keys in a CTAS partitioning operation. An alpha option--for experimental use at this stage. Do not use in production systems.")); + + /** + * The option added as part of DRILL-4577, was used to mark that hive tables should be loaded + * for all table names at once. Then as part of DRILL-4826 was added option to regulate bulk size, + * because big amount of views was causing performance degradation. After last improvements for + * DRILL-7115 both options ({@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became obsolete and may be removed + * in future releases. + */ + @Deprecated public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; - public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null); /** - * When getting Hive Table information with exec.enable_bulk_load_table_list set to true, - * use the exec.bulk_load_table_list.bulk_size to determine how many tables to fetch from HiveMetaStore - * at a time. (The number of tables can get to be quite large.) + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} */ + @Deprecated + public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, + new OptionDescription("Deprecated after DRILL-7115 improvement.")); + + /** + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} + */ + @Deprecated public static final String BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY = "exec.bulk_load_table_list.bulk_size"; - public static final PositiveLongValidator BULK_LOAD_TABLE_LIST_BULK_SIZE = new PositiveLongValidator(BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY, Integer.MAX_VALUE, null); + + /** + * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} Review comment: ```suggestion * See {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} javadoc for details ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267904395 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillParserUtil.java ## @@ -17,31 +17,31 @@ */ package org.apache.drill.exec.planner.sql.parser; +import java.util.ArrayList; import java.util.List; import org.apache.calcite.sql.SqlNode; import org.apache.calcite.sql.SqlOperator; import org.apache.calcite.sql.parser.SqlParserPos; import org.apache.calcite.sql.parser.SqlParserUtil; -import org.apache.calcite.util.Util; - -import org.apache.drill.shaded.guava.com.google.common.collect.Lists; /** * Helper methods or constants used in parsing a SQL query. */ public class DrillParserUtil { - public static final String CHARSET = Util.getDefaultCharset().name(); + private static final int CONDITION_LIST_CAPACITY = 3; public static SqlNode createCondition(SqlNode left, SqlOperator op, SqlNode right) { // if one of the operands is null, return the other -if (left == null || right == null) { - return left != null ? left : right; +if (left == null) { Review comment: gj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267898826 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ## @@ -63,14 +60,35 @@ public Table getTable(String tableName) { return hiveSchema.getDrillTable(this.name, tableName); } + @Override + public List> getTableNamesAndTypes() { +Set views = getViewNames(); +// optimization for empty views +Function> toNameAndTypePair = views.isEmpty() +? (name) -> Pair.of(name, TableType.TABLE) +: (name) -> Pair.of(name, views.contains(name) ? TableType.VIEW : TableType.TABLE); Review comment: consider to add brackets for better code understanding: ```suggestion : (name) -> Pair.of(name, (views.contains(name) ? TableType.VIEW : TableType.TABLE)); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267900169 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ## @@ -63,14 +60,35 @@ public Table getTable(String tableName) { return hiveSchema.getDrillTable(this.name, tableName); } + @Override + public List> getTableNamesAndTypes() { +Set views = getViewNames(); +// optimization for empty views +Function> toNameAndTypePair = views.isEmpty() +? (name) -> Pair.of(name, TableType.TABLE) +: (name) -> Pair.of(name, views.contains(name) ? TableType.VIEW : TableType.TABLE); +return getTableNames().stream() +.map(toNameAndTypePair) +.collect(toList()); + } + + private Set getViewNames() { +try { + return new HashSet<>(mClient.getTables(this.name, "*", org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW)); +} catch (MetaException e) { + logger.warn("Failed to get view names, views and tables won't be separated by type.", e.getCause()); Review comment: ```suggestion logger.warn("Failed to get view names. Views and tables won't be separated by type.", e.getCause()); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance URL: https://github.com/apache/drill/pull/1706#discussion_r267905606 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ## @@ -275,29 +274,14 @@ public void dropTable(String tableName) { .build(logger); } - /** - * Get the collection of {@link Table} tables specified in the tableNames with bulk-load (if the underlying storage - * plugin supports). - * It is not guaranteed that the retrieved tables would have RowType and Statistic being fully populated. - * - * Specifically, calling {@link Table#getRowType(org.apache.calcite.rel.type.RelDataTypeFactory)} or {@link Table#getStatistic()} might incur - * {@link UnsupportedOperationException} being thrown. - * - * @param tableNames the requested tables, specified by the table names - * @return the collection of requested tables - */ - public List> getTablesByNamesByBulkLoad(final List tableNames, int bulkSize) { -return getTablesByNames(tableNames); - } - /** * Get the collection of {@link Table} tables specified in the tableNames. * - * @param tableNames the requested tables, specified by the table names + * @param tableNames the requested tables, specified by the table namesbulkSize * @return the collection of requested tables */ public List> getTablesByNames(final List tableNames) { -final List> tables = Lists.newArrayList(); +final List> tables = new ArrayList<>(tableNames.size()); for (String tableName : tableNames) { Review comment: Can it be replaced with stream? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled
KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled URL: https://github.com/apache/drill/pull/1712#discussion_r267943693 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java ## @@ -55,6 +56,13 @@ public DotDrillType getType(){ * @return Return owner of the file in underlying file system. */ public String getOwner() { +if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) { Review comment: Owner for views on S3 is always empty. The `getOwner()` method is currently called for views only and for other files the method which throws the `IllegalArgumentException` won't be called with empty `String`. So I suppose we should not throw an `Exception` from the method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed
arina-ielchiieva commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed URL: https://github.com/apache/drill/pull/1703#issuecomment-475371799 +1, please squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] sohami merged pull request #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1…
sohami merged pull request #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1… URL: https://github.com/apache/drill/pull/1707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] Agirish commented on issue #1710: DRILL-7127: Updating hbase version for mapr profile
Agirish commented on issue #1710: DRILL-7127: Updating hbase version for mapr profile URL: https://github.com/apache/drill/pull/1710#issuecomment-475365086 I'm seeing some test failures. Will spend some time to analyze them and get back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7128) IllegalStateException: Read batch count [0] should be greater than zero
Khurram Faraaz created DRILL-7128: - Summary: IllegalStateException: Read batch count [0] should be greater than zero Key: DRILL-7128 URL: https://issues.apache.org/jira/browse/DRILL-7128 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.15.0 Reporter: Khurram Faraaz Source table is a Hive table stored as parquet. Issue is seen only when querying datacapturekey column, which is of VARCHAR type. Hive 2.3 MapR Drill : 1.15.0.0-mapr commit id : 951ef156fb1025677a2ca2dcf84e11002bf4b513 {noformat} 0: jdbc:drill:drillbit=test.a.node1> describe bt_br_cc_invalid_leads ; +-++--+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++--+ | wrapup | CHARACTER VARYING | YES | | datacapturekey | CHARACTER VARYING | YES | | leadgendate | CHARACTER VARYING | YES | | crla1 | CHARACTER VARYING | YES | | crla2 | CHARACTER VARYING | YES | | invalid_lead | INTEGER | YES | | destination_advertiser_vendor_name | CHARACTER VARYING | YES | | source_program_key | CHARACTER VARYING | YES | | publisher_publisher | CHARACTER VARYING | YES | | areaname | CHARACTER VARYING | YES | | data_abertura_ficha | CHARACTER VARYING | YES | +-++--+ 11 rows selected (1.85 seconds) 0: jdbc:drill:drillbit=test.a.node1> // from the view definition, note that column datacapturekey is of type VARVCHAR with precision 2000 { "name" : "bt_br_cc_invalid_leads", "sql" : "SELECT CAST(`wrapup` AS VARCHAR(2000)) AS `wrapup`, CAST(`datacapturekey` AS VARCHAR(2000)) AS `datacapturekey`, CAST(`leadgendate` AS VARCHAR(2000)) AS `leadgendate`, CAST(`crla1` AS VARCHAR(2000)) AS `crla1`, CAST(`crla2` AS VARCHAR(2000)) AS `crla2`, CAST(`invalid_lead` AS INTEGER) AS `invalid_lead`, CAST(`destination_advertiser_vendor_name` AS VARCHAR(2000)) AS `destination_advertiser_vendor_name`, CAST(`source_program_key` AS VARCHAR(2000)) AS `source_program_key`, CAST(`publisher_publisher` AS VARCHAR(2000)) AS `publisher_publisher`, CAST(`areaname` AS VARCHAR(2000)) AS `areaname`, CAST(`data_abertura_ficha` AS VARCHAR(2000)) AS `data_abertura_ficha`\nFROM `dfs`.`root`.`/user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads`", "fields" : [ { "name" : "wrapup", "type" : "VARCHAR", "precision" : 2000, "isNullable" : true }, { "name" : "datacapturekey", "type" : "VARCHAR", "precision" : 2000, "isNullable" : true ... ... // total number of rows in bt_br_cc_invalid_leads 0: jdbc:drill:drillbit=test.a.node1> select count(*) from bt_br_cc_invalid_leads ; +-+ | EXPR$0 | +-+ | 20599 | +-+ 1 row selected (0.173 seconds) {noformat} Stack trace from drillbit.log {noformat} 2019-03-18 12:19:01,610 [237010da-6eda-a913-0424-32f63fbe01be:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 237010da-6eda-a913-0424-32f63fbe01be issued by bigtable: SELECT `bt_br_cc_invalid_leads`.`datacapturekey` AS `datacapturekey` FROM `dfs.drill_views`.`bt_br_cc_invalid_leads` `bt_br_cc_invalid_leads` GROUP BY `bt_br_cc_invalid_leads`.`datacapturekey` 2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 237010da-6eda-a913-0424-32f63fbe01be:0:0: State change requested AWAITING_ALLOCATION --> RUNNING 2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO o.a.d.e.w.f.FragmentStatusReporter - 237010da-6eda-a913-0424-32f63fbe01be:0:0: State to report: RUNNING 2019-03-18 12:19:02,502 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet record reader. Message: Hadoop path: /user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads/08_0 Total records read: 0 Row group index: 0 Records in row group: 1551 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message hive_schema { optional binary wrapup (UTF8); optional binary datacapturekey (UTF8); optional binary leadgendate (UTF8); optional binary crla1 (UTF8); optional binary crla2 (UTF8); optional binary invalid_lead (UTF8); optional binary destination_advertiser_vendor_name (UTF8); optional binary source_program_key (UTF8); optional binary publisher_publisher (UTF8); optional binary areaname (UTF8); optional binary data_abertura_ficha (UTF8); } , metadata: {}}, blocks: [BlockMetaData\{1551, 139906 [ColumnMetaData{UNCOMPRESSED [wrapup] optional binary wrapup (UTF8) [PLAIN_DICTIONARY, RLE, BIT_PACKED], 4}, ColumnMetaData\{UNCOMPRESSED [datacapturekey] optional binary datacapturekey (UTF8) [RLE, PLAIN, BIT_PACKED], 656}, ColumnMetaData\{UNCOMPRESSED [leadgendate] optional binary leadgendate (UTF8) [PLAIN_DICTIONARY, RLE, BIT_PACKED], 23978}, ColumnMetaDa
[GitHub] [drill] sohami commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed
sohami commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed URL: https://github.com/apache/drill/pull/1703#issuecomment-475334557 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] kkhatua commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed
kkhatua commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed URL: https://github.com/apache/drill/pull/1703#issuecomment-475332636 @sohami / @arina-ielchiieva I've made the minor changes. Please review and let me know if anything else is missing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] amansinha100 closed pull request #1704: DRILL-7113: Fix creation of filter conditions for IS NULL and IS NOT …
amansinha100 closed pull request #1704: DRILL-7113: Fix creation of filter conditions for IS NULL and IS NOT … URL: https://github.com/apache/drill/pull/1704 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] amansinha100 closed pull request #1646: DRILL-6852: Adapt current Parquet Metadata cache implementation to use Drill Metastore API
amansinha100 closed pull request #1646: DRILL-6852: Adapt current Parquet Metadata cache implementation to use Drill Metastore API URL: https://github.com/apache/drill/pull/1646 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] kkhatua commented on a change in pull request #1692: DRILL-6562: Plugin Management improvements
kkhatua commented on a change in pull request #1692: DRILL-6562: Plugin Management improvements URL: https://github.com/apache/drill/pull/1692#discussion_r267873794 ## File path: exec/java-exec/src/main/resources/rest/storage/list.ftl ## @@ -17,79 +17,280 @@ limitations under the License. --> + <#include "*/generic.ftl"> <#macro page_head> + + + + + <#macro page_body> - Enabled Storage Plugins - + + Plugin Management + + + + + + Create + + + Export all + + + + + + + + + +Enabled Storage Plugins <#list model as plugin> <#if plugin.enabled() == true> - + ${plugin.getName()} -Update -Disable -Export + + Update + + + Disable + + + Export + - - - Disabled Storage Plugins - + + +Disabled Storage Plugins <#list model as plugin> <#if plugin.enabled() == false> - + ${plugin.getName()} -Update -Enable + + Update + + + Enable + + + Export + - + + + <#-- Modal window for exporting plugin config (including group plugins modal) --> + + + + + × + Plugin config + + + +Format + + + +JSON + + + + + +HOCON + + + + + +Plugin group + + + +ALL + + + + + +ENABLED + + + + + +DISABLED + + + + + + + Close + Export + + + - -New Storage Plugin - - - + <#-- Modal window for exporting plugin config (including group plugins modal) --> + + <#-- Modal window for creating plugin --> + + + + + × + New Storage Plugin + + + + + +Configuration + + + + + + + Close + Create + + + + + + - Create - + + <#-- Modal window for creating plugin --> +
[GitHub] [drill] sohami commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1…
sohami commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1… URL: https://github.com/apache/drill/pull/1707#issuecomment-475307113 Thanks for the review. Squashed and rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] ihuzenko commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled
ihuzenko commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled URL: https://github.com/apache/drill/pull/1712#discussion_r267835326 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java ## @@ -55,6 +56,13 @@ public DotDrillType getType(){ * @return Return owner of the file in underlying file system. */ public String getOwner() { +if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) { Review comment: If owner is not always empty for .view.drill files on S3, maybe it makes sense to inverse the check a little bit, like: ```java public String getOwner() { String owner = status.getOwner(); if (owner.isEmpty() && type == DotDrillType.VIEW) { // Drill view S3AFileStatus is not populated with owner (it has default value of ""). // This empty String causes IllegalArgumentException to be thrown (if impersonation is enabled) in // SchemaTreeProvider#createRootSchema(String, SchemaConfigInfoProvider). To work-around the issue // we can return current user as if they were the owner of the file (since they have access to it). owner = ImpersonationUtil.getProcessUserName(); } return owner; } ``` Also what if owner is empty but file type is not ```DotDrillType.VIEW``` ? Should we throw exception in such case to detect potential future problems early in such case ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #1692: DRILL-6562: Plugin Management improvements
arina-ielchiieva commented on issue #1692: DRILL-6562: Plugin Management improvements URL: https://github.com/apache/drill/pull/1692#issuecomment-475215017 Looks good, really nice improvement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled
KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled URL: https://github.com/apache/drill/pull/1712#discussion_r267733833 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java ## @@ -55,6 +56,13 @@ public DotDrillType getType(){ * @return Return owner of the file in underlying file system. */ public String getOwner() { +if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) { Review comment: No, the other 'files' work fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1…
arina-ielchiieva commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1… URL: https://github.com/apache/drill/pull/1707#issuecomment-475202366 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vvysotskyi commented on a change in pull request #1708: DRILL-7118: Filter not getting pushed down on MapR-DB tables.
vvysotskyi commented on a change in pull request #1708: DRILL-7118: Filter not getting pushed down on MapR-DB tables. URL: https://github.com/apache/drill/pull/1708#discussion_r267705343 ## File path: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java ## @@ -107,15 +107,13 @@ private static CompareFunctionsProcessor processWithEvaluator(FunctionCall call, LogicalExpression nameArg = call.args.get(0); LogicalExpression valueArg = call.args.size() >= 2 ? call.args.get(1) : null; -if (valueArg != null) { - if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) { -LogicalExpression swapArg = valueArg; -valueArg = nameArg; -nameArg = swapArg; -evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); - } - evaluator.success = nameArg.accept(evaluator, valueArg); +if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) { + LogicalExpression swapArg = valueArg; + valueArg = nameArg; + nameArg = swapArg; + evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); } +evaluator.success = nameArg.accept(evaluator, valueArg); Review comment: This check was added to avoid NPE, but for the case when `VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())` is false, and `valueArg` is null, NPE will not occur. But should we add the next check? ```suggestion if (nameArg != null) { evaluator.success = nameArg.accept(evaluator, valueArg); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vvysotskyi commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled
vvysotskyi commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled URL: https://github.com/apache/drill/pull/1712#discussion_r267691682 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java ## @@ -55,6 +56,13 @@ public DotDrillType getType(){ * @return Return owner of the file in underlying file system. */ public String getOwner() { +if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) { Review comment: What about regular tables, parquet metadata cache files or stat files? Is it possible that empty `status.getOwner()` may cause problems for them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] KazydubB opened a new pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled
KazydubB opened a new pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled URL: https://github.com/apache/drill/pull/1712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #1711: DRILL-7011: Support schema in scan framework
paul-rogers commented on issue #1711: DRILL-7011: Support schema in scan framework URL: https://github.com/apache/drill/pull/1711#issuecomment-475131734 @arina-ielchiieva, here is the full integrated schema feature for review. For some reason, the unit TestCsvWithSchema test runs in Eclipse, but not from the command line. I'll investigate that tomorrow. Because of that, I've not done a full unit test run. Still, there is plenty of code to review while I figure out the unit test issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers opened a new pull request #1711: DRILL-7011: Support schema in scan framework
paul-rogers opened a new pull request #1711: DRILL-7011: Support schema in scan framework URL: https://github.com/apache/drill/pull/1711 Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework. Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead. Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors. Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services