[GitHub] [drill] HanumathRao commented on a change in pull request #1708: DRILL-7118: Filter not getting pushed down on MapR-DB tables.

2019-03-21 Thread GitBox
HanumathRao commented on a change in pull request #1708: DRILL-7118: Filter not 
getting pushed down on MapR-DB tables.
URL: https://github.com/apache/drill/pull/1708#discussion_r268039787
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java
 ##
 @@ -107,15 +107,13 @@ private static CompareFunctionsProcessor 
processWithEvaluator(FunctionCall call,
 LogicalExpression nameArg = call.args.get(0);
 LogicalExpression valueArg = call.args.size() >= 2 ? call.args.get(1) : 
null;
 
-if (valueArg != null) {
-  if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) {
-LogicalExpression swapArg = valueArg;
-valueArg = nameArg;
-nameArg = swapArg;
-evaluator.functionName = 
COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
-  }
-  evaluator.success = nameArg.accept(evaluator, valueArg);
+if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) {
+  LogicalExpression swapArg = valueArg;
+  valueArg = nameArg;
+  nameArg = swapArg;
+  evaluator.functionName = 
COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
 }
+evaluator.success = nameArg.accept(evaluator, valueArg);
 
 Review comment:
   @vvysotskyi  I have made the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-21 Thread salim achouche (JIRA)
salim achouche created DRILL-7130:
-

 Summary: IllegalStateException: Read batch count [0] should be 
greater than zero
 Key: DRILL-7130
 URL: https://issues.apache.org/jira/browse/DRILL-7130
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.17.0


The following exception is being hit when reading parquet data:

Caused by: java.lang.IllegalStateException: Read batch count [0] should be 
greater than zero at 
org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624)
 ~[vector-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716)
 ~[vector-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

2019-03-21 Thread Anisha Reddy (JIRA)
Anisha Reddy created DRILL-7129:
---

 Summary: Join with more than 1 condition is not using stats to 
compute row count estimate
 Key: DRILL-7129
 URL: https://issues.apache.org/jira/browse/DRILL-7129
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Anisha Reddy
 Fix For: 1.17.0


Below are the details: 

 
{code:java}
0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
`table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | 
57068 | +-+ 1 row selected (0.179 seconds)

 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
`table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | 
7474 | +-+ 1 row selected (0.171 seconds) 

0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
`table_stats/Tpch0.01/parquet/lineitem` l, 
`table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | 
+-+ 1 row selected (0.769 seconds)

 0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes 
for select * from `table_stats/Tpch0.01/parquet/lineitem` l, 
`table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
and l.l_suppkey = ps.ps_suppkey; 
+--+--+
 | text | json | 
+--+--+
 | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 
io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], 
**0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount = 
57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 
network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], 
T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR 
T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 cpu, 
193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 
HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: 
=[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY 
l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 
network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], 
l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, 
ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {114136.0 
rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107572 00-07 
Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/lineitem]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], 
selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `l_partkey`, 
`l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY l_partkey, ANY 
l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 rows, 171204.0 cpu, 
171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 Project(T11¦¦**=[$0], 
ps_partkey=[$1], ps_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T11¦¦**, 
ANY ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {14948.0 
rows, 44844.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107574 00-06 
Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/partsupp]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], 
selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
`ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 rows, 
22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
{code}

The ndv for l_partkey = 2000
ps_partkey = 1817
l_supkey = 100
ps_suppkey = 100 

We see that such joins is just taking the max of left side and the right side 
table.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [drill] Agirish merged pull request #1709: DRILL-7126: Contrib format-ltsv is not being included in distribution

2019-03-21 Thread GitBox
Agirish merged pull request #1709: DRILL-7126: Contrib format-ltsv is not being 
included in distribution
URL: https://github.com/apache/drill/pull/1709
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267903717
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
+   * for all table names at once. Then as part of DRILL-4826 was added option 
to regulate bulk size,
+   * because big amount of views was causing performance degradation. After 
last improvements for
+   * DRILL-7115 both options ({@link 
ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became 
obsolete and may be removed
+   * in future releases.
+   */
+  @Deprecated
   public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
-  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null);
 
   /**
-   * When getting Hive Table information with exec.enable_bulk_load_table_list 
set to true,
-   * use the exec.bulk_load_table_list.bulk_size to determine how many tables 
to fetch from HiveMetaStore
-   * at a time. (The number of tables can get to be quite large.)
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
*/
+  @Deprecated
+  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY,
+  new OptionDescription("Deprecated after DRILL-7115 improvement."));
+
+  /**
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   */
+  @Deprecated
   public static final String BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY = 
"exec.bulk_load_table_list.bulk_size";
-  public static final PositiveLongValidator BULK_LOAD_TABLE_LIST_BULK_SIZE = 
new PositiveLongValidator(BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY, 
Integer.MAX_VALUE, null);
+
+  /**
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
 
 Review comment:
   ```suggestion
  * @see ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267903537
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
+   * for all table names at once. Then as part of DRILL-4826 was added option 
to regulate bulk size,
+   * because big amount of views was causing performance degradation. After 
last improvements for
+   * DRILL-7115 both options ({@link 
ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became 
obsolete and may be removed
+   * in future releases.
+   */
+  @Deprecated
   public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
-  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null);
 
   /**
-   * When getting Hive Table information with exec.enable_bulk_load_table_list 
set to true,
-   * use the exec.bulk_load_table_list.bulk_size to determine how many tables 
to fetch from HiveMetaStore
-   * at a time. (The number of tables can get to be quite large.)
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
*/
+  @Deprecated
+  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY,
+  new OptionDescription("Deprecated after DRILL-7115 improvement."));
+
+  /**
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
 
 Review comment:
   ```suggestion
  * @deprecated option. It will not take any effect.
  * @see ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267903099
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
+   * for all table names at once. Then as part of DRILL-4826 was added option 
to regulate bulk size,
+   * because big amount of views was causing performance degradation. After 
last improvements for
+   * DRILL-7115 both options ({@link 
ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became 
obsolete and may be removed
+   * in future releases.
+   */
+  @Deprecated
   public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
-  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null);
 
   /**
-   * When getting Hive Table information with exec.enable_bulk_load_table_list 
set to true,
-   * use the exec.bulk_load_table_list.bulk_size to determine how many tables 
to fetch from HiveMetaStore
-   * at a time. (The number of tables can get to be quite large.)
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
 
 Review comment:
   ```suggestion
  * @see ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267903537
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
+   * for all table names at once. Then as part of DRILL-4826 was added option 
to regulate bulk size,
+   * because big amount of views was causing performance degradation. After 
last improvements for
+   * DRILL-7115 both options ({@link 
ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became 
obsolete and may be removed
+   * in future releases.
+   */
+  @Deprecated
   public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
-  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null);
 
   /**
-   * When getting Hive Table information with exec.enable_bulk_load_table_list 
set to true,
-   * use the exec.bulk_load_table_list.bulk_size to determine how many tables 
to fetch from HiveMetaStore
-   * at a time. (The number of tables can get to be quite large.)
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
*/
+  @Deprecated
+  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY,
+  new OptionDescription("Deprecated after DRILL-7115 improvement."));
+
+  /**
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
 
 Review comment:
   ```suggestion
  * @deprecated option. It will not take any effect.
  * See {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} javadoc for 
details
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267937747
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java
 ##
 @@ -275,29 +274,14 @@ public void dropTable(String tableName) {
 .build(logger);
   }
 
-  /**
-   * Get the collection of {@link Table} tables specified in the tableNames 
with bulk-load (if the underlying storage
-   * plugin supports).
-   * It is not guaranteed that the retrieved tables would have RowType and 
Statistic being fully populated.
-   *
-   * Specifically, calling {@link 
Table#getRowType(org.apache.calcite.rel.type.RelDataTypeFactory)} or {@link 
Table#getStatistic()} might incur
-   * {@link UnsupportedOperationException} being thrown.
-   *
-   * @param  tableNames the requested tables, specified by the table names
-   * @return the collection of requested tables
-   */
-  public List> getTablesByNamesByBulkLoad(final 
List tableNames, int bulkSize) {
-return getTablesByNames(tableNames);
-  }
-
   /**
* Get the collection of {@link Table} tables specified in the tableNames.
*
-   * @param  tableNames the requested tables, specified by the table names
+   * @param  tableNames the requested tables, specified by the table 
namesbulkSize
 
 Review comment:
   typo?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267903099
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
+   * for all table names at once. Then as part of DRILL-4826 was added option 
to regulate bulk size,
+   * because big amount of views was causing performance degradation. After 
last improvements for
+   * DRILL-7115 both options ({@link 
ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became 
obsolete and may be removed
+   * in future releases.
+   */
+  @Deprecated
   public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
-  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null);
 
   /**
-   * When getting Hive Table information with exec.enable_bulk_load_table_list 
set to true,
-   * use the exec.bulk_load_table_list.bulk_size to determine how many tables 
to fetch from HiveMetaStore
-   * at a time. (The number of tables can get to be quite large.)
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
 
 Review comment:
   ```suggestion
  * See {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} javadoc for 
details
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267937301
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java
 ##
 @@ -275,29 +274,14 @@ public void dropTable(String tableName) {
 .build(logger);
   }
 
-  /**
-   * Get the collection of {@link Table} tables specified in the tableNames 
with bulk-load (if the underlying storage
-   * plugin supports).
-   * It is not guaranteed that the retrieved tables would have RowType and 
Statistic being fully populated.
-   *
-   * Specifically, calling {@link 
Table#getRowType(org.apache.calcite.rel.type.RelDataTypeFactory)} or {@link 
Table#getStatistic()} might incur
-   * {@link UnsupportedOperationException} being thrown.
-   *
-   * @param  tableNames the requested tables, specified by the table names
-   * @return the collection of requested tables
-   */
-  public List> getTablesByNamesByBulkLoad(final 
List tableNames, int bulkSize) {
-return getTablesByNames(tableNames);
-  }
-
   /**
* Get the collection of {@link Table} tables specified in the tableNames.
*
-   * @param  tableNames the requested tables, specified by the table names
+   * @param  tableNames the requested tables, specified by the table 
namesbulkSize
* @return the collection of requested tables
*/
   public List> getTablesByNames(final 
List tableNames) {
-final List> tables = Lists.newArrayList();
+final List> tables = new 
ArrayList<>(tableNames.size());
 for (String tableName : tableNames) {
 
 Review comment:
   ```
   return tableNames.stream()
   // Schema may return NULL for table if the query user doesn't have 
permissions to load the table. Ignore such
   // tables as INFO SCHEMA is about showing tables which the use has 
access to query.
   .map(tableName -> Pair.of(tableName, getTable(tableName)))
   .filter(pair -> Objects.nonNull(pair.getRight()))
   .collect(Collectors.toList());
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267902483
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
 
 Review comment:
   ```suggestion
  * @deprecated option. It will not take any effect.
  * The option added as part of DRILL-4577, was used to mark that hive 
tables should be loaded
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267903717
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -744,16 +744,37 @@ private ExecConstants() {
   public static final BooleanValidator 
CTAS_PARTITIONING_HASH_DISTRIBUTE_VALIDATOR = new 
BooleanValidator(CTAS_PARTITIONING_HASH_DISTRIBUTE,
   new OptionDescription("Uses a hash algorithm to distribute data on 
partition keys in a CTAS partitioning operation. An alpha option--for 
experimental use at this stage. Do not use in production systems."));
 
+
+  /**
+   * The option added as part of DRILL-4577, was used to mark that hive tables 
should be loaded
+   * for all table names at once. Then as part of DRILL-4826 was added option 
to regulate bulk size,
+   * because big amount of views was causing performance degradation. After 
last improvements for
+   * DRILL-7115 both options ({@link 
ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   * and {@link ExecConstants#BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY}) became 
obsolete and may be removed
+   * in future releases.
+   */
+  @Deprecated
   public static final String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
-  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, null);
 
   /**
-   * When getting Hive Table information with exec.enable_bulk_load_table_list 
set to true,
-   * use the exec.bulk_load_table_list.bulk_size to determine how many tables 
to fetch from HiveMetaStore
-   * at a time. (The number of tables can get to be quite large.)
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
*/
+  @Deprecated
+  public static final BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY,
+  new OptionDescription("Deprecated after DRILL-7115 improvement."));
+
+  /**
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
+   */
+  @Deprecated
   public static final String BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY = 
"exec.bulk_load_table_list.bulk_size";
-  public static final PositiveLongValidator BULK_LOAD_TABLE_LIST_BULK_SIZE = 
new PositiveLongValidator(BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY, 
Integer.MAX_VALUE, null);
+
+  /**
+   * See javadoc for {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY}
 
 Review comment:
   ```suggestion
  * See {@link ExecConstants#ENABLE_BULK_LOAD_TABLE_LIST_KEY} javadoc for 
details
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267904395
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillParserUtil.java
 ##
 @@ -17,31 +17,31 @@
  */
 package org.apache.drill.exec.planner.sql.parser;
 
+import java.util.ArrayList;
 import java.util.List;
 
 import org.apache.calcite.sql.SqlNode;
 import org.apache.calcite.sql.SqlOperator;
 import org.apache.calcite.sql.parser.SqlParserPos;
 import org.apache.calcite.sql.parser.SqlParserUtil;
-import org.apache.calcite.util.Util;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 
 /**
  * Helper methods or constants used in parsing a SQL query.
  */
 public class DrillParserUtil {
 
-  public static final String CHARSET = Util.getDefaultCharset().name();
+  private static final int CONDITION_LIST_CAPACITY = 3;
 
   public static SqlNode createCondition(SqlNode left, SqlOperator op, SqlNode 
right) {
 
 // if one of the operands is null, return the other
-if (left == null || right == null) {
-  return left != null ? left : right;
+if (left == null) {
 
 Review comment:
   gj


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267898826
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ##
 @@ -63,14 +60,35 @@ public Table getTable(String tableName) {
 return hiveSchema.getDrillTable(this.name, tableName);
   }
 
+  @Override
+  public List> getTableNamesAndTypes() {
+Set views = getViewNames();
+// optimization for empty views
+Function> toNameAndTypePair = 
views.isEmpty()
+? (name) -> Pair.of(name, TableType.TABLE)
+: (name) -> Pair.of(name, views.contains(name) ? TableType.VIEW : 
TableType.TABLE);
 
 Review comment:
   consider to add brackets for better code understanding: 
   ```suggestion
   : (name) -> Pair.of(name, (views.contains(name) ? TableType.VIEW : 
TableType.TABLE));
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267900169
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ##
 @@ -63,14 +60,35 @@ public Table getTable(String tableName) {
 return hiveSchema.getDrillTable(this.name, tableName);
   }
 
+  @Override
+  public List> getTableNamesAndTypes() {
+Set views = getViewNames();
+// optimization for empty views
+Function> toNameAndTypePair = 
views.isEmpty()
+? (name) -> Pair.of(name, TableType.TABLE)
+: (name) -> Pair.of(name, views.contains(name) ? TableType.VIEW : 
TableType.TABLE);
+return getTableNames().stream()
+.map(toNameAndTypePair)
+.collect(toList());
+  }
+
+  private Set getViewNames() {
+try {
+  return new HashSet<>(mClient.getTables(this.name, "*", 
org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW));
+} catch (MetaException e) {
+  logger.warn("Failed to get view names, views and tables won't be 
separated by type.", e.getCause());
 
 Review comment:
   ```suggestion
 logger.warn("Failed to get view names. Views and tables won't be 
separated by type.", e.getCause());
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-03-21 Thread GitBox
vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r267905606
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java
 ##
 @@ -275,29 +274,14 @@ public void dropTable(String tableName) {
 .build(logger);
   }
 
-  /**
-   * Get the collection of {@link Table} tables specified in the tableNames 
with bulk-load (if the underlying storage
-   * plugin supports).
-   * It is not guaranteed that the retrieved tables would have RowType and 
Statistic being fully populated.
-   *
-   * Specifically, calling {@link 
Table#getRowType(org.apache.calcite.rel.type.RelDataTypeFactory)} or {@link 
Table#getStatistic()} might incur
-   * {@link UnsupportedOperationException} being thrown.
-   *
-   * @param  tableNames the requested tables, specified by the table names
-   * @return the collection of requested tables
-   */
-  public List> getTablesByNamesByBulkLoad(final 
List tableNames, int bulkSize) {
-return getTablesByNames(tableNames);
-  }
-
   /**
* Get the collection of {@link Table} tables specified in the tableNames.
*
-   * @param  tableNames the requested tables, specified by the table names
+   * @param  tableNames the requested tables, specified by the table 
namesbulkSize
* @return the collection of requested tables
*/
   public List> getTablesByNames(final 
List tableNames) {
-final List> tables = Lists.newArrayList();
+final List> tables = new 
ArrayList<>(tableNames.size());
 for (String tableName : tableNames) {
 
 Review comment:
   Can it be replaced with stream?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-21 Thread GitBox
KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't 
query views from the S3 storage when plain authentication is enabled
URL: https://github.com/apache/drill/pull/1712#discussion_r267943693
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java
 ##
 @@ -55,6 +56,13 @@ public DotDrillType getType(){
* @return Return owner of the file in underlying file system.
*/
   public String getOwner() {
+if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) {
 
 Review comment:
   Owner for views on S3 is always empty.
   The `getOwner()` method is currently called for views only and for other 
files the method which throws the `IllegalArgumentException` won't be called 
with empty `String`. So I suppose we should not throw an `Exception` from the 
method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed

2019-03-21 Thread GitBox
arina-ielchiieva commented on issue #1703: DRILL-7110: Skip writing profile 
when an ALTER SESSION is executed
URL: https://github.com/apache/drill/pull/1703#issuecomment-475371799
 
 
   +1, please squash the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] sohami merged pull request #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1…

2019-03-21 Thread GitBox
sohami merged pull request #1707: DRILL-7125: REFRESH TABLE METADATA fails 
after upgrade from Drill 1.1…
URL: https://github.com/apache/drill/pull/1707
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] Agirish commented on issue #1710: DRILL-7127: Updating hbase version for mapr profile

2019-03-21 Thread GitBox
Agirish commented on issue #1710: DRILL-7127: Updating hbase version for mapr 
profile
URL: https://github.com/apache/drill/pull/1710#issuecomment-475365086
 
 
   I'm seeing some test failures. Will spend some time to analyze them and get 
back. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7128) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-21 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-7128:
-

 Summary: IllegalStateException: Read batch count [0] should be 
greater than zero
 Key: DRILL-7128
 URL: https://issues.apache.org/jira/browse/DRILL-7128
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: Khurram Faraaz


Source table is a Hive table stored as parquet.
Issue is seen only when querying datacapturekey column, which is of VARCHAR 
type.

Hive 2.3
MapR Drill : 1.15.0.0-mapr 
commit id : 951ef156fb1025677a2ca2dcf84e11002bf4b513

{noformat}
0: jdbc:drill:drillbit=test.a.node1> describe bt_br_cc_invalid_leads ;
+-++--+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+-++--+
| wrapup | CHARACTER VARYING | YES |
| datacapturekey | CHARACTER VARYING | YES |
| leadgendate | CHARACTER VARYING | YES |
| crla1 | CHARACTER VARYING | YES |
| crla2 | CHARACTER VARYING | YES |
| invalid_lead | INTEGER | YES |
| destination_advertiser_vendor_name | CHARACTER VARYING | YES |
| source_program_key | CHARACTER VARYING | YES |
| publisher_publisher | CHARACTER VARYING | YES |
| areaname | CHARACTER VARYING | YES |
| data_abertura_ficha | CHARACTER VARYING | YES |
+-++--+
11 rows selected (1.85 seconds)
0: jdbc:drill:drillbit=test.a.node1>

// from the view definition, note that column datacapturekey is of type 
VARVCHAR with precision 2000
{
"name" : "bt_br_cc_invalid_leads",
"sql" : "SELECT CAST(`wrapup` AS VARCHAR(2000)) AS `wrapup`, 
CAST(`datacapturekey` AS VARCHAR(2000)) AS `datacapturekey`, CAST(`leadgendate` 
AS VARCHAR(2000)) AS `leadgendate`, CAST(`crla1` AS VARCHAR(2000)) AS `crla1`, 
CAST(`crla2` AS VARCHAR(2000)) AS `crla2`, CAST(`invalid_lead` AS INTEGER) AS 
`invalid_lead`, CAST(`destination_advertiser_vendor_name` AS VARCHAR(2000)) AS 
`destination_advertiser_vendor_name`, CAST(`source_program_key` AS 
VARCHAR(2000)) AS `source_program_key`, CAST(`publisher_publisher` AS 
VARCHAR(2000)) AS `publisher_publisher`, CAST(`areaname` AS VARCHAR(2000)) AS 
`areaname`, CAST(`data_abertura_ficha` AS VARCHAR(2000)) AS 
`data_abertura_ficha`\nFROM 
`dfs`.`root`.`/user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads`",
"fields" : [ {
"name" : "wrapup",
"type" : "VARCHAR",
"precision" : 2000,
"isNullable" : true
}, {
"name" : "datacapturekey",
"type" : "VARCHAR",
"precision" : 2000,
"isNullable" : true
...
...

// total number of rows in bt_br_cc_invalid_leads
0: jdbc:drill:drillbit=test.a.node1> select count(*) from 
bt_br_cc_invalid_leads ;
+-+
| EXPR$0 |
+-+
| 20599 |
+-+
1 row selected (0.173 seconds)
{noformat}

Stack trace from drillbit.log
{noformat}
2019-03-18 12:19:01,610 [237010da-6eda-a913-0424-32f63fbe01be:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
237010da-6eda-a913-0424-32f63fbe01be issued by bigtable: SELECT 
`bt_br_cc_invalid_leads`.`datacapturekey` AS `datacapturekey`
FROM `dfs.drill_views`.`bt_br_cc_invalid_leads` `bt_br_cc_invalid_leads`
GROUP BY `bt_br_cc_invalid_leads`.`datacapturekey`

2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
o.a.d.e.w.fragment.FragmentExecutor - 237010da-6eda-a913-0424-32f63fbe01be:0:0: 
State change requested AWAITING_ALLOCATION --> RUNNING
2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
o.a.d.e.w.f.FragmentStatusReporter - 237010da-6eda-a913-0424-32f63fbe01be:0:0: 
State to report: RUNNING
2019-03-18 12:19:02,502 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet 
record reader.
Message:
Hadoop path: /user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads/08_0
Total records read: 0
Row group index: 0
Records in row group: 1551
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message hive_schema {
 optional binary wrapup (UTF8);
 optional binary datacapturekey (UTF8);
 optional binary leadgendate (UTF8);
 optional binary crla1 (UTF8);
 optional binary crla2 (UTF8);
 optional binary invalid_lead (UTF8);
 optional binary destination_advertiser_vendor_name (UTF8);
 optional binary source_program_key (UTF8);
 optional binary publisher_publisher (UTF8);
 optional binary areaname (UTF8);
 optional binary data_abertura_ficha (UTF8);
}
, metadata: {}}, blocks: [BlockMetaData\{1551, 139906 
[ColumnMetaData{UNCOMPRESSED [wrapup] optional binary wrapup (UTF8) 
[PLAIN_DICTIONARY, RLE, BIT_PACKED], 4}, ColumnMetaData\{UNCOMPRESSED 
[datacapturekey] optional binary datacapturekey (UTF8) [RLE, PLAIN, 
BIT_PACKED], 656}, ColumnMetaData\{UNCOMPRESSED [leadgendate] optional binary 
leadgendate (UTF8) [PLAIN_DICTIONARY, RLE, BIT_PACKED], 23978}, 
ColumnMetaDa

[GitHub] [drill] sohami commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed

2019-03-21 Thread GitBox
sohami commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER 
SESSION is executed
URL: https://github.com/apache/drill/pull/1703#issuecomment-475334557
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] kkhatua commented on issue #1703: DRILL-7110: Skip writing profile when an ALTER SESSION is executed

2019-03-21 Thread GitBox
kkhatua commented on issue #1703: DRILL-7110: Skip writing profile when an 
ALTER SESSION is executed
URL: https://github.com/apache/drill/pull/1703#issuecomment-475332636
 
 
   @sohami / @arina-ielchiieva I've made the minor changes. Please review and 
let me know if anything else is missing. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] amansinha100 closed pull request #1704: DRILL-7113: Fix creation of filter conditions for IS NULL and IS NOT …

2019-03-21 Thread GitBox
amansinha100 closed pull request #1704: DRILL-7113: Fix creation of filter 
conditions for IS NULL and IS NOT …
URL: https://github.com/apache/drill/pull/1704
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] amansinha100 closed pull request #1646: DRILL-6852: Adapt current Parquet Metadata cache implementation to use Drill Metastore API

2019-03-21 Thread GitBox
amansinha100 closed pull request #1646: DRILL-6852: Adapt current Parquet 
Metadata cache implementation to use Drill Metastore API
URL: https://github.com/apache/drill/pull/1646
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] kkhatua commented on a change in pull request #1692: DRILL-6562: Plugin Management improvements

2019-03-21 Thread GitBox
kkhatua commented on a change in pull request #1692: DRILL-6562: Plugin 
Management improvements
URL: https://github.com/apache/drill/pull/1692#discussion_r267873794
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/storage/list.ftl
 ##
 @@ -17,79 +17,280 @@
 limitations under the License.
 
 -->
+
 <#include "*/generic.ftl">
 <#macro page_head>
+  
+
+  
+  
+  
 
 
 <#macro page_body>
   
   
-  Enabled Storage Plugins
-  
+
+  Plugin Management
+  
+
+
+  
+
+  Create
+
+
+  Export all
+
+  
+
+
+  
+
+  
+
+  
+Enabled Storage Plugins
 
   
 <#list model as plugin>
   <#if plugin.enabled() == true>
 
-  
+  
 ${plugin.getName()}
   
   
-Update
-Disable
-Export
+
+  Update
+
+
+  Disable
+
+
+  Export
+
   
 
   
 
   
 
   
-  
-  
-  Disabled Storage Plugins
-  
+
+  
+Disabled Storage Plugins
 
   
 <#list model as plugin>
   <#if plugin.enabled() == false>
 
-  
+  
 ${plugin.getName()}
   
   
-Update
-Enable
+
+  Update
+
+
+  Enable
+
+
+  Export
+
   
 
   
 
   
 
   
-  
+
+
+  <#-- Modal window for exporting plugin config (including group plugins 
modal) -->
+  
+
+  
+
+  ×
+  Plugin config
+
+
+  
+Format
+
+  
+
+JSON
+  
+
+
+  
+
+HOCON
+  
+
+  
+
+  
+Plugin group
+
+  
+
+ALL
+  
+
+
+  
+
+ENABLED
+  
+
+
+  
+
+DISABLED
+  
+
+  
+
+
+
+  Close
+  Export
+
+  
+
   
-  
-New Storage Plugin
-
-  
-
+  <#-- Modal window for exporting plugin config (including group plugins 
modal) -->
+
+  <#-- Modal window for creating plugin -->
+  
+
+  
+
+  ×
+  New Storage Plugin
+
+
+
+  
+
+Configuration
+
+  
+
+
+
+
+  Close
+  Create
+
+  
+
+  
+  
+
   
-  Create
-
+
   
+  <#-- Modal window for creating plugin -->
+
   

[GitHub] [drill] sohami commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1…

2019-03-21 Thread GitBox
sohami commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after 
upgrade from Drill 1.1…
URL: https://github.com/apache/drill/pull/1707#issuecomment-475307113
 
 
   Thanks for the review. Squashed and rebased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-21 Thread GitBox
ihuzenko commented on a change in pull request #1712: DRILL-7079: Drill can't 
query views from the S3 storage when plain authentication is enabled
URL: https://github.com/apache/drill/pull/1712#discussion_r267835326
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java
 ##
 @@ -55,6 +56,13 @@ public DotDrillType getType(){
* @return Return owner of the file in underlying file system.
*/
   public String getOwner() {
+if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) {
 
 Review comment:
   If owner is not always empty for .view.drill files on S3, maybe it makes 
sense to inverse the check a little bit, like: 
   
   ```java
 public String getOwner() {
   String owner = status.getOwner();
   if (owner.isEmpty() && type == DotDrillType.VIEW) {
 // Drill view S3AFileStatus is not populated with owner (it has 
default value of "").
 // This empty String causes IllegalArgumentException to be thrown (if 
impersonation is enabled) in
 // SchemaTreeProvider#createRootSchema(String, 
SchemaConfigInfoProvider). To work-around the issue
 // we can return current user as if they were the owner of the file 
(since they have access to it).
 owner = ImpersonationUtil.getProcessUserName();
   }
   return owner;
 }
   ```
   Also what if owner is empty but file type is not ```DotDrillType.VIEW``` ? 
Should we throw exception in such case to detect potential future problems 
early in such case ?  
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #1692: DRILL-6562: Plugin Management improvements

2019-03-21 Thread GitBox
arina-ielchiieva commented on issue #1692: DRILL-6562: Plugin Management 
improvements
URL: https://github.com/apache/drill/pull/1692#issuecomment-475215017
 
 
   Looks good, really nice improvement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-21 Thread GitBox
KazydubB commented on a change in pull request #1712: DRILL-7079: Drill can't 
query views from the S3 storage when plain authentication is enabled
URL: https://github.com/apache/drill/pull/1712#discussion_r267733833
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java
 ##
 @@ -55,6 +56,13 @@ public DotDrillType getType(){
* @return Return owner of the file in underlying file system.
*/
   public String getOwner() {
+if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) {
 
 Review comment:
   No, the other 'files' work fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA fails after upgrade from Drill 1.1…

2019-03-21 Thread GitBox
arina-ielchiieva commented on issue #1707: DRILL-7125: REFRESH TABLE METADATA 
fails after upgrade from Drill 1.1…
URL: https://github.com/apache/drill/pull/1707#issuecomment-475202366
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1708: DRILL-7118: Filter not getting pushed down on MapR-DB tables.

2019-03-21 Thread GitBox
vvysotskyi commented on a change in pull request #1708: DRILL-7118: Filter not 
getting pushed down on MapR-DB tables.
URL: https://github.com/apache/drill/pull/1708#discussion_r267705343
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java
 ##
 @@ -107,15 +107,13 @@ private static CompareFunctionsProcessor 
processWithEvaluator(FunctionCall call,
 LogicalExpression nameArg = call.args.get(0);
 LogicalExpression valueArg = call.args.size() >= 2 ? call.args.get(1) : 
null;
 
-if (valueArg != null) {
-  if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) {
-LogicalExpression swapArg = valueArg;
-valueArg = nameArg;
-nameArg = swapArg;
-evaluator.functionName = 
COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
-  }
-  evaluator.success = nameArg.accept(evaluator, valueArg);
+if (VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())) {
+  LogicalExpression swapArg = valueArg;
+  valueArg = nameArg;
+  nameArg = swapArg;
+  evaluator.functionName = 
COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
 }
+evaluator.success = nameArg.accept(evaluator, valueArg);
 
 Review comment:
   This check was added to avoid NPE, but for the case when 
`VALUE_EXPRESSION_CLASSES.contains(nameArg.getClass())` is false, and 
`valueArg` is null, NPE will not occur. But should we add the next check?
   ```suggestion
   if (nameArg != null) {
 evaluator.success = nameArg.accept(evaluator, valueArg);
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-21 Thread GitBox
vvysotskyi commented on a change in pull request #1712: DRILL-7079: Drill can't 
query views from the S3 storage when plain authentication is enabled
URL: https://github.com/apache/drill/pull/1712#discussion_r267691682
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java
 ##
 @@ -55,6 +56,13 @@ public DotDrillType getType(){
* @return Return owner of the file in underlying file system.
*/
   public String getOwner() {
+if (type == DotDrillType.VIEW && status.getOwner().isEmpty()) {
 
 Review comment:
   What about regular tables, parquet metadata cache files or stat files? Is it 
possible that empty `status.getOwner()` may cause problems for them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] KazydubB opened a new pull request #1712: DRILL-7079: Drill can't query views from the S3 storage when plain authentication is enabled

2019-03-21 Thread GitBox
KazydubB opened a new pull request #1712: DRILL-7079: Drill can't query views 
from the S3 storage when plain authentication is enabled
URL: https://github.com/apache/drill/pull/1712
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #1711: DRILL-7011: Support schema in scan framework

2019-03-21 Thread GitBox
paul-rogers commented on issue #1711: DRILL-7011: Support schema in scan 
framework
URL: https://github.com/apache/drill/pull/1711#issuecomment-475131734
 
 
   @arina-ielchiieva, here is the full integrated schema feature for review. 
For some reason, the unit TestCsvWithSchema test runs in Eclipse, but not from 
the command line. I'll investigate that tomorrow. Because of that, I've not 
done a full unit test run. Still, there is plenty of code to review while I 
figure out the unit test issue. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers opened a new pull request #1711: DRILL-7011: Support schema in scan framework

2019-03-21 Thread GitBox
paul-rogers opened a new pull request #1711: DRILL-7011: Support schema in scan 
framework
URL: https://github.com/apache/drill/pull/1711
 
 
   Adds schema support to the row set-based scan framework and to the "V3" text 
reader based on that framework.
   
   Adding the schema made clear that passing options as a long list of 
constructor arguments was not sustainable. Refactored code to use a builder 
pattern instead.
   
   Added support for default values in the "null column loader", which required 
adding a "setValue" method to the column accessors.
   
   Added unit tests for all new or changed functionality. See TestCsvWithSchema 
for the overall test of the entire integrated mechanism.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services