[jira] [Comment Edited] (DRILL-7038) Queries on partitioned columns scan the entire datasets

2019-04-02 Thread Bohdan Kazydub (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808416#comment-16808416
 ] 

Bohdan Kazydub edited comment on DRILL-7038 at 4/3/19 6:48 AM:
---

Hi, [~bbevens]. I think it's OK, but I think it is needed to specify that 
additionally to {{DISTINCT}} or {{GROUP BY}} operation the query has to query 
({{SELECT}}) partition columns (dir0, dir1,..., dirN) only.


was (Author: kazydubb):
Hi, [~bbevens]. I think it's OK, but I think it is needed to specify that 
additionally for {{DISTINCT}} or {{GROUP BY}} operation the query has to query 
({{SELECT}}) partition columns (dir0, dir1,..., dirN) only.

> Queries on partitioned columns scan the entire datasets
> ---
>
> Key: DRILL-7038
> URL: https://issues.apache.org/jira/browse/DRILL-7038
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> For tables with hive-style partitions like
> {code}
> /table/2018/Q1
> /table/2018/Q2
> /table/2019/Q1
> etc.
> {code}
> if any of the following queries is run:
> {code}
> select distinct dir0 from dfs.`/table`
> {code}
> {code}
> select dir0 from dfs.`/table` group by dir0
> {code}
> it will actually scan every single record in the table rather than just 
> getting a list of directories at the dir0 level. This applies even when 
> cached metadata is available. This is a big penalty especially as the 
> datasets grow.
> To avoid such situations, a logical prune rule can be used to collect 
> partition columns (`dir0`), either from metadata cache (if available) or 
> group scan, and drop unnecessary files from being read. The rule will be 
> applied on following conditions:
> 1) all queried columns are partitoin columns, and
> 2) either {{DISTINCT}} or {{GROUP BY}} operations are performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7038) Queries on partitioned columns scan the entire datasets

2019-04-02 Thread Bohdan Kazydub (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808416#comment-16808416
 ] 

Bohdan Kazydub commented on DRILL-7038:
---

Hi, [~bbevens]. I think it's OK, but I think it is needed to specify that 
additionally for {{DISTINCT}} or {{GROUP BY}} operation the query has to query 
({{SELECT}}) partition columns (dir0, dir1,..., dirN) only.

> Queries on partitioned columns scan the entire datasets
> ---
>
> Key: DRILL-7038
> URL: https://issues.apache.org/jira/browse/DRILL-7038
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> For tables with hive-style partitions like
> {code}
> /table/2018/Q1
> /table/2018/Q2
> /table/2019/Q1
> etc.
> {code}
> if any of the following queries is run:
> {code}
> select distinct dir0 from dfs.`/table`
> {code}
> {code}
> select dir0 from dfs.`/table` group by dir0
> {code}
> it will actually scan every single record in the table rather than just 
> getting a list of directories at the dir0 level. This applies even when 
> cached metadata is available. This is a big penalty especially as the 
> datasets grow.
> To avoid such situations, a logical prune rule can be used to collect 
> partition columns (`dir0`), either from metadata cache (if available) or 
> group scan, and drop unnecessary files from being read. The rule will be 
> applied on following conditions:
> 1) all queried columns are partitoin columns, and
> 2) either {{DISTINCT}} or {{GROUP BY}} operations are performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-04-02 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-7132.
-

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808309#comment-16808309
 ] 

ASF GitHub Bot commented on DRILL-7153:
---

cgivre commented on pull request #1731: DRILL-7153: Drill Fails to Build using 
JDK 1.8.0_65
URL: https://github.com/apache/drill/pull/1731
 
 
   This PR fixes a bug in which building Drill using JDK 1.8.0_65 results in 
the following error. 
   
   ```
   [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) 
on project drill-java-exec: Compilation failure
   [ERROR] 
/Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
 error: unreported exception E; must be caught or declared to be thrown
   [ERROR]   where E,T,V are type-variables:
   [ERROR] E extends Exception declared in method 
accept(ExprVisitor,V)
   [ERROR] T extends Object declared in method 
accept(ExprVisitor,V)
   [ERROR] V extends Object declared in method 
accept(ExprVisitor,V)
   [ERROR]
   [ERROR] -> [Help 1]
   [ERROR]
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR]
   [ERROR] For more information about the errors and possible solutions, please 
read the following articles:
   [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   [ERROR]
   [ERROR] After correcting the problems, you can resume the build with the 
command
   [ERROR]   mvn  -rf :drill-java-exec
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill Fails to Build using JDK 1.8.0_65
> ---
>
> Key: DRILL-7153
> URL: https://issues.apache.org/jira/browse/DRILL-7153
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Drill fails to build when using Java 1.8.0_65.  Throws the following error:
> [{{ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project drill-java-exec: Compilation failure
> [ERROR] 
> /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
>  error: unreported exception E; must be caught or declared to be thrown
> [ERROR]   where E,T,V are type-variables:
> [ERROR] E extends Exception declared in method 
> accept(ExprVisitor,V)
> [ERROR] T extends Object declared in method 
> accept(ExprVisitor,V)
> [ERROR] V extends Object declared in method 
> accept(ExprVisitor,V)
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :drill-java-exec}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65

2019-04-02 Thread Charles Givre (JIRA)
Charles Givre created DRILL-7153:


 Summary: Drill Fails to Build using JDK 1.8.0_65
 Key: DRILL-7153
 URL: https://issues.apache.org/jira/browse/DRILL-7153
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.16.0


Drill fails to build when using Java 1.8.0_65.  Throws the following error:

[{{ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) 
on project drill-java-exec: Compilation failure
[ERROR] 
/Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
 error: unreported exception E; must be caught or declared to be thrown
[ERROR]   where E,T,V are type-variables:
[ERROR] E extends Exception declared in method 
accept(ExprVisitor,V)
[ERROR] T extends Object declared in method 
accept(ExprVisitor,V)
[ERROR] V extends Object declared in method 
accept(ExprVisitor,V)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :drill-java-exec}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved DRILL-7152.
---
Resolution: Fixed

Fixed in 54384a9. 

> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808298#comment-16808298
 ] 

ASF GitHub Bot commented on DRILL-7152:
---

amansinha100 commented on pull request #1730: DRILL-7152: During histogram 
creation handle the case when all values…
URL: https://github.com/apache/drill/pull/1730
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808294#comment-16808294
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

paul-rogers commented on pull request #1726: DRILL-7143: Support default value 
for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271557253
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/AbstractFixedWidthWriter.java
 ##
 @@ -93,17 +112,62 @@ protected final int prepareWrite(int writeIndex) {
 @Override
 protected final void fillEmpties(final int writeIndex) {
   final int width = width();
-  final int stride = ZERO_BUF.length / width;
+  final int stride = emptyValue.length / width;
   int dest = lastWriteIndex + 1;
   while (dest < writeIndex) {
 int length = writeIndex - dest;
 length = Math.min(length, stride);
-drillBuf.setBytes(dest * width, ZERO_BUF, 0, length * width);
+drillBuf.setBytes(dest * width, emptyValue, 0, length * width);
 dest += length;
   }
 }
   }
 
+  /**
+   * Base class for writers that use the Java int type as their native
+   * type. Handles common implicit conversions from other types to int.
+   */
+  public static abstract class BaseIntWriter extends BaseFixedWidthWriter {
+
+@Override
+public final void setLong(final long value) {
+  try {
+// Catches int overflow. Does not catch overflow for smaller types.
+setInt(Math.toIntExact(value));
+  } catch (final ArithmeticException e) {
+throw InvalidConversionError.writeError(schema(), value, e);
+  }
+}
+
+@Override
+public final void setDouble(final double value) {
 
 Review comment:
   Yes, just as setInt() covers TInyInt, SmallInt, Int, UInt1, and UInt2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808295#comment-16808295
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

paul-rogers commented on pull request #1726: DRILL-7143: Support default value 
for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271557119
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/VectorPrinter.java
 ##
 @@ -33,7 +32,10 @@
   public static void printOffsets(UInt4Vector vector, int start, int length) {
 header(vector, start, length);
 for (int i = start, j = 0; j < length; i++, j++) {
-  if (j > 0) {
+  if (j % 40 == 0) {
 
 Review comment:
   Before this change, I had a vector of 1000 items all on one line. After this 
change, the output is 40 elements per line. Note that this code is used only 
during debugging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808273#comment-16808273
 ] 

ASF GitHub Bot commented on DRILL-7150:
---

amansinha100 commented on pull request #1729: DRILL-7150: Fix timezone 
conversion for timestamp from maprdb after the transition from PDT to PST
URL: https://github.com/apache/drill/pull/1729#discussion_r271548124
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java
 ##
 @@ -93,7 +95,9 @@ public static CompareFunctionsProcessor 
processWithTimeZoneOffset(FunctionCall c
   protected boolean visitTimestampExpr(SchemaPath path, 
TimeStampExpression valueArg) {
 // converts timestamp value from local time zone to UTC since the 
record reader
 // reads the timestamp in local timezone if the 
readTimestampWithZoneOffset flag is enabled
-long timeStamp = valueArg.getTimeStamp() - 
DateUtility.TIMEZONE_OFFSET_MILLIS;
+long timeStamp = 
Instant.ofEpochMilli(valueArg.getTimeStamp()).atZone(ZoneId.of("UTC"))
 
 Review comment:
   This is a long chain of functions .. could you split this into couple of 
statements ? Helps both readability and debugging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. Create the table in MaprDB shell:
> {noformat}
> create /tmp/testtimestamp
> insert /tmp/testtimestamp --value 
> '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
> {noformat}
> 2. Create an external hive table:
> {code:sql}
> CREATE EXTERNAL TABLE default.timeTest
> (`_id` string,
> `str` string,
> `ts` timestamp)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
> TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
> {code}
> 3. Enable native reader and timezone conversion for MaprDB timestamp:
> {code:sql}
> alter session set 
> `store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
> alter session set 
> `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
> {code}
> 4. Run the query on the table from Drill using hive plugin:
> {code:java}
> 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
> +--+--+--+
> | _id  |   str|ts|
> +--+--+--+
> | eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
> | pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
> | pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
> | unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
> +--+--+--+
> 4 rows selected (0.343 seconds)
> {code}
> Please note that timestamps for {{eot}} and {{pst}} values are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808272#comment-16808272
 ] 

ASF GitHub Bot commented on DRILL-7150:
---

amansinha100 commented on pull request #1729: DRILL-7150: Fix timezone 
conversion for timestamp from maprdb after the transition from PDT to PST
URL: https://github.com/apache/drill/pull/1729#discussion_r271548171
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -357,7 +357,8 @@ protected void writeTimeStamp(MapOrListWriterImpl writer, 
String fieldName, Docu
* @param readerdocument reader
*/
   private void writeTimestampWithLocalZoneOffset(MapOrListWriterImpl writer, 
String fieldName, DocumentReader reader) {
-long timestamp = reader.getTimestampLong() + 
DateUtility.TIMEZONE_OFFSET_MILLIS;
+long timestamp = 
Instant.ofEpochMilli(reader.getTimestampLong()).atZone(ZoneId.systemDefault())
 
 Review comment:
   Same as above. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. Create the table in MaprDB shell:
> {noformat}
> create /tmp/testtimestamp
> insert /tmp/testtimestamp --value 
> '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
> {noformat}
> 2. Create an external hive table:
> {code:sql}
> CREATE EXTERNAL TABLE default.timeTest
> (`_id` string,
> `str` string,
> `ts` timestamp)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
> TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
> {code}
> 3. Enable native reader and timezone conversion for MaprDB timestamp:
> {code:sql}
> alter session set 
> `store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
> alter session set 
> `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
> {code}
> 4. Run the query on the table from Drill using hive plugin:
> {code:java}
> 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
> +--+--+--+
> | _id  |   str|ts|
> +--+--+--+
> | eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
> | pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
> | pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
> | unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
> +--+--+--+
> 4 rows selected (0.343 seconds)
> {code}
> Please note that timestamps for {{eot}} and {{pst}} values are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808207#comment-16808207
 ] 

ASF GitHub Bot commented on DRILL-7152:
---

gparai commented on issue #1730: DRILL-7152: During histogram creation handle 
the case when all values…
URL: https://github.com/apache/drill/pull/1730#issuecomment-479236474
 
 
   @amansinha100 please take a look at the Travis failure. Otherwise, changes 
LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7152:
--
Reviewer: Gautam Parai

> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808181#comment-16808181
 ] 

ASF GitHub Bot commented on DRILL-7152:
---

amansinha100 commented on issue #1730: DRILL-7152: During histogram creation 
handle the case when all values…
URL: https://github.com/apache/drill/pull/1730#issuecomment-479218730
 
 
   @gparai could you please review ?  Thanks. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808178#comment-16808178
 ] 

ASF GitHub Bot commented on DRILL-7152:
---

amansinha100 commented on pull request #1730: DRILL-7152: During histogram 
creation handle the case when all values…
URL: https://github.com/apache/drill/pull/1730
 
 
   … of a column are NULLs.
   
   Please see [DRILL-7152](https://issues.apache.org/jira/browse/DRILL-7152) 
for a description of the issue.  It was caused because all the column's values 
are NULLs and the t-digest code-gen functions tried to generate an output for 
an empty t-digest since it does not store any NULL values.  The fix is to check 
the t-digest size() first before trying to create the output. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808175#comment-16808175
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

dvjyothsna commented on pull request #1723: DRILL-7063: Seperate metadata cache 
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r271507937
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java
 ##
 @@ -161,7 +161,7 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ForemanSetupException {
*/
   private SqlNodeList getColumnList(final SqlRefreshMetadata 
sqlrefreshMetadata) {
 SqlNodeList columnList = sqlrefreshMetadata.getFieldList();
-if (columnList == null || !SqlNodeList.isEmptyList(columnList)) {
 
 Review comment:
   Removed the extra check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

2019-04-02 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-7136:


Assignee: Gautam Parai  (was: Pritesh Maker)

> Num_buckets for HashAgg in profile may be inaccurate
> 
>
> Key: DRILL-7136
> URL: https://issues.apache.org/jira/browse/DRILL-7136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Gautam Parai
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill
>
>
> I ran TPCH query 17 with sf 1000.  Here is the query:
> {noformat}
> select
>   sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem l,
>   part p
> where
>   p.p_partkey = l.l_partkey
>   and p.p_brand = 'Brand#13'
>   and p.p_container = 'JUMBO CAN'
>   and l.l_quantity < (
> select
>   0.2 * avg(l2.l_quantity)
> from
>   lineitem l2
> where
>   l2.l_partkey = p.p_partkey
>   );
> {noformat}
> One of the hash agg operators has resized 6 times.  It should have 4M 
> buckets.  But the profile shows it has 64K buckets.
> I have attached a sample profile.  In this profile, the hash agg operator is 
> (04-02).
> {noformat}
> Operator Metrics
> Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING
> RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB  
>   SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
> AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
> AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
> 04-00-02  65,536 748,746  6   364 1   
> 582 0   813 582,653 18  26,316,456  401 1,631,943 
>   25  26,176,350
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-7152:
-

 Summary: Histogram creation throws exception for all nulls column
 Key: DRILL-7152
 URL: https://issues.apache.org/jira/browse/DRILL-7152
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Aman Sinha
Assignee: Aman Sinha
 Fix For: 1.16.0


ANALYZE command fails when creating the histogram for a table with 1 column 
with all NULLs. 

Analyze table `table_stats/parquet_col_nulls` compute statistics;

{noformat}
Error: SYSTEM ERROR: NullPointerException
  (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
TDigest output

org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085

org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224

org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116

org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116

org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116

org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1669
org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
{noformat}

This table has 1 column with all NULL values:

{noformat}
apache drill (dfs.drilltestdir)> select * from `table_stats/parquet_col_nulls` 
limit 20;
+--+--+
| col1 | col2 |
+--+--+
| 0| null |
| 1| null |
| 2| null |
| 3| null |
| 4| null |
| 5| null |
| 6| null |
| 7| null |
| 8| null |
| 9| null |
| 10   | null |
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7045) UDF string_binary java.lang.IndexOutOfBoundsException:

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808134#comment-16808134
 ] 

ASF GitHub Bot commented on DRILL-7045:
---

sohami commented on issue #1671: DRILL-7045 UDF string_binary 
java.lang.IndexOutOfBoundsException
URL: https://github.com/apache/drill/pull/1671#issuecomment-479186883
 
 
   @jcmcote - I have address @KazydubB comment in this commit and rebased on 
latest apache. Can you please make the change or pull in this commit so that we 
can close this PR ? 
https://github.com/sohami/drill/commit/7aaef8691a4a594442464301035ea3aefd7497dd
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UDF string_binary java.lang.IndexOutOfBoundsException:
> --
>
> Key: DRILL-7045
> URL: https://issues.apache.org/jira/browse/DRILL-7045
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
> Fix For: 1.16.0
>
>
> Given a large field like
>  
> cat input.json
> { "col0": 
> "lajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasd

[jira] [Commented] (DRILL-540) Allow querying hive views in Drill

2019-04-02 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808132#comment-16808132
 ] 

Bridget Bevens commented on DRILL-540:
--

Hi [~IhorHuzenko] and [~vitalii],

Aside from removing the note stating hive views are not supported, do I need to 
add any other information to the docs, for example, shouold I aslo include the  
warning?

Warning:  Because views in Hive aren't present as physical files and access 
can't be granted using file system commands, then access to Hive views for 
Storage Based Authorization is based on the underlying tables used in view 
definition. For current example views were defined as selection over 
appropriate tables.

Thanks,
Bridget

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                     *SQL 
> Standart                    Storage Ba

[jira] [Updated] (DRILL-7146) Query failing with NPE when ZK queue is enabled

2019-04-02 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7146:
-
Labels: ready-to-commit  (was: )

> Query failing with NPE when ZK queue is enabled
> ---
>
> Key: DRILL-7146
> URL: https://issues.apache.org/jira/browse/DRILL-7146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Sorabh Hamirwasia
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
>  
> {code:java}
> >> Query: alter system reset all;
>  SYSTEM ERROR: NullPointerException
> Please, refer to logs for more information.
> [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010]
>  java.sql.SQLException: SYSTEM ERROR: NullPointerException
> Please, refer to logs for more information.
> [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010]
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:535)
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:607)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>  at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1107)
>  at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1118)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>  at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>  at org.apache.drill.test.framework.Utils.execSQL(Utils.java:917)
>  at org.apache.drill.test.framework.TestDriver.setup(TestDriver.java:632)
>  at org.apache.drill.test.framework.TestDriver.runTests(TestDriver.java:152)
>  at org.apache.drill.test.framework.TestDriver.main(TestDriver.java:94)
>  Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: NullPointerException
> Please, refer to logs for more information.
> [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010]
>  at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>  at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>  at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.ja

[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808053#comment-16808053
 ] 

ASF GitHub Bot commented on DRILL-7048:
---

kkhatua commented on issue #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#issuecomment-479152740
 
 
   @vvysotskyi , @ihuzenko 
   I've done the changes and verified the tests. If everything is fine, I'll 
rebase on the latest master (there are small conflicts due to new commits on 
master introducing additional system options)
   
   I've also included a trim for the values, so an input of `100 `  will be 
treated as valid.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808051#comment-16808051
 ] 

ASF GitHub Bot commented on DRILL-7048:
---

kkhatua commented on issue #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#issuecomment-479152740
 
 
   @vvysotskyi , @ihuzenko 
   I've done the changes and verified the tests. If everything is fine, I'll 
rebase on the latest master (there are small conflicts due to new commits on 
master introducing additional system options)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808047#comment-16808047
 ] 

ASF GitHub Bot commented on DRILL-7048:
---

kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#discussion_r271452501
 
 

 ##
 File path: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java
 ##
 @@ -462,4 +618,25 @@ public void 
testParamSettingWhenUnsupportedTypeSaysUnsupported() throws SQLExcep
 }
   }
 
+
+  // Sets the SystemMaxRows option
+  private void setSystemMaxRows(int sysValueToSet) throws SQLException {
 
 Review comment:
   As per our chat, I've introduced `@Before` and `@After` methods for 
synchronizing the `system` level modifications to the options.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement JDBC Statement.setMaxRows() with System Option
> 
>
> Key: DRILL-7048
> URL: https://issues.apache.org/jira/browse/DRILL-7048
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC, Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> With DRILL-6960, the webUI will get an auto-limit on the number of results 
> fetched.
> Since more of the plumbing is already there, it makes sense to provide the 
> same for the JDBC client.
> In addition, it would be nice if the Server can have a pre-defined value as 
> well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a 
> max limit on the resultset size as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-7060) Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in JsonReader

2019-04-02 Thread Abhishek Girish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish closed DRILL-7060.
--

> Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in 
> JsonReader
> -
>
> Key: DRILL-7060
> URL: https://issues.apache.org/jira/browse/DRILL-7060
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Some JSON files may have strings with backslashes - which are read as escape 
> characters. By default only standard escape characters are allowed. So 
> querying such files would fail. For example see:
> Data
> {code}
> {"file":"C:\Sfiles\escape.json"}
> {code}
> Error
> {code}
> (com.fasterxml.jackson.core.JsonParseException) Unrecognized character escape 
> 'S' (code 83)
>  at [Source: (org.apache.drill.exec.store.dfs.DrillFSDataInputStream); line: 
> 1, column: 178]
> com.fasterxml.jackson.core.JsonParser._constructError():1804
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError():663
> 
> com.fasterxml.jackson.core.base.ParserMinimalBase._handleUnrecognizedCharacterEscape():640
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeEscaped():3243
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipString():2537
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken():683
> org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():342
> org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():298
> org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():246
> org.apache.drill.exec.vector.complex.fn.JsonReader.write():205
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next():216
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext():223
> ...
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7060) Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in JsonReader

2019-04-02 Thread Abhishek Girish (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808009#comment-16808009
 ] 

Abhishek Girish commented on DRILL-7060:


[~kkhatua], I don't think any additional documentation is necessary as such. I 
think the option description is clear. When someone really needs this, they'll 
be able to find it. It's not required in most scenarios.

> Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in 
> JsonReader
> -
>
> Key: DRILL-7060
> URL: https://issues.apache.org/jira/browse/DRILL-7060
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Some JSON files may have strings with backslashes - which are read as escape 
> characters. By default only standard escape characters are allowed. So 
> querying such files would fail. For example see:
> Data
> {code}
> {"file":"C:\Sfiles\escape.json"}
> {code}
> Error
> {code}
> (com.fasterxml.jackson.core.JsonParseException) Unrecognized character escape 
> 'S' (code 83)
>  at [Source: (org.apache.drill.exec.store.dfs.DrillFSDataInputStream); line: 
> 1, column: 178]
> com.fasterxml.jackson.core.JsonParser._constructError():1804
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError():663
> 
> com.fasterxml.jackson.core.base.ParserMinimalBase._handleUnrecognizedCharacterEscape():640
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeEscaped():3243
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipString():2537
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken():683
> org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():342
> org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():298
> org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():246
> org.apache.drill.exec.vector.complex.fn.JsonReader.write():205
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next():216
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext():223
> ...
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7146) Query failing with NPE when ZK queue is enabled

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807996#comment-16807996
 ] 

ASF GitHub Bot commented on DRILL-7146:
---

HanumathRao commented on pull request #1725: DRILL-7146: Query failing with NPE 
when ZK queue is enabled.
URL: https://github.com/apache/drill/pull/1725#discussion_r271437936
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/planner/rm/TestMemoryCalculator.java
 ##
 @@ -59,6 +59,7 @@
 
   private static final long DEFAULT_SLICE_TARGET = 10L;
   private static final long DEFAULT_BATCH_SIZE = 16*1024*1024;
+  private static final String ENABLE_QUEUE = 
"drill.exec.queue.embedded.enable";
 
 Review comment:
   I have updated the test case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query failing with NPE when ZK queue is enabled
> ---
>
> Key: DRILL-7146
> URL: https://issues.apache.org/jira/browse/DRILL-7146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Sorabh Hamirwasia
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
>
>  
> {code:java}
> >> Query: alter system reset all;
>  SYSTEM ERROR: NullPointerException
> Please, refer to logs for more information.
> [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010]
>  java.sql.SQLException: SYSTEM ERROR: NullPointerException
> Please, refer to logs for more information.
> [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010]
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:535)
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:607)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>  at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1107)
>  at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1118)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>  at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>  at org.apache.drill.test.framework.Utils.execSQL(Utils.java:917)
>  at org.apache.drill.test.framework.TestDriver.setup(TestDriver.java:632)
>  at org.apache.drill.test.framework.TestDriver.runTests(TestDriver.java:152)
>  at org.apache.drill.test.framework.TestDriver.main(TestDriver.java:94)
>  Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: NullPointerException
> Please, refer to logs for more information.
> [Error Id: ec4b9c66-9f5c-4736-acf3-605f84ea0226 on drill80:31010]
>  at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>  at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.

[jira] [Commented] (DRILL-6558) Drill query fails when file name contains semicolon

2019-04-02 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807973#comment-16807973
 ] 

Vitalii Diravka commented on DRILL-6558:


The original issue is present for Drill with Hadoop 3.2 libs:
{code:java}
Apache Drill 1.16.0-SNAPSHOT
"What ever the mind of man can conceive and believe, Drill can query."
apache drill> select * from dfs.`/tmp/af:3`;
Error: VALIDATION ERROR: java.net.URISyntaxException: Relative path in absolute 
URI: af:3


[Error Id: a6687d43-24f4-460b-8a39-ea05c1fb2f3f on vitalii-UX331UN:31010] 
(state=,code=0)
{code}
But I didn't reproduce the second case:
{code:java}
apache drill> select * from dfs.`/tmp/af:3`;
Error: VALIDATION ERROR: java.net.URISyntaxException: Relative path in absolute 
URI: af:3


[Error Id: a6687d43-24f4-460b-8a39-ea05c1fb2f3f on vitalii-UX331UN:31010] 
(state=,code=0)
apache drill> use dfs.tmp;
+--+-+
| ok | summary |
+--+-+
| true | Default schema changed to [dfs.tmp] |
+--+-+
1 row selected (0.24 seconds)
apache drill (dfs.tmp)> select * from sys.version;
+-+--+-++++
| version | commit_id | commit_message | commit_time | build_email | build_time 
|
+-+--+-++++
| 1.16.0-SNAPSHOT | a070d0b592b3f77411864c04d9c4025e0d1cf888 | Fix test 
failures. Update HBase version | 02.04.2019 @ 13:26:18 EEST | 
vita...@apache.org | 02.04.2019 @ 20:37:18 EEST |
+-+--+-++++
1 row selected (1.295 seconds)
{code}

> Drill query fails when file name contains semicolon
> ---
>
> Key: DRILL-6558
> URL: https://issues.apache.org/jira/browse/DRILL-6558
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Vysotskyi
>Priority: Major
>
> Queries on the tables which contain semicolon in the name:
> {code:sql}
> select * from dfs.`/tmp/q:0`
> {code}
> fails with error:
> {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> java.net.URISyntaxException: Relative path in absolute URI: q:0
> SQL Query null
> [Error Id: 34fafee1-8fbe-4fe0-9fcb-ddcc926bb192 on user515050-pc:31010]
> (java.lang.IllegalArgumentException) java.net.URISyntaxException: Relative 
> path in absolute URI: q:0
>  org.apache.hadoop.fs.Path.initialize():205
>  org.apache.hadoop.fs.Path.():171
>  org.apache.hadoop.fs.Path.():93
>  org.apache.hadoop.fs.Globber.glob():253
>  org.apache.hadoop.fs.FileSystem.globStatus():1655
>  org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():547
>  org.apache.drill.exec.store.dfs.FileSelection.create():274
>  
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():607
>  
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():408
>  org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry():96
>  org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get():90
>  
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable():561
>  
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.getTable():132
>  org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():82
>  org.apache.calcite.jdbc.CalciteSchema.getTable():257
>  org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom():1022
>  org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry():979
>  org.apache.calcite.prepare.CalciteCatalogReader.getTable():123
>  
> org.apache.drill.exec.planner.sql.SqlConverter$DrillCalciteCatalogReader.getTable():650
>  
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():260
>  org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3219
>  org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
>  org.apache.calcite.sql.validate.AbstractNamespace.validate():84
>  org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
>  org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
>  org.apache.calcite.sql.SqlSelect.validate():226
>  
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903
>  org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613
>  org.apache.drill.exec.planner.sql.SqlConverter.validate():190
>  
> org.apache.drill.exec.plann

[jira] [Updated] (DRILL-6097) Create an interface for the QueryContext

2019-04-02 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6097:

Environment: (was: Currently the QueryContext does not implement an 
interface and the concrete class is passed around everywhere. Additionally 
Mockito is used in tests to mock it. Ideally we would make the QueryContext 
implement an interface and create a mock implementation of it that is used in 
the tests, just like what we did for the FragmentContext.)

> Create an interface for the QueryContext
> 
>
> Key: DRILL-6097
> URL: https://issues.apache.org/jira/browse/DRILL-6097
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Currently the QueryContext does not implement an interface and the concrete 
> class is passed around everywhere. Additionally Mockito is used in tests to 
> mock it. Ideally we would make the QueryContext implement an interface and 
> create a mock implementation of it that is used in the tests, just like what 
> we did for the FragmentContext.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6097) Create an interface for the QueryContext

2019-04-02 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6097:

Description: Currently the QueryContext does not implement an interface and 
the concrete class is passed around everywhere. Additionally Mockito is used in 
tests to mock it. Ideally we would make the QueryContext implement an interface 
and create a mock implementation of it that is used in the tests, just like 
what we did for the FragmentContext.

> Create an interface for the QueryContext
> 
>
> Key: DRILL-6097
> URL: https://issues.apache.org/jira/browse/DRILL-6097
> Project: Apache Drill
>  Issue Type: Improvement
> Environment: Currently the QueryContext does not implement an 
> interface and the concrete class is passed around everywhere. Additionally 
> Mockito is used in tests to mock it. Ideally we would make the QueryContext 
> implement an interface and create a mock implementation of it that is used in 
> the tests, just like what we did for the FragmentContext.
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Currently the QueryContext does not implement an interface and the concrete 
> class is passed around everywhere. Additionally Mockito is used in tests to 
> mock it. Ideally we would make the QueryContext implement an interface and 
> create a mock implementation of it that is used in the tests, just like what 
> we did for the FragmentContext.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6377) typeof() does not return DECIMAL scale, precision

2019-04-02 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-6377.
-
   Resolution: Fixed
Fix Version/s: 1.16.0

> typeof() does not return DECIMAL scale, precision
> -
>
> Key: DRILL-6377
> URL: https://issues.apache.org/jira/browse/DRILL-6377
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.16.0
>
>
> The {{typeof()}} function returns the type of a column:
> {noformat}
> SELECT typeof(CAST(a AS DOUBLE)) FROM (VALUES (1)) AS T(a);
> +-+
> | EXPR$0  |
> +-+
> | FLOAT8  |
> +-+
> {noformat}
> In Drill, the {{DECIMAL}} type is parameterized with scale and precision. 
> However, {{typeof()}} does not return this information:
> {noformat}
> ALTER SESSION SET `planner.enable_decimal_data_type` = true;
> SELECT typeof(CAST(a AS DECIMAL)) FROM (VALUES (1)) AS T(a);
> +--+
> |  EXPR$0  |
> +--+
> | DECIMAL38SPARSE  |
> +--+
> SELECT typeof(CAST(a AS DECIMAL(6, 3))) FROM (VALUES (1)) AS T(a);
> +---+
> |  EXPR$0   |
> +---+
> | DECIMAL9  |
> +---+
> {noformat}
> Expected something of the form {{DECIMAL(, )}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6377) typeof() does not return DECIMAL scale, precision

2019-04-02 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807929#comment-16807929
 ] 

Arina Ielchiieva commented on DRILL-6377:
-

For decimals SqlTypeOf function covers precision and scale:
{noformat}
apache drill> select sqltypeof(cast(10.56 as decimal(5,2))) from (values(1));
+---+
|EXPR$0 |
+---+
| DECIMAL(5, 2) |
{noformat}

For interval looks like issue was fixed:
{noformat}
apache drill> select sqltypeof(INTERVAL '1' YEAR) from (values(1));
++
| EXPR$0 |
++
| INTERVAL YEAR TO MONTH |
++
1 row selected (0.109 seconds)
apache drill> select typeof(INTERVAL '1' YEAR) from (values(1));
+--+
|EXPR$0|
+--+
| INTERVALYEAR |
{noformat}


> typeof() does not return DECIMAL scale, precision
> -
>
> Key: DRILL-6377
> URL: https://issues.apache.org/jira/browse/DRILL-6377
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The {{typeof()}} function returns the type of a column:
> {noformat}
> SELECT typeof(CAST(a AS DOUBLE)) FROM (VALUES (1)) AS T(a);
> +-+
> | EXPR$0  |
> +-+
> | FLOAT8  |
> +-+
> {noformat}
> In Drill, the {{DECIMAL}} type is parameterized with scale and precision. 
> However, {{typeof()}} does not return this information:
> {noformat}
> ALTER SESSION SET `planner.enable_decimal_data_type` = true;
> SELECT typeof(CAST(a AS DECIMAL)) FROM (VALUES (1)) AS T(a);
> +--+
> |  EXPR$0  |
> +--+
> | DECIMAL38SPARSE  |
> +--+
> SELECT typeof(CAST(a AS DECIMAL(6, 3))) FROM (VALUES (1)) AS T(a);
> +---+
> |  EXPR$0   |
> +---+
> | DECIMAL9  |
> +---+
> {noformat}
> Expected something of the form {{DECIMAL(, )}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4211) Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned.

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi reassigned DRILL-4211:
--

Assignee: Volodymyr Vysotskyi  (was: Timothy Farkas)

> Column aliases not pushed down to JDBC stores in some cases when Drill 
> expects aliased columns to be returned.
> --
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - JDBC
>Affects Versions: 1.3.0, 1.11.0
> Environment: Postgres db storage
>Reporter: Robert Hamilton-Smith
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: newbie
> Fix For: 1.16.0
>
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food&Dining|
> |id1|restaurants|food&Dining|
> |id2|Coffee Shops|food&Dining|
> |id2|Coffee Shops|food&Dining|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) 

[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807920#comment-16807920
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

ihuzenko commented on issue #1706: DRILL-7115: Improve Hive schema show tables 
performance
URL: https://github.com/apache/drill/pull/1706#issuecomment-479094218
 
 
   @vdiravka , I've addressed comments. 
   
   I totally agree with you that refactoring is better to put into separate 
commits and I'll use this approach in future. 
   
   For show tables authorization improvement was created 
[DRILL-7151](https://issues.apache.org/jira/browse/DRILL-7151) ticket. 
   
   For caches was changed type of ```tableNamesCache``` to  
```LoadingCache> ```, previously only 
names were cached here, also all work with Guava caches was unified under 
```HiveMetadataCache``` facade. 
   
   For Drill Hive SASL (Kerberos) connection I didn't introduce changes, 
related code from ```DrillHiveMetaStoreClientFactory``` was previously in 
```DrillHiveMetaStoreClient```.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7150:
---
Reviewer: Aman Sinha

> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. Create the table in MaprDB shell:
> {noformat}
> create /tmp/testtimestamp
> insert /tmp/testtimestamp --value 
> '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
> {noformat}
> 2. Create an external hive table:
> {code:sql}
> CREATE EXTERNAL TABLE default.timeTest
> (`_id` string,
> `str` string,
> `ts` timestamp)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
> TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
> {code}
> 3. Enable native reader and timezone conversion for MaprDB timestamp:
> {code:sql}
> alter session set 
> `store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
> alter session set 
> `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
> {code}
> 4. Run the query on the table from Drill using hive plugin:
> {code:java}
> 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
> +--+--+--+
> | _id  |   str|ts|
> +--+--+--+
> | eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
> | pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
> | pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
> | unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
> +--+--+--+
> 4 rows selected (0.343 seconds)
> {code}
> Please note that timestamps for {{eot}} and {{pst}} values are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807910#comment-16807910
 ] 

ASF GitHub Bot commented on DRILL-7150:
---

vvysotskyi commented on issue #1729: DRILL-7150: Fix timezone conversion for 
timestamp from maprdb after the transition from PDT to PST
URL: https://github.com/apache/drill/pull/1729#issuecomment-479083498
 
 
   @amansinha100, since you have reviewed the original PR, could you please 
review this one?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. Create the table in MaprDB shell:
> {noformat}
> create /tmp/testtimestamp
> insert /tmp/testtimestamp --value 
> '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
> {noformat}
> 2. Create an external hive table:
> {code:sql}
> CREATE EXTERNAL TABLE default.timeTest
> (`_id` string,
> `str` string,
> `ts` timestamp)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
> TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
> {code}
> 3. Enable native reader and timezone conversion for MaprDB timestamp:
> {code:sql}
> alter session set 
> `store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
> alter session set 
> `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
> {code}
> 4. Run the query on the table from Drill using hive plugin:
> {code:java}
> 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
> +--+--+--+
> | _id  |   str|ts|
> +--+--+--+
> | eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
> | pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
> | pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
> | unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
> +--+--+--+
> 4 rows selected (0.343 seconds)
> {code}
> Please note that timestamps for {{eot}} and {{pst}} values are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807909#comment-16807909
 ] 

ASF GitHub Bot commented on DRILL-7150:
---

vvysotskyi commented on pull request #1729: DRILL-7150: Fix timezone conversion 
for timestamp from maprdb after the transition from PDT to PST
URL: https://github.com/apache/drill/pull/1729
 
 
   Used JDK classes to convert timestamp from one timezone to another one 
instead of adding milliseconds which corresponds to the offset.
   
   For problem description please see 
[DRILL-7151](https://issues.apache.org/jira/browse/DRILL-7151).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. Create the table in MaprDB shell:
> {noformat}
> create /tmp/testtimestamp
> insert /tmp/testtimestamp --value 
> '{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
> insert /tmp/testtimestamp --value 
> '{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
> {noformat}
> 2. Create an external hive table:
> {code:sql}
> CREATE EXTERNAL TABLE default.timeTest
> (`_id` string,
> `str` string,
> `ts` timestamp)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
> TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
> {code}
> 3. Enable native reader and timezone conversion for MaprDB timestamp:
> {code:sql}
> alter session set 
> `store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
> alter session set 
> `store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
> {code}
> 4. Run the query on the table from Drill using hive plugin:
> {code:java}
> 0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
> +--+--+--+
> | _id  |   str|ts|
> +--+--+--+
> | eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
> | pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
> | pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
> | unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
> +--+--+--+
> 4 rows selected (0.343 seconds)
> {code}
> Please note that timestamps for {{eot}} and {{pst}} values are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7147) Source order of "drill-env.sh" and "distrib-env.sh" should be swapped

2019-04-02 Thread Abhishek Girish (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807905#comment-16807905
 ] 

Abhishek Girish commented on DRILL-7147:


[~Paul.Rogers], you are right. With the "simple" way, there is no issue with 
{{drill-env.sh}} & &{{distrib-env.sh}}. Like you said, setting the simple way 
in {{drill-env.sh}} could cause issues if corresponding ENV variables are set. 
And I think that's something we could document instead of finding a fix. 

> Source order of "drill-env.sh" and "distrib-env.sh" should be swapped
> -
>
> Key: DRILL-7147
> URL: https://issues.apache.org/jira/browse/DRILL-7147
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.15.0
>Reporter: Hao Zhu
>Assignee: Abhishek Girish
>Priority: Minor
> Fix For: 1.16.0
>
>
> In bin/drill-config.sh, the description of the source order is:
> {code:java}
> # Variables may be set in one of four places:
> #
> #   Environment (per run)
> #   drill-env.sh (per site)
> #   distrib-env.sh (per distribution)
> #   drill-config.sh (this file, Drill defaults)
> #
> # Properties "inherit" from items lower on the list, and may be "overridden" 
> by items
> # higher on the list. In the environment, just set the variable:
> {code}
> However actually bin/drill-config.sh sources drill-env.sh firstly, and then 
> distrib-env.sh.
> {code:java}
> drillEnv="$DRILL_CONF_DIR/drill-env.sh"
> if [ -r "$drillEnv" ]; then
>   . "$drillEnv"
> fi
> ...
> distribEnv="$DRILL_CONF_DIR/distrib-env.sh"
> if [ -r "$distribEnv" ]; then
>   . "$distribEnv"
> else
>   distribEnv="$DRILL_HOME/conf/distrib-env.sh"
>   if [ -r "$distribEnv" ]; then
> . "$distribEnv"
>   fi
> fi
> {code}
> We need to swap the source order of drill-env.sh and distrib-env.sh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.2 libraries

2019-04-02 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6540:
---
Description: 
Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, 
hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
hadoop-yarn-client).
A year ago the [Hadoop 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] 
was released and recently it was updated to [Hadoop 
3.2.0|https://hadoop.apache.org/docs/r3.2.0/].

To use Drill under Hadoop3.0 distribution we need this upgrade. Also the newer 
version includes new features, which can be useful for Drill.
 This upgrade is also needed to leverage the newest version of Zookeeper 
libraries and Hive 3.1 version.

  was:
Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, 
hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
hadoop-yarn-client).
 Half of year ago the [Hadoop 
3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and recently 
it was an update - [Hadoop 3.2.0|https://hadoop.apache.org/docs/r3.2.0/].

To use Drill under Hadoop3.0 distribution we need this upgrade. Also the newer 
version includes new features, which can be useful for Drill.
 This upgrade is also needed to leverage the newest version of Zookeeper 
libraries and Hive 3.1 version.


> Upgrade to HADOOP-3.2 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
> A year ago the [Hadoop 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] 
> was released and recently it was updated to [Hadoop 
> 3.2.0|https://hadoop.apache.org/docs/r3.2.0/].
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
>  This upgrade is also needed to leverage the newest version of Zookeeper 
> libraries and Hive 3.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.2 libraries

2019-04-02 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6540:
---
Summary: Upgrade to HADOOP-3.2 libraries   (was: Upgrade to HADOOP-3.1 
libraries )

> Upgrade to HADOOP-3.2 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses 2.7.4 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
>  Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.2.0|https://hadoop.apache.org/docs/r3.2.0/].
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
>  This upgrade is also needed to leverage the newest version of Zookeeper 
> libraries and Hive 3.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7151) Show only accessible tables when Hive authorization enabled

2019-04-02 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7151:
---

 Summary: Show only accessible tables when Hive authorization 
enabled
 Key: DRILL-7151
 URL: https://issues.apache.org/jira/browse/DRILL-7151
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Igor Guzenko
Assignee: Igor Guzenko


The SHOW TABLES for Hive worked inconsistently for very long time.

Before changes introduced by DRILL-7115 only accessible tables were shown only 
when Hive Storage Based Authorization is enabled, but for SQL Standard Based 
Authorization all tables were shown to user ([related 
discussion|https://github.com/apache/drill/pull/461#discussion_r58753354]). 

In scope of DRILL-7115 the only accessible restriction for Storage Based 
Authorization was weakened in order to improve query performance.

There is still need to improve security of Hive show tables query and at the 
same time do not violate performance requirements. 

For SQL Standard Based Authorization this can be done by asking 
```HiveAuthorizationHelper.authorizerV2``` for table's 'SELECT' permission.

For Storage Based Authorization performance acceptable approach is not known 
for now, one of ideas is try using appropriate Hive storage based authorizer 
class for the purpose. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7150:
---
Description: 
Steps to reproduce:
0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
1. Create the table in MaprDB shell:
{noformat}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value 
'{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
{noformat}
2. Create an external hive table:
{code:sql}
CREATE EXTERNAL TABLE default.timeTest
(`_id` string,
`str` string,
`ts` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
{code}
3. Enable native reader and timezone conversion for MaprDB timestamp:
{code:sql}
alter session set 
`store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
alter session set 
`store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
{code}
4. Run the query on the table from Drill using hive plugin:
{code:java}
0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
+--+--+--+
| _id  |   str|ts|
+--+--+--+
| eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
| pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
| pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
| unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
+--+--+--+
4 rows selected (0.343 seconds)
{code}
Please note that timestamps for {{eot}} and {{pst}} values are incorrect.

  was:
Steps to reproduce:
0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
1. Create the table in MaprDB shell:
{noformat}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value 
'{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
{noformat}
2. Create an external hive table:
{code:sql}
CREATE EXTERNAL TABLE default.timeTest
(`_id` string,
`str` string,
`ts` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
{code}
3. Enable native reader and timezone conversion for MaprDB timestamp:
{code:sql}
alter session set 
`store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
alter session set 
`store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
{code}
4. Run the query on the table from Drill using hive plugin:
{code:java}
0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
+--+--+--+
| _id  |   str|ts|
+--+--+--+
| eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
| pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
| pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
| unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
+--+--+--+
4 rows selected (0.343 seconds)
{code}
Please note that timestamps for {{eot}} and {{pst}} values are wrong.


> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. 

[jira] [Updated] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7150:
---
Description: 
Steps to reproduce:
0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
1. Create the table in MaprDB shell:
{noformat}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value 
'{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
{noformat}
2. Create an external hive table:
{code:sql}
CREATE EXTERNAL TABLE default.timeTest
(`_id` string,
`str` string,
`ts` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
{code}
3. Enable native reader and timezone conversion for MaprDB timestamp:
{code:sql}
alter session set 
`store.hive.maprdb_json.optimize_scan_with_native_reader`=true;
alter session set 
`store.hive.maprdb_json.read_timestamp_with_timezone_offset`=true;
{code}
4. Run the query on the table from Drill using hive plugin:
{code:java}
0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
+--+--+--+
| _id  |   str|ts|
+--+--+--+
| eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
| pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
| pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
| unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
+--+--+--+
4 rows selected (0.343 seconds)
{code}
Please note that timestamps for {{eot}} and {{pst}} values are wrong.

  was:
Steps to reproduce:
0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
1. Create the table in MaprDB shell:
{noformat}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value 
'{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
{noformat}
2. Create a hive table:
{code:sql}
CREATE EXTERNAL TABLE default.timeTest
(`_id` string,
`str` string,
`ts` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
{code}
3. Enable native reader and timezone conversion for maprdb timestamp:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true;
alter session store.hive.maprdb_json.read_timestamp_with_timezone_offset=true;
{code}
4. Run the query on the table from Drill using hive plugin:
{code}
0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
+--+--+--+
| _id  |   str|ts|
+--+--+--+
| eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
| pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
| pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
| unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
+--+--+--+
4 rows selected (0.343 seconds)
{code}

Plese note that the results for {{eot}} and {{pst}} values are wrong.



> Fix timezone conversion for timestamp from maprdb after the transition from 
> PDT to PST
> --
>
> Key: DRILL-7150
> URL: https://issues.apache.org/jira/browse/DRILL-7150
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MapRDB
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
> 1. Create the table in MaprDB 

[jira] [Created] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-7150:
--

 Summary: Fix timezone conversion for timestamp from maprdb after 
the transition from PDT to PST
 Key: DRILL-7150
 URL: https://issues.apache.org/jira/browse/DRILL-7150
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - MapRDB
Affects Versions: 1.16.0
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
 Fix For: 1.16.0


Steps to reproduce:
0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
1. Create the table in MaprDB shell:
{noformat}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value 
'{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
{noformat}
2. Create a hive table:
{code:sql}
CREATE EXTERNAL TABLE default.timeTest
(`_id` string,
`str` string,
`ts` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
{code}
3. Enable native reader and timezone conversion for maprdb timestamp:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true;
alter session store.hive.maprdb_json.read_timestamp_with_timezone_offset=true;
{code}
4. Run the query on the table from Drill using hive plugin:
{code}
0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
+--+--+--+
| _id  |   str|ts|
+--+--+--+
| eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
| pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
| pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
| unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
+--+--+--+
4 rows selected (0.343 seconds)
{code}

Plese note that the results for {{eot}} and {{pst}} values are wrong.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807815#comment-16807815
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271339849
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ##
 @@ -920,46 +920,11 @@ public void dropTable(String table) {
 }
 
 @Override
-public List> getTableNamesAndTypes(boolean 
bulkLoad, int bulkSize) {
-  final List> tableNamesAndTypes = 
Lists.newArrayList();
-
-  // Look for raw tables first
-  if (!tables.isEmpty()) {
-for (Map.Entry tableEntry : 
tables.entrySet()) {
-  tableNamesAndTypes
-  .add(Pair.of(tableEntry.getKey().sig.name, 
tableEntry.getValue().getJdbcTableType()));
-}
-  }
-  // Then look for files that start with this name and end in .drill.
-  List files = Collections.emptyList();
-  try {
-files = DotDrillUtil.getDotDrills(getFS(), new 
Path(config.getLocation()), DotDrillType.VIEW);
-  } catch (AccessControlException e) {
-if (!schemaConfig.getIgnoreAuthErrors()) {
-  logger.debug(e.getMessage());
-  throw UserException.permissionError(e)
-  .message("Not authorized to list or query tables in schema 
[%s]", getFullSchemaName())
-  .build(logger);
-}
-  } catch (IOException e) {
-logger.warn("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  } catch (UnsupportedOperationException e) {
-// the file system (e.g. the classpath filesystem) may not support 
listing
-// of files. But see getViews(), it ignores the exception and continues
-logger.debug("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  }
-
-  try {
-for (DotDrillFile f : files) {
-  if (f.getType() == DotDrillType.VIEW) {
-tableNamesAndTypes.add(Pair.of(f.getBaseName(), TableType.VIEW));
-  }
-}
-  } catch (UnsupportedOperationException e) {
-logger.debug("The filesystem for this workspace does not support this 
operation.", e);
 
 Review comment:
   This deleted code mostly duplicated body of existing ```getViews()``` 
method. This logging statement also present in the method. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7149) Kerberos Code Missing from Drill on YARN

2019-04-02 Thread Charles Givre (JIRA)
Charles Givre created DRILL-7149:


 Summary: Kerberos Code Missing from Drill on YARN
 Key: DRILL-7149
 URL: https://issues.apache.org/jira/browse/DRILL-7149
 Project: Apache Drill
  Issue Type: Bug
  Components: Security
Affects Versions: 1.14.0
Reporter: Charles Givre


My company is trying to deploy Drill using the Drill on Yarn (DoY) and we have 
run into the issue that DoY does not seem to support passing Kerberos 
credentials in order to interact with HDFS. 

Upon checking the source code available in GIT 
(https://github.com/apache/drill/blob/1.14.0/drill-yarn/src/main/java/org/apache/drill/yarn/core/)
 and referring to Apache YARN documentation 
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html)
 , we saw no section for passing the security credentials needed by the 
application to interact with any Hadoop cluster services and applications. 

This we feel needs to be added to the source code so that delegation tokens can 
be passed inside the container for the process to be able access Drill archive 
on HDFS and start. It probably should be added to the ContainerLaunchContext 
within the ApplicationSubmissionContext for DoY as suggested under Apache 
documentation.
 
We tried the same DoY utility on a non-kerberised cluster and the process 
started well. Although we ran into a different issue there of hosts getting 
blacklisted
We tested with the Single Principal per cluster option.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807808#comment-16807808
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271335079
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.store.hive.ColumnListsCache;
+import org.apache.drill.exec.store.hive.HiveReadEntry;
+import org.apache.drill.exec.store.hive.HiveTableWithColumnCache;
+import org.apache.drill.exec.store.hive.HiveTableWrapper;
+import org.apache.drill.exec.store.hive.HiveUtilities;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownTableException;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableEntryCacheLoader extends CacheLoader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableEntryCacheLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public HiveReadEntry load(TableName key) throws Exception {
+Table table;
+List partitions;
+synchronized (client) {
+  table = getTable(key);
+  partitions = getPartitions(key);
+}
+HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, 
new ColumnListsCache(table));
+List partitionWrappers = 
partitions.isEmpty()
+? null
 
 Review comment:
   I've considered possibility to use empty lists and can conclude that doing 
this will break backward compatibility because ```HiveReadEntry``` is part of 
JSON serializable  ```HiveScan``` operator, and deserializing empty lists for 
older drillbit which expect null list may break null dependent checks. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807779#comment-16807779
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-479008647
 
 
   @amansinha100, could you please take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807778#comment-16807778
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-479008544
 
 
   Diagrams of the classes introduced in this PR: 
https://docs.google.com/presentation/d/1XG_xgR4okzXaJ3Z7HFHfzCwlM5VkNfre8GFEAd2Zo8k/edit?usp=sharing
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807710#comment-16807710
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728
 
 
   In the scope of this PR introduced caching of table metadata (schema and 
statistics) at the query level.
   Introduced `MetadataProviderManager` which holds both `SchemaProvider` and 
`DrillStatsTable` and `TableMetadataProvider` if it was already created.
   `MetadataProviderManager` instance will be cached and used for every 
`DrillTable` which corresponds to the same table.
   Such an approach was used to preserve lazy initialization of group scan and 
`TableMetadataProvider` instances, so once the first instance of 
`TableMetadataProvider` is created, it will be stored in the 
`MetadataProviderManager` and its metadata will be reused for all further 
`TableMetadataProvider` instances.
   
   Another part of this PR is connected with the adoption of statistics to use 
Drill Metastore API. Enhanced logic to distinguish exact and estimated 
metadata, and used `TableMetadata` for obtaining statistics.
   
   Will create and attach a class diagram later.
   
   Also, tests should be run for this PR, so for now, I'll leave it in draft 
state.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807700#comment-16807700
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271269107
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/VectorPrinter.java
 ##
 @@ -33,7 +32,10 @@
   public static void printOffsets(UInt4Vector vector, int start, int length) {
 header(vector, start, length);
 for (int i = start, j = 0; j < length; i++, j++) {
-  if (j > 0) {
+  if (j % 40 == 0) {
 
 Review comment:
   How this will look like after the change?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807703#comment-16807703
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271270612
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/OffsetVectorWriterImpl.java
 ##
 @@ -302,4 +312,9 @@ public void dump(HierarchicalFormatter format) {
   .attribute("nextOffset", nextOffset)
   .endObject();
   }
+
+  @Override
+  public void setDefaultValue(Object value) {
+throw new UnsupportedOperationException("Encoding not supported for offset 
vectors");
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807697#comment-16807697
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271268522
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/ScalarReader.java
 ##
 @@ -86,4 +87,10 @@
   LocalDate getDate();
   LocalTime getTime();
   Instant getTimestamp();
+
+  /**
+   * Return the value of the object using the extended type.
+   * @return
 
 Review comment:
   Please move add description to the return to avoid warnings in the IDE (just 
move upper line to `@return`).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807699#comment-16807699
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271265638
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/types/Types.java
 ##
 @@ -463,23 +462,29 @@ public static boolean usesHolderForGet(final MajorType 
type) {
 default:
   return true;
 }
-
   }
 
   public static boolean isFixedWidthType(final MajorType type) {
-switch(type.getMinorType()) {
+return isFixedWidthType(type.getMinorType());
+  }
+
+  public static boolean isFixedWidthType(final MinorType type) {
+return ! isVarWidthType(type);
+  }
+
+  public static boolean isVarWidthType(final MinorType type) {
+switch(type) {
 
 Review comment:
   ```suggestion
   switch (type) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807698#comment-16807698
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271270255
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/NullableScalarWriter.java
 ##
 @@ -278,4 +278,9 @@ public void dump(HierarchicalFormatter format) {
 baseWriter.dump(format);
 format.endObject();
   }
+
+  @Override
+  public void setDefaultValue(Object value) {
+throw new UnsupportedOperationException("Default values not supported for 
nullable types");
 
 Review comment:
   Maybe include `value` into the error message?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807701#comment-16807701
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271269796
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/AbstractFixedWidthWriter.java
 ##
 @@ -93,17 +112,62 @@ protected final int prepareWrite(int writeIndex) {
 @Override
 protected final void fillEmpties(final int writeIndex) {
   final int width = width();
-  final int stride = ZERO_BUF.length / width;
+  final int stride = emptyValue.length / width;
   int dest = lastWriteIndex + 1;
   while (dest < writeIndex) {
 int length = writeIndex - dest;
 length = Math.min(length, stride);
-drillBuf.setBytes(dest * width, ZERO_BUF, 0, length * width);
+drillBuf.setBytes(dest * width, emptyValue, 0, length * width);
 dest += length;
   }
 }
   }
 
+  /**
+   * Base class for writers that use the Java int type as their native
+   * type. Handles common implicit conversions from other types to int.
+   */
+  public static abstract class BaseIntWriter extends BaseFixedWidthWriter {
+
+@Override
+public final void setLong(final long value) {
+  try {
+// Catches int overflow. Does not catch overflow for smaller types.
+setInt(Math.toIntExact(value));
+  } catch (final ArithmeticException e) {
+throw InvalidConversionError.writeError(schema(), value, e);
+  }
+}
+
+@Override
+public final void setDouble(final double value) {
 
 Review comment:
   Double covers Float as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807702#comment-16807702
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

arina-ielchiieva commented on pull request #1726: DRILL-7143: Support default 
value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271268972
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/convert/AbstractWriteConverter.java
 ##
 @@ -68,6 +68,11 @@ public ColumnMetadata schema() {
 return baseWriter.schema();
   }
 
+  @Override
+  public void setDefaultValue(Object value) {
+throw new IllegalStateException("Cannot set a default value through a 
shim; types conflict.");
 
 Review comment:
   Should we include `value` in the error message?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7145:

Comment: was deleted

(was: aielchiieva commented on pull request #1727: DRILL-7145: Exceptions 
happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271257957
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` 
in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
)

> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7145:

Comment: was deleted

(was: aielchiieva commented on pull request #1727: DRILL-7145: Exceptions 
happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271257957
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` 
in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
)

> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7145:

Reviewer: Arina Ielchiieva

> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807680#comment-16807680
 ] 

ASF GitHub Bot commented on DRILL-7145:
---

arina-ielchiieva commented on pull request #1727: DRILL-7145: Exceptions 
happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271258333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method getException() 
inAbstractDisposableUserClientConnection class, similar to getError(). And then 
do proper handling of both.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807681#comment-16807681
 ] 

ASF GitHub Bot commented on DRILL-7145:
---

arina-ielchiieva commented on pull request #1727: DRILL-7145: Exceptions 
happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271258333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` in 
`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807678#comment-16807678
 ] 

ASF GitHub Bot commented on DRILL-7145:
---

aielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened 
during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271257957
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` 
in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807676#comment-16807676
 ] 

ASF GitHub Bot commented on DRILL-7145:
---

aielchiieva commented on pull request #1727: DRILL-7145: Exceptions happened 
during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271257957
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` 
in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807670#comment-16807670
 ] 

ASF GitHub Bot commented on DRILL-7145:
---

agozhiy commented on pull request #1727: DRILL-7145: Exceptions happened during 
retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727
 
 
   …ctor are not being displayed at the Drill Web UI
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7087) Integrate Arrow's Gandiva into Drill

2019-04-02 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807667#comment-16807667
 ] 

Arina Ielchiieva commented on DRILL-7087:
-

[~weijie] please start conversation about this on the mailing list, let's see 
what community thinks about having Arrow fork. Personally I am against of 
having an Arrow fork.

> Integrate Arrow's Gandiva into Drill
> 
>
> Key: DRILL-7087
> URL: https://issues.apache.org/jira/browse/DRILL-7087
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen, Execution - Relational Operators
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> It's a prior work to integrate arrow into drill by invoking the its gandiva 
> feature. Comparing arrow and drill 's in memory column representation , 
> there's different null representation internal now. Drill use 1 byte while 
> arrow using 1 bit to indicate one null row. Also all columns of arrow is 
> nullable now. Apart from those basic differences , they have same memory 
> representation to the different data types. 
> The integrating strategy is to invoke arrow's JniWrapper's native method 
> directly by passing the ValueVector's memory address. 
> I have done a implementation at our own Drill version by integrating gandiva 
> into Drill's project operator. The performance shows that there's nearly 1 
> times performance gain at expression computation.
> So if there's no objection , I will submit a related PR to contribute this 
> feature. Also this issue waits for arrow's related issue[ARROW-4819].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7087) Integrate Arrow's Gandiva into Drill

2019-04-02 Thread weijie.tong (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807656#comment-16807656
 ] 

weijie.tong commented on DRILL-7087:


What's your opinion about a self managed branch of arrow since the arrow ones 
seem don't agree with the ARROW-4819 ?

> Integrate Arrow's Gandiva into Drill
> 
>
> Key: DRILL-7087
> URL: https://issues.apache.org/jira/browse/DRILL-7087
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen, Execution - Relational Operators
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> It's a prior work to integrate arrow into drill by invoking the its gandiva 
> feature. Comparing arrow and drill 's in memory column representation , 
> there's different null representation internal now. Drill use 1 byte while 
> arrow using 1 bit to indicate one null row. Also all columns of arrow is 
> nullable now. Apart from those basic differences , they have same memory 
> representation to the different data types. 
> The integrating strategy is to invoke arrow's JniWrapper's native method 
> directly by passing the ValueVector's memory address. 
> I have done a implementation at our own Drill version by integrating gandiva 
> into Drill's project operator. The performance shows that there's nearly 1 
> times performance gain at expression computation.
> So if there's no objection , I will submit a related PR to contribute this 
> feature. Also this issue waits for arrow's related issue[ARROW-4819].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807612#comment-16807612
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271222800
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ##
 @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function
 
 for(ExprNode arg : exprNode.args) {
   Result exprResult = evaluateHelper(recordValues, arg);
-  if (exprResult == Result.FALSE) {
-return exprResult;
-  }
-  if (exprResult == Result.INCONCLUSIVE) {
-result = Result.INCONCLUSIVE;
+  switch (exprResult) {
 
 Review comment:
   Suggested change will break the logic, here is a loop and when invocation of 
```evaluateHelper(recordValues, arg)``` returned ```Result.INCONCLUSIVE``` once 
it's still a chance that next iteration will return false I guess. Previously 
here was the chunk:
   ```java
   for(ExprNode arg : exprNode.args) {
 Result exprResult = evaluateHelper(recordValues, arg);
 if (exprResult == Result.FALSE) {
   return exprResult;
 }
 if (exprResult == Result.INCONCLUSIVE) {
   result = Result.INCONCLUSIVE;
 }
   }
   ```
   I see that my change made it more confusing, so I'll revert it back. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807592#comment-16807592
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271215506
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ##
 @@ -63,89 +58,38 @@ public Table getTable(String tableName) {
 return hiveSchema.getDrillTable(this.name, tableName);
   }
 
+  @Override
+  public Collection> getTableNamesAndTypes() {
+ensureInitTables();
+return tables.entrySet();
+  }
+
   @Override
   public Set getTableNames() {
+ensureInitTables();
+return tables.keySet();
+  }
+
+  private void ensureInitTables() {
 if (tables == null) {
   try {
-tables = Sets.newHashSet(mClient.getTableNames(this.name, 
schemaConfig.getIgnoreAuthErrors()));
-  } catch (final TException e) {
-logger.warn("Failure while attempting to access HiveDatabase '{}'.", 
this.name, e.getCause());
-tables = Sets.newHashSet(); // empty set.
+tables = mClient.getTableNamesAndTypes(this.name, 
schemaConfig.getIgnoreAuthErrors());
+  } catch (TException e) {
+logger.warn(String.format(
 
 Review comment:
   It's invocation of ```warn(String msg, Throwable t)``` which means that 
stack trace won't be missed in logs. Using string with placeholders ```{}``` 
and ```warn(String format, Object... arguments)``` most probably will just call 
```toString()``` on exception object and stack trace details won't be shown. 
   
 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807572#comment-16807572
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

ihuzenko commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271203347
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.store.hive.ColumnListsCache;
+import org.apache.drill.exec.store.hive.HiveReadEntry;
+import org.apache.drill.exec.store.hive.HiveTableWithColumnCache;
+import org.apache.drill.exec.store.hive.HiveTableWrapper;
+import org.apache.drill.exec.store.hive.HiveUtilities;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownTableException;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableEntryCacheLoader extends CacheLoader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableEntryCacheLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public HiveReadEntry load(TableName key) throws Exception {
+Table table;
+List partitions;
+synchronized (client) {
+  table = getTable(key);
+  partitions = getPartitions(key);
+}
+HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, 
new ColumnListsCache(table));
+List partitionWrappers = 
partitions.isEmpty()
+? null
 
 Review comment:
   Good catch, the logic was here previously since the class was static nested. 
So I extracted it and preserved existing logic, but I'll try to use empty list 
and maybe somewhere else redundant null check will be removed too.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807523#comment-16807523
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271177669
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableName.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Objects;
+
+/**
+ * Combination of dbName and tableName fields used
 
 Review comment:
   ```suggestion
* Combination of database and table names used
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807526#comment-16807526
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271164002
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 ##
 @@ -266,8 +266,7 @@ private void scanSchema(String schemaPath, SchemaPlus 
schema) {
*/
   public void visitTables(String schemaPath, SchemaPlus schema) {
 final AbstractSchema drillSchema = schema.unwrap(AbstractSchema.class);
-final List tableNames = Lists.newArrayList(schema.getTableNames());
-for(Pair tableNameToTable : 
drillSchema.getTablesByNames(tableNames)) {
+for(Pair tableNameToTable : 
drillSchema.getTablesByNames(schema.getTableNames())) {
 
 Review comment:
   ```suggestion
   for (Pair tableNameToTable : 
drillSchema.getTablesByNames(schema.getTableNames())) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807525#comment-16807525
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271179071
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ##
 @@ -63,89 +58,38 @@ public Table getTable(String tableName) {
 return hiveSchema.getDrillTable(this.name, tableName);
   }
 
+  @Override
+  public Collection> getTableNamesAndTypes() {
+ensureInitTables();
+return tables.entrySet();
+  }
+
   @Override
   public Set getTableNames() {
+ensureInitTables();
+return tables.keySet();
+  }
+
+  private void ensureInitTables() {
 if (tables == null) {
   try {
-tables = Sets.newHashSet(mClient.getTableNames(this.name, 
schemaConfig.getIgnoreAuthErrors()));
-  } catch (final TException e) {
-logger.warn("Failure while attempting to access HiveDatabase '{}'.", 
this.name, e.getCause());
-tables = Sets.newHashSet(); // empty set.
+tables = mClient.getTableNamesAndTypes(this.name, 
schemaConfig.getIgnoreAuthErrors());
+  } catch (TException e) {
+logger.warn(String.format(
 
 Review comment:
   Why `String.format`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807521#comment-16807521
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271163913
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ##
 @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function
 
 for(ExprNode arg : exprNode.args) {
   Result exprResult = evaluateHelper(recordValues, arg);
-  if (exprResult == Result.FALSE) {
-return exprResult;
-  }
-  if (exprResult == Result.INCONCLUSIVE) {
-result = Result.INCONCLUSIVE;
+  switch (exprResult) {
 
 Review comment:
   consider
   ```
   if (exprResult == Result.FALSE || exprResult == Result.INCONCLUSIVE) {
 return exprResult;
   }
   ```
   i find it simpler
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807516#comment-16807516
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271145021
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.calcite.schema.Schema.TableType;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.stream.Collectors.toMap;
+import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableNameLoader extends CacheLoader> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableNameLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public Map load(String dbName) throws Exception {
+List tableAndViewNames;
+final Set viewNames = new HashSet<>();
+synchronized (client) {
+  try {
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  } catch (MetaException e) {
+  /*
+ HiveMetaStoreClient is encapsulating both the 
MetaException/TExceptions inside MetaException.
+ Since we don't have good way to differentiate, we will close older 
connection and retry once.
+ This is only applicable for getAllTables and getAllDatabases method 
since other methods are
+ properly throwing correct exceptions.
+  */
+logger.warn("Failure while attempting to get hive tables. Retries 
once.", e);
+AutoCloseables.closeSilently(client::close);
+client.reconnect();
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  }
+}
+Function valueMapper = viewNames.isEmpty()
+? tableName -> TableType.TABLE
 
 Review comment:
   Please replace two-level ternary operator. We are trying to avoid it in 
Drill for readability.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlass

[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807524#comment-16807524
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271177349
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableName.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Objects;
+
+/**
+ * Combination of dbName and tableName fields used
+ * to represent key for getting table data from cache.
+ */
+final class TableName {
+
+  private final String dbName;
+
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807520#comment-16807520
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271160776
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ##
 @@ -920,46 +920,11 @@ public void dropTable(String table) {
 }
 
 @Override
-public List> getTableNamesAndTypes(boolean 
bulkLoad, int bulkSize) {
-  final List> tableNamesAndTypes = 
Lists.newArrayList();
-
-  // Look for raw tables first
-  if (!tables.isEmpty()) {
-for (Map.Entry tableEntry : 
tables.entrySet()) {
-  tableNamesAndTypes
-  .add(Pair.of(tableEntry.getKey().sig.name, 
tableEntry.getValue().getJdbcTableType()));
-}
-  }
-  // Then look for files that start with this name and end in .drill.
-  List files = Collections.emptyList();
-  try {
-files = DotDrillUtil.getDotDrills(getFS(), new 
Path(config.getLocation()), DotDrillType.VIEW);
-  } catch (AccessControlException e) {
-if (!schemaConfig.getIgnoreAuthErrors()) {
-  logger.debug(e.getMessage());
-  throw UserException.permissionError(e)
-  .message("Not authorized to list or query tables in schema 
[%s]", getFullSchemaName())
-  .build(logger);
-}
-  } catch (IOException e) {
-logger.warn("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  } catch (UnsupportedOperationException e) {
-// the file system (e.g. the classpath filesystem) may not support 
listing
-// of files. But see getViews(), it ignores the exception and continues
-logger.debug("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  }
-
-  try {
-for (DotDrillFile f : files) {
-  if (f.getType() == DotDrillType.VIEW) {
-tableNamesAndTypes.add(Pair.of(f.getBaseName(), TableType.VIEW));
-  }
-}
-  } catch (UnsupportedOperationException e) {
-logger.debug("The filesystem for this workspace does not support this 
operation.", e);
 
 Review comment:
   What about logging?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807522#comment-16807522
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271161918
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ##
 @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function
 
 for(ExprNode arg : exprNode.args) {
 
 Review comment:
   ```suggestion
for (ExprNode arg : exprNode.args) {
   ```
   please edit in 3 other cases in this class
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807518#comment-16807518
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271150560
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.store.hive.ColumnListsCache;
+import org.apache.drill.exec.store.hive.HiveReadEntry;
+import org.apache.drill.exec.store.hive.HiveTableWithColumnCache;
+import org.apache.drill.exec.store.hive.HiveTableWrapper;
+import org.apache.drill.exec.store.hive.HiveUtilities;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownTableException;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableEntryCacheLoader extends CacheLoader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableEntryCacheLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public HiveReadEntry load(TableName key) throws Exception {
+Table table;
+List partitions;
+synchronized (client) {
+  table = getTable(key);
+  partitions = getPartitions(key);
+}
+HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, 
new ColumnListsCache(table));
+List partitionWrappers = 
partitions.isEmpty()
+? null
 
 Review comment:
   Why not empty list instead of null in case of empty partitions list?
   Depends on the above answer you can use `Optional` or `Stream` 
`filter(Objects::nonNull)` for better stream chaining. You can ignore it, if 
you added `if` condition intentionally to avoid creation of `Optional` or 
`Stream` objects.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807519#comment-16807519
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271160223
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ##
 @@ -67,23 +66,24 @@
 import org.apache.drill.exec.store.AbstractSchema;
 import org.apache.drill.exec.store.PartitionNotFoundException;
 import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.util.DrillFileSystemUtil;
 import org.apache.drill.exec.store.StorageStrategy;
 import org.apache.drill.exec.store.easy.json.JSONFormatPlugin;
+import org.apache.drill.exec.util.DrillFileSystemUtil;
 import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.permission.FsAction;
 import org.apache.hadoop.fs.permission.FsPermission;
 import org.apache.hadoop.security.AccessControlException;
 
-import com.fasterxml.jackson.databind.ObjectMapper;
-import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
-import org.apache.drill.shaded.guava.com.google.common.base.Strings;
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import static java.util.Collections.unmodifiableList;
 
 Review comment:
   Usually we don't touch imports ordering, since different IDE can change it 
for a lot of classes.
   But it is ok here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
> and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807517#comment-16807517
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271145356
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.calcite.schema.Schema.TableType;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.stream.Collectors.toMap;
+import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableNameLoader extends CacheLoader> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableNameLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public Map load(String dbName) throws Exception {
+List tableAndViewNames;
+final Set viewNames = new HashSet<>();
+synchronized (client) {
+  try {
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  } catch (MetaException e) {
+  /*
+ HiveMetaStoreClient is encapsulating both the 
MetaException/TExceptions inside MetaException.
+ Since we don't have good way to differentiate, we will close older 
connection and retry once.
+ This is only applicable for getAllTables and getAllDatabases method 
since other methods are
+ properly throwing correct exceptions.
+  */
+logger.warn("Failure while attempting to get hive tables. Retries 
once.", e);
+AutoCloseables.closeSilently(client::close);
+client.reconnect();
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  }
+}
+Function valueMapper = viewNames.isEmpty()
+? tableName -> TableType.TABLE
+: tableOrViewName -> viewNames.contains(tableOrViewName) ? 
TableType.VIEW : TableType.TABLE;
+return Collections.unmodifiableMap(tableAndViewNames.stream()
+.collect(toMap(Function.identity(), valueMapper)));
 
 Review comment:
   please follow the common way to use `Collectors` class name with a static 
method usage.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a spli

[jira] [Commented] (DRILL-7115) Improve Hive schema show tables performance

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807515#comment-16807515
 ] 

ASF GitHub Bot commented on DRILL-7115:
---

vdiravka commented on pull request #1706: DRILL-7115: Improve Hive schema show 
tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271144225
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.calcite.schema.Schema.TableType;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.stream.Collectors.toMap;
+import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableNameLoader extends CacheLoader> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableNameLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public Map load(String dbName) throws Exception {
+List tableAndViewNames;
+final Set viewNames = new HashSet<>();
+synchronized (client) {
+  try {
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  } catch (MetaException e) {
+  /*
+ HiveMetaStoreClient is encapsulating both the 
MetaException/TExceptions inside MetaException.
+ Since we don't have good way to differentiate, we will close older 
connection and retry once.
+ This is only applicable for getAllTables and getAllDatabases method 
since other methods are
+ properly throwing correct exceptions.
+  */
+logger.warn("Failure while attempting to get hive tables. Retries 
once.", e);
+AutoCloseables.closeSilently(client::close);
+client.reconnect();
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  }
+}
+Function valueMapper = viewNames.isEmpty()
+? tableName -> TableType.TABLE
+: tableOrViewName -> viewNames.contains(tableOrViewName) ? 
TableType.VIEW : TableType.TABLE;
+return Collections.unmodifiableMap(tableAndViewNames.stream()
+.collect(toMap(Function.identity(), valueMapper)));
+  }
+
+}
 
 Review comment:
   ```suggestion
   }
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Hive schema show tables performance
> ---
>
> Key: DRILL-7115
> URL: https://issues.apache.org/jira/browse/DRILL-7115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Information Schema
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
> 20mins. The schema has nearly ~8000 tables.
> Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
> 0.2 secs).
> I tested the same in m

[jira] [Commented] (DRILL-7072) Query with semi join fails for JDBC storage plugin

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807500#comment-16807500
 ] 

Volodymyr Vysotskyi commented on DRILL-7072:


No, it doesn't.

> Query with semi join fails for JDBC storage plugin
> --
>
> Key: DRILL-7072
> URL: https://issues.apache.org/jira/browse/DRILL-7072
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> When running a query with semi join for JDBC storage plugin, it fails with 
> class cast exception:
> {code:sql}
> select person_id from mysql.`drill_mysql_test`.person t1
> where exists (
> select person_id from mysql.`drill_mysql_test`.person
> where t1.person_id = person_id)
> {code}
> {noformat}
> SYSTEM ERROR: ClassCastException: 
> org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate cannot be cast to 
> org.apache.drill.exec.planner.logical.DrillAggregateRel
> Please, refer to logs for more information.
> [Error Id: 85a27762-a4e5-4571-909f-0efa18ca0689 on user515050-pc:31013]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate 
> cannot be cast to org.apache.drill.exec.planner.logical.DrillAggregateRel
> Please, refer to logs for more information.
> [Error Id: 85a27762-a4e5-4571-909f-0efa18ca0689 on user515050-pc:31013]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:779)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:325)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:221)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83)
>  [classes/:na]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:299) 
> [classes/:na]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_191]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_191]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: 
> org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate cannot be cast to 
> org.apache.drill.exec.planner.logical.DrillAggregateRel
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:300) 
> [classes/:na]
>   ... 3 common frames omitted
> Caused by: java.lang.ClassCastException: 
> org.apache.calcite.adapter.jdbc.JdbcRules$JdbcAggregate cannot be cast to 
> org.apache.drill.exec.planner.logical.DrillAggregateRel
>   at 
> org.apache.drill.exec.planner.logical.DrillSemiJoinRule.matches(DrillSemiJoinRule.java:171)
>  ~[classes/:na]
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:557) 
> ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420) 
> ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:257)
>  ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>  ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:216) 
> ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:203) 
> ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:431)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:382)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:365)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:289)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:331)
>  ~[classes/:na]
>   at 
> org.apa

[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807485#comment-16807485
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

paul-rogers commented on issue #1726: DRILL-7143: Support default value for 
empty columns
URL: https://github.com/apache/drill/pull/1726#issuecomment-478882176
 
 
   @arina-ielchiieva, here is a first-cut at the improved default values. Have 
tested selected mechanisms and CSV with schema. Have not yet run the full set 
of unit tests. Consider this a "preview" to begin the code review in parallel 
with the remaining busy-work needed to complete the PR. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807481#comment-16807481
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

paul-rogers commented on pull request #1726: DRILL-7143: Support default value 
for empty columns
URL: https://github.com/apache/drill/pull/1726
 
 
   Modifies the prior work to add default values for columns. The prior work 
added defaults
   when the entire column is missing from a reader (the old Nullable Int 
column). The Row
   Set mechanism now will also "fill empty" slots with the default value.
   
   Added default support for the column writers. The writers automatically 
obtain the
   default value from the column schema. The default can also be set explicitly 
on
   the column writer.
   
   Updated the null column mechanism to use this feature rather than the ad-hoc
   implemention in the prior commit.
   
   Semantics changed a bit. Only Required columns take a default. The default 
value
   is ignored or nullable columns since nullable columns already have a file 
default: NULL.
   
   Updated the CSV-with-schema tests to illustrate the new behavior.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)