[jira] [Commented] (DRILL-7159) After renaming MAP to STRUCT typeString method still outputs MAP name

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813054#comment-16813054
 ] 

ASF GitHub Bot commented on DRILL-7159:
---

sohami commented on pull request #1741: DRILL-7159: Fix typeString method to 
return correct name for MAP (aka STRUCT)
URL: https://github.com/apache/drill/pull/1741
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> After renaming MAP to STRUCT typeString method still outputs MAP name
> -
>
> Key: DRILL-7159
> URL: https://issues.apache.org/jira/browse/DRILL-7159
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> After renaming MAP to STRUCT typeString method still outputs MAP name.
> Reproduce:
> {noformat}
> apache drill> CREATE or replace SCHEMA
> . .semicolon> (
> . . . . . .)> varchar_column VARCHAR(10) NOT NULL, 
> . . . . . .)> struct_column STRUCT>
> . . . . . .)> )
> . .semicolon>  FOR TABLE dfs.tmp.`text_table`;
> {noformat}
> Error:
> {noformat}
> apache drill> describe schema for table dfs.tmp.`text_table`;
> Error: RESOURCE ERROR: Cannot construct instance of 
> `org.apache.drill.exec.record.metadata.AbstractColumnMetadata`, problem: Line 
> [1], position [16], offending symbol [@1,16:18='MAP',<26>,1:16]: no viable 
> alternative at input '`struct_column`MAP'
>  at [Source: 
> (org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream); line: 14, 
> column: 7] (through reference chain: 
> org.apache.drill.exec.record.metadata.schema.SchemaContainer["schema"]->org.apache.drill.exec.record.metadata.TupleSchema["columns"]->java.util.ArrayList[1])
> Error while accessing table location for [dfs.tmp.text_table]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813055#comment-16813055
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

sohami commented on pull request #1723: DRILL-7063: Seperate metadata cache 
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813056#comment-16813056
 ] 

ASF GitHub Bot commented on DRILL-7154:
---

sohami commented on pull request #1737: DRILL-7154: TPCH query 4, 17 and 18 
take longer with sf 1000 when Statistics are disabled.
URL: https://github.com/apache/drill/pull/1737
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}

[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813019#comment-16813019
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level 
option shown even if not set
URL: https://github.com/apache/drill/pull/1742
 
 
   The fix is to force setting to zero IFF autoLimit was intended to be set 
originally but is inapplicable; such as 'SHOW DATABASES'. If autolimit was not 
intended to be applied, setting the value to zero is not required.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7160:

Reviewer: Volodymyr Vysotskyi

> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813011#comment-16813011
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

dvjyothsna commented on issue #1736: DRILL-7064: Leverage the summary metadata 
for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#issuecomment-481106364
 
 
   Looks good to me.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-540) Allow querying hive views in Drill

2019-04-08 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813007#comment-16813007
 ] 

Igor Guzenko commented on DRILL-540:


Hi [~bbevens], 

Sounds very good, thank you. 

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> Authorization*
> ||SQL||user0||user1||user2||   ||user0||user1||user2||
> |*Queries executed using Drill :*| | | | | | | |
> |SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
> views|Accessibe tables + all views|Accessibe tables + all views|
> |SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|     
>    (x)|         (x)|
> |SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|      
>   (

[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812936#comment-16812936
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273302668
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -149,6 +219,77 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 return new ScanBatch(context, oContext, readers, implicitColumns);
   }
 
+  /**
+   *  Create a reader and add it to the list of readers.
+   *
+   * @param context
+   * @param rowGroupScan
+   * @param oContext
+   * @param columnExplorer
+   * @param readers
+   * @param implicitColumns
+   * @param mapWithMaxColumns
+   * @param rowGroup
+   * @param fs
+   * @param footer
+   * @param readSchemaOnly - if true sets the number of rows to read to be zero
+   * @return
+   */
+  private Map 
createReaderAndImplicitColumns(ExecutorFragmentContext context,
+ 
AbstractParquetRowGroupScan rowGroupScan,
+ OperatorContext 
oContext,
+ ColumnExplorer 
columnExplorer,
+ 
List readers,
+ List> implicitColumns,
+ Map mapWithMaxColumns,
+ RowGroupReadEntry 
rowGroup,
+ DrillFileSystem 
fs,
+ ParquetMetadata 
footer,
+ boolean 
readSchemaOnly) {
+ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footer,
+  rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
+logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
+
+boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
+boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, rowGroupScan.getColumns());
+logger.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", 
useNewReader ? "enabled" : "disabled",
+containsComplexColumn ? "found." : "not found.");
+RecordReader reader;
+
+if (useNewReader || containsComplexColumn) {
+  reader = new DrillParquetReader(context,
+  footer,
+  rowGroup,
+  columnExplorer.getTableColumns(),
+  fs,
+  containsCorruptDates);
+} else {
+  reader = new ParquetRecordReader(context,
+  rowGroup.getPath(),
+  rowGroup.getRowGroupIndex(),
+  rowGroup.getNumRecordsToRead(), // if readSchemaOnly - then set to 
zero rows to read (currently breaks the ScanBatch)
 
 Review comment:
   Commented this out, and added TODO comments.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812934#comment-16812934
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273302407
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -68,76 +83,131 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 List readers = new LinkedList<>();
 List> implicitColumns = new ArrayList<>();
 Map mapWithMaxColumns = new LinkedHashMap<>();
-for (RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) {
-  /*
-  Here we could store a map from file names to footers, to prevent 
re-reading the footer for each row group in a file
-  TODO - to prevent reading the footer again in the parquet record reader 
(it is read earlier in the ParquetStorageEngine)
-  we should add more information to the RowGroupInfo that will be 
populated upon the first read to
-  provide the reader with all of th file meta-data it needs
-  These fields will be added to the constructor below
-  */
-  try {
-Stopwatch timer = logger.isTraceEnabled() ? 
Stopwatch.createUnstarted() : null;
-DrillFileSystem fs = fsManager.get(rowGroupScan.getFsConf(rowGroup), 
rowGroup.getPath());
-ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
-if (!footers.containsKey(rowGroup.getPath())) {
-  if (timer != null) {
-timer.start();
+ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+RowGroupReadEntry firstRowGroup = null; // to be scanned in case ALL row 
groups are pruned out
+ParquetMetadata firstFooter = null;
+long rowgroupsPruned = 0; // for stats
+
+try {
+
+  LogicalExpression filterExpr = rowGroupScan.getFilter();
+  Path selectionRoot = rowGroupScan.getSelectionRoot();
+  // Runtime pruning: Avoid recomputing metadata objects for each 
row-group in case they use the same file
+  // by keeping the following objects computed earlier (relies on same 
file being in consecutive rowgroups)
+  Path prevRowGroupPath = null;
+  Metadata_V3.ParquetTableMetadata_v3 tableMetadataV3 = null;
+  Metadata_V3.ParquetFileMetadata_v3 fileMetadataV3 = null;
+  FileSelection fileSelection = null;
+  ParquetTableMetadataProviderImpl metadataProvider = null;
+
+  for (RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) 
{
+/*
+Here we could store a map from file names to footers, to prevent 
re-reading the footer for each row group in a file
+TODO - to prevent reading the footer again in the parquet record 
reader (it is read earlier in the ParquetStorageEngine)
+we should add more information to the RowGroupInfo that will be 
populated upon the first read to
+provide the reader with all of th file meta-data it needs
+These fields will be added to the constructor below
+*/
+
+  Stopwatch timer = logger.isTraceEnabled() ? 
Stopwatch.createUnstarted() : null;
+  DrillFileSystem fs = fsManager.get(rowGroupScan.getFsConf(rowGroup), 
rowGroup.getPath());
+  if (!footers.containsKey(rowGroup.getPath())) {
+if (timer != null) {
+  timer.start();
+}
+
+ParquetMetadata footer = readFooter(fs.getConf(), 
rowGroup.getPath(), readerConfig);
+if (timer != null) {
+  long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
+  logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", 
"", rowGroup.getPath(), "", 0, 0, 0, timeToRead);
+}
+footers.put(rowGroup.getPath(), footer);
   }
+  ParquetMetadata footer = footers.get(rowGroup.getPath());
+
+  //
+  //   If a filter is given (and it is not just "TRUE") - then use it 
to perform run-time pruning
+  //
+  if ( filterExpr != null && ! (filterExpr instanceof 
ValueExpressions.BooleanExpression)  ) { // skip when no filter or filter is 
TRUE
 
 Review comment:
   Added a check for true - getBoolean() .
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  

[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812933#comment-16812933
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273302288
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -149,6 +219,77 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 return new ScanBatch(context, oContext, readers, implicitColumns);
   }
 
+  /**
+   *  Create a reader and add it to the list of readers.
+   *
+   * @param context
+   * @param rowGroupScan
+   * @param oContext
+   * @param columnExplorer
+   * @param readers
+   * @param implicitColumns
+   * @param mapWithMaxColumns
 
 Review comment:
   Done.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812932#comment-16812932
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273302187
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -149,6 +219,77 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 return new ScanBatch(context, oContext, readers, implicitColumns);
   }
 
+  /**
+   *  Create a reader and add it to the list of readers.
+   *
+   * @param context
+   * @param rowGroupScan
+   * @param oContext
+   * @param columnExplorer
+   * @param readers
+   * @param implicitColumns
+   * @param mapWithMaxColumns
+   * @param rowGroup
+   * @param fs
+   * @param footer
+   * @param readSchemaOnly - if true sets the number of rows to read to be zero
+   * @return
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812927#comment-16812927
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273300809
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ##
 @@ -52,6 +52,7 @@
 
   private final ParquetFormatPlugin formatPlugin;
   private final ParquetFormatConfig formatConfig;
+  private final Collection drillbitEndpoints;
 
 Review comment:
   With the latest changes, no need for a special constructor for the 
ParquetGroupScan.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812926#comment-16812926
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273300586
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
 ##
 @@ -127,9 +127,21 @@ public static void createMeta(FileSystem fs, Path path, 
ParquetReaderConfig read
*
* @return parquet table metadata
*/
-  public static ParquetTableMetadata_v3 getParquetTableMetadata(FileSystem fs, 
String path, ParquetReaderConfig readerConfig) throws IOException {
+  public static ParquetTableMetadata_v3 
getParquetTableMetadata(ParquetMetadata footer, FileSystem fs, String path, 
ParquetReaderConfig readerConfig) throws IOException {
 Metadata metadata = new Metadata(readerConfig);
-return metadata.getParquetTableMetadata(path, fs);
+return metadata.getParquetTableMetadata(path, fs, footer);
+  }
+
+  /**
+   *  When the footer is not yet available (it would be read)
 
 Review comment:
   Done.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812923#comment-16812923
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273299242
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logic

[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812924#comment-16812924
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on issue #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#issuecomment-481070256
 
 
   @vvysotskyi , @dvjyothsna  I have addressed your review comments.  Pls take 
a look. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812921#comment-16812921
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273299166
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
 
 Review comment:
   Removed the 'logical' (and corresponding 'physical') suffixes from all the 
places and instead just use the name of the rule since it already has a 'Rule' 
or 'Prule' suffix. 
 
---

[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812922#comment-16812922
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273299198
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logic

[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812920#comment-16812920
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273298899
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/CountToDirectScanUtils.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rel.type.RelDataTypeFieldImpl;
+import org.apache.calcite.rel.type.RelRecordType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.type.SqlTypeName;
+
+import java.util.List;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+
+/**
+ * A utility class that contains helper functions used by rules that convert 
COUNT(*) and COUNT(col)
+ * aggregates (no group-by) to DirectScan
+ */
+public class CountToDirectScanUtils {
+
+  /**
+   * Checks if aggregate call contains star or non-null expression:
+   * 
+   * count(*)  == >  empty arg  ==>  rowCount
+   * count(Not-null-input) ==> rowCount
+   * 
+   *
+   * @param aggregateCall aggregate call
+   * @param aggregate aggregate relation expression
+   * @return true of aggregate call contains star or non-null expression
+   */
+  public static boolean containsStarOrNotNullInput(AggregateCall 
aggregateCall, Aggregate aggregate) {
+return aggregateCall.getArgList().isEmpty() ||
+(aggregateCall.getArgList().size() == 1 &&
+
!aggregate.getInput().getRowType().getFieldList().get(aggregateCall.getArgList().get(0)).getType().isNullable());
+  }
+
+  /**
+   * For each aggregate call creates field based on its name with bigint type.
+   * Constructs record type for created fields.
+   *
+   * @param aggregateRel aggregate relation expression
+   * @param fieldNames field names
+   * @return record type
+   */
+  public static RelDataType constructDataType(Aggregate aggregateRel, 
Collection fieldNames) {
+List fields = new ArrayList<>();
+Iterator filedNamesIterator = fieldNames.iterator();
 
 Review comment:
   Done. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already c

[jira] [Commented] (DRILL-6582) SYSLOG (RFC-5424) Format Plugin

2019-04-08 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812919#comment-16812919
 ] 

Bridget Bevens commented on DRILL-6582:
---

Hi [~cgivre],

I've added the doc here: https://drill.apache.org/docs/syslog-format-plugin/ 
Let me know if I need to change anything.

Thanks,
Bridget

> SYSLOG (RFC-5424) Format Plugin
> ---
>
> Key: DRILL-6582
> URL: https://issues.apache.org/jira/browse/DRILL-6582
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Many security log files are in the format defined by RFC-5424.  A format 
> plugin which can read data formatted in according to this specification would 
> be very useful for security engineers as well as network engineers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6582) SYSLOG (RFC-5424) Format Plugin

2019-04-08 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6582:
--
Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> SYSLOG (RFC-5424) Format Plugin
> ---
>
> Key: DRILL-6582
> URL: https://issues.apache.org/jira/browse/DRILL-6582
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> Many security log files are in the format defined by RFC-5424.  A format 
> plugin which can read data formatted in according to this specification would 
> be very useful for security engineers as well as network engineers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-08 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7154:
-
Reviewer: Boaz Ben-Zvi

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id

[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled

2019-04-08 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7154:
-
Labels: ready-to-commit  (was: )

> TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
> -
>
> Key: DRILL-7154
> URL: https://issues.apache.org/jira/browse/DRILL-7154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
> 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, 
> hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, 
> hashagg.stats.disabled.foreman.log
>
>
> Here is TPCH 04 with sf 1000:
> {noformat}
> select
>   o.o_orderpriority,
>   count(*) as order_count
> from
>   orders o
> where
>   o.o_orderdate >= date '1996-10-01'
>   and o.o_orderdate < date '1996-10-01' + interval '3' month
>   and 
>   exists (
> select
>   *
> from
>   lineitem l
> where
>   l.l_orderkey = o.o_orderkey
>   and l.l_commitdate < l.l_receiptdate
>   )
> group by
>   o.o_orderpriority
> order by
>   o.o_orderpriority;
> {noformat}
> TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
> operator in the new plan is taking longer.  One possible reason is that the 
> Hash Agg operator in the new plan is not using as many buckets as the old 
> plan did.  The memory usage of the Hash Agg operator in the new plan is using 
> less memory compared to the old plan.
> Here is the old plan:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
> order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
> rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 
> network, 2.2631985057468002E10 memory}, id = 5645
> 00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
> 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 
> memory}, id = 5644
> 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
> 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
> 01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
> 02-01SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
> 02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
> o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
> {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
> 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
> 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType 
> = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
> cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 
> memory}, id = 5639
> 02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
> cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
> 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5638
> 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : 
> rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 
> 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 
> cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 
> memory}, id = 5637
> 03-02  Project(o_orderpriority=[$1]) : rowType = 
> RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
> {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
> 3.25631968386

[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812910#comment-16812910
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

dvjyothsna commented on issue #1723: DRILL-7063: Seperate metadata cache file 
into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#issuecomment-481062555
 
 
   Thank you @amansinha100. Squashed the commits into one commit. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6992) Support column histogram statistics

2019-04-08 Thread Aman Sinha (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812899#comment-16812899
 ] 

Aman Sinha commented on DRILL-6992:
---

Histogram creation and usage for the following data types is supported as of 
commit # 849f896:  INT, BIGINT, FLOAT, DOUBLE, TIME, DATE, TIMESTAMP, BOOLEAN.  
 Marking this umbrella JIRA as fixed.  For other data types, in particular 
VARCHAR, VARBINARY, I will open enhancement JIRA separately for a future 
release. 

> Support column histogram statistics
> ---
>
> Key: DRILL-6992
> URL: https://issues.apache.org/jira/browse/DRILL-6992
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 1.15.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> As a follow-up to 
> [DRILL-1328|https://issues.apache.org/jira/browse/DRILL-1328] which is adding 
> NDV (num distinct values) support and creating the framework for statistics, 
> we also need Histograms.   These are needed  for range predicates selectivity 
> estimation as well as equality predicates when there is non-uniform 
> distribution of data.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7119) Modify selectivity calculations to use histograms for supported data types

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812897#comment-16812897
 ] 

ASF GitHub Bot commented on DRILL-7119:
---

amansinha100 commented on pull request #1733: DRILL-7119: Compute range 
predicate selectivity using histograms.
URL: https://github.com/apache/drill/pull/1733
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Modify selectivity calculations to use histograms for supported data types
> --
>
> Key: DRILL-7119
> URL: https://issues.apache.org/jira/browse/DRILL-7119
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> (Please see parent JIRA for the design document)
> Once the t-digest based histogram is created, we need to read it back and 
> modify the selectivity calculations such that they use the histogram buckets 
> for range conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7063:
--
Labels: ready-to-commit  (was: )

> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-540) Allow querying hive views in Drill

2019-04-08 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812894#comment-16812894
 ] 

Bridget Bevens commented on DRILL-540:
--

Hi [~IhorHuzenko],

Is the following note okay?
_For storage-based authorization, access to Hive views depends on the user’s 
permissions on the underlying tables in the view definition. When a user 
selects from a Hive view, the view is expanded (converted into a query), and 
the underlying tables referenced in the query are validated for permissions._

Thanks,
Bridget



> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> Authorization*
> ||SQL||user0||user1||user2||   ||user0||user1||user2||
> |*Queries executed using Drill :*| | | | | | | |
> |S

[jira] [Updated] (DRILL-7119) Modify selectivity calculations to use histograms for supported data types

2019-04-08 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7119:
--
Labels: ready-to-commit  (was: )

> Modify selectivity calculations to use histograms for supported data types
> --
>
> Key: DRILL-7119
> URL: https://issues.apache.org/jira/browse/DRILL-7119
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> (Please see parent JIRA for the design document)
> Once the t-digest based histogram is created, we need to read it back and 
> modify the selectivity calculations such that they use the histogram buckets 
> for range conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7119) Modify selectivity calculations to use histograms for supported data types

2019-04-08 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-7119:
--
Summary: Modify selectivity calculations to use histograms for supported 
data types  (was: Modify selectivity calculations to use histograms)

> Modify selectivity calculations to use histograms for supported data types
> --
>
> Key: DRILL-7119
> URL: https://issues.apache.org/jira/browse/DRILL-7119
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> (Please see parent JIRA for the design document)
> Once the t-digest based histogram is created, we need to read it back and 
> modify the selectivity calculations such that they use the histogram buckets 
> for range conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812815#comment-16812815
 ] 

Volodymyr Vysotskyi commented on DRILL-7160:


Marking this as a blocker since it is actually a regression and should be fixed 
before the release. 

> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7160:
---
Fix Version/s: (was: 1.17.0)
   1.16.0

> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7160:
---
Priority: Blocker  (was: Major)

> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812811#comment-16812811
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

dvjyothsna commented on issue #1723: DRILL-7063: Seperate metadata cache file 
into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#issuecomment-481014934
 
 
   @amansinha100 I have addressed the review comments. Can you please take a 
look at it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-3846) Metadata Caching : A count(*) query took more time with the cache in place

2019-04-08 Thread Aman Sinha (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812807#comment-16812807
 ] 

Aman Sinha commented on DRILL-3846:
---

Let's try this after DRILL-7064 is fixed. 

> Metadata Caching : A count(*) query took more time with the cache in place
> --
>
> Key: DRILL-3846
> URL: https://issues.apache.org/jira/browse/DRILL-3846
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.16.0
>
>
> git.commit.id.abbrev=3c89b30
> I have a folder with 10k complex files. The generated cache file is around 
> 486 MB. The below numbers indicate that we regressed in terms of performance 
> when we generated the metadata cache
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from 
> `complex_sparse_5files`;
> +--+
> |  EXPR$0  |
> +--+
> | 100  |
> +--+
> 1 row selected (30.835 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata 
> `complex_sparse_5files`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table complex_sparse_5files.  
> |
> +---+-+
> 1 row selected (10.69 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from 
> `complex_sparse_5files`;
> +--+
> |  EXPR$0  |
> +--+
> | 100  |
> +--+
> 1 row selected (47.614 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-3846) Metadata Caching : A count(*) query took more time with the cache in place

2019-04-08 Thread Aman Sinha (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha reassigned DRILL-3846:
-

Assignee: Aman Sinha  (was: Venkata Jyothsna Donapati)

> Metadata Caching : A count(*) query took more time with the cache in place
> --
>
> Key: DRILL-3846
> URL: https://issues.apache.org/jira/browse/DRILL-3846
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.16.0
>
>
> git.commit.id.abbrev=3c89b30
> I have a folder with 10k complex files. The generated cache file is around 
> 486 MB. The below numbers indicate that we regressed in terms of performance 
> when we generated the metadata cache
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from 
> `complex_sparse_5files`;
> +--+
> |  EXPR$0  |
> +--+
> | 100  |
> +--+
> 1 row selected (30.835 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata 
> `complex_sparse_5files`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table complex_sparse_5files.  
> |
> +---+-+
> 1 row selected (10.69 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from 
> `complex_sparse_5files`;
> +--+
> |  EXPR$0  |
> +--+
> | 100  |
> +--+
> 1 row selected (47.614 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812800#comment-16812800
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

dvjyothsna commented on pull request #1723: DRILL-7063: Seperate metadata cache 
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r273244458
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
 ##
 @@ -149,20 +157,25 @@ public static ParquetTableMetadata_v3 
getParquetTableMetadata(Map 
paths,
  
MetadataContext metaContext,
  
ParquetReaderConfig readerConfig) {
-if (ignoreReadingMetadata(metaContext, path)) {
-  return null;
-}
 Metadata metadata = new Metadata(readerConfig);
-metadata.readBlockMeta(path, false, metaContext, fs);
+if (paths.isEmpty()) {
+  metaContext.setMetadataCacheCorrupted(true);
+}
+for (Path path: paths) {
+  if (ignoreReadingMetadata(metaContext, path)) {
+return null;
 
 Review comment:
   Added this comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812801#comment-16812801
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

dvjyothsna commented on pull request #1723: DRILL-7063: Seperate metadata cache 
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r273244597
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
 ##
 @@ -633,43 +716,169 @@ private void readBlockMeta(Path path, boolean dirsOnly, 
MetadataContext metaCont
 parquetTableMetadataDirs.updateRelativePaths(metadataParentDirPath);
 if (!alreadyCheckedModification && 
tableModified(parquetTableMetadataDirs.getDirectories(), path, 
metadataParentDir, metaContext, fs)) {
   parquetTableMetadataDirs =
-  
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null)).getRight();
+  
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null, true)).getRight();
   newMetadata = true;
 }
   } else {
-parquetTableMetadata = mapper.readValue(is, 
ParquetTableMetadataBase.class);
+if (isFileMetadata) {
+  parquetTableMetadata.assignFiles((mapper.readValue(is, 
FileMetadata.class)).getFiles());
+  if (new 
MetadataVersion(parquetTableMetadata.getMetadataVersion()).compareTo(new 
MetadataVersion(4, 0)) >= 0) {
+((ParquetTableMetadata_v4) 
parquetTableMetadata).updateRelativePaths(metadataParentDirPath);
+  }
+
+  if (!alreadyCheckedModification && 
tableModified(parquetTableMetadata.getDirectories(), path, metadataParentDir, 
metaContext, fs)) {
+parquetTableMetadata =
+
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null, true)).getLeft();
+newMetadata = true;
+  }
+} else if (isSummaryFile) {
+  MetadataSummary metadataSummary = mapper.readValue(is, 
Metadata_V4.MetadataSummary.class);
+  ParquetTableMetadata_v4 parquetTableMetadata_v4 = new 
ParquetTableMetadata_v4(metadataSummary);
+  parquetTableMetadata = (ParquetTableMetadataBase) 
parquetTableMetadata_v4;
+} else {
+  parquetTableMetadata = mapper.readValue(is, 
ParquetTableMetadataBase.class);
+  if (new 
MetadataVersion(parquetTableMetadata.getMetadataVersion()).compareTo(new 
MetadataVersion(3, 0)) >= 0) {
+((Metadata_V3.ParquetTableMetadata_v3) 
parquetTableMetadata).updateRelativePaths(metadataParentDirPath);
+  }
+  if (!alreadyCheckedModification && 
tableModified((parquetTableMetadata.getDirectories()), path, metadataParentDir, 
metaContext, fs)) {
+parquetTableMetadata =
+
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null, true)).getLeft();
+newMetadata = true;
+  }
+}
 if (timer != null) {
   logger.debug("Took {} ms to read metadata from cache file", 
timer.elapsed(TimeUnit.MILLISECONDS));
   timer.stop();
 }
-if (new 
MetadataVersion(parquetTableMetadata.getMetadataVersion()).compareTo(new 
MetadataVersion(3, 0)) >= 0) {
-  ((ParquetTableMetadata_v3) 
parquetTableMetadata).updateRelativePaths(metadataParentDirPath);
-}
-  if (!alreadyCheckedModification && 
tableModified(parquetTableMetadata.getDirectories(), path, metadataParentDir, 
metaContext, fs)) {
-  // TODO change with current columns in existing metadata (auto 
refresh feature)
-  parquetTableMetadata =
-  
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null)).getLeft();
-  newMetadata = true;
+if (!isSummaryFile) {
+  // DRILL-5009: Remove the RowGroup if it is empty
+  List files = 
parquetTableMetadata.getFiles();
+  if (files != null) {
+for (ParquetFileMetadata file : files) {
+  List rowGroups = file.getRowGroups();
+  rowGroups.removeIf(r -> r.getRowCount() == 0);
+}
+  }
 }
-
-// DRILL-5009: Remove the RowGroup if it is empty
-List files = 
parquetTableMetadata.getFiles();
-for (ParquetFileMetadata file : files) {
-  List rowGroups = file.getRowGroups();
-  rowGroups.removeIf(r -> r.getRowCount() == 0);
+if (newMetadata) {
+  // if new metadata files were created, invalidate the existing 
metadata context
+  metaContext.clear();
 }
-
-  }
-  if (newMetadata) {
-// if new metadata files wer

[jira] [Updated] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

2019-04-08 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7136:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Num_buckets for HashAgg in profile may be inaccurate
> 
>
> Key: DRILL-7136
> URL: https://issues.apache.org/jira/browse/DRILL-7136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill
>
>
> I ran TPCH query 17 with sf 1000.  Here is the query:
> {noformat}
> select
>   sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem l,
>   part p
> where
>   p.p_partkey = l.l_partkey
>   and p.p_brand = 'Brand#13'
>   and p.p_container = 'JUMBO CAN'
>   and l.l_quantity < (
> select
>   0.2 * avg(l2.l_quantity)
> from
>   lineitem l2
> where
>   l2.l_partkey = p.p_partkey
>   );
> {noformat}
> One of the hash agg operators has resized 6 times.  It should have 4M 
> buckets.  But the profile shows it has 64K buckets.
> I have attached a sample profile.  In this profile, the hash agg operator is 
> (04-02).
> {noformat}
> Operator Metrics
> Minor FragmentNUM_BUCKETS NUM_ENTRIES NUM_RESIZING
> RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB  
>   SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
> AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
> AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
> 04-00-02  65,536 748,746  6   364 1   
> 582 0   813 582,653 18  26,316,456  401 1,631,943 
>   25  26,176,350
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7160:

Fix Version/s: (was: 1.16.0)
   1.17.0

> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7119) Modify selectivity calculations to use histograms

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812794#comment-16812794
 ] 

ASF GitHub Bot commented on DRILL-7119:
---

amansinha100 commented on issue #1733: DRILL-7119: Compute range predicate 
selectivity using histograms.
URL: https://github.com/apache/drill/pull/1733#issuecomment-481012094
 
 
   @gparai I added a unit test for histogram usage and addressed your comment.  
Could you pls take another look ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Modify selectivity calculations to use histograms
> -
>
> Key: DRILL-7119
> URL: https://issues.apache.org/jira/browse/DRILL-7119
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> (Please see parent JIRA for the design document)
> Once the t-digest based histogram is created, we need to read it back and 
> modify the selectivity calculations such that they use the histogram buckets 
> for range conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7062) Run-time row group pruning

2019-04-08 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-7062:

Reviewer: Aman Sinha

> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7119) Modify selectivity calculations to use histograms

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812793#comment-16812793
 ] 

ASF GitHub Bot commented on DRILL-7119:
---

amansinha100 commented on pull request #1733: DRILL-7119: Compute range 
predicate selectivity using histograms.
URL: https://github.com/apache/drill/pull/1733#discussion_r273239384
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java
 ##
 @@ -69,27 +79,177 @@ public void setNumRowsPerBucket(long numRows) {
   }
 
   @Override
-  public Double estimatedSelectivity(RexNode filter) {
+  public Double estimatedSelectivity(final RexNode filter) {
 if (numRowsPerBucket >= 0) {
-  return 1.0;
-} else {
-  return null;
+  // at a minimum, the histogram should have a start and end point of 1 
bucket, so at least 2 entries
+  Preconditions.checkArgument(buckets.length >= 2,  "Histogram has invalid 
number of entries");
+  final int first = 0;
+  final int last = buckets.length - 1;
+
+  // number of buckets is 1 less than the total # entries in the buckets 
array since last
+  // entry is the end point of the last bucket
+  final int numBuckets = buckets.length - 1;
+  final long totalRows = numBuckets * numRowsPerBucket;
+  if (filter instanceof RexCall) {
+// get the operator
+SqlOperator op = ((RexCall) filter).getOperator();
+if (op.getKind() == SqlKind.GREATER_THAN ||
+op.getKind() == SqlKind.GREATER_THAN_OR_EQUAL) {
+  Double value = getLiteralValue(filter);
+  if (value != null) {
+
+// *** Handle the boundary conditions first ***
+
+// if value is less than or equal to the first bucket's start 
point then all rows qualify
+int result = value.compareTo(buckets[first]);
+if (result <= 0) {
+  return LARGE_SELECTIVITY;
+}
+// if value is greater than the end point of the last bucket, then 
none of the rows qualify
+result = value.compareTo(buckets[last]);
+if (result > 0) {
+  return SMALL_SELECTIVITY;
 
 Review comment:
   Done.  I added a comment when we declare the class variable. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Modify selectivity calculations to use histograms
> -
>
> Key: DRILL-7119
> URL: https://issues.apache.org/jira/browse/DRILL-7119
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> (Please see parent JIRA for the design document)
> Once the t-digest based histogram is created, we need to read it back and 
> modify the selectivity calculations such that they use the histogram buckets 
> for range conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6985) Fix sqlline.bat issues on Windows and add drill-embedded.bat

2019-04-08 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812737#comment-16812737
 ] 

Bridget Bevens edited comment on DRILL-6985 at 4/8/19 9:01 PM:
---

Hi [~vvysotskyi],

I've updated https://drill.apache.org/docs/installing-drill-on-windows/ 
and https://drill.apache.org/docs/starting-drill-on-windows/ with this 
information.

Please let me know if I need to make any other changes.

Thanks,
Bridget


was (Author: bbevens):
Hi [~vvysotskyi],

I've updated https://drill.apache.org/docs/installing-drill-on-windows/ 
with this information.

Please let me know if I need to make any other changes.

Thanks,
Bridget

> Fix sqlline.bat issues on Windows and add drill-embedded.bat
> 
>
> Key: DRILL-6985
> URL: https://issues.apache.org/jira/browse/DRILL-6985
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
> Environment: Windows 10
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> *For documentation*
> {{drill-embedded.bat}} was added as handy script to Start Drill on Windows 
> without passing any params.
> Please updated the following section: 
> https://drill.apache.org/docs/starting-drill-on-windows/
> Other issues covered in this Jira:
> {{sqlline.bat}} fails for the next cases:
>  1. Specified file in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f /tmp/q.sql
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 2. Specified file path that contains spaces:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f "/tmp/q q.sql"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> q.sql""=="test" was unexpected at this time.
> {noformat}
> 3. Specified query in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -e "select * 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> * was unexpected at this time.
> {noformat}
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -q "select 'a' 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> 'a' was unexpected at this time.
> {noformat}
> 4. Specified custom config location:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config=/tmp/conf
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 5. Specified custom config location with spaces in the path:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config="/tmp/conf test"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> test"" was unexpected at this time.
> {noformat}
> 6. Sqlline was run from non-bin directory:
> {noformat}
> apache-drill-1.15.0>bin\sqlline.bat -u "jdbc:drill:zk=local"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812764#comment-16812764
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

amansinha100 commented on pull request #1723: DRILL-7063: Seperate metadata 
cache file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r273224323
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
 ##
 @@ -633,43 +716,169 @@ private void readBlockMeta(Path path, boolean dirsOnly, 
MetadataContext metaCont
 parquetTableMetadataDirs.updateRelativePaths(metadataParentDirPath);
 if (!alreadyCheckedModification && 
tableModified(parquetTableMetadataDirs.getDirectories(), path, 
metadataParentDir, metaContext, fs)) {
   parquetTableMetadataDirs =
-  
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null)).getRight();
+  
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null, true)).getRight();
   newMetadata = true;
 }
   } else {
-parquetTableMetadata = mapper.readValue(is, 
ParquetTableMetadataBase.class);
+if (isFileMetadata) {
+  parquetTableMetadata.assignFiles((mapper.readValue(is, 
FileMetadata.class)).getFiles());
+  if (new 
MetadataVersion(parquetTableMetadata.getMetadataVersion()).compareTo(new 
MetadataVersion(4, 0)) >= 0) {
+((ParquetTableMetadata_v4) 
parquetTableMetadata).updateRelativePaths(metadataParentDirPath);
+  }
+
+  if (!alreadyCheckedModification && 
tableModified(parquetTableMetadata.getDirectories(), path, metadataParentDir, 
metaContext, fs)) {
+parquetTableMetadata =
+
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null, true)).getLeft();
+newMetadata = true;
+  }
+} else if (isSummaryFile) {
+  MetadataSummary metadataSummary = mapper.readValue(is, 
Metadata_V4.MetadataSummary.class);
+  ParquetTableMetadata_v4 parquetTableMetadata_v4 = new 
ParquetTableMetadata_v4(metadataSummary);
+  parquetTableMetadata = (ParquetTableMetadataBase) 
parquetTableMetadata_v4;
+} else {
+  parquetTableMetadata = mapper.readValue(is, 
ParquetTableMetadataBase.class);
+  if (new 
MetadataVersion(parquetTableMetadata.getMetadataVersion()).compareTo(new 
MetadataVersion(3, 0)) >= 0) {
+((Metadata_V3.ParquetTableMetadata_v3) 
parquetTableMetadata).updateRelativePaths(metadataParentDirPath);
+  }
+  if (!alreadyCheckedModification && 
tableModified((parquetTableMetadata.getDirectories()), path, metadataParentDir, 
metaContext, fs)) {
+parquetTableMetadata =
+
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null, true)).getLeft();
+newMetadata = true;
+  }
+}
 if (timer != null) {
   logger.debug("Took {} ms to read metadata from cache file", 
timer.elapsed(TimeUnit.MILLISECONDS));
   timer.stop();
 }
-if (new 
MetadataVersion(parquetTableMetadata.getMetadataVersion()).compareTo(new 
MetadataVersion(3, 0)) >= 0) {
-  ((ParquetTableMetadata_v3) 
parquetTableMetadata).updateRelativePaths(metadataParentDirPath);
-}
-  if (!alreadyCheckedModification && 
tableModified(parquetTableMetadata.getDirectories(), path, metadataParentDir, 
metaContext, fs)) {
-  // TODO change with current columns in existing metadata (auto 
refresh feature)
-  parquetTableMetadata =
-  
(createMetaFilesRecursivelyAsProcessUser(Path.getPathWithoutSchemeAndAuthority(path.getParent()),
 fs, true, null)).getLeft();
-  newMetadata = true;
+if (!isSummaryFile) {
+  // DRILL-5009: Remove the RowGroup if it is empty
+  List files = 
parquetTableMetadata.getFiles();
+  if (files != null) {
+for (ParquetFileMetadata file : files) {
+  List rowGroups = file.getRowGroups();
+  rowGroups.removeIf(r -> r.getRowCount() == 0);
+}
+  }
 }
-
-// DRILL-5009: Remove the RowGroup if it is empty
-List files = 
parquetTableMetadata.getFiles();
-for (ParquetFileMetadata file : files) {
-  List rowGroups = file.getRowGroups();
-  rowGroups.removeIf(r -> r.getRowCount() == 0);
+if (newMetadata) {
+  // if new metadata files were created, invalidate the existing 
metadata context
+  metaContext.clear();
 }
-
-  }
-  if (newMetadata) {
-// if new metadata files w

[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812752#comment-16812752
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

amansinha100 commented on pull request #1723: DRILL-7063: Seperate metadata 
cache file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r273217885
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
 ##
 @@ -149,20 +157,25 @@ public static ParquetTableMetadata_v3 
getParquetTableMetadata(Map 
paths,
  
MetadataContext metaContext,
  
ParquetReaderConfig readerConfig) {
-if (ignoreReadingMetadata(metaContext, path)) {
-  return null;
-}
 Metadata metadata = new Metadata(readerConfig);
-metadata.readBlockMeta(path, false, metaContext, fs);
+if (paths.isEmpty()) {
+  metaContext.setMetadataCacheCorrupted(true);
+}
+for (Path path: paths) {
+  if (ignoreReadingMetadata(metaContext, path)) {
+return null;
 
 Review comment:
   Pls add a comment indicating that the ignore flag is set as part of 
`readBlockMeta()` which is called later in this loop, so each iteration of the 
loop needs to check if a previous call to readBlockMeta found a corrupted 
metadata.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812742#comment-16812742
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

vvysotskyi commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273211742
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logical

[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812741#comment-16812741
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

vvysotskyi commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273213654
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logical

[jira] [Commented] (DRILL-6985) Fix sqlline.bat issues on Windows and add drill-embedded.bat

2019-04-08 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812737#comment-16812737
 ] 

Bridget Bevens commented on DRILL-6985:
---

Hi [~vvysotskyi],

I've updated https://drill.apache.org/docs/installing-drill-on-windows/ 
with this information.

Please let me know if I need to make any other changes.

Thanks,
Bridget

> Fix sqlline.bat issues on Windows and add drill-embedded.bat
> 
>
> Key: DRILL-6985
> URL: https://issues.apache.org/jira/browse/DRILL-6985
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
> Environment: Windows 10
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> *For documentation*
> {{drill-embedded.bat}} was added as handy script to Start Drill on Windows 
> without passing any params.
> Please updated the following section: 
> https://drill.apache.org/docs/starting-drill-on-windows/
> Other issues covered in this Jira:
> {{sqlline.bat}} fails for the next cases:
>  1. Specified file in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f /tmp/q.sql
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 2. Specified file path that contains spaces:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f "/tmp/q q.sql"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> q.sql""=="test" was unexpected at this time.
> {noformat}
> 3. Specified query in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -e "select * 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> * was unexpected at this time.
> {noformat}
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -q "select 'a' 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> 'a' was unexpected at this time.
> {noformat}
> 4. Specified custom config location:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config=/tmp/conf
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 5. Specified custom config location with spaces in the path:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config="/tmp/conf test"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> test"" was unexpected at this time.
> {noformat}
> 6. Sqlline was run from non-bin directory:
> {noformat}
> apache-drill-1.15.0>bin\sqlline.bat -u "jdbc:drill:zk=local"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6985) Fix sqlline.bat issues on Windows and add drill-embedded.bat

2019-04-08 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6985:
--
Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> Fix sqlline.bat issues on Windows and add drill-embedded.bat
> 
>
> Key: DRILL-6985
> URL: https://issues.apache.org/jira/browse/DRILL-6985
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
> Environment: Windows 10
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> *For documentation*
> {{drill-embedded.bat}} was added as handy script to Start Drill on Windows 
> without passing any params.
> Please updated the following section: 
> https://drill.apache.org/docs/starting-drill-on-windows/
> Other issues covered in this Jira:
> {{sqlline.bat}} fails for the next cases:
>  1. Specified file in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f /tmp/q.sql
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 2. Specified file path that contains spaces:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f "/tmp/q q.sql"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> q.sql""=="test" was unexpected at this time.
> {noformat}
> 3. Specified query in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -e "select * 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> * was unexpected at this time.
> {noformat}
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -q "select 'a' 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> 'a' was unexpected at this time.
> {noformat}
> 4. Specified custom config location:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config=/tmp/conf
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 5. Specified custom config location with spaces in the path:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config="/tmp/conf test"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> test"" was unexpected at this time.
> {noformat}
> 6. Sqlline was run from non-bin directory:
> {noformat}
> apache-drill-1.15.0>bin\sqlline.bat -u "jdbc:drill:zk=local"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2019-04-08 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812734#comment-16812734
 ] 

Bridget Bevens commented on DRILL-5679:
---

Hi [~arina],
 
I've updated https://drill.apache.org/docs/installing-drill-on-windows/ with 
the information here. 
Was this JIRA meant for Drill in distributed mode on Windows, embedded mode on 
Windows, both?
Currently, we do not have a Drill in distributed mode on Windows doc, but I can 
add a note in the prereqs section for Drill in distributed mode.

Thanks,
Bridget

> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: Future
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2019-04-08 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-5679:
--
Labels: doc-complete  (was: doc-impacting)

> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: Future
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812640#comment-16812640
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

dvjyothsna commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273161135
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logical

[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812637#comment-16812637
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

dvjyothsna commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273158196
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logical

[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812626#comment-16812626
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273124323
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logic

[jira] [Updated] (DRILL-7159) After renaming MAP to STRUCT typeString method still outputs MAP name

2019-04-08 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7159:
---
Labels: ready-to-commit  (was: )

> After renaming MAP to STRUCT typeString method still outputs MAP name
> -
>
> Key: DRILL-7159
> URL: https://issues.apache.org/jira/browse/DRILL-7159
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> After renaming MAP to STRUCT typeString method still outputs MAP name.
> Reproduce:
> {noformat}
> apache drill> CREATE or replace SCHEMA
> . .semicolon> (
> . . . . . .)> varchar_column VARCHAR(10) NOT NULL, 
> . . . . . .)> struct_column STRUCT>
> . . . . . .)> )
> . .semicolon>  FOR TABLE dfs.tmp.`text_table`;
> {noformat}
> Error:
> {noformat}
> apache drill> describe schema for table dfs.tmp.`text_table`;
> Error: RESOURCE ERROR: Cannot construct instance of 
> `org.apache.drill.exec.record.metadata.AbstractColumnMetadata`, problem: Line 
> [1], position [16], offending symbol [@1,16:18='MAP',<26>,1:16]: no viable 
> alternative at input '`struct_column`MAP'
>  at [Source: 
> (org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream); line: 14, 
> column: 7] (through reference chain: 
> org.apache.drill.exec.record.metadata.schema.SchemaContainer["schema"]->org.apache.drill.exec.record.metadata.TupleSchema["columns"]->java.util.ArrayList[1])
> Error while accessing table location for [dfs.tmp.text_table]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-08 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-7160:
--

 Summary: exec.query.max_rows QUERY-level options are shown on 
Profiles tab
 Key: DRILL-7160
 URL: https://issues.apache.org/jira/browse/DRILL-7160
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Affects Versions: 1.16.0
Reporter: Volodymyr Vysotskyi
Assignee: Kunal Khatua
 Fix For: 1.16.0


As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
Profiles even when it was not set explicitly. The issue is because the option 
is being set on the query level internally.

>From the code, looks like it is set in 
>{{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
>value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7159) After renaming MAP to STRUCT typeString method still outputs MAP name

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812579#comment-16812579
 ] 

ASF GitHub Bot commented on DRILL-7159:
---

arina-ielchiieva commented on issue #1741: DRILL-7159: Fix typeString method to 
return correct name for MAP (aka STRUCT)
URL: https://github.com/apache/drill/pull/1741#issuecomment-480907197
 
 
   @vvysotskyi please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> After renaming MAP to STRUCT typeString method still outputs MAP name
> -
>
> Key: DRILL-7159
> URL: https://issues.apache.org/jira/browse/DRILL-7159
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> After renaming MAP to STRUCT typeString method still outputs MAP name.
> Reproduce:
> {noformat}
> apache drill> CREATE or replace SCHEMA
> . .semicolon> (
> . . . . . .)> varchar_column VARCHAR(10) NOT NULL, 
> . . . . . .)> struct_column STRUCT>
> . . . . . .)> )
> . .semicolon>  FOR TABLE dfs.tmp.`text_table`;
> {noformat}
> Error:
> {noformat}
> apache drill> describe schema for table dfs.tmp.`text_table`;
> Error: RESOURCE ERROR: Cannot construct instance of 
> `org.apache.drill.exec.record.metadata.AbstractColumnMetadata`, problem: Line 
> [1], position [16], offending symbol [@1,16:18='MAP',<26>,1:16]: no viable 
> alternative at input '`struct_column`MAP'
>  at [Source: 
> (org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream); line: 14, 
> column: 7] (through reference chain: 
> org.apache.drill.exec.record.metadata.schema.SchemaContainer["schema"]->org.apache.drill.exec.record.metadata.TupleSchema["columns"]->java.util.ArrayList[1])
> Error while accessing table location for [dfs.tmp.text_table]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7159) After renaming MAP to STRUCT typeString method still outputs MAP name

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812577#comment-16812577
 ] 

ASF GitHub Bot commented on DRILL-7159:
---

arina-ielchiieva commented on pull request #1741: DRILL-7159: Fix typeString 
method to return correct name for MAP (aka STRUCT)
URL: https://github.com/apache/drill/pull/1741
 
 
   Jira [DRILL-7159](https://issues.apache.org/jira/browse/DRILL-7159).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> After renaming MAP to STRUCT typeString method still outputs MAP name
> -
>
> Key: DRILL-7159
> URL: https://issues.apache.org/jira/browse/DRILL-7159
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> After renaming MAP to STRUCT typeString method still outputs MAP name.
> Reproduce:
> {noformat}
> apache drill> CREATE or replace SCHEMA
> . .semicolon> (
> . . . . . .)> varchar_column VARCHAR(10) NOT NULL, 
> . . . . . .)> struct_column STRUCT>
> . . . . . .)> )
> . .semicolon>  FOR TABLE dfs.tmp.`text_table`;
> {noformat}
> Error:
> {noformat}
> apache drill> describe schema for table dfs.tmp.`text_table`;
> Error: RESOURCE ERROR: Cannot construct instance of 
> `org.apache.drill.exec.record.metadata.AbstractColumnMetadata`, problem: Line 
> [1], position [16], offending symbol [@1,16:18='MAP',<26>,1:16]: no viable 
> alternative at input '`struct_column`MAP'
>  at [Source: 
> (org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream); line: 14, 
> column: 7] (through reference chain: 
> org.apache.drill.exec.record.metadata.schema.SchemaContainer["schema"]->org.apache.drill.exec.record.metadata.TupleSchema["columns"]->java.util.ArrayList[1])
> Error while accessing table location for [dfs.tmp.text_table]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7159) After renaming MAP to STRUCT typeString method still outputs MAP name

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7159:

Reviewer: Volodymyr Vysotskyi

> After renaming MAP to STRUCT typeString method still outputs MAP name
> -
>
> Key: DRILL-7159
> URL: https://issues.apache.org/jira/browse/DRILL-7159
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> After renaming MAP to STRUCT typeString method still outputs MAP name.
> Reproduce:
> {noformat}
> apache drill> CREATE or replace SCHEMA
> . .semicolon> (
> . . . . . .)> varchar_column VARCHAR(10) NOT NULL, 
> . . . . . .)> struct_column STRUCT>
> . . . . . .)> )
> . .semicolon>  FOR TABLE dfs.tmp.`text_table`;
> {noformat}
> Error:
> {noformat}
> apache drill> describe schema for table dfs.tmp.`text_table`;
> Error: RESOURCE ERROR: Cannot construct instance of 
> `org.apache.drill.exec.record.metadata.AbstractColumnMetadata`, problem: Line 
> [1], position [16], offending symbol [@1,16:18='MAP',<26>,1:16]: no viable 
> alternative at input '`struct_column`MAP'
>  at [Source: 
> (org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream); line: 14, 
> column: 7] (through reference chain: 
> org.apache.drill.exec.record.metadata.schema.SchemaContainer["schema"]->org.apache.drill.exec.record.metadata.TupleSchema["columns"]->java.util.ArrayList[1])
> Error while accessing table location for [dfs.tmp.text_table]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812540#comment-16812540
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273124323
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logic

[jira] [Commented] (DRILL-7145) Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI

2019-04-08 Thread Anton Gozhiy (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812537#comment-16812537
 ] 

Anton Gozhiy commented on DRILL-7145:
-

Merged into master with commit 463f01627c298eb0f29396f635ecfbb945800c80.

> Exceptions happened during retrieving values from ValueVector are not being 
> displayed at the Drill Web UI
> -
>
> Key: DRILL-7145
> URL: https://issues.apache.org/jira/browse/DRILL-7145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> *Data:*
> A text file with the following content:
> {noformat}
> Id,col1,col2
> 1,aaa,bbb
> 2,ccc,ddd
> 3,eee
> 4,fff,ggg
> {noformat}
> Note that the record with id 3 has not value for the third column.
> exec.storage.enable_v3_text_reader should be false.
> *Submit the query from the Web UI:*
> {code:sql}
> select * from 
> table(dfs.tmp.`/drill/text/test`(type=>'text',lineDelimiter=>'\n',fieldDelimiter=>',',extractHeader=>true))
> {code}
> *Expected result:*
> Exception should happen due to DRILL-4814. It should be properly displayed.
> *Actual result:*
> Incorrect data is returned but without error. Query status: success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7159) After renaming MAP to STRUCT typeString method still outputs MAP name

2019-04-08 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7159:
---

 Summary: After renaming MAP to STRUCT typeString method still 
outputs MAP name
 Key: DRILL-7159
 URL: https://issues.apache.org/jira/browse/DRILL-7159
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.16.0


After renaming MAP to STRUCT typeString method still outputs MAP name.

Reproduce:
{noformat}
apache drill> CREATE or replace SCHEMA
. .semicolon> (
. . . . . .)> varchar_column VARCHAR(10) NOT NULL, 
. . . . . .)> struct_column STRUCT>
. . . . . .)> )
. .semicolon>  FOR TABLE dfs.tmp.`text_table`;
{noformat}

Error:
{noformat}
apache drill> describe schema for table dfs.tmp.`text_table`;
Error: RESOURCE ERROR: Cannot construct instance of 
`org.apache.drill.exec.record.metadata.AbstractColumnMetadata`, problem: Line 
[1], position [16], offending symbol [@1,16:18='MAP',<26>,1:16]: no viable 
alternative at input '`struct_column`MAP'
 at [Source: 
(org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream); line: 14, 
column: 7] (through reference chain: 
org.apache.drill.exec.record.metadata.schema.SchemaContainer["schema"]->org.apache.drill.exec.record.metadata.TupleSchema["columns"]->java.util.ArrayList[1])

Error while accessing table location for [dfs.tmp.text_table]

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812515#comment-16812515
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

amansinha100 commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-480885230
 
 
   @vvysotskyi thanks for the explanations.  I am mostly good with the PR..so 
its a +1 from me but it would be good if @gparai can take a quick look at the 
statistics related changes. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7158) null values for varchar, interval, boolean are displayed as empty string in SqlLine

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7158:

Description: 
null values for varchar, interval, boolean are displayed as empty string in 
SqlLine.
Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
Possible workaround to set nullValue other case than lower: {{!set nullValue 
Null}}.

Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
SqlLine.


  was:
null values for varchar, interval, boolean are displayed as empty in SqLine.
Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
Possible workaround to set nullValue other case than lower: {{!set nullValue 
Null}}.

Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
SqlLine.



> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine
> ---
>
> Key: DRILL-7158
> URL: https://issues.apache.org/jira/browse/DRILL-7158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine.
> Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
> Possible workaround to set nullValue other case than lower: {{!set nullValue 
> Null}}.
> Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
> SqlLine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7158) null values for varchar, interval, boolean are displayed as empty string in SqlLine

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7158:

Summary: null values for varchar, interval, boolean are displayed as empty 
string in SqlLine  (was: null values for varchar, interval, boolean are 
displayed as empty in SqLine)

> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine
> ---
>
> Key: DRILL-7158
> URL: https://issues.apache.org/jira/browse/DRILL-7158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> null values for varchar, interval, boolean are displayed as empty in SqLine.
> Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
> Possible workaround to set nullValue other case than lower: {{!set nullValue 
> Null}}.
> Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
> SqlLine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7158) null values for varchar, interval, boolean are displayed as empty in SqLine

2019-04-08 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-7158:
---

 Summary: null values for varchar, interval, boolean are displayed 
as empty in SqLine
 Key: DRILL-7158
 URL: https://issues.apache.org/jira/browse/DRILL-7158
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Arina Ielchiieva
 Fix For: 1.17.0


null values for varchar, interval, boolean are displayed as empty in SqLine.
Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
Possible workaround to set nullValue other case than lower: {{!set nullValue 
Null}}.

Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
SqlLine.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7144) sqlline option : !set useLineContinuation false, fails with ParseException

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7144.
-
Resolution: Fixed

> sqlline option : !set useLineContinuation false, fails with ParseException
> --
>
> Key: DRILL-7144
> URL: https://issues.apache.org/jira/browse/DRILL-7144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.15.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>Priority: Major
>
> sqlline option does not work as intended. Returns ParseException instead.
> !set useLineContinuation false
> On mapr-drill-1.13.0 we hit the below Exception.
> {noformat}
> 0: jdbc:drill:drillbit=drill-abcd-dev.dev.schw> !set useLineContinuation false
> Error setting configuration: useLineContinuation: 
> java.lang.IllegalArgumentException: No method matching 
> "setuseLineContinuation" was found in sqlline.SqlLineOpts.
> {noformat}
> It does not work on drill-1.15.0-mapr-r1
> git.branch=drill-1.15.0-mapr-r1
> git.commit.id=ebc9fe49d4477b04701fdd81884d5a0b748a13ae
> {noformat}
> [test@test-ab bin]# ./sqlline -u 
> "jdbc:drill:schema=dfs.tmp;auth=MAPRSASL;drillbit=test-ab.qa.lab" -n mapr -p 
> mapr
> Apache Drill 1.15.0.3-mapr
> "Start your SQL engine."
> 0: jdbc:drill:schema=dfs.tmp> !set useLineContinuation false
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.version
> > select * from sys.memory
> Error: PARSE ERROR: Encountered "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  "NATURAL" ...
>  "JOIN" ...
>  "INNER" ...
>  "LEFT" ...
>  "RIGHT" ...
>  "FULL" ...
>  "CROSS" ...
>  "," ...
>  "OUTER" ...
>  "EXTEND" ...
>  "(" ...
>  "MATCH_RECOGNIZE" ...
>  "AS" ...
>   ...
>   ...
>   ...
>   ...
>   ...
>  "TABLESAMPLE" ...
>  "WHERE" ...
>  "GROUP" ...
>  "HAVING" ...
>  "WINDOW" ...
>  "UNION" ...
>  "INTERSECT" ...
>  "EXCEPT" ...
>  "MINUS" ...
>  "." ...
>  "[" ...
> SQL Query select * from sys.version
> select * from sys.memory
> ^
> [Error Id: 067d5402-b965-4660-8981-34491ab5a051 on test-ab.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> [Error Id: 067d5402-b965-4660-8981-34491ab5a051 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.SqlConverter.parse(SqlConverter.java:185) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:138)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:110)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:76)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_151]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_151]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered 
> "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  ...
>  "[" ...
> at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.convertException(DrillParserImpl.java:350)
>  ~[drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.normalizeException(DrillParserImpl.java:131)
>  ~[drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:137) 
> ~[calcite-core-1.17.0-drill-r2.jar:1.17.0-drill-r2]
>  at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:162) 
> ~[calcite-core-1.17.0-drill-r2.jar:1.17.0-drill-r2]
>  at 
> org.apache.drill.exec.planner.sql.SqlConverter.parse(SqlConverter.java:177) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  ... 8 common frames omitted
> Caused by: org.apache.drill.exec.planner.sql.parser.impl.ParseException: 
> Encountered "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  "NATURAL" ...
>  ...
>  ...
>  "[" ...
> at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.generateParseException(DrillParse

[jira] [Updated] (DRILL-7144) sqlline option : !set useLineContinuation false, fails with ParseException

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7144:

Fix Version/s: 1.16.0

> sqlline option : !set useLineContinuation false, fails with ParseException
> --
>
> Key: DRILL-7144
> URL: https://issues.apache.org/jira/browse/DRILL-7144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.15.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> sqlline option does not work as intended. Returns ParseException instead.
> !set useLineContinuation false
> On mapr-drill-1.13.0 we hit the below Exception.
> {noformat}
> 0: jdbc:drill:drillbit=drill-abcd-dev.dev.schw> !set useLineContinuation false
> Error setting configuration: useLineContinuation: 
> java.lang.IllegalArgumentException: No method matching 
> "setuseLineContinuation" was found in sqlline.SqlLineOpts.
> {noformat}
> It does not work on drill-1.15.0-mapr-r1
> git.branch=drill-1.15.0-mapr-r1
> git.commit.id=ebc9fe49d4477b04701fdd81884d5a0b748a13ae
> {noformat}
> [test@test-ab bin]# ./sqlline -u 
> "jdbc:drill:schema=dfs.tmp;auth=MAPRSASL;drillbit=test-ab.qa.lab" -n mapr -p 
> mapr
> Apache Drill 1.15.0.3-mapr
> "Start your SQL engine."
> 0: jdbc:drill:schema=dfs.tmp> !set useLineContinuation false
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.version
> > select * from sys.memory
> Error: PARSE ERROR: Encountered "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  "NATURAL" ...
>  "JOIN" ...
>  "INNER" ...
>  "LEFT" ...
>  "RIGHT" ...
>  "FULL" ...
>  "CROSS" ...
>  "," ...
>  "OUTER" ...
>  "EXTEND" ...
>  "(" ...
>  "MATCH_RECOGNIZE" ...
>  "AS" ...
>   ...
>   ...
>   ...
>   ...
>   ...
>  "TABLESAMPLE" ...
>  "WHERE" ...
>  "GROUP" ...
>  "HAVING" ...
>  "WINDOW" ...
>  "UNION" ...
>  "INTERSECT" ...
>  "EXCEPT" ...
>  "MINUS" ...
>  "." ...
>  "[" ...
> SQL Query select * from sys.version
> select * from sys.memory
> ^
> [Error Id: 067d5402-b965-4660-8981-34491ab5a051 on test-ab.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> [Error Id: 067d5402-b965-4660-8981-34491ab5a051 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.SqlConverter.parse(SqlConverter.java:185) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:138)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:110)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:76)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_151]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_151]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered 
> "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  ...
>  "[" ...
> at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.convertException(DrillParserImpl.java:350)
>  ~[drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.normalizeException(DrillParserImpl.java:131)
>  ~[drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:137) 
> ~[calcite-core-1.17.0-drill-r2.jar:1.17.0-drill-r2]
>  at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:162) 
> ~[calcite-core-1.17.0-drill-r2.jar:1.17.0-drill-r2]
>  at 
> org.apache.drill.exec.planner.sql.SqlConverter.parse(SqlConverter.java:177) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  ... 8 common frames omitted
> Caused by: org.apache.drill.exec.planner.sql.parser.impl.ParseException: 
> Encountered "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  "NATURAL" ...
>  ...
>  ...
>  "[" ...
> at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl

[jira] [Commented] (DRILL-7144) sqlline option : !set useLineContinuation false, fails with ParseException

2019-04-08 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812495#comment-16812495
 ] 

Arina Ielchiieva commented on DRILL-7144:
-

Not reproduced after SqlLine upgrade 
(https://issues.apache.org/jira/browse/DRILL-6989).

> sqlline option : !set useLineContinuation false, fails with ParseException
> --
>
> Key: DRILL-7144
> URL: https://issues.apache.org/jira/browse/DRILL-7144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.15.0
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> sqlline option does not work as intended. Returns ParseException instead.
> !set useLineContinuation false
> On mapr-drill-1.13.0 we hit the below Exception.
> {noformat}
> 0: jdbc:drill:drillbit=drill-abcd-dev.dev.schw> !set useLineContinuation false
> Error setting configuration: useLineContinuation: 
> java.lang.IllegalArgumentException: No method matching 
> "setuseLineContinuation" was found in sqlline.SqlLineOpts.
> {noformat}
> It does not work on drill-1.15.0-mapr-r1
> git.branch=drill-1.15.0-mapr-r1
> git.commit.id=ebc9fe49d4477b04701fdd81884d5a0b748a13ae
> {noformat}
> [test@test-ab bin]# ./sqlline -u 
> "jdbc:drill:schema=dfs.tmp;auth=MAPRSASL;drillbit=test-ab.qa.lab" -n mapr -p 
> mapr
> Apache Drill 1.15.0.3-mapr
> "Start your SQL engine."
> 0: jdbc:drill:schema=dfs.tmp> !set useLineContinuation false
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.version
> > select * from sys.memory
> Error: PARSE ERROR: Encountered "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  "NATURAL" ...
>  "JOIN" ...
>  "INNER" ...
>  "LEFT" ...
>  "RIGHT" ...
>  "FULL" ...
>  "CROSS" ...
>  "," ...
>  "OUTER" ...
>  "EXTEND" ...
>  "(" ...
>  "MATCH_RECOGNIZE" ...
>  "AS" ...
>   ...
>   ...
>   ...
>   ...
>   ...
>  "TABLESAMPLE" ...
>  "WHERE" ...
>  "GROUP" ...
>  "HAVING" ...
>  "WINDOW" ...
>  "UNION" ...
>  "INTERSECT" ...
>  "EXCEPT" ...
>  "MINUS" ...
>  "." ...
>  "[" ...
> SQL Query select * from sys.version
> select * from sys.memory
> ^
> [Error Id: 067d5402-b965-4660-8981-34491ab5a051 on test-ab.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> [Error Id: 067d5402-b965-4660-8981-34491ab5a051 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.SqlConverter.parse(SqlConverter.java:185) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:138)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:110)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:76)
>  [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_151]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_151]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered 
> "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "FETCH" ...
>  ...
>  "[" ...
> at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.convertException(DrillParserImpl.java:350)
>  ~[drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at 
> org.apache.drill.exec.planner.sql.parser.impl.DrillParserImpl.normalizeException(DrillParserImpl.java:131)
>  ~[drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:137) 
> ~[calcite-core-1.17.0-drill-r2.jar:1.17.0-drill-r2]
>  at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:162) 
> ~[calcite-core-1.17.0-drill-r2.jar:1.17.0-drill-r2]
>  at 
> org.apache.drill.exec.planner.sql.SqlConverter.parse(SqlConverter.java:177) 
> [drill-java-exec-1.15.0.3-mapr.jar:1.15.0.3-mapr]
>  ... 8 common frames omitted
> Caused by: org.apache.drill.exec.planner.sql.parser.impl.ParseException: 
> Encountered "select" at line 2, column 1.
> Was expecting one of:
>  
>  "ORDER" ...
>  "LIMIT" ...
>  "OFFSET" ...
>  "

[jira] [Updated] (DRILL-6835) Schema Provision using File / Table Function

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6835:

Description: 
Schema Provision using File / Table Function design document:

https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit?usp=sharing

Phase 1 functional specification - 
https://docs.google.com/document/d/1ExVgx2FDqxAz5GTqyWt-_1-UqwRSTGLGEYuc8gsESG8/edit?usp=sharing

  was:
Schema Provision using File / Table Function design document:

https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit?usp=sharing

Phase 1 design document - 
https://docs.google.com/document/d/1ExVgx2FDqxAz5GTqyWt-_1-UqwRSTGLGEYuc8gsESG8/edit?usp=sharing


> Schema Provision using File / Table Function
> 
>
> Key: DRILL-6835
> URL: https://issues.apache.org/jira/browse/DRILL-6835
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Schema Provision using File / Table Function design document:
> https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit?usp=sharing
> Phase 1 functional specification - 
> https://docs.google.com/document/d/1ExVgx2FDqxAz5GTqyWt-_1-UqwRSTGLGEYuc8gsESG8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7157:

Issue Type: Sub-task  (was: Bug)
Parent: DRILL-6835

> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7045) UDF string_binary java.lang.IndexOutOfBoundsException:

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812327#comment-16812327
 ] 

ASF GitHub Bot commented on DRILL-7045:
---

arina-ielchiieva commented on issue #1734: DRILL-7045: UDF string_binary 
java.lang.IndexOutOfBoundsException
URL: https://github.com/apache/drill/pull/1734#issuecomment-480780319
 
 
   Merged with commits:
   a1986a3fec1634812712e47be0be2565b303ea2d
   771fd270b684bbf388c0e2fa10b359eba3dfdb7c
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UDF string_binary java.lang.IndexOutOfBoundsException:
> --
>
> Key: DRILL-7045
> URL: https://issues.apache.org/jira/browse/DRILL-7045
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Given a large field like
>  
> cat input.json
> { "col0": 
> "lajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlk

[jira] [Commented] (DRILL-7045) UDF string_binary java.lang.IndexOutOfBoundsException:

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812328#comment-16812328
 ] 

ASF GitHub Bot commented on DRILL-7045:
---

arina-ielchiieva commented on pull request #1734: DRILL-7045: UDF string_binary 
java.lang.IndexOutOfBoundsException
URL: https://github.com/apache/drill/pull/1734
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UDF string_binary java.lang.IndexOutOfBoundsException:
> --
>
> Key: DRILL-7045
> URL: https://issues.apache.org/jira/browse/DRILL-7045
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Given a large field like
>  
> cat input.json
> { "col0": 
> "lajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkf

[jira] [Commented] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812323#comment-16812323
 ] 

ASF GitHub Bot commented on DRILL-7157:
---

asfgit commented on pull request #1740: DRILL-7157: Wrap SchemaParsingException 
into UserException when creating schema
URL: https://github.com/apache/drill/pull/1740
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812322#comment-16812322
 ] 

ASF GitHub Bot commented on DRILL-7143:
---

asfgit commented on pull request #1726: DRILL-7143: Support default value for 
empty columns
URL: https://github.com/apache/drill/pull/1726
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7049) REST API returns the toString of byte arrays (VARBINARY types)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812324#comment-16812324
 ] 

ASF GitHub Bot commented on DRILL-7049:
---

asfgit commented on pull request #1739: DRILL-7049: REST API returns the 
toString of byte arrays (VARBINARY types)
URL: https://github.com/apache/drill/pull/1739
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API returns the toString of byte arrays (VARBINARY types)
> --
>
> Key: DRILL-7049
> URL: https://issues.apache.org/jira/browse/DRILL-7049
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server, Web Server
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Doing a query using the REST API will return VARBINARY columns as a Java byte 
> array hashcode instead of the actual data of the VARBINARY.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-5239) Drill text reader reports wrong results when column value starts with '#'

2019-04-08 Thread Hefei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hefei Li reassigned DRILL-5239:
---

Assignee: (was: Hefei Li)

> Drill text reader reports wrong results when column value starts with '#'
> -
>
> Key: DRILL-5239
> URL: https://issues.apache.org/jira/browse/DRILL-5239
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Priority: Blocker
>  Labels: doc-impacting
> Fix For: Future
>
>
> git.commit.id.abbrev=2af709f
> Data Set :
> {code}
> D|32
> 8h|234
> ;#|3489
> ^$*(|308
> #|98
> {code}
> Wrong Result : (Last row is missing)
> {code}
> select columns[0] as col1, columns[1] as col2 from 
> dfs.`/drill/testdata/wtf2.tbl`;
> +---+---+
> | col1  | col2  |
> +---+---+
> | D | 32|
> | 8h| 234   |
> | ;#| 3489  |
> | ^$*(  | 308   |
> +---+---+
> 4 rows selected (0.233 seconds)
> {code}
> The issue does not however happen with a parquet file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812299#comment-16812299
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#discussion_r272945631
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/metastore/ColumnStatisticsKind.java
 ##
 @@ -106,6 +107,53 @@ public boolean isValueStatistic() {
 public boolean isExact() {
   return true;
 }
+  },
+
+  /**
+   * Column statistics kind which represents number of non-null values for the 
specific column.
+   */
+  NON_NULL_COUNT(Statistic.NNROWCOUNT) {
+@Override
+public Double mergeStatistics(List 
statisticsList) {
+  double nonNullRowCount = 0;
+  for (ColumnStatistics statistics : statisticsList) {
+Double nnRowCount = (Double) statistics.getStatistic(this);
+if (nnRowCount != null) {
+  nonNullRowCount += nnRowCount;
+}
+  }
+  return nonNullRowCount;
+}
+  },
+
+  /**
+   * Column statistics kind which represents number of distinct values for the 
specific column.
+   */
+  NVD(Statistic.NDV) {
+@Override
+public Object mergeStatistics(List 
statisticsList) {
+  throw new UnsupportedOperationException("Cannot merge statistics for 
NDV");
+}
+  },
+
+  /**
+   * Column statistics kind which width of the specific column.
+   */
+  AVG_WIDTH(Statistic.AVG_WIDTH) {
+@Override
+public Object mergeStatistics(List 
statisticsList) {
+  throw new UnsupportedOperationException("Cannot merge statistics for 
avg_width");
+}
+  },
+
+  /**
+   * Column statistics kind which width of the specific column.
 
 Review comment:
   Thanks, missed it during the rebase onto the master.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812297#comment-16812297
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#discussion_r272945731
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/metastore/ColumnStatisticsKind.java
 ##
 @@ -106,6 +107,53 @@ public boolean isValueStatistic() {
 public boolean isExact() {
   return true;
 }
+  },
+
+  /**
+   * Column statistics kind which represents number of non-null values for the 
specific column.
+   */
+  NON_NULL_COUNT(Statistic.NNROWCOUNT) {
+@Override
+public Double mergeStatistics(List 
statisticsList) {
+  double nonNullRowCount = 0;
+  for (ColumnStatistics statistics : statisticsList) {
+Double nnRowCount = (Double) statistics.getStatistic(this);
+if (nnRowCount != null) {
+  nonNullRowCount += nnRowCount;
+}
+  }
+  return nonNullRowCount;
+}
+  },
+
+  /**
+   * Column statistics kind which represents number of distinct values for the 
specific column.
+   */
+  NVD(Statistic.NDV) {
+@Override
+public Object mergeStatistics(List 
statisticsList) {
+  throw new UnsupportedOperationException("Cannot merge statistics for 
NDV");
+}
+  },
+
+  /**
+   * Column statistics kind which width of the specific column.
+   */
+  AVG_WIDTH(Statistic.AVG_WIDTH) {
+@Override
+public Object mergeStatistics(List 
statisticsList) {
+  throw new UnsupportedOperationException("Cannot merge statistics for 
avg_width");
+}
+  },
+
+  /**
+   * Column statistics kind which width of the specific column.
+   */
+  HISTOGRAM("histogram") {
+@Override
+public Object mergeStatistics(List 
statisticsList) {
+  throw new UnsupportedOperationException("Cannot merge statistics for 
avg_width");
 
 Review comment:
   Thanks, fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812296#comment-16812296
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#discussion_r272954092
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/MetadataProviderManager.java
 ##
 @@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import org.apache.drill.exec.planner.common.DrillStatsTable;
+import org.apache.drill.exec.record.metadata.schema.SchemaProvider;
+
+/**
+ * Base interface for passing and obtaining {@link SchemaProvider}, {@link 
DrillStatsTable} and
+ * {@link TableMetadataProvider}, responsible for creating required
+ * {@link TableMetadataProviderBuilder} which constructs required {@link 
TableMetadataProvider}
+ * based on specified providers
+ */
+public interface MetadataProviderManager {
 
 Review comment:
   `MetadataProviderManager` exposes `SchemaProvider` and `StatsProvider` in 
order to pass them into builder for constructing `TableMetadataProvider`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812298#comment-16812298
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#discussion_r272964601
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java
 ##
 @@ -49,9 +50,7 @@
   private final String userName;
   private GroupScan scan;
   private SessionOptionManager options;
-  // Stores the statistics(rowcount, NDV etc.) associated with the table
-  private DrillStatsTable statsTable;
-  private TupleMetadata schema;
+  private MetadataProviderManager metadataProviderManager;
 
 Review comment:
   This solution wasn't a result of following a particular Design Pattern. It 
was done in order to allow caching of `MetadataProvider` and preserve lazy 
initialization. `MetadataProviderManager` will be stored to the cache and once 
`MetadataProvider` is created, `MetadataProviderManager` will receive a link to 
it and it will be reused for constructing all further `MetadataProvider` 
instances.
   
   Yes, that's correct that "there's a 1-to-many relationship between 
Manager-->Provider and a 1-to-1 relationship between a Provider and its 
associated DrillTable".
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7049) REST API returns the toString of byte arrays (VARBINARY types)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812274#comment-16812274
 ] 

ASF GitHub Bot commented on DRILL-7049:
---

agozhiy commented on issue #1739: DRILL-7049: REST API returns the toString of 
byte arrays (VARBINARY types)
URL: https://github.com/apache/drill/pull/1739#issuecomment-480757987
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API returns the toString of byte arrays (VARBINARY types)
> --
>
> Key: DRILL-7049
> URL: https://issues.apache.org/jira/browse/DRILL-7049
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server, Web Server
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Doing a query using the REST API will return VARBINARY columns as a Java byte 
> array hashcode instead of the actual data of the VARBINARY.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7157:
---
Labels: ready-to-commit  (was: )

> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7049) REST API returns the toString of byte arrays (VARBINARY types)

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7049:

Labels: ready-to-commit  (was: )

> REST API returns the toString of byte arrays (VARBINARY types)
> --
>
> Key: DRILL-7049
> URL: https://issues.apache.org/jira/browse/DRILL-7049
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server, Web Server
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Doing a query using the REST API will return VARBINARY columns as a Java byte 
> array hashcode instead of the actual data of the VARBINARY.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7049) REST API returns the toString of byte arrays (VARBINARY types)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812194#comment-16812194
 ] 

ASF GitHub Bot commented on DRILL-7049:
---

arina-ielchiieva commented on issue #1739: DRILL-7049: REST API returns the 
toString of byte arrays (VARBINARY types)
URL: https://github.com/apache/drill/pull/1739#issuecomment-480720296
 
 
   +1, LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API returns the toString of byte arrays (VARBINARY types)
> --
>
> Key: DRILL-7049
> URL: https://issues.apache.org/jira/browse/DRILL-7049
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server, Web Server
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Doing a query using the REST API will return VARBINARY columns as a Java byte 
> array hashcode instead of the actual data of the VARBINARY.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7045) UDF string_binary java.lang.IndexOutOfBoundsException:

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812191#comment-16812191
 ] 

ASF GitHub Bot commented on DRILL-7045:
---

arina-ielchiieva commented on issue #1734: DRILL-7045: UDF string_binary 
java.lang.IndexOutOfBoundsException
URL: https://github.com/apache/drill/pull/1734#issuecomment-480719700
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UDF string_binary java.lang.IndexOutOfBoundsException:
> --
>
> Key: DRILL-7045
> URL: https://issues.apache.org/jira/browse/DRILL-7045
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Given a large field like
>  
> cat input.json
> { "col0": 
> "lajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjjflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjasdlfjsalk;dfjasdlkfjlajsldfjlkajflksdjfjas

[jira] [Updated] (DRILL-7143) Enforce column-level constraints when using a schema

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7143:

Labels: ready-to-commit  (was: )

> Enforce column-level constraints when using a schema
> 
>
> Key: DRILL-7143
> URL: https://issues.apache.org/jira/browse/DRILL-7143
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7049) REST API returns the toString of byte arrays (VARBINARY types)

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812187#comment-16812187
 ] 

ASF GitHub Bot commented on DRILL-7049:
---

vdiravka commented on pull request #1672: DRILL-7049 return VARBINARY as a 
string with escaped non printable bytes
URL: https://github.com/apache/drill/pull/1672
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API returns the toString of byte arrays (VARBINARY types)
> --
>
> Key: DRILL-7049
> URL: https://issues.apache.org/jira/browse/DRILL-7049
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server, Web Server
>Affects Versions: 1.15.0
>Reporter: jean-claude
>Assignee: jean-claude
>Priority: Minor
> Fix For: 1.16.0
>
>
> Doing a query using the REST API will return VARBINARY columns as a Java byte 
> array hashcode instead of the actual data of the VARBINARY.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7157:

Description: 
When there is an error during schema parsing we should throw UserException but 
not system:
{noformat}
apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], offending 
symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
{noformat}

After changes exception will be the following:
{noformat}
apache drill> create or replace schema (col iint) for table dfs.tmp.text_table;
Error: PARSE ERROR: Line [1], position [5], offending symbol 
[@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'

Schema: (col iint)
{noformat}


  was:
When there is an error during schema parsing we should throw UserException but 
not system:
{noformat}
apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], offending 
symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'


Please, refer to logs for more information.
{noformat}

After changes exception will be the following:
{noformat}
apache drill> create or replace schema (col iint) for table dfs.tmp.text_table;
Error: PARSE ERROR: Line [1], position [5], offending symbol 
[@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'

Schema: (col iint)
{noformat}



> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812184#comment-16812184
 ] 

ASF GitHub Bot commented on DRILL-7157:
---

arina-ielchiieva commented on pull request #1740: DRILL-7157: Wrap 
SchemaParsingException into UserException when creating schema
URL: https://github.com/apache/drill/pull/1740
 
 
   Jira - [DRILL-7157](https://issues.apache.org/jira/browse/DRILL-7157)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> Please, refer to logs for more information.
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812186#comment-16812186
 ] 

ASF GitHub Bot commented on DRILL-7157:
---

arina-ielchiieva commented on issue #1740: DRILL-7157: Wrap 
SchemaParsingException into UserException when creating schema
URL: https://github.com/apache/drill/pull/1740#issuecomment-480718133
 
 
   @vvysotskyi please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> Please, refer to logs for more information.
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7157) Wrap SchemaParsingException into UserException when creating schema

2019-04-08 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7157:

Description: 
When there is an error during schema parsing we should throw UserException but 
not system:
{noformat}
apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], offending 
symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'


Please, refer to logs for more information.
{noformat}

After changes exception will be the following:
{noformat}
apache drill> create or replace schema (col iint) for table dfs.tmp.text_table;
Error: PARSE ERROR: Line [1], position [5], offending symbol 
[@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'

Schema: (col iint)
{noformat}


  was:
When there is an error during schema parsing we should throw UserException but 
not system:
{noformat}
apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], offending 
symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'


Please, refer to logs for more information.
{noformat}



> Wrap SchemaParsingException into UserException when creating schema
> ---
>
> Key: DRILL-7157
> URL: https://issues.apache.org/jira/browse/DRILL-7157
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.16.0
>
>
> When there is an error during schema parsing we should throw UserException 
> but not system:
> {noformat}
> apache drill>create or replace schema (col iint) for table dfs.tmp.text_table;
> Error: SYSTEM ERROR: SchemaParsingException: Line [1], position [5], 
> offending symbol [@2,5:8='iint',<47>,1:5]: no viable alternative at input 
> 'coliint'
> Please, refer to logs for more information.
> {noformat}
> After changes exception will be the following:
> {noformat}
> apache drill> create or replace schema (col iint) for table 
> dfs.tmp.text_table;
> Error: PARSE ERROR: Line [1], position [5], offending symbol 
> [@2,5:8='iint',<47>,1:5]: no viable alternative at input 'coliint'
> Schema: (col iint)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)