[jira] [Commented] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815019#comment-16815019
 ] 

ASF GitHub Bot commented on DRILL-7166:
---

amansinha100 commented on pull request #1745: DRILL-7166: Count query with 
wildcard should skip reading of metadata summary file
URL: https://github.com/apache/drill/pull/1745#discussion_r274237088
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/planner/logical/TestConvertCountToDirectScan.java
 ##
 @@ -238,4 +238,40 @@ public void 
testCountsWithMetadataCacheSummaryAndDirPruning() throws Exception {
   test("drop table if exists %s", tableName);
 }
   }
+
+  @Test
+  public void testCountsWithWildCard() throws Exception {
+test("use dfs.tmp");
+String tableName = "parquet_table_counts";
+
+try {
+  for (int i = 0; i < 10; i++) {
+test(String.format("create table `%s/12/%s` as select * from 
cp.`tpch/orders.parquet`", tableName, i));
 
 Review comment:
   Since the test does not actually need a larger table, could you perhaps use 
a smaller table (like nation) here since there are 13 CTAS statements and doing 
a CTAS on 'orders' adds extra time to the test. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tests doing count(* ) with wildcards in table name are querying metadata 
> cache and returning wrong results
> --
>
> Key: DRILL-7166
> URL: https://issues.apache.org/jira/browse/DRILL-7166
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Venkata Jyothsna Donapati
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Tests:
> {code}
> Functional/metadata_caching/data/drill4376_1.q
> Functional/metadata_caching/data/drill4376_2.q
> Functional/metadata_caching/data/drill4376_3.q
> Functional/metadata_caching/data/drill4376_4.q
> Functional/metadata_caching/data/drill4376_5.q
> Functional/metadata_caching/data/drill4376_6.q
> Functional/metadata_caching/data/drill4376_8.q
> {code}
> Example pattern of queries:
> {code}
> select count(*) from `lineitem_hierarchical_intint/*8*/3*`;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814962#comment-16814962
 ] 

ASF GitHub Bot commented on DRILL-7166:
---

dvjyothsna commented on pull request #1745: DRILL-7166: Count query with 
wildcard should skip reading of metadata summary file
URL: https://github.com/apache/drill/pull/1745
 
 
   Count(*) or Count(column) queries use the aggregated row count and null 
count from the metadata summary file without reading the large file metadata. 
When the directory filter has a wildcard, count cannot be fetched from the 
metadata summary file since the summary file contains count of all the children 
underneath that and there is no way to filter using wild card. The 
ConvertCountToDirectScan physical rule will be applied to these cases.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tests doing count(* ) with wildcards in table name are querying metadata 
> cache and returning wrong results
> --
>
> Key: DRILL-7166
> URL: https://issues.apache.org/jira/browse/DRILL-7166
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Venkata Jyothsna Donapati
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Tests:
> {code}
> Functional/metadata_caching/data/drill4376_1.q
> Functional/metadata_caching/data/drill4376_2.q
> Functional/metadata_caching/data/drill4376_3.q
> Functional/metadata_caching/data/drill4376_4.q
> Functional/metadata_caching/data/drill4376_5.q
> Functional/metadata_caching/data/drill4376_6.q
> Functional/metadata_caching/data/drill4376_8.q
> {code}
> Example pattern of queries:
> {code}
> select count(*) from `lineitem_hierarchical_intint/*8*/3*`;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results

2019-04-10 Thread Venkata Jyothsna Donapati (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati reassigned DRILL-7166:


Assignee: Venkata Jyothsna Donapati  (was: Pritesh Maker)

> Tests doing count(* ) with wildcards in table name are querying metadata 
> cache and returning wrong results
> --
>
> Key: DRILL-7166
> URL: https://issues.apache.org/jira/browse/DRILL-7166
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Venkata Jyothsna Donapati
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Tests:
> {code}
> Functional/metadata_caching/data/drill4376_1.q
> Functional/metadata_caching/data/drill4376_2.q
> Functional/metadata_caching/data/drill4376_3.q
> Functional/metadata_caching/data/drill4376_4.q
> Functional/metadata_caching/data/drill4376_5.q
> Functional/metadata_caching/data/drill4376_6.q
> Functional/metadata_caching/data/drill4376_8.q
> {code}
> Example pattern of queries:
> {code}
> select count(*) from `lineitem_hierarchical_intint/*8*/3*`;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814922#comment-16814922
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

sohami commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level 
option shown even if not set
URL: https://github.com/apache/drill/pull/1742
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7165) Redundant Checksum calculating for ASC files

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814880#comment-16814880
 ] 

ASF GitHub Bot commented on DRILL-7165:
---

sohami commented on issue #1743: DRILL-7165: Redundant Checksum calculating for 
ASC files
URL: https://github.com/apache/drill/pull/1743#issuecomment-481873156
 
 
   @vdiravka - I would recommend handling the rename change in separate PR 
rather than doing now if there is even slight risk of it breaking anything.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Redundant Checksum calculating for ASC files
> 
>
> Key: DRILL-7165
> URL: https://issues.apache.org/jira/browse/DRILL-7165
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.16.0
>
>
> Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an 
> zip archives and for ASC (signature) files. The last is redundant. For 
> example:
> apache-drill-1.15.0-src.tar.gz.asc.sha512
> apache-drill-1.15.0-src.zip.asc.sha512
> apache-drill-1.15.0.tar.gz.asc.sha512
> The proper list of files: 
> [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results

2019-04-10 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7166:
-
Priority: Blocker  (was: Critical)

> Tests doing count(* ) with wildcards in table name are querying metadata 
> cache and returning wrong results
> --
>
> Key: DRILL-7166
> URL: https://issues.apache.org/jira/browse/DRILL-7166
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Pritesh Maker
>Priority: Blocker
> Fix For: 1.16.0
>
>
> Tests:
> {code}
> Functional/metadata_caching/data/drill4376_1.q
> Functional/metadata_caching/data/drill4376_2.q
> Functional/metadata_caching/data/drill4376_3.q
> Functional/metadata_caching/data/drill4376_4.q
> Functional/metadata_caching/data/drill4376_5.q
> Functional/metadata_caching/data/drill4376_6.q
> Functional/metadata_caching/data/drill4376_8.q
> {code}
> Example pattern of queries:
> {code}
> select count(*) from `lineitem_hierarchical_intint/*8*/3*`;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7165) Redundant Checksum calculating for ASC files

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814876#comment-16814876
 ] 

ASF GitHub Bot commented on DRILL-7165:
---

vdiravka commented on issue #1743: DRILL-7165: Redundant Checksum calculating 
for ASC files
URL: https://github.com/apache/drill/pull/1743#issuecomment-481870840
 
 
   @sohami Couold you please review?
   Here is one not mandatory change: `drill-root` -> `apache-drill` project 
`artifactId`. 
   It is more convenient to use it as a parameter `${project.artifactId}` or 
`${project.parent.artifactId}` instead of hardcoding `apache-drill` everywhere. 
   The question is it safe to change Drill project `artifactId`? Looks like 
[`drill-root`](https://mvnrepository.com/artifact/org.apache.drill/drill-root) 
isn't used by external tools. 
   Not sure there are other risks to rename it. If they are, please let me know.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Redundant Checksum calculating for ASC files
> 
>
> Key: DRILL-7165
> URL: https://issues.apache.org/jira/browse/DRILL-7165
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.16.0
>
>
> Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an 
> zip archives and for ASC (signature) files. The last is redundant. For 
> example:
> apache-drill-1.15.0-src.tar.gz.asc.sha512
> apache-drill-1.15.0-src.zip.asc.sha512
> apache-drill-1.15.0.tar.gz.asc.sha512
> The proper list of files: 
> [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7165) Redundant Checksum calculating for ASC files

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814874#comment-16814874
 ] 

ASF GitHub Bot commented on DRILL-7165:
---

vdiravka commented on pull request #1743: DRILL-7165: Redundant Checksum 
calculating for ASC files
URL: https://github.com/apache/drill/pull/1743
 
 
   - change 'checksum-maven-plugin' 'goal' - 'artifacts' -> 'files'
   - specify 'includes' in 'fileSet' for 'checksum-maven-plugin'
   - change 'drill-root' -> 'apache-drill' of 'project.artifactId'
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Redundant Checksum calculating for ASC files
> 
>
> Key: DRILL-7165
> URL: https://issues.apache.org/jira/browse/DRILL-7165
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.16.0
>
>
> Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an 
> zip archives and for ASC (signature) files. The last is redundant. For 
> example:
> apache-drill-1.15.0-src.tar.gz.asc.sha512
> apache-drill-1.15.0-src.zip.asc.sha512
> apache-drill-1.15.0.tar.gz.asc.sha512
> The proper list of files: 
> [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7135) Upgrade to Jetty 9.4

2019-04-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7135:
-
Fix Version/s: (was: Future)
   1.17.0

> Upgrade to Jetty 9.4
> 
>
> Key: DRILL-7135
> URL: https://issues.apache.org/jira/browse/DRILL-7135
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Priority: Minor
> Fix For: 1.17.0
>
>
> Initially DRILL-7051 updated Jetty to 9.4 version and DRILL-7081 updated 
> Jersey version to 2.28 version. These versions work fine for Drill with 
> Hadoop version below 3.0.
>  Starting from Hadoop 3.0 it uses 
> [org.eclipse.jetty|https://github.com/apache/hadoop/blob/branch-3.0/hadoop-project/pom.xml#L38]
>  9.3 version.
>  That's why it conflicts with newer Jetty versions.
> Drill can update Jetty and Jersey versions after resolution HADOOP-14930 and 
> HBASE-19256.
>  Or alternatively these libs can be shaded in Drill, but there is no real 
> reason to do it nowadays.
> See details in 
> [#1681|https://github.com/apache/drill/pull/1681#discussion_r265904521] PR.
> _Notes_: 
> * For Jersey update it is necessary to add 
> org.glassfish.jersey.inject:jersey-hk2 in Drill to solve all compilation 
> failures.
> * See doc for Jetty update: 
> https://www.eclipse.org/jetty/documentation/9.4.x/upgrading-jetty.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7135) Upgrade to Jetty 9.4

2019-04-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-7135:


Assignee: Arina Ielchiieva

> Upgrade to Jetty 9.4
> 
>
> Key: DRILL-7135
> URL: https://issues.apache.org/jira/browse/DRILL-7135
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.17.0
>
>
> Initially DRILL-7051 updated Jetty to 9.4 version and DRILL-7081 updated 
> Jersey version to 2.28 version. These versions work fine for Drill with 
> Hadoop version below 3.0.
>  Starting from Hadoop 3.0 it uses 
> [org.eclipse.jetty|https://github.com/apache/hadoop/blob/branch-3.0/hadoop-project/pom.xml#L38]
>  9.3 version.
>  That's why it conflicts with newer Jetty versions.
> Drill can update Jetty and Jersey versions after resolution HADOOP-14930 and 
> HBASE-19256.
>  Or alternatively these libs can be shaded in Drill, but there is no real 
> reason to do it nowadays.
> See details in 
> [#1681|https://github.com/apache/drill/pull/1681#discussion_r265904521] PR.
> _Notes_: 
> * For Jersey update it is necessary to add 
> org.glassfish.jersey.inject:jersey-hk2 in Drill to solve all compilation 
> failures.
> * See doc for Jetty update: 
> https://www.eclipse.org/jetty/documentation/9.4.x/upgrading-jetty.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7160:
---
Labels: ready-to-commit  (was: )

> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results

2019-04-10 Thread Abhishek Girish (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814809#comment-16814809
 ] 

Abhishek Girish commented on DRILL-7166:


{code}
Query: Functional/metadata_caching/data/drill4376_6.q
select count(*) from `lineitem_hierarchical_intint/*/1*`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
19775 (1 occurence(s))

Query: Functional/metadata_caching/data/drill4376_8.q
select count(*) from `lineitem_hierarchical_intint/*8*/3*`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
3600 (1 occurence(s))

Query: Functional/metadata_caching/data/drill4376_3.q
select count(*) from `lineitem_hierarchical_intint/1**2`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
20175 (1 occurence(s))

Query: Functional/metadata_caching/data/drill4376_2.q
select count(*) from `lineitem_hierarchical_intint/19*4`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
2 (1 occurence(s))

Query: Functional/metadata_caching/data/drill4376_1.q
select count(*) from `lineitem_hierarchical_intint/199*`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
3 (1 occurence(s))

Query: Functional/metadata_caching/data/drill4376_5.q
select count(*) from `lineitem_hierarchical_intint/*/1`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
6300 (1 occurence(s))

Query: Functional/metadata_caching/data/drill4376_4.q
select count(*) from `lineitem_hierarchical_intint/*8*`


 Expected number of rows: 1
Actual number of rows from Drill: 1
 Number of matching rows: 0
  Number of rows missing: 1
   Number of rows unexpected: 1

These rows are not expected (first 10):
70175

These rows are missing (first 10):
40175 (1 occurence(s))
{code}

> Tests doing count(* ) with wildcards in table name are querying metadata 
> cache and returning wrong results
> --
>
> Key: DRILL-7166
> URL: https://issues.apache.org/jira/browse/DRILL-7166
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Pritesh Maker
>Priority: Critical
> Fix For: 1.16.0
>
>
> Tests:
> {code}
> Functional/metadata_caching/data/drill4376_1.q
> Functional/metadata_caching/data/drill4376_2.q
> Functional/metadata_caching/data/drill4376_3.q
> Functional/metadata_caching/data/drill4376_4.q
> Functional/metadata_caching/data/drill4376_5.q
> Functional/metadata_caching/data/drill4376_6.q
> Functional/metadata_caching/data/drill4376_8.q
> {code}
> Example pattern of queries:
> {code}
> select count(*) from `lineitem_hierarchical_intint/*8*/3*`;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results

2019-04-10 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-7166:
--

 Summary: Tests doing count(* ) with wildcards in table name are 
querying metadata cache and returning wrong results
 Key: DRILL-7166
 URL: https://issues.apache.org/jira/browse/DRILL-7166
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.16.0
Reporter: Abhishek Girish
Assignee: Pritesh Maker
 Fix For: 1.16.0


Tests:
{code}
Functional/metadata_caching/data/drill4376_1.q
Functional/metadata_caching/data/drill4376_2.q
Functional/metadata_caching/data/drill4376_3.q
Functional/metadata_caching/data/drill4376_4.q
Functional/metadata_caching/data/drill4376_5.q
Functional/metadata_caching/data/drill4376_6.q
Functional/metadata_caching/data/drill4376_8.q
{code}
Example pattern of queries:
{code}
select count(*) from `lineitem_hierarchical_intint/*8*/3*`;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814784#comment-16814784
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows 
QUERY-level option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r274107814
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -336,7 +336,8 @@ public String getOperatorsJSON() {
   }
 
   public Map getQueryOptions() {
-return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope());
+// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero 
(e.g. query -> SHOW FILES)
+return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && 
!(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && 
String.valueOf(o.getValue()).equals("0")));
 
 Review comment:
   @kkhatua, you haven't reverted this change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814742#comment-16814742
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

kkhatua commented on issue #1742: DRILL-7160: e.q.max_rows QUERY-level option 
shown even if not set
URL: https://github.com/apache/drill/pull/1742#issuecomment-481803435
 
 
   @vvysotskyi , @arina-ielchiieva 
   
   With the latest update, for non-applicable queries, there are no options 
shown that indicate that the `max_rows` has been set.
   For applicable queries, based on the combination of what the SESSION and 
SYSTEM (default) values are, you get the following outcome with the scope of 
the option also indicated as shown in the profile:
   
   |SYSTEM  |   SESSION |   Final   | ScopeSet  |
   ||--|-||
   |0   |   0   |   0   |   
N/A |
   |15  |   0   |   15  |   
N/A |
   |0   |   10  |   10  |   
SESSION |
   |15  |   10  |   10  |   
SESSION |
   |15  |   20  |   15  |   
QUERY   |
   
   The last one is required because there is no way for me to remove the 
SESSION level value and let only the SYSTEM value persist.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7014) Format plugin for LTSV files

2019-04-10 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-7014:
--
Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7014) Format plugin for LTSV files

2019-04-10 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814738#comment-16814738
 ] 

Bridget Bevens commented on DRILL-7014:
---

Hi [~shimamoto]

I've added the doc here: https://drill.apache.org/docs/ltsv-format-plugin/ 
Let me know if I need to change anything.

Thank you!
~Bridget

> Format plugin for LTSV files
> 
>
> Key: DRILL-7014
> URL: https://issues.apache.org/jira/browse/DRILL-7014
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.15.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> I would like to contribute [this 
> plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill.
> h4. Abstract
> storage-plugins-override.conf
> {code:json}
> "storage":{
>   dfs: {
> type: "file",
> connection: "file:///",
> formats: {
>   "ltsv": {
> "type": "ltsv",
> "extensions": [
>   "ltsv"
> ]
>   }
> },
> enabled: true
>   }
> }
> {code}
> sample.ltsv
> {code}
> time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/xxx HTTP/1.1  status:200  size:4968 referer:- ua:Java/1.8.0_131 
> reqtime:2.532 apptime:2.532 vhost:api.example.com
> time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx  forwardedfor:-  req:GET 
> /v1/yyy HTTP/1.1  status:200  size:412  referer:- ua:Java/1.8.0_201 
> reqtime:3.580 apptime:3.580 vhost:api.example.com
> {code}
> Run query
> {code:sh}
> root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded 
> Apache Drill 1.15.0
> "Drill must go on."
> 0: jdbc:drill:zk=local> SELECT * FROM 
> dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0;
> +-+--+---+---+-+---+--+-+--+--+--+
> |time |   host   | forwardedfor  |  
> req  | status  | size  | referer  |   ua| reqtime  | 
> apptime  |  vhost   |
> +-+--+---+---+-+---+--+-+--+--+--+
> | 30/Nov/2016:00:56:37 +0900  | xxx.xxx.xxx.xxx  | - | GET 
> /v1/yyy HTTP/1.1  | 200 | 412   | -| Java/1.8.0_201  | 3.580| 
> 3.580| api.example.com  |
> +-+--+---+---+-+---+--+-+--+--+--+
> 1 row selected (6.074 seconds)
> 0: jdbc:drill:zk=local> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7162:
-
Fix Version/s: 1.17.0

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Major
> Fix For: 1.17.0
>
>
> Apache Drill uses rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814693#comment-16814693
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows 
QUERY-level option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r274074779
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -336,7 +336,8 @@ public String getOperatorsJSON() {
   }
 
   public Map getQueryOptions() {
-return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope());
+// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero 
(e.g. query -> SHOW FILES)
+return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && 
!(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && 
String.valueOf(o.getValue()).equals("0")));
 
 Review comment:
   Yes, as I wrote in the previous comment, we can skip the issue I mentioned 
in one of the comments for now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7062) Run-time row group pruning

2019-04-10 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7062:
-
Fix Version/s: (was: 1.16.0)
   1.17.0

> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7028) Reduce the planning time of queries on large Parquet tables with large metadata cache files

2019-04-10 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7028:
-
Fix Version/s: 1.17.0

> Reduce the planning time of queries on large Parquet tables with large 
> metadata cache files
> ---
>
> Key: DRILL-7028
> URL: https://issues.apache.org/jira/browse/DRILL-7028
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: performance
> Fix For: 1.16.0, 1.17.0
>
>
> If the Parquet table has a large number of small files, the metadata cache 
> files grow larger and the planner tries to read the large metadata cache file 
> which leads to the planning time overhead. Most of the time of execution is 
> spent during the planning phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814495#comment-16814495
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level 
option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r273990569
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -336,7 +336,8 @@ public String getOperatorsJSON() {
   }
 
   public Map getQueryOptions() {
-return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope());
+// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero 
(e.g. query -> SHOW FILES)
+return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && 
!(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && 
String.valueOf(o.getValue()).equals("0")));
 
 Review comment:
   Agreed. I am not sure, but I think I did this here 
   
https://github.com/apache/drill/blob/c3ee7949656fb4c1b144e1633f97002c159ec8f3/exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java#L120
   because of the complexity in the logic  and (I think) because without this, 
the session value (0) gets precedence when the SYSTEM default exists.
   I can revisit this post release to avoid blocking the release for now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814479#comment-16814479
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

arina-ielchiieva commented on pull request #1742: DRILL-7160: e.q.max_rows 
QUERY-level option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r273980708
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -336,7 +336,8 @@ public String getOperatorsJSON() {
   }
 
   public Map getQueryOptions() {
-return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope());
+// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero 
(e.g. query -> SHOW FILES)
+return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && 
!(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && 
String.valueOf(o.getValue()).equals("0")));
 
 Review comment:
   I think we should not set max rows count to query context if it is the same 
as default, thus this count won't appear in Query Profile. Current solution 
just fixes the symptom not the real problem.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814464#comment-16814464
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level 
option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r273972484
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
 ##
 @@ -226,7 +226,10 @@ private static SqlNode 
checkAndApplyAutoLimit(SqlConverter parser, QueryContext
 if (isAutoLimitShouldBeApplied(context, sqlNode)) {
   sqlNode = wrapWithAutoLimit(sqlNode, context);
 } else {
-  context.getOptions().setLocalOption(ExecConstants.QUERY_MAX_ROWS, 0);
+//Force setting to zero IFF autoLimit was intended to be set 
originally but is inapplicable
 
 Review comment:
    
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814459#comment-16814459
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level 
option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r273966816
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -336,7 +336,8 @@ public String getOperatorsJSON() {
   }
 
   public Map getQueryOptions() {
-return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope());
+// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero 
(e.g. query -> SHOW FILES)
+return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && 
!(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && 
String.valueOf(o.getValue()).equals("0")));
 
 Review comment:
   This change actually helps to not show the value of Zero being applied, when 
the query is inapplicable. Are you sure we want to remove this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7165) Redundant Checksum calculating for ASC files

2019-04-10 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-7165:
--

 Summary: Redundant Checksum calculating for ASC files
 Key: DRILL-7165
 URL: https://issues.apache.org/jira/browse/DRILL-7165
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build  Test
Affects Versions: 1.15.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.16.0


Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an 
zip archives and for ASC (signature) files. The last is redundant. For example:
apache-drill-1.15.0-src.tar.gz.asc.sha512
apache-drill-1.15.0-src.zip.asc.sha512
apache-drill-1.15.0.tar.gz.asc.sha512

The proper list of files: 
[http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7161) Aggregation with group by clause

2019-04-10 Thread Gayathri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814412#comment-16814412
 ] 

Gayathri commented on DRILL-7161:
-

[~lhfei]

Thank you for your response. 

If the following query is given, it is working fine without using any CAST even 
for null values.

SELECT sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json`;

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Assignee: Hefei Li
>Priority: Blocker
>  Labels: Drill, issue
> Fix For: 1.14.0
>
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814409#comment-16814409
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows 
QUERY-level option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r273925887
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
 ##
 @@ -226,7 +226,10 @@ private static SqlNode 
checkAndApplyAutoLimit(SqlConverter parser, QueryContext
 if (isAutoLimitShouldBeApplied(context, sqlNode)) {
   sqlNode = wrapWithAutoLimit(sqlNode, context);
 } else {
-  context.getOptions().setLocalOption(ExecConstants.QUERY_MAX_ROWS, 0);
+//Force setting to zero IFF autoLimit was intended to be set 
originally but is inapplicable
 
 Review comment:
   Please fix indentation and please refactor methods added in the previous 
commit to pass the value of 
`context.getOptions().getOption(ExecConstants.QUERY_MAX_ROWS).num_val.intValue()`
 instead of `context`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814410#comment-16814410
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows 
QUERY-level option shown even if not set
URL: https://github.com/apache/drill/pull/1742#discussion_r273924644
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -336,7 +336,8 @@ public String getOperatorsJSON() {
   }
 
   public Map getQueryOptions() {
-return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope());
+// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero 
(e.g. query -> SHOW FILES)
+return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && 
!(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && 
String.valueOf(o.getValue()).equals("0")));
 
 Review comment:
   Please revert this change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7071) Reserved words documentation udpate

2019-04-10 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-7071:
---
Fix Version/s: (was: 1.16.0)
   Future

> Reserved words documentation udpate
> ---
>
> Key: DRILL-7071
> URL: https://issues.apache.org/jira/browse/DRILL-7071
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc, documentation, keyword, reserved-word
> Fix For: Future
>
>
> Last time a lot of reserved keywords were added to Drill project, for 
> instance in DRILL-1328 or DRILL-7058 will introduce new one too. Therefore 
> Drill reserved keywords in documentation should be updated:
> [https://drill.apache.org/docs/reserved-keywords/]
> These words should be obtained from these sections of Drill Parser file:
> [https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/data/Parser.tdd#L30]
> and 
> [https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/data/Parser.tdd#L390]
> _Note:_ this list will be updated after the next Calcite version update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7161) Aggregation with group by clause

2019-04-10 Thread Hefei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hefei Li resolved DRILL-7161.
-
   Resolution: Not A Bug
Fix Version/s: 1.14.0

By default, Drill does not support different types of JSON lists. For support 
on JSON data types, you can refer to the **[JSON Data 
Model|https://drill.apache.org/docs/json-data-model/].

In this case, the ‘B’ column in your given test data is all null.
When Drill reads the column, it will be processed by default according to the 
VARCHAR type.
So, if you want to work with numeric types as you expect, you can use the 
[CAST|https://drill.apache.org/docs/data-type-conversion/] type conversion 
function provided by Drill.

Such as:
{code:java}
select a, sum(CAST(b as INT)) from dfs.`/drill/data/sample.json` group by a
{code}

Then it will work fine.

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Assignee: Hefei Li
>Priority: Blocker
>  Labels: Drill, issue
> Fix For: 1.14.0
>
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (DRILL-7161) Aggregation with group by clause

2019-04-10 Thread Hefei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hefei Li updated DRILL-7161:

Comment: was deleted

(was: By default, Drill does not support different types of JSON lists. For 
support on JSON data types, you can refer to the *[JSON Data 
Model|[https://drill.apache.org/docs/json-data-model/]]*.

In this case, the ‘B’ column in your given test data is all null.
When Drill reads the column, it will be processed by default according to the 
VARCHAR type.
So, if you want to work with numeric types as you expect, you can use the 
*[CAST|[https://drill.apache.org/docs/data-type-conversion/]]* type conversion 
function provided by Drill.

Such as:
{code:java}
select a, sum(CAST(b as INT)) from dfs.`/drill/data/sample.json`  group by a
{code}
Then it will work fine.)

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Assignee: Hefei Li
>Priority: Blocker
>  Labels: Drill, issue
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7161) Aggregation with group by clause

2019-04-10 Thread Hefei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814223#comment-16814223
 ] 

Hefei Li commented on DRILL-7161:
-

By default, Drill does not support different types of JSON lists. For support 
on JSON data types, you can refer to the *[JSON Data 
Model|[https://drill.apache.org/docs/json-data-model/]]*.

In this case, the ‘B’ column in your given test data is all null.
When Drill reads the column, it will be processed by default according to the 
VARCHAR type.
So, if you want to work with numeric types as you expect, you can use the 
*[CAST|[https://drill.apache.org/docs/data-type-conversion/]]* type conversion 
function provided by Drill.

Such as:
{code:java}
select a, sum(CAST(b as INT)) from dfs.`/drill/data/sample.json`  group by a
{code}
Then it will work fine.

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Assignee: Hefei Li
>Priority: Blocker
>  Labels: Drill, issue
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7116) Adapt statistics to use Drill Metastore API

2019-04-10 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-7116.

   Resolution: Fixed
Fix Version/s: (was: 1.17.0)
   1.16.0

Fixed int the scope of DRILL-7089

> Adapt statistics to use Drill Metastore API
> ---
>
> Key: DRILL-7116
> URL: https://issues.apache.org/jira/browse/DRILL-7116
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> The current implementation of statistics supposes the usage of files for 
> storing and reading statistics.
>  The aim of this Jira is to adapt statistics to use Drill Metastore API so in 
> future it may be stored in other metastore implementations.
> Implementation details:
>  - Move statistics info into {{TableMetadata}}
>  - Provide a way for obtaining {{TableMetadata}} in the places where 
> statistics may be used (partially implemented in the scope of DRILL-7089)
>  - Investigate and implement (if possible) lazy materialization of 
> {{DrillStatsTable}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7161) Aggregation with group by clause

2019-04-10 Thread Hefei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hefei Li reassigned DRILL-7161:
---

Assignee: Hefei Li

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Assignee: Hefei Li
>Priority: Blocker
>  Labels: Drill, issue
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)