[jira] [Commented] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results
[ https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815019#comment-16815019 ] ASF GitHub Bot commented on DRILL-7166: --- amansinha100 commented on pull request #1745: DRILL-7166: Count query with wildcard should skip reading of metadata summary file URL: https://github.com/apache/drill/pull/1745#discussion_r274237088 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/planner/logical/TestConvertCountToDirectScan.java ## @@ -238,4 +238,40 @@ public void testCountsWithMetadataCacheSummaryAndDirPruning() throws Exception { test("drop table if exists %s", tableName); } } + + @Test + public void testCountsWithWildCard() throws Exception { +test("use dfs.tmp"); +String tableName = "parquet_table_counts"; + +try { + for (int i = 0; i < 10; i++) { +test(String.format("create table `%s/12/%s` as select * from cp.`tpch/orders.parquet`", tableName, i)); Review comment: Since the test does not actually need a larger table, could you perhaps use a smaller table (like nation) here since there are 13 CTAS statements and doing a CTAS on 'orders' adds extra time to the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Tests doing count(* ) with wildcards in table name are querying metadata > cache and returning wrong results > -- > > Key: DRILL-7166 > URL: https://issues.apache.org/jira/browse/DRILL-7166 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Abhishek Girish >Assignee: Venkata Jyothsna Donapati >Priority: Blocker > Fix For: 1.16.0 > > > Tests: > {code} > Functional/metadata_caching/data/drill4376_1.q > Functional/metadata_caching/data/drill4376_2.q > Functional/metadata_caching/data/drill4376_3.q > Functional/metadata_caching/data/drill4376_4.q > Functional/metadata_caching/data/drill4376_5.q > Functional/metadata_caching/data/drill4376_6.q > Functional/metadata_caching/data/drill4376_8.q > {code} > Example pattern of queries: > {code} > select count(*) from `lineitem_hierarchical_intint/*8*/3*`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results
[ https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814962#comment-16814962 ] ASF GitHub Bot commented on DRILL-7166: --- dvjyothsna commented on pull request #1745: DRILL-7166: Count query with wildcard should skip reading of metadata summary file URL: https://github.com/apache/drill/pull/1745 Count(*) or Count(column) queries use the aggregated row count and null count from the metadata summary file without reading the large file metadata. When the directory filter has a wildcard, count cannot be fetched from the metadata summary file since the summary file contains count of all the children underneath that and there is no way to filter using wild card. The ConvertCountToDirectScan physical rule will be applied to these cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Tests doing count(* ) with wildcards in table name are querying metadata > cache and returning wrong results > -- > > Key: DRILL-7166 > URL: https://issues.apache.org/jira/browse/DRILL-7166 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Abhishek Girish >Assignee: Venkata Jyothsna Donapati >Priority: Blocker > Fix For: 1.16.0 > > > Tests: > {code} > Functional/metadata_caching/data/drill4376_1.q > Functional/metadata_caching/data/drill4376_2.q > Functional/metadata_caching/data/drill4376_3.q > Functional/metadata_caching/data/drill4376_4.q > Functional/metadata_caching/data/drill4376_5.q > Functional/metadata_caching/data/drill4376_6.q > Functional/metadata_caching/data/drill4376_8.q > {code} > Example pattern of queries: > {code} > select count(*) from `lineitem_hierarchical_intint/*8*/3*`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results
[ https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Jyothsna Donapati reassigned DRILL-7166: Assignee: Venkata Jyothsna Donapati (was: Pritesh Maker) > Tests doing count(* ) with wildcards in table name are querying metadata > cache and returning wrong results > -- > > Key: DRILL-7166 > URL: https://issues.apache.org/jira/browse/DRILL-7166 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Abhishek Girish >Assignee: Venkata Jyothsna Donapati >Priority: Blocker > Fix For: 1.16.0 > > > Tests: > {code} > Functional/metadata_caching/data/drill4376_1.q > Functional/metadata_caching/data/drill4376_2.q > Functional/metadata_caching/data/drill4376_3.q > Functional/metadata_caching/data/drill4376_4.q > Functional/metadata_caching/data/drill4376_5.q > Functional/metadata_caching/data/drill4376_6.q > Functional/metadata_caching/data/drill4376_8.q > {code} > Example pattern of queries: > {code} > select count(*) from `lineitem_hierarchical_intint/*8*/3*`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814922#comment-16814922 ] ASF GitHub Bot commented on DRILL-7160: --- sohami commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Labels: ready-to-commit > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7165) Redundant Checksum calculating for ASC files
[ https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814880#comment-16814880 ] ASF GitHub Bot commented on DRILL-7165: --- sohami commented on issue #1743: DRILL-7165: Redundant Checksum calculating for ASC files URL: https://github.com/apache/drill/pull/1743#issuecomment-481873156 @vdiravka - I would recommend handling the rename change in separate PR rather than doing now if there is even slight risk of it breaking anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Redundant Checksum calculating for ASC files > > > Key: DRILL-7165 > URL: https://issues.apache.org/jira/browse/DRILL-7165 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build Test >Affects Versions: 1.15.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.16.0 > > > Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an > zip archives and for ASC (signature) files. The last is redundant. For > example: > apache-drill-1.15.0-src.tar.gz.asc.sha512 > apache-drill-1.15.0-src.zip.asc.sha512 > apache-drill-1.15.0.tar.gz.asc.sha512 > The proper list of files: > [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results
[ https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7166: - Priority: Blocker (was: Critical) > Tests doing count(* ) with wildcards in table name are querying metadata > cache and returning wrong results > -- > > Key: DRILL-7166 > URL: https://issues.apache.org/jira/browse/DRILL-7166 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Abhishek Girish >Assignee: Pritesh Maker >Priority: Blocker > Fix For: 1.16.0 > > > Tests: > {code} > Functional/metadata_caching/data/drill4376_1.q > Functional/metadata_caching/data/drill4376_2.q > Functional/metadata_caching/data/drill4376_3.q > Functional/metadata_caching/data/drill4376_4.q > Functional/metadata_caching/data/drill4376_5.q > Functional/metadata_caching/data/drill4376_6.q > Functional/metadata_caching/data/drill4376_8.q > {code} > Example pattern of queries: > {code} > select count(*) from `lineitem_hierarchical_intint/*8*/3*`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7165) Redundant Checksum calculating for ASC files
[ https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814876#comment-16814876 ] ASF GitHub Bot commented on DRILL-7165: --- vdiravka commented on issue #1743: DRILL-7165: Redundant Checksum calculating for ASC files URL: https://github.com/apache/drill/pull/1743#issuecomment-481870840 @sohami Couold you please review? Here is one not mandatory change: `drill-root` -> `apache-drill` project `artifactId`. It is more convenient to use it as a parameter `${project.artifactId}` or `${project.parent.artifactId}` instead of hardcoding `apache-drill` everywhere. The question is it safe to change Drill project `artifactId`? Looks like [`drill-root`](https://mvnrepository.com/artifact/org.apache.drill/drill-root) isn't used by external tools. Not sure there are other risks to rename it. If they are, please let me know. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Redundant Checksum calculating for ASC files > > > Key: DRILL-7165 > URL: https://issues.apache.org/jira/browse/DRILL-7165 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build Test >Affects Versions: 1.15.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.16.0 > > > Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an > zip archives and for ASC (signature) files. The last is redundant. For > example: > apache-drill-1.15.0-src.tar.gz.asc.sha512 > apache-drill-1.15.0-src.zip.asc.sha512 > apache-drill-1.15.0.tar.gz.asc.sha512 > The proper list of files: > [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7165) Redundant Checksum calculating for ASC files
[ https://issues.apache.org/jira/browse/DRILL-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814874#comment-16814874 ] ASF GitHub Bot commented on DRILL-7165: --- vdiravka commented on pull request #1743: DRILL-7165: Redundant Checksum calculating for ASC files URL: https://github.com/apache/drill/pull/1743 - change 'checksum-maven-plugin' 'goal' - 'artifacts' -> 'files' - specify 'includes' in 'fileSet' for 'checksum-maven-plugin' - change 'drill-root' -> 'apache-drill' of 'project.artifactId' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Redundant Checksum calculating for ASC files > > > Key: DRILL-7165 > URL: https://issues.apache.org/jira/browse/DRILL-7165 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build Test >Affects Versions: 1.15.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.16.0 > > > Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an > zip archives and for ASC (signature) files. The last is redundant. For > example: > apache-drill-1.15.0-src.tar.gz.asc.sha512 > apache-drill-1.15.0-src.zip.asc.sha512 > apache-drill-1.15.0.tar.gz.asc.sha512 > The proper list of files: > [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7135) Upgrade to Jetty 9.4
[ https://issues.apache.org/jira/browse/DRILL-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-7135: - Fix Version/s: (was: Future) 1.17.0 > Upgrade to Jetty 9.4 > > > Key: DRILL-7135 > URL: https://issues.apache.org/jira/browse/DRILL-7135 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Vitalii Diravka >Priority: Minor > Fix For: 1.17.0 > > > Initially DRILL-7051 updated Jetty to 9.4 version and DRILL-7081 updated > Jersey version to 2.28 version. These versions work fine for Drill with > Hadoop version below 3.0. > Starting from Hadoop 3.0 it uses > [org.eclipse.jetty|https://github.com/apache/hadoop/blob/branch-3.0/hadoop-project/pom.xml#L38] > 9.3 version. > That's why it conflicts with newer Jetty versions. > Drill can update Jetty and Jersey versions after resolution HADOOP-14930 and > HBASE-19256. > Or alternatively these libs can be shaded in Drill, but there is no real > reason to do it nowadays. > See details in > [#1681|https://github.com/apache/drill/pull/1681#discussion_r265904521] PR. > _Notes_: > * For Jersey update it is necessary to add > org.glassfish.jersey.inject:jersey-hk2 in Drill to solve all compilation > failures. > * See doc for Jetty update: > https://www.eclipse.org/jetty/documentation/9.4.x/upgrading-jetty.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7135) Upgrade to Jetty 9.4
[ https://issues.apache.org/jira/browse/DRILL-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-7135: Assignee: Arina Ielchiieva > Upgrade to Jetty 9.4 > > > Key: DRILL-7135 > URL: https://issues.apache.org/jira/browse/DRILL-7135 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Vitalii Diravka >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.17.0 > > > Initially DRILL-7051 updated Jetty to 9.4 version and DRILL-7081 updated > Jersey version to 2.28 version. These versions work fine for Drill with > Hadoop version below 3.0. > Starting from Hadoop 3.0 it uses > [org.eclipse.jetty|https://github.com/apache/hadoop/blob/branch-3.0/hadoop-project/pom.xml#L38] > 9.3 version. > That's why it conflicts with newer Jetty versions. > Drill can update Jetty and Jersey versions after resolution HADOOP-14930 and > HBASE-19256. > Or alternatively these libs can be shaded in Drill, but there is no real > reason to do it nowadays. > See details in > [#1681|https://github.com/apache/drill/pull/1681#discussion_r265904521] PR. > _Notes_: > * For Jersey update it is necessary to add > org.glassfish.jersey.inject:jersey-hk2 in Drill to solve all compilation > failures. > * See doc for Jetty update: > https://www.eclipse.org/jetty/documentation/9.4.x/upgrading-jetty.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7160: --- Labels: ready-to-commit (was: ) > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Labels: ready-to-commit > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results
[ https://issues.apache.org/jira/browse/DRILL-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814809#comment-16814809 ] Abhishek Girish commented on DRILL-7166: {code} Query: Functional/metadata_caching/data/drill4376_6.q select count(*) from `lineitem_hierarchical_intint/*/1*` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 19775 (1 occurence(s)) Query: Functional/metadata_caching/data/drill4376_8.q select count(*) from `lineitem_hierarchical_intint/*8*/3*` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 3600 (1 occurence(s)) Query: Functional/metadata_caching/data/drill4376_3.q select count(*) from `lineitem_hierarchical_intint/1**2` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 20175 (1 occurence(s)) Query: Functional/metadata_caching/data/drill4376_2.q select count(*) from `lineitem_hierarchical_intint/19*4` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 2 (1 occurence(s)) Query: Functional/metadata_caching/data/drill4376_1.q select count(*) from `lineitem_hierarchical_intint/199*` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 3 (1 occurence(s)) Query: Functional/metadata_caching/data/drill4376_5.q select count(*) from `lineitem_hierarchical_intint/*/1` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 6300 (1 occurence(s)) Query: Functional/metadata_caching/data/drill4376_4.q select count(*) from `lineitem_hierarchical_intint/*8*` Expected number of rows: 1 Actual number of rows from Drill: 1 Number of matching rows: 0 Number of rows missing: 1 Number of rows unexpected: 1 These rows are not expected (first 10): 70175 These rows are missing (first 10): 40175 (1 occurence(s)) {code} > Tests doing count(* ) with wildcards in table name are querying metadata > cache and returning wrong results > -- > > Key: DRILL-7166 > URL: https://issues.apache.org/jira/browse/DRILL-7166 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Abhishek Girish >Assignee: Pritesh Maker >Priority: Critical > Fix For: 1.16.0 > > > Tests: > {code} > Functional/metadata_caching/data/drill4376_1.q > Functional/metadata_caching/data/drill4376_2.q > Functional/metadata_caching/data/drill4376_3.q > Functional/metadata_caching/data/drill4376_4.q > Functional/metadata_caching/data/drill4376_5.q > Functional/metadata_caching/data/drill4376_6.q > Functional/metadata_caching/data/drill4376_8.q > {code} > Example pattern of queries: > {code} > select count(*) from `lineitem_hierarchical_intint/*8*/3*`; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7166) Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results
Abhishek Girish created DRILL-7166: -- Summary: Tests doing count(* ) with wildcards in table name are querying metadata cache and returning wrong results Key: DRILL-7166 URL: https://issues.apache.org/jira/browse/DRILL-7166 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 1.16.0 Reporter: Abhishek Girish Assignee: Pritesh Maker Fix For: 1.16.0 Tests: {code} Functional/metadata_caching/data/drill4376_1.q Functional/metadata_caching/data/drill4376_2.q Functional/metadata_caching/data/drill4376_3.q Functional/metadata_caching/data/drill4376_4.q Functional/metadata_caching/data/drill4376_5.q Functional/metadata_caching/data/drill4376_6.q Functional/metadata_caching/data/drill4376_8.q {code} Example pattern of queries: {code} select count(*) from `lineitem_hierarchical_intint/*8*/3*`; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814784#comment-16814784 ] ASF GitHub Bot commented on DRILL-7160: --- vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r274107814 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -336,7 +336,8 @@ public String getOperatorsJSON() { } public Map getQueryOptions() { -return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope()); +// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero (e.g. query -> SHOW FILES) +return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && !(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && String.valueOf(o.getValue()).equals("0"))); Review comment: @kkhatua, you haven't reverted this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814742#comment-16814742 ] ASF GitHub Bot commented on DRILL-7160: --- kkhatua commented on issue #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#issuecomment-481803435 @vvysotskyi , @arina-ielchiieva With the latest update, for non-applicable queries, there are no options shown that indicate that the `max_rows` has been set. For applicable queries, based on the combination of what the SESSION and SYSTEM (default) values are, you get the following outcome with the scope of the option also indicated as shown in the profile: |SYSTEM | SESSION | Final | ScopeSet | ||--|-|| |0 | 0 | 0 | N/A | |15 | 0 | 15 | N/A | |0 | 10 | 10 | SESSION | |15 | 10 | 10 | SESSION | |15 | 20 | 15 | QUERY | The last one is required because there is no way for me to remove the SESSION level value and let only the SYSTEM value persist. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-7014: -- Labels: doc-complete ready-to-commit (was: doc-impacting ready-to-commit) > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > Labels: doc-complete, ready-to-commit > Fix For: 1.16.0 > > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7014) Format plugin for LTSV files
[ https://issues.apache.org/jira/browse/DRILL-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814738#comment-16814738 ] Bridget Bevens commented on DRILL-7014: --- Hi [~shimamoto] I've added the doc here: https://drill.apache.org/docs/ltsv-format-plugin/ Let me know if I need to change anything. Thank you! ~Bridget > Format plugin for LTSV files > > > Key: DRILL-7014 > URL: https://issues.apache.org/jira/browse/DRILL-7014 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.15.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > I would like to contribute [this > plugin|https://github.com/bizreach/drill-ltsv-plugin] to Drill. > h4. Abstract > storage-plugins-override.conf > {code:json} > "storage":{ > dfs: { > type: "file", > connection: "file:///", > formats: { > "ltsv": { > "type": "ltsv", > "extensions": [ > "ltsv" > ] > } > }, > enabled: true > } > } > {code} > sample.ltsv > {code} > time:30/Nov/2016:00:55:08 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/xxx HTTP/1.1 status:200 size:4968 referer:- ua:Java/1.8.0_131 > reqtime:2.532 apptime:2.532 vhost:api.example.com > time:30/Nov/2016:00:56:37 +0900 host:xxx.xxx.xxx.xxx forwardedfor:- req:GET > /v1/yyy HTTP/1.1 status:200 size:412 referer:- ua:Java/1.8.0_201 > reqtime:3.580 apptime:3.580 vhost:api.example.com > {code} > Run query > {code:sh} > root@1805183e9b65:/apache-drill-1.15.0# ./bin/drill-embedded > Apache Drill 1.15.0 > "Drill must go on." > 0: jdbc:drill:zk=local> SELECT * FROM > dfs.`/apache-drill-1.15.0/sample-data/sample.ltsv` WHERE reqtime > 3.0; > +-+--+---+---+-+---+--+-+--+--+--+ > |time | host | forwardedfor | > req | status | size | referer | ua| reqtime | > apptime | vhost | > +-+--+---+---+-+---+--+-+--+--+--+ > | 30/Nov/2016:00:56:37 +0900 | xxx.xxx.xxx.xxx | - | GET > /v1/yyy HTTP/1.1 | 200 | 412 | -| Java/1.8.0_201 | 3.580| > 3.580| api.example.com | > +-+--+---+---+-+---+--+-+--+--+--+ > 1 row selected (6.074 seconds) > 0: jdbc:drill:zk=local> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs
[ https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-7162: - Fix Version/s: 1.17.0 > Apache Drill uses 3rd Party with Highest CVEs > -- > > Key: DRILL-7162 > URL: https://issues.apache.org/jira/browse/DRILL-7162 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Ayush Sharma >Priority: Major > Fix For: 1.17.0 > > > Apache Drill uses rd party libraries with almost 250+ CVEs. > Most of the CVEs are in the older version of Jetty (9.1.x) whereas the > current version of Jetty is 9.4.x > Also many of the other libraries are in EOF versions and the are not patched > even in the latest release. > This creates an issue of security when we use it in production. > We are able to replace many older version of libraries with the latest > versions with no CVEs , however many of them are not replaceable as it is and > would require some changes in the source code. > The jetty version is of the highest priority and needs migration to 9.4.x > version immediately. > > Please look into this issue at immediate priority as it compromises with the > security of the application utilizing Apache Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814693#comment-16814693 ] ASF GitHub Bot commented on DRILL-7160: --- vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r274074779 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -336,7 +336,8 @@ public String getOperatorsJSON() { } public Map getQueryOptions() { -return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope()); +// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero (e.g. query -> SHOW FILES) +return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && !(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && String.valueOf(o.getValue()).equals("0"))); Review comment: Yes, as I wrote in the previous comment, we can skip the issue I mentioned in one of the comments for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7062) Run-time row group pruning
[ https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7062: - Fix Version/s: (was: 1.16.0) 1.17.0 > Run-time row group pruning > -- > > Key: DRILL-7062 > URL: https://issues.apache.org/jira/browse/DRILL-7062 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Boaz Ben-Zvi >Priority: Major > Fix For: 1.17.0 > > Original Estimate: 504h > Remaining Estimate: 504h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7028) Reduce the planning time of queries on large Parquet tables with large metadata cache files
[ https://issues.apache.org/jira/browse/DRILL-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7028: - Fix Version/s: 1.17.0 > Reduce the planning time of queries on large Parquet tables with large > metadata cache files > --- > > Key: DRILL-7028 > URL: https://issues.apache.org/jira/browse/DRILL-7028 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: performance > Fix For: 1.16.0, 1.17.0 > > > If the Parquet table has a large number of small files, the metadata cache > files grow larger and the planner tries to read the large metadata cache file > which leads to the planning time overhead. Most of the time of execution is > spent during the planning phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814495#comment-16814495 ] ASF GitHub Bot commented on DRILL-7160: --- kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r273990569 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -336,7 +336,8 @@ public String getOperatorsJSON() { } public Map getQueryOptions() { -return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope()); +// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero (e.g. query -> SHOW FILES) +return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && !(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && String.valueOf(o.getValue()).equals("0"))); Review comment: Agreed. I am not sure, but I think I did this here https://github.com/apache/drill/blob/c3ee7949656fb4c1b144e1633f97002c159ec8f3/exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java#L120 because of the complexity in the logic and (I think) because without this, the session value (0) gets precedence when the SYSTEM default exists. I can revisit this post release to avoid blocking the release for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814479#comment-16814479 ] ASF GitHub Bot commented on DRILL-7160: --- arina-ielchiieva commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r273980708 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -336,7 +336,8 @@ public String getOperatorsJSON() { } public Map getQueryOptions() { -return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope()); +// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero (e.g. query -> SHOW FILES) +return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && !(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && String.valueOf(o.getValue()).equals("0"))); Review comment: I think we should not set max rows count to query context if it is the same as default, thus this count won't appear in Query Profile. Current solution just fixes the symptom not the real problem. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814464#comment-16814464 ] ASF GitHub Bot commented on DRILL-7160: --- kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r273972484 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java ## @@ -226,7 +226,10 @@ private static SqlNode checkAndApplyAutoLimit(SqlConverter parser, QueryContext if (isAutoLimitShouldBeApplied(context, sqlNode)) { sqlNode = wrapWithAutoLimit(sqlNode, context); } else { - context.getOptions().setLocalOption(ExecConstants.QUERY_MAX_ROWS, 0); +//Force setting to zero IFF autoLimit was intended to be set originally but is inapplicable Review comment: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814459#comment-16814459 ] ASF GitHub Bot commented on DRILL-7160: --- kkhatua commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r273966816 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -336,7 +336,8 @@ public String getOperatorsJSON() { } public Map getQueryOptions() { -return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope()); +// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero (e.g. query -> SHOW FILES) +return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && !(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && String.valueOf(o.getValue()).equals("0"))); Review comment: This change actually helps to not show the value of Zero being applied, when the query is inapplicable. Are you sure we want to remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7165) Redundant Checksum calculating for ASC files
Vitalii Diravka created DRILL-7165: -- Summary: Redundant Checksum calculating for ASC files Key: DRILL-7165 URL: https://issues.apache.org/jira/browse/DRILL-7165 Project: Apache Drill Issue Type: Improvement Components: Tools, Build Test Affects Versions: 1.15.0 Reporter: Vitalii Diravka Assignee: Vitalii Diravka Fix For: 1.16.0 Currently {{checksum-maven-plugin}} creates sha-512 checksum files for tar an zip archives and for ASC (signature) files. The last is redundant. For example: apache-drill-1.15.0-src.tar.gz.asc.sha512 apache-drill-1.15.0-src.zip.asc.sha512 apache-drill-1.15.0.tar.gz.asc.sha512 The proper list of files: [http://home.apache.org/~vitalii/drill/releases/1.15.0/rc2/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814412#comment-16814412 ] Gayathri commented on DRILL-7161: - [~lhfei] Thank you for your response. If the following query is given, it is working fine without using any CAST even for null values. SELECT sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json`; > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Assignee: Hefei Li >Priority: Blocker > Labels: Drill, issue > Fix For: 1.14.0 > > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814409#comment-16814409 ] ASF GitHub Bot commented on DRILL-7160: --- vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r273925887 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java ## @@ -226,7 +226,10 @@ private static SqlNode checkAndApplyAutoLimit(SqlConverter parser, QueryContext if (isAutoLimitShouldBeApplied(context, sqlNode)) { sqlNode = wrapWithAutoLimit(sqlNode, context); } else { - context.getOptions().setLocalOption(ExecConstants.QUERY_MAX_ROWS, 0); +//Force setting to zero IFF autoLimit was intended to be set originally but is inapplicable Review comment: Please fix indentation and please refactor methods added in the previous commit to pass the value of `context.getOptions().getOption(ExecConstants.QUERY_MAX_ROWS).num_val.intValue()` instead of `context`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814410#comment-16814410 ] ASF GitHub Bot commented on DRILL-7160: --- vvysotskyi commented on pull request #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#discussion_r273924644 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -336,7 +336,8 @@ public String getOperatorsJSON() { } public Map getQueryOptions() { -return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope()); +// Skip reporting QUERY_MAX_ROWS if it is inapplicable and set to zero (e.g. query -> SHOW FILES) +return getOptions(o -> OptionValue.OptionScope.QUERY == o.getScope() && !(ExecConstants.QUERY_MAX_ROWS.equals(o.getName()) && String.valueOf(o.getValue()).equals("0"))); Review comment: Please revert this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7071) Reserved words documentation udpate
[ https://issues.apache.org/jira/browse/DRILL-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-7071: --- Fix Version/s: (was: 1.16.0) Future > Reserved words documentation udpate > --- > > Key: DRILL-7071 > URL: https://issues.apache.org/jira/browse/DRILL-7071 > Project: Apache Drill > Issue Type: Task > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Vitalii Diravka >Assignee: Bridget Bevens >Priority: Minor > Labels: doc, documentation, keyword, reserved-word > Fix For: Future > > > Last time a lot of reserved keywords were added to Drill project, for > instance in DRILL-1328 or DRILL-7058 will introduce new one too. Therefore > Drill reserved keywords in documentation should be updated: > [https://drill.apache.org/docs/reserved-keywords/] > These words should be obtained from these sections of Drill Parser file: > [https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/data/Parser.tdd#L30] > and > [https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/data/Parser.tdd#L390] > _Note:_ this list will be updated after the next Calcite version update. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hefei Li resolved DRILL-7161. - Resolution: Not A Bug Fix Version/s: 1.14.0 By default, Drill does not support different types of JSON lists. For support on JSON data types, you can refer to the **[JSON Data Model|https://drill.apache.org/docs/json-data-model/]. In this case, the ‘B’ column in your given test data is all null. When Drill reads the column, it will be processed by default according to the VARCHAR type. So, if you want to work with numeric types as you expect, you can use the [CAST|https://drill.apache.org/docs/data-type-conversion/] type conversion function provided by Drill. Such as: {code:java} select a, sum(CAST(b as INT)) from dfs.`/drill/data/sample.json` group by a {code} Then it will work fine. > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Assignee: Hefei Li >Priority: Blocker > Labels: Drill, issue > Fix For: 1.14.0 > > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hefei Li updated DRILL-7161: Comment: was deleted (was: By default, Drill does not support different types of JSON lists. For support on JSON data types, you can refer to the *[JSON Data Model|[https://drill.apache.org/docs/json-data-model/]]*. In this case, the ‘B’ column in your given test data is all null. When Drill reads the column, it will be processed by default according to the VARCHAR type. So, if you want to work with numeric types as you expect, you can use the *[CAST|[https://drill.apache.org/docs/data-type-conversion/]]* type conversion function provided by Drill. Such as: {code:java} select a, sum(CAST(b as INT)) from dfs.`/drill/data/sample.json` group by a {code} Then it will work fine.) > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Assignee: Hefei Li >Priority: Blocker > Labels: Drill, issue > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814223#comment-16814223 ] Hefei Li commented on DRILL-7161: - By default, Drill does not support different types of JSON lists. For support on JSON data types, you can refer to the *[JSON Data Model|[https://drill.apache.org/docs/json-data-model/]]*. In this case, the ‘B’ column in your given test data is all null. When Drill reads the column, it will be processed by default according to the VARCHAR type. So, if you want to work with numeric types as you expect, you can use the *[CAST|[https://drill.apache.org/docs/data-type-conversion/]]* type conversion function provided by Drill. Such as: {code:java} select a, sum(CAST(b as INT)) from dfs.`/drill/data/sample.json` group by a {code} Then it will work fine. > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Assignee: Hefei Li >Priority: Blocker > Labels: Drill, issue > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7116) Adapt statistics to use Drill Metastore API
[ https://issues.apache.org/jira/browse/DRILL-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-7116. Resolution: Fixed Fix Version/s: (was: 1.17.0) 1.16.0 Fixed int the scope of DRILL-7089 > Adapt statistics to use Drill Metastore API > --- > > Key: DRILL-7116 > URL: https://issues.apache.org/jira/browse/DRILL-7116 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > The current implementation of statistics supposes the usage of files for > storing and reading statistics. > The aim of this Jira is to adapt statistics to use Drill Metastore API so in > future it may be stored in other metastore implementations. > Implementation details: > - Move statistics info into {{TableMetadata}} > - Provide a way for obtaining {{TableMetadata}} in the places where > statistics may be used (partially implemented in the scope of DRILL-7089) > - Investigate and implement (if possible) lazy materialization of > {{DrillStatsTable}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hefei Li reassigned DRILL-7161: --- Assignee: Hefei Li > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Assignee: Hefei Li >Priority: Blocker > Labels: Drill, issue > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)