[jira] [Updated] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gayathri updated DRILL-7161: Priority: Blocker (was: Critical) > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Priority: Blocker > Labels: Drill, issue > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-7077: -- Labels: doc-complete ready-to-commit (was: doc-impacting ready-to-commit) > Add Function to Facilitate Time Series Analysis > --- > > Key: DRILL-7077 > URL: https://issues.apache.org/jira/browse/DRILL-7077 > Project: Apache Drill > Issue Type: New Feature >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-complete, ready-to-commit > Fix For: 1.16.0 > > > When analyzing time based data, you will often have to aggregate by time > grains. While some time grains will be easy to calculate, others, such as > quarter, can be quite difficult. These functions enable a user to quickly and > easily aggregate data by various units of time. Usage is as follows: > {code:java} > SELECT > FROM > GROUP BY nearestDate(, {code} > So let's say that a user wanted to count the number of hits on a web server > per 15 minute, the query might look like this: > {code:java} > SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate, > COUNT(*) AS hitCount > FROM dfs.`log.httpd` > GROUP BY nearestDate(`eventDate`, '15MINUTE'){code} > Currently supports the following time units: > * YEAR > * QUARTER > * MONTH > * WEEK_SUNDAY > * WEEK_MONDAY > * DAY > * HOUR > * HALF_HOUR / 30MIN > * QUARTER_HOUR / 15MIN > * MINUTE > * 30SECOND > * 15SECOND > * SECOND > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813987#comment-16813987 ] Bridget Bevens commented on DRILL-7077: --- [~cgivre] the info is posted here: https://drill.apache.org/docs/date-time-functions-and-arithmetic/#nearestdate Let me know if I need to change anything. Thanks, Bridget > Add Function to Facilitate Time Series Analysis > --- > > Key: DRILL-7077 > URL: https://issues.apache.org/jira/browse/DRILL-7077 > Project: Apache Drill > Issue Type: New Feature >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > When analyzing time based data, you will often have to aggregate by time > grains. While some time grains will be easy to calculate, others, such as > quarter, can be quite difficult. These functions enable a user to quickly and > easily aggregate data by various units of time. Usage is as follows: > {code:java} > SELECT > FROM > GROUP BY nearestDate(, {code} > So let's say that a user wanted to count the number of hits on a web server > per 15 minute, the query might look like this: > {code:java} > SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate, > COUNT(*) AS hitCount > FROM dfs.`log.httpd` > GROUP BY nearestDate(`eventDate`, '15MINUTE'){code} > Currently supports the following time units: > * YEAR > * QUARTER > * MONTH > * WEEK_SUNDAY > * WEEK_MONDAY > * DAY > * HOUR > * HALF_HOUR / 30MIN > * QUARTER_HOUR / 15MIN > * MINUTE > * 30SECOND > * 15SECOND > * SECOND > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813963#comment-16813963 ] ASF GitHub Bot commented on DRILL-7064: --- sohami commented on pull request #1736: DRILL-7064: Leverage the summary metadata for plain COUNT aggregates. URL: https://github.com/apache/drill/pull/1736 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Leverage the summary's totalRowCount and totalNullCount for COUNT() queries > (also prevent eager expansion of files) > --- > > Key: DRILL-7064 > URL: https://issues.apache.org/jira/browse/DRILL-7064 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Aman Sinha >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This sub-task is meant to leverage the Parquet metadata cache's summary > stats: totalRowCount (across all files and row groups) and the per-column > totalNullCount (across all files and row groups) to answer plain COUNT > aggregation queries without Group-By. These are currently converted to a > DirectScan by the ConvertCountToDirectScanRule which utilizes the row group > metadata; however this rule is applied on Drill Logical rels and converts the > logical plan to a physical plan with DirectScanPrel but this is too late > since the DrillScanRel that is already created during logical planning has > already read the entire metadata cache file along with its full list of row > group entries. The metadata cache file can grow quite large and this does not > scale. > The solution is to use the Metadata Summary file that is created in > DRILL-7063 and create a new rule that will apply early on such that it > operates on the Calcite logical rels instead of the Drill logical rels and > prevents eager expansion of the list of files/row groups. > We will not remove the existing rule. The existing rule will continue to > operate as before because it is possible that after some transformations, we > still want to apply the optimizations for COUNT queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7089: - Component/s: Metadata > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813964#comment-16813964 ] ASF GitHub Bot commented on DRILL-7089: --- sohami commented on pull request #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813895#comment-16813895 ] Bridget Bevens commented on DRILL-7077: --- Hi [~cgivre], I'm trying this function and may be doing something wrong, but 15SECOND and 30SECOND are not working for me: select nearestdate(CAST(COLUMNS[2] as timestamp), '30SECOND') as nearest_second from dfs.samples.`/bee/time.csv`; Error: SYSTEM ERROR: DrillRuntimeException: [30SECOND] is not a valid time statement. Expecting: [YEAR, QUARTER, MONTH, WEEK_SUNDAY, WEEK_MONDAY, DAY, HOUR, HALF_HOUR, QUARTER_HOUR, MINUTE, HALF_MINUTE, QUARTER_MINUTE, SECOND] Fragment 0:0 Please, refer to logs for more information. [Error Id: f119202e-ec24-4670-83c2-14b4a7f83ebf on doc23.lab:31010] (state=,code=0) apache drill> select nearestdate(CAST(COLUMNS[2] as timestamp), 'SECOND') as nearest_second from dfs.samples.`/bee/time.csv`; +---+ |nearest_second | +---+ | 2018-01-01 05:10:15.0 | | 2017-02-02 01:02:03.0 | | 2003-04-06 07:11:11.0 | +---+ 3 rows selected (0.191 seconds) Are 15SECOND and 30SECOND supported? Thanks, Bridget > Add Function to Facilitate Time Series Analysis > --- > > Key: DRILL-7077 > URL: https://issues.apache.org/jira/browse/DRILL-7077 > Project: Apache Drill > Issue Type: New Feature >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > When analyzing time based data, you will often have to aggregate by time > grains. While some time grains will be easy to calculate, others, such as > quarter, can be quite difficult. These functions enable a user to quickly and > easily aggregate data by various units of time. Usage is as follows: > {code:java} > SELECT > FROM > GROUP BY nearestDate(, {code} > So let's say that a user wanted to count the number of hits on a web server > per 15 minute, the query might look like this: > {code:java} > SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate, > COUNT(*) AS hitCount > FROM dfs.`log.httpd` > GROUP BY nearestDate(`eventDate`, '15MINUTE'){code} > Currently supports the following time units: > * YEAR > * QUARTER > * MONTH > * WEEK_SUNDAY > * WEEK_MONDAY > * DAY > * HOUR > * HALF_HOUR / 30MIN > * QUARTER_HOUR / 15MIN > * MINUTE > * 30SECOND > * 15SECOND > * SECOND > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.
[ https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri updated DRILL-7164: --- Description: On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in plan: { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "kind" : "STRING", "accessibleScopes" : "ALL", "name" : "store.kafka.record.reader", "string_val" : "org.apache.drill.exec.store.kafka.decoders.JsonMessageReader", "scope" : "SESSION" }, { "kind" : "BOOLEAN", "accessibleScopes" : "ALL", "name" : "exec.errors.verbose", "bool_val" : true, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "store.kafka.poll.timeout", "num_val" : 5000, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "planner.width.max_per_node", "num_val" : 2, "scope" : "SESSION" } ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "kafka-scan", "@id" : 6, "userName" : "", "kafkaStoragePluginConfig" : { "type" : "kafka", "kafkaConsumerProps" : { "bootstrap.servers" : "127.0.0.1:56524", "group.id" : "drill-test-consumer" }, "enabled" : true }, "columns" : [ "`**`", "`kafkaMsgOffset`" ], "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "project", "@id" : 5, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`**`" }, { "ref" : "`kafkaMsgOffset`", "expr" : "`kafkaMsgOffset`" } ], "child" : 6, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "filter", "@id" : 4, "child" : 5, "expr" : "equal(`kafkaMsgOffset`, 9) ", "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 0.75 } }, { "pop" : "selection-vector-remover", "@id" : 3, "child" : 4, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`T23¦¦**`" } ], "child" : 3, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`**`", "expr" : "`T23¦¦**`" } ], "child" : 2, "outputProj" : true, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } } ] }! {code} In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a change in the way cost is being represented. It also has the changed the test which I think is not right. The pattern to compare in the plan should be made smart to fix this issue generically. was: On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in
[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.
[ https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri updated DRILL-7164: --- Description: On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, "cost" in plan: { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "kind" : "STRING", "accessibleScopes" : "ALL", "name" : "store.kafka.record.reader", "string_val" : "org.apache.drill.exec.store.kafka.decoders.JsonMessageReader", "scope" : "SESSION" }, { "kind" : "BOOLEAN", "accessibleScopes" : "ALL", "name" : "exec.errors.verbose", "bool_val" : true, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "store.kafka.poll.timeout", "num_val" : 5000, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "planner.width.max_per_node", "num_val" : 2, "scope" : "SESSION" } ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "kafka-scan", "@id" : 6, "userName" : "", "kafkaStoragePluginConfig" : { "type" : "kafka", "kafkaConsumerProps" : { "bootstrap.servers" : "127.0.0.1:56524", "group.id" : "drill-test-consumer" }, "enabled" : true }, "columns" : [ "`**`", "`kafkaMsgOffset`" ], "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "project", "@id" : 5, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`**`" }, { "ref" : "`kafkaMsgOffset`", "expr" : "`kafkaMsgOffset`" } ], "child" : 6, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "filter", "@id" : 4, "child" : 5, "expr" : "equal(`kafkaMsgOffset`, 9) ", "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 0.75 } }, { "pop" : "selection-vector-remover", "@id" : 3, "child" : 4, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`T23¦¦**`" } ], "child" : 3, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`**`", "expr" : "`T23¦¦**`" } ], "child" : 2, "outputProj" : true, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } } ] }! {code} In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a change in the way cost is being represented. It also has the changed the test which I think is not right. The pattern to compare in the plan should be made smart to fix this issue generically. was: On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in plan:
[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.
[ https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri updated DRILL-7164: --- Description: On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in plan: { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "kind" : "STRING", "accessibleScopes" : "ALL", "name" : "store.kafka.record.reader", "string_val" : "org.apache.drill.exec.store.kafka.decoders.JsonMessageReader", "scope" : "SESSION" }, { "kind" : "BOOLEAN", "accessibleScopes" : "ALL", "name" : "exec.errors.verbose", "bool_val" : true, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "store.kafka.poll.timeout", "num_val" : 5000, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "planner.width.max_per_node", "num_val" : 2, "scope" : "SESSION" } ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "kafka-scan", "@id" : 6, "userName" : "", "kafkaStoragePluginConfig" : { "type" : "kafka", "kafkaConsumerProps" : { "bootstrap.servers" : "127.0.0.1:56524", "group.id" : "drill-test-consumer" }, "enabled" : true }, "columns" : [ "`**`", "`kafkaMsgOffset`" ], "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "project", "@id" : 5, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`**`" }, { "ref" : "`kafkaMsgOffset`", "expr" : "`kafkaMsgOffset`" } ], "child" : 6, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "filter", "@id" : 4, "child" : 5, "expr" : "equal(`kafkaMsgOffset`, 9) ", "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 0.75 } }, { "pop" : "selection-vector-remover", "@id" : 3, "child" : 4, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`T23¦¦**`" } ], "child" : 3, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`**`", "expr" : "`T23¦¦**`" } ], "child" : 2, "outputProj" : true, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } } ] }! {code} In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a change in the way cost is being represented. This has the changed the test which I think is not right. The pattern to compare in the plan should be made smart to fix this issue generically. was: On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in pl
[jira] [Created] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.
Hanumath Rao Maduri created DRILL-7164: -- Summary: KafkaFilterPushdownTest is sometimes failing to pattern match correctly. Key: DRILL-7164 URL: https://issues.apache.org/jira/browse/DRILL-7164 Project: Apache Drill Issue Type: Bug Components: Storage - Kafka Affects Versions: 1.16.0 Reporter: Hanumath Rao Maduri Assignee: Abhishek Ravi Fix For: 1.17.0 On my private build I am hitting kafka storage tests issue intermittently. Here is the issue which I came across. {code} at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] 15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest) java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, .* .* "cost" in plan: { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "kind" : "STRING", "accessibleScopes" : "ALL", "name" : "store.kafka.record.reader", "string_val" : "org.apache.drill.exec.store.kafka.decoders.JsonMessageReader", "scope" : "SESSION" }, { "kind" : "BOOLEAN", "accessibleScopes" : "ALL", "name" : "exec.errors.verbose", "bool_val" : true, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "store.kafka.poll.timeout", "num_val" : 5000, "scope" : "SESSION" }, { "kind" : "LONG", "accessibleScopes" : "ALL", "name" : "planner.width.max_per_node", "num_val" : 2, "scope" : "SESSION" } ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "kafka-scan", "@id" : 6, "userName" : "", "kafkaStoragePluginConfig" : { "type" : "kafka", "kafkaConsumerProps" : { "bootstrap.servers" : "127.0.0.1:56524", "group.id" : "drill-test-consumer" }, "enabled" : true }, "columns" : [ "`**`", "`kafkaMsgOffset`" ], "kafkaScanSpec" : { "topicName" : "drill-pushdown-topic" }, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "project", "@id" : 5, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`**`" }, { "ref" : "`kafkaMsgOffset`", "expr" : "`kafkaMsgOffset`" } ], "child" : 6, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 5.0 } }, { "pop" : "filter", "@id" : 4, "child" : 5, "expr" : "equal(`kafkaMsgOffset`, 9) ", "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 0.75 } }, { "pop" : "selection-vector-remover", "@id" : 3, "child" : 4, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "`T23¦¦**`", "expr" : "`T23¦¦**`" } ], "child" : 3, "outputProj" : false, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`**`", "expr" : "`T23¦¦**`" } ], "child" : 2, "outputProj" : true, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } }, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 1.0 } } ] }! {code} In the earlier checkin there is a change in the way cost is being represented. This has the changed the test which I think is not right. The pattern to compare in the plan should be made smart to fix this issue generically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7163) Join query fails with java.lang.IllegalArgumentException: null
Khurram Faraaz created DRILL-7163: - Summary: Join query fails with java.lang.IllegalArgumentException: null Key: DRILL-7163 URL: https://issues.apache.org/jira/browse/DRILL-7163 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0 Reporter: Khurram Faraaz Join query fails with java.lang.IllegalArgumentException: null Drill : 1.15.0 Failing query is {noformat} Select * From ( select convert_from(t.itm.iUUID, 'UTF8') iUUID, convert_from(t.UPC.UPC14, 'UTF8') UPC14, convert_from(t.itm.upcDesc, 'UTF8') upcDesc, convert_from(t.ris.mstBrdOid, 'UTF8') mstBrdOid, convert_from(t.ris.vrfLgyMtch, 'UTF8') vrfLgyMtch, convert_from(t.itm.mtch.cfdMtch, 'UTF8') cfdMtch, convert_from(t.itm.uoM, 'UTF8') uoM, convert_from(t.uomRec.valVal, 'UTF8') uomVal, case when a.iUUID is null then 0 else 1 end as keyind from hbase.`/mapr/tables/item-master` t left outer join ( select distinct convert_from(t.m.iUUID, 'UTF8') iUUID from hbase.`/mapr/tables/items` t ) a on t.itm.iUUID = a.iUUID ) i where (i.mstBrdOid is null or i.vrfLgyMtch is null) and i.keyind=1 {noformat} Stack trace from drillbit.log {noformat} 2019-03-27 11:45:44,563 [23646564-3d23-f32b-6f68-11d7c4dd7a19:frag:1:0] ERROR o.a.d.e.physical.impl.BaseRootExec - Batch dump started: dumping last 2 failed batches 2019-03-27 11:45:44,564 [23646564-3d23-f32b-6f68-11d7c4dd7a19:frag:1:0] ERROR o.a.d.e.p.i.p.ProjectRecordBatch - ProjectRecordBatch[projector=Projector[vector2=null, selectionVectorMode=NONE], hasRemainder=false, remainderIndex=0, recordCount=0, container=org.apache.drill.exec.record.VectorContainer@2133fd0e[recordCount = 0, schemaChanged = false, schema = BatchSchema [fields=[[`row_key` (VARBINARY:REQUIRED)], [`clnDesc` (MAP:REQUIRED), children=([`bndlCnt` (VARBINARY:OPTIONAL)], [`by` (VARBINARY:OPTIONAL)], [`desc` (VARBINARY:OPTIONAL)], [`dt` (VARBINARY:OPTIONAL)], [`descExt` (VARBINARY:OPTIONAL)])], [`dup` (MAP:REQUIRED), children=([`dupBy` (VARBINARY:OPTIONAL)], [`dupDt` (VARBINARY:OPTIONAL)], [`duplicate` (VARBINARY:OPTIONAL)], [`preferred` (VARBINARY:OPTIONAL)])], [`itm` (MAP:REQUIRED), children=([`iUUID` (VARBINARY:OPTIONAL)], [`cfdLgyMtch` (VARBINARY:OPTIONAL)], [`uoM` (VARBINARY:OPTIONAL)], [`upcCd` (VARBINARY:OPTIONAL)], [`upcDesc` (VARBINARY:OPTIONAL)], [`promo` (VARBINARY:OPTIONAL)])], [`lckSts` (MAP:REQUIRED), children=([`lckBy` (VARBINARY:OPTIONAL)], [`lckDt` (VARBINARY:OPTIONAL)])], [`lgy` (MAP:REQUIRED), children=([`lgyBr` (VARBINARY:OPTIONAL)])], [`obs` (MAP:REQUIRED), children=([`POSFile` (VARBINARY:OPTIONAL)])], [`prmRec` (MAP:REQUIRED)], [`ris` (MAP:REQUIRED), children=([`UPC` (VARBINARY:OPTIONAL)], [`brdDesc` (VARBINARY:OPTIONAL)], [`brdExtDesc` (VARBINARY:OPTIONAL)], [`brdFamDesc` (VARBINARY:OPTIONAL)], [`brdTypeCd` (VARBINARY:OPTIONAL)], [`flvDesc` (VARBINARY:OPTIONAL)], [`mfgDesc` (VARBINARY:OPTIONAL)], [`modBy` (VARBINARY:OPTIONAL)], [`modDt` (VARBINARY:OPTIONAL)], [`msaCatCd` (VARBINARY:OPTIONAL)])], [`rjr` (MAP:REQUIRED)], [`uomRec` (MAP:REQUIRED), children=([`valBy` (VARBINARY:OPTIONAL)], [`valDt` (VARBINARY:OPTIONAL)], [`valVal` (VARBINARY:OPTIONAL)], [`recBy` (VARBINARY:OPTIONAL)], [`recDt` (VARBINARY:OPTIONAL)], [`recRat` (VARBINARY:OPTIONAL)], [`recVal` (VARBINARY:OPTIONAL)])], [`upc` (MAP:REQUIRED), children=([`UPC14` (VARBINARY:OPTIONAL)], [`allUPCVar` (VARBINARY:OPTIONAL)])], [`$f12` (VARBINARY:OPTIONAL)], [`iUUID` (VARCHAR:OPTIONAL)]], selectionVector=NONE], wrappers = [org.apache.drill.exec.vector.VarBinaryVector@b23a384[field = [`row_key` (VARBINARY:REQUIRED)], ...], org.apache.drill.exec.vector.complex.MapVector@61c779ff, org.apache.drill.exec.vector.complex.MapVector@575c0f96, org.apache.drill.exec.vector.complex.MapVector@69b943fe, org.apache.drill.exec.vector.complex.MapVector@7f90e2ce, org.apache.drill.exec.vector.complex.MapVector@25c27442, org.apache.drill.exec.vector.complex.MapVector@12d5ffd3, org.apache.drill.exec.vector.complex.MapVector@3150f8c4, org.apache.drill.exec.vector.complex.MapVector@49aefab2, org.apache.drill.exec.vector.complex.MapVector@7f78e7a1, org.apache.drill.exec.vector.complex.MapVector@426ea4fa, org.apache.drill.exec.vector.complex.MapVector@74cee2ab, org.apache.drill.exec.vector.NullableVarBinaryVector@4a0bfdea[field = [`$f12` (VARBINARY:OPTIONAL)], ...], org.apache.drill.exec.vector.NullableVarCharVector@72f64ee5[field = [`iUUID` (VARCHAR:OPTIONAL)], ...]], ...]] 2019-03-27 11:45:44,565 [23646564-3d23-f32b-6f68-11d7c4dd7a19:frag:1:0] ERROR o.a.d.e.p.impl.join.HashJoinBatch - HashJoinBatch[container=org.apache.drill.exec.record.VectorContainer@45887d35[recordCount = 0, schemaChanged = false, schema = BatchSchema [fields=[[`row_key` (VARBINARY:REQUIRED)], [`clnDesc` (MAP:REQUIRED), children=([`bndlCnt` (VARBINARY:OPTIONAL)], [`by` (VARBINARY:OPTIONAL)], [`desc` (VARBINARY:OPTIONAL)],
[jira] [Commented] (DRILL-7065) Ensure backward compatibility is maintained
[ https://issues.apache.org/jira/browse/DRILL-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813660#comment-16813660 ] Sorabh Hamirwasia commented on DRILL-7065: -- Merged with DRILL-7063 > Ensure backward compatibility is maintained > > > Key: DRILL-7065 > URL: https://issues.apache.org/jira/browse/DRILL-7065 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7065) Ensure backward compatibility is maintained
[ https://issues.apache.org/jira/browse/DRILL-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7065: - Reviewer: Volodymyr Vysotskyi > Ensure backward compatibility is maintained > > > Key: DRILL-7065 > URL: https://issues.apache.org/jira/browse/DRILL-7065 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7065) Ensure backward compatibility is maintained
[ https://issues.apache.org/jira/browse/DRILL-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7065: - Labels: ready-to-commit (was: ) > Ensure backward compatibility is maintained > > > Key: DRILL-7065 > URL: https://issues.apache.org/jira/browse/DRILL-7065 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7066) Auto-refresh should pick up existing columns from metadata cache
[ https://issues.apache.org/jira/browse/DRILL-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7066: - Labels: ready-to-commit (was: ) > Auto-refresh should pick up existing columns from metadata cache > > > Key: DRILL-7066 > URL: https://issues.apache.org/jira/browse/DRILL-7066 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 72h > Remaining Estimate: 72h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7066) Auto-refresh should pick up existing columns from metadata cache
[ https://issues.apache.org/jira/browse/DRILL-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7066: - Reviewer: Aman Sinha > Auto-refresh should pick up existing columns from metadata cache > > > Key: DRILL-7066 > URL: https://issues.apache.org/jira/browse/DRILL-7066 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 72h > Remaining Estimate: 72h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7066) Auto-refresh should pick up existing columns from metadata cache
[ https://issues.apache.org/jira/browse/DRILL-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813656#comment-16813656 ] Sorabh Hamirwasia commented on DRILL-7066: -- Merged with DRILL-7063 > Auto-refresh should pick up existing columns from metadata cache > > > Key: DRILL-7066 > URL: https://issues.apache.org/jira/browse/DRILL-7066 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 72h > Remaining Estimate: 72h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813655#comment-16813655 ] ASF GitHub Bot commented on DRILL-7064: --- amansinha100 commented on issue #1736: DRILL-7064: Leverage the summary metadata for plain COUNT aggregates. URL: https://github.com/apache/drill/pull/1736#issuecomment-481356363 Rebased on master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Leverage the summary's totalRowCount and totalNullCount for COUNT() queries > (also prevent eager expansion of files) > --- > > Key: DRILL-7064 > URL: https://issues.apache.org/jira/browse/DRILL-7064 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Aman Sinha >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This sub-task is meant to leverage the Parquet metadata cache's summary > stats: totalRowCount (across all files and row groups) and the per-column > totalNullCount (across all files and row groups) to answer plain COUNT > aggregation queries without Group-By. These are currently converted to a > DirectScan by the ConvertCountToDirectScanRule which utilizes the row group > metadata; however this rule is applied on Drill Logical rels and converts the > logical plan to a physical plan with DirectScanPrel but this is too late > since the DrillScanRel that is already created during logical planning has > already read the entire metadata cache file along with its full list of row > group entries. The metadata cache file can grow quite large and this does not > scale. > The solution is to use the Metadata Summary file that is created in > DRILL-7063 and create a new rule that will apply early on such that it > operates on the Calcite logical rels instead of the Drill logical rels and > prevents eager expansion of the list of files/row groups. > We will not remove the existing rule. The existing rule will continue to > operate as before because it is possible that after some transformations, we > still want to apply the optimizations for COUNT queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7064: --- Labels: ready-to-commit (was: ) > Leverage the summary's totalRowCount and totalNullCount for COUNT() queries > (also prevent eager expansion of files) > --- > > Key: DRILL-7064 > URL: https://issues.apache.org/jira/browse/DRILL-7064 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Aman Sinha >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This sub-task is meant to leverage the Parquet metadata cache's summary > stats: totalRowCount (across all files and row groups) and the per-column > totalNullCount (across all files and row groups) to answer plain COUNT > aggregation queries without Group-By. These are currently converted to a > DirectScan by the ConvertCountToDirectScanRule which utilizes the row group > metadata; however this rule is applied on Drill Logical rels and converts the > logical plan to a physical plan with DirectScanPrel but this is too late > since the DrillScanRel that is already created during logical planning has > already read the entire metadata cache file along with its full list of row > group entries. The metadata cache file can grow quite large and this does not > scale. > The solution is to use the Metadata Summary file that is created in > DRILL-7063 and create a new rule that will apply early on such that it > operates on the Calcite logical rels instead of the Drill logical rels and > prevents eager expansion of the list of files/row groups. > We will not remove the existing rule. The existing rule will continue to > operate as before because it is possible that after some transformations, we > still want to apply the optimizations for COUNT queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813614#comment-16813614 ] ASF GitHub Bot commented on DRILL-7089: --- vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728#issuecomment-481335547 @amansinha100, thanks for pointing this, they passed, so I have added `ready-to-commit` label. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7089: --- Labels: ready-to-commit (was: ) > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813611#comment-16813611 ] ASF GitHub Bot commented on DRILL-7089: --- amansinha100 commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728#issuecomment-481333534 @vvysotskyi if the regression tests pass you can update the read-to-commit label in the JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813591#comment-16813591 ] ASF GitHub Bot commented on DRILL-7064: --- amansinha100 commented on issue #1736: DRILL-7064: Leverage the summary metadata for plain COUNT aggregates. URL: https://github.com/apache/drill/pull/1736#issuecomment-481324430 @vvysotskyi I have addressed the missed comment. Pls take a look. Also, I haven't yet rebased on latest master since master already has the DRILL-7063 commit, so I will need to decouple this PR such that only the changes for DRILL-7064 can be applied. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Leverage the summary's totalRowCount and totalNullCount for COUNT() queries > (also prevent eager expansion of files) > --- > > Key: DRILL-7064 > URL: https://issues.apache.org/jira/browse/DRILL-7064 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venkata Jyothsna Donapati >Assignee: Aman Sinha >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This sub-task is meant to leverage the Parquet metadata cache's summary > stats: totalRowCount (across all files and row groups) and the per-column > totalNullCount (across all files and row groups) to answer plain COUNT > aggregation queries without Group-By. These are currently converted to a > DirectScan by the ConvertCountToDirectScanRule which utilizes the row group > metadata; however this rule is applied on Drill Logical rels and converts the > logical plan to a physical plan with DirectScanPrel but this is too late > since the DrillScanRel that is already created during logical planning has > already read the entire metadata cache file along with its full list of row > group entries. The metadata cache file can grow quite large and this does not > scale. > The solution is to use the Metadata Summary file that is created in > DRILL-7063 and create a new rule that will apply early on such that it > operates on the Calcite logical rels instead of the Drill logical rels and > prevents eager expansion of the list of files/row groups. > We will not remove the existing rule. The existing rule will continue to > operate as before because it is possible that after some transformations, we > still want to apply the optimizations for COUNT queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813561#comment-16813561 ] ASF GitHub Bot commented on DRILL-7064: --- amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary metadata for plain COUNT aggregates. URL: https://github.com/apache/drill/pull/1736#discussion_r273561165 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java ## @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.logical; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.calcite.rel.core.Aggregate; +import org.apache.calcite.rel.core.AggregateCall; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.core.TableScan; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rex.RexInputRef; +import org.apache.commons.lang3.tuple.ImmutablePair; +import org.apache.commons.lang3.tuple.Pair; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.logical.FormatPluginConfig; + +import org.apache.drill.exec.physical.base.ScanStats; +import org.apache.drill.exec.planner.common.CountToDirectScanUtils; +import org.apache.drill.exec.planner.common.DrillRelOptUtil; + +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.FileSystemPlugin; +import org.apache.drill.exec.store.dfs.FormatSelection; +import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig; +import org.apache.drill.exec.store.direct.MetadataDirectGroupScan; +import org.apache.drill.exec.store.parquet.ParquetFormatConfig; +import org.apache.drill.exec.store.parquet.ParquetReaderConfig; +import org.apache.drill.exec.store.parquet.metadata.Metadata; +import org.apache.drill.exec.store.parquet.metadata.Metadata_V4; +import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap; +import org.apache.hadoop.fs.Path; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.LinkedHashMap; +import java.util.Set; + +/** + * This rule is a logical planning counterpart to a corresponding ConvertCountToDirectScanPrule + * physical rule + * + * + * This rule will convert " select count(*) as mycount from table " + * or " select count(not-nullable-expr) as mycount from table " into + * + *Project(mycount) + * \ + *DirectGroupScan ( PojoRecordReader ( rowCount )) + * + * or " select count(column) as mycount from table " into + * + * Project(mycount) + * \ + *DirectGroupScan (PojoRecordReader (columnValueCount)) + * + * Rule can be applied if query contains multiple count expressions. + * " select count(column1), count(column2), count(*) from table " + * + * + * + * The rule utilizes the Parquet Metadata Cache's summary information to retrieve the total row count + * and the per-column null count. As such, the rule is only applicable for Parquet tables and only if the + * metadata cache has been created with the summary information. + * + */ +public class ConvertCountToDirectScanRule extends RelOptRule { + + public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new ConvertCountToDirectScanRule( + RelOptHelper.some(Aggregate.class, +RelOptHelper.some(Project.class, +RelOptHelper.any(TableScan.class))), "Agg_on_proj_on_scan:logical"); + + public static final RelOptRule AGG_ON_SCAN = new ConvertCountToDirectScanRule( + RelOptHelper.some(Aggregate.class, +RelOptHelper.any(TableScan.class)), "Agg_on_scan:logic
[jira] [Comment Edited] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs
[ https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813486#comment-16813486 ] Vitalii Diravka edited comment on DRILL-7162 at 4/9/19 2:35 PM: Jetty version is updated in latest master version to 9.3, see DRILL-7051. There an issue with Jetty 9.4 version, see DRILL-7135. [~er.ayushsha...@gmail.com] Regarding other CVEs, if you are able to fix them please open the PRs. Thanks was (Author: vitalii): Jetty version is updated in latest master version to 9.3, see DRILL-7051. There an issue with Jetty 9.4 version, see DRILL-7135. [~er.ayushsha...@gmail.com] Regarding other CVEs, please publish here the list and if you are able to fix them please open the PRs. Thanks > Apache Drill uses 3rd Party with Highest CVEs > -- > > Key: DRILL-7162 > URL: https://issues.apache.org/jira/browse/DRILL-7162 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Ayush Sharma >Priority: Major > > Apache Drill uses rd party libraries with almost 250+ CVEs. > Most of the CVEs are in the older version of Jetty (9.1.x) whereas the > current version of Jetty is 9.4.x > Also many of the other libraries are in EOF versions and the are not patched > even in the latest release. > This creates an issue of security when we use it in production. > We are able to replace many older version of libraries with the latest > versions with no CVEs , however many of them are not replaceable as it is and > would require some changes in the source code. > The jetty version is of the highest priority and needs migration to 9.4.x > version immediately. > > Please look into this issue at immediate priority as it compromises with the > security of the application utilizing Apache Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs
[ https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813486#comment-16813486 ] Vitalii Diravka commented on DRILL-7162: Jetty version is updated in latest master version to 9.3, see DRILL-7051. There an issue with Jetty 9.4 version, see DRILL-7135. [~er.ayushsha...@gmail.com] Regarding other CVEs, please publish here the list and if you are able to fix them please open the PRs. Thanks > Apache Drill uses 3rd Party with Highest CVEs > -- > > Key: DRILL-7162 > URL: https://issues.apache.org/jira/browse/DRILL-7162 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Ayush Sharma >Priority: Blocker > > Apache Drill uses rd party libraries with almost 250+ CVEs. > Most of the CVEs are in the older version of Jetty (9.1.x) whereas the > current version of Jetty is 9.4.x > Also many of the other libraries are in EOF versions and the are not patched > even in the latest release. > This creates an issue of security when we use it in production. > We are able to replace many older version of libraries with the latest > versions with no CVEs , however many of them are not replaceable as it is and > would require some changes in the source code. > The jetty version is of the highest priority and needs migration to 9.4.x > version immediately. > > Please look into this issue at immediate priority as it compromises with the > security of the application utilizing Apache Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs
[ https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-7162: --- Priority: Major (was: Blocker) > Apache Drill uses 3rd Party with Highest CVEs > -- > > Key: DRILL-7162 > URL: https://issues.apache.org/jira/browse/DRILL-7162 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Ayush Sharma >Priority: Major > > Apache Drill uses rd party libraries with almost 250+ CVEs. > Most of the CVEs are in the older version of Jetty (9.1.x) whereas the > current version of Jetty is 9.4.x > Also many of the other libraries are in EOF versions and the are not patched > even in the latest release. > This creates an issue of security when we use it in production. > We are able to replace many older version of libraries with the latest > versions with no CVEs , however many of them are not replaceable as it is and > would require some changes in the source code. > The jetty version is of the highest priority and needs migration to 9.4.x > version immediately. > > Please look into this issue at immediate priority as it compromises with the > security of the application utilizing Apache Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs
Ayush Sharma created DRILL-7162: --- Summary: Apache Drill uses 3rd Party with Highest CVEs Key: DRILL-7162 URL: https://issues.apache.org/jira/browse/DRILL-7162 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0, 1.14.0, 1.13.0 Reporter: Ayush Sharma Apache Drill uses rd party libraries with almost 250+ CVEs. Most of the CVEs are in the older version of Jetty (9.1.x) whereas the current version of Jetty is 9.4.x Also many of the other libraries are in EOF versions and the are not patched even in the latest release. This creates an issue of security when we use it in production. We are able to replace many older version of libraries with the latest versions with no CVEs , however many of them are not replaceable as it is and would require some changes in the source code. The jetty version is of the highest priority and needs migration to 9.4.x version immediately. Please look into this issue at immediate priority as it compromises with the security of the application utilizing Apache Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gayathri updated DRILL-7161: Affects Version/s: 1.14.0 > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.14.0 >Reporter: Gayathri >Priority: Critical > Labels: Drill, issue > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7161) Aggregation with group by clause
[ https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gayathri updated DRILL-7161: Labels: Drill issue (was: ) > Aggregation with group by clause > > > Key: DRILL-7161 > URL: https://issues.apache.org/jira/browse/DRILL-7161 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Reporter: Gayathri >Priority: Critical > Labels: Drill, issue > > Facing some issues with the following case: > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} > {"a":2,"b":null} > {"a":3,"b":null} > {"a":4,"b":null} > *Query:* > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; > *Error:* > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions > supported for VarChar type > *Observation:* > If we query without using group by, then it is working fine without any > error. If group by is used, then sum of null values is throwing the above > error. > > Can anyone please let us know the solution for this or if there are any > alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65
[ https://issues.apache.org/jira/browse/DRILL-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre closed DRILL-7153. Fixed. > Drill Fails to Build using JDK 1.8.0_65 > --- > > Key: DRILL-7153 > URL: https://issues.apache.org/jira/browse/DRILL-7153 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Blocker > Labels: ready-to-commit > Fix For: 1.16.0 > > > Drill fails to build when using Java 1.8.0_65. Throws the following error: > [{{ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile > (default-compile) on project drill-java-exec: Compilation failure > [ERROR] > /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68] > error: unreported exception E; must be caught or declared to be thrown > [ERROR] where E,T,V are type-variables: > [ERROR] E extends Exception declared in method > accept(ExprVisitor,V) > [ERROR] T extends Object declared in method > accept(ExprVisitor,V) > [ERROR] V extends Object declared in method > accept(ExprVisitor,V) > [ERROR] > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :drill-java-exec}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes
[ https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813263#comment-16813263 ] ASF GitHub Bot commented on DRILL-7089: --- vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API URL: https://github.com/apache/drill/pull/1728#issuecomment-481200553 Rebased onto the master and resolved merge conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement caching of BaseMetadata classes > - > > Key: DRILL-7089 > URL: https://issues.apache.org/jira/browse/DRILL-7089 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.16.0 > > > In the scope of DRILL-6852 were introduced new classes for metadata usage. > These classes may be reused in other GroupScan instances to preserve heap > usage for the case when metadata is large. > The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass > them to the {{GroupScan}}, so in the scope of the single query, it will be > possible to reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode
[ https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-4946. Resolution: Cannot Reproduce Fix Version/s: 1.16.0 Resolving this Jira, since it is not reproduced anymore using query from the description. > org.objectweb.asm.tree.analysis.AnalyzerException printed to console in > embedded mode > - > > Key: DRILL-4946 > URL: https://issues.apache.org/jira/browse/DRILL-4946 > Project: Apache Drill > Issue Type: Bug >Reporter: Chunhui Shi >Assignee: Chunhui Shi >Priority: Critical > Fix For: 1.16.0 > > > Testing by querying a json file got AnalyzerException printed. > The problem was due to scalar_replacement mode is default to be 'try', and > org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. > [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json > {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": > "4a31", "kp2": "38"} > {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null} > {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": > "4e4f", "kp2": "51"} > {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": > "6e6f", "kp2": "31"} > 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as > intkey from dfs.`/tmp/conv.json`; > org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: > Expected an object reference, but found . > at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294) > at > org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837) > at > org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726) > at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412) > at > org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223) > at > org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263) > at > org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78) > at > org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) > at com.google.common.cache.LocalCache.get(LocalCache.java:3937) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) > at > org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63) > at > org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56) > at > org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:484) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) > at > org.apache.drill.exec.record.Abstra
[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode
[ https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813226#comment-16813226 ] ASF GitHub Bot commented on DRILL-4946: --- vvysotskyi commented on issue #619: DRILL-4946: redirect System.err so users under embedded mode won't se… URL: https://github.com/apache/drill/pull/619#issuecomment-481192148 Closing this PR, since most of the problems which caused errors during scalar replacement were resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > org.objectweb.asm.tree.analysis.AnalyzerException printed to console in > embedded mode > - > > Key: DRILL-4946 > URL: https://issues.apache.org/jira/browse/DRILL-4946 > Project: Apache Drill > Issue Type: Bug >Reporter: Chunhui Shi >Assignee: Chunhui Shi >Priority: Critical > > Testing by querying a json file got AnalyzerException printed. > The problem was due to scalar_replacement mode is default to be 'try', and > org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. > [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json > {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": > "4a31", "kp2": "38"} > {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null} > {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": > "4e4f", "kp2": "51"} > {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": > "6e6f", "kp2": "31"} > 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as > intkey from dfs.`/tmp/conv.json`; > org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: > Expected an object reference, but found . > at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294) > at > org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837) > at > org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726) > at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412) > at > org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223) > at > org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263) > at > org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78) > at > org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) > at com.google.common.cache.LocalCache.get(LocalCache.java:3937) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) > at > org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63) > at > org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56) > at > org.apache.drill.exe
[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode
[ https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813227#comment-16813227 ] ASF GitHub Bot commented on DRILL-4946: --- vvysotskyi commented on pull request #619: DRILL-4946: redirect System.err so users under embedded mode won't se… URL: https://github.com/apache/drill/pull/619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > org.objectweb.asm.tree.analysis.AnalyzerException printed to console in > embedded mode > - > > Key: DRILL-4946 > URL: https://issues.apache.org/jira/browse/DRILL-4946 > Project: Apache Drill > Issue Type: Bug >Reporter: Chunhui Shi >Assignee: Chunhui Shi >Priority: Critical > > Testing by querying a json file got AnalyzerException printed. > The problem was due to scalar_replacement mode is default to be 'try', and > org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. > [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json > {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": > "4a31", "kp2": "38"} > {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null} > {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": > "4e4f", "kp2": "51"} > {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": > "6e6f", "kp2": "31"} > 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as > intkey from dfs.`/tmp/conv.json`; > org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: > Expected an object reference, but found . > at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294) > at > org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837) > at > org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87) > at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877) > at > org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837) > at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726) > at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412) > at > org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223) > at > org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263) > at > org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78) > at > org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) > at com.google.common.cache.LocalCache.get(LocalCache.java:3937) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) > at > org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63) > at > org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56) > at > org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310) > at > org.apache.drill.exec.physical.impl.
[jira] [Commented] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode
[ https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813214#comment-16813214 ] Arina Ielchiieva commented on DRILL-5679: - [~bbevens] I think this was meant for both but there is no need to mention that, it's just a general pre-requisites for Windows installation. {quote} Click New, and enter JAVA_HOME as the variable name. For the variable value, enter the path to your JDK installation. Instead of using Program Files in the path name, use progra~1. This is required because Drill cannot use file paths with spaces. {quote} Not all users have Java installed in Program Files directory, user can chose any directory during Java installation. I guess you just should mention that. > Document JAVA_HOME requirements for installing Drill in distributed mode > > > Key: DRILL-5679 > URL: https://issues.apache.org/jira/browse/DRILL-5679 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Bridget Bevens >Priority: Major > Labels: doc-complete > Fix For: Future > > > There is general requirement that JAVA_HOME variable should not contain > spaces. > For example, during Drill installation in distributed mode on Windows user > can see the following error: > {noformat} > C:\Drill/bin/runbit: line 107: exec: C:\Program: not found > {noformat} > There are two options to fix this problem: > {noformat} > 1. Install JAVA in directory without spaces. > 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 > (if in x86). > Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71" > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode
[ https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813214#comment-16813214 ] Arina Ielchiieva edited comment on DRILL-5679 at 4/9/19 10:04 AM: -- [~bbevens] I think this was meant for both but there is no need to mention that, it's just a general pre-requisites for Windows installation. {quote} Click New, and enter JAVA_HOME as the variable name. For the variable value, enter the path to your JDK installation. Instead of using Program Files in the path name, use progra~1. This is required because Drill cannot use file paths with spaces. {quote} Not all users have Java installed in Program Files directory, user can chose any directory during Java installation. I guess you just should mention that. was (Author: arina): [~bbevens] I think this was meant for both but there is no need to mention that, it's just a general pre-requisites for Windows installation. {quote} Click New, and enter JAVA_HOME as the variable name. For the variable value, enter the path to your JDK installation. Instead of using Program Files in the path name, use progra~1. This is required because Drill cannot use file paths with spaces. {quote} Not all users have Java installed in Program Files directory, user can chose any directory during Java installation. I guess you just should mention that. > Document JAVA_HOME requirements for installing Drill in distributed mode > > > Key: DRILL-5679 > URL: https://issues.apache.org/jira/browse/DRILL-5679 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Bridget Bevens >Priority: Major > Labels: doc-complete > Fix For: Future > > > There is general requirement that JAVA_HOME variable should not contain > spaces. > For example, during Drill installation in distributed mode on Windows user > can see the following error: > {noformat} > C:\Drill/bin/runbit: line 107: exec: C:\Program: not found > {noformat} > There are two options to fix this problem: > {noformat} > 1. Install JAVA in directory without spaces. > 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 > (if in x86). > Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71" > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7161) Aggregation with group by clause
Gayathri created DRILL-7161: --- Summary: Aggregation with group by clause Key: DRILL-7161 URL: https://issues.apache.org/jira/browse/DRILL-7161 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Gayathri Facing some issues with the following case: Json file (*sample.json*) is having the following content: {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null} *Query:* SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a; *Error:* UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions supported for VarChar type *Observation:* If we query without using group by, then it is working fine without any error. If group by is used, then sum of null values is throwing the above error. Can anyone please let us know the solution for this or if there are any alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6985) Fix sqlline.bat issues on Windows and add drill-embedded.bat
[ https://issues.apache.org/jira/browse/DRILL-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813186#comment-16813186 ] Volodymyr Vysotskyi commented on DRILL-6985: Hi [~bbevens], Updated pages look good, thanks! > Fix sqlline.bat issues on Windows and add drill-embedded.bat > > > Key: DRILL-6985 > URL: https://issues.apache.org/jira/browse/DRILL-6985 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 > Environment: Windows 10 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-complete, ready-to-commit > Fix For: 1.16.0 > > > *For documentation* > {{drill-embedded.bat}} was added as handy script to Start Drill on Windows > without passing any params. > Please updated the following section: > https://drill.apache.org/docs/starting-drill-on-windows/ > Other issues covered in this Jira: > {{sqlline.bat}} fails for the next cases: > 1. Specified file in the argument: > {noformat} > apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f /tmp/q.sql > DRILL_ARGS - " -u jdbc:drill:zk=local" > HADOOP_HOME not detected... > HBASE_HOME not detected... > Calculating Drill classpath... > Error: Could not find or load main class sqlline.SqlLine > {noformat} > 2. Specified file path that contains spaces: > {noformat} > apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f "/tmp/q q.sql" > DRILL_ARGS - " -u jdbc:drill:zk=local" > HADOOP_HOME not detected... > HBASE_HOME not detected... > Calculating Drill classpath... > q.sql""=="test" was unexpected at this time. > {noformat} > 3. Specified query in the argument: > {noformat} > apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -e "select * > from sys.version" > DRILL_ARGS - " -u jdbc:drill:zk=local" > HADOOP_HOME not detected... > HBASE_HOME not detected... > Calculating Drill classpath... > * was unexpected at this time. > {noformat} > {noformat} > apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -q "select 'a' > from sys.version" > DRILL_ARGS - " -u jdbc:drill:zk=local" > HADOOP_HOME not detected... > HBASE_HOME not detected... > Calculating Drill classpath... > 'a' was unexpected at this time. > {noformat} > 4. Specified custom config location: > {noformat} > apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" > --config=/tmp/conf > DRILL_ARGS - " -u jdbc:drill:zk=local" > HADOOP_HOME not detected... > HBASE_HOME not detected... > Calculating Drill classpath... > Error: Could not find or load main class sqlline.SqlLine > {noformat} > 5. Specified custom config location with spaces in the path: > {noformat} > apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" > --config="/tmp/conf test" > DRILL_ARGS - " -u jdbc:drill:zk=local" > test"" was unexpected at this time. > {noformat} > 6. Sqlline was run from non-bin directory: > {noformat} > apache-drill-1.15.0>bin\sqlline.bat -u "jdbc:drill:zk=local" > DRILL_ARGS - " -u jdbc:drill:zk=local" > HADOOP_HOME not detected... > HBASE_HOME not detected... > Calculating Drill classpath... > Error: Could not find or load main class sqlline.SqlLine > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab
[ https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813174#comment-16813174 ] ASF GitHub Bot commented on DRILL-7160: --- vvysotskyi commented on issue #1742: DRILL-7160: e.q.max_rows QUERY-level option shown even if not set URL: https://github.com/apache/drill/pull/1742#issuecomment-481175705 @kkhatua, `exec.query.max_rows` query level option should be present in query profile only for the case when auto-limit is applied or can be applied, but currently, it is present even for the case when the non-select query is submitted and `exec.query.max_rows` is set to non-zero value. Please fix this case and verify that other corner cases are handled correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > exec.query.max_rows QUERY-level options are shown on Profiles tab > - > > Key: DRILL-7160 > URL: https://issues.apache.org/jira/browse/DRILL-7160 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Kunal Khatua >Priority: Blocker > Fix For: 1.16.0 > > > As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's > Profiles even when it was not set explicitly. The issue is because the option > is being set on the query level internally. > From the code, looks like it is set in > {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the > value differs from the existing one should be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813124#comment-16813124 ] ASF GitHub Bot commented on DRILL-7064: --- vvysotskyi commented on pull request #1736: DRILL-7064: Leverage the summary metadata for plain COUNT aggregates. URL: https://github.com/apache/drill/pull/1736#discussion_r273380948 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java ## @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.logical; + +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.calcite.rel.core.Aggregate; +import org.apache.calcite.rel.core.AggregateCall; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.core.TableScan; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rex.RexInputRef; +import org.apache.commons.lang3.tuple.ImmutablePair; +import org.apache.commons.lang3.tuple.Pair; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.logical.FormatPluginConfig; + +import org.apache.drill.exec.physical.base.ScanStats; +import org.apache.drill.exec.planner.common.CountToDirectScanUtils; +import org.apache.drill.exec.planner.common.DrillRelOptUtil; + +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.FileSystemPlugin; +import org.apache.drill.exec.store.dfs.FormatSelection; +import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig; +import org.apache.drill.exec.store.direct.MetadataDirectGroupScan; +import org.apache.drill.exec.store.parquet.ParquetFormatConfig; +import org.apache.drill.exec.store.parquet.ParquetReaderConfig; +import org.apache.drill.exec.store.parquet.metadata.Metadata; +import org.apache.drill.exec.store.parquet.metadata.Metadata_V4; +import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap; +import org.apache.hadoop.fs.Path; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.LinkedHashMap; +import java.util.Set; + +/** + * This rule is a logical planning counterpart to a corresponding ConvertCountToDirectScanPrule + * physical rule + * + * + * This rule will convert " select count(*) as mycount from table " + * or " select count(not-nullable-expr) as mycount from table " into + * + *Project(mycount) + * \ + *DirectGroupScan ( PojoRecordReader ( rowCount )) + * + * or " select count(column) as mycount from table " into + * + * Project(mycount) + * \ + *DirectGroupScan (PojoRecordReader (columnValueCount)) + * + * Rule can be applied if query contains multiple count expressions. + * " select count(column1), count(column2), count(*) from table " + * + * + * + * The rule utilizes the Parquet Metadata Cache's summary information to retrieve the total row count + * and the per-column null count. As such, the rule is only applicable for Parquet tables and only if the + * metadata cache has been created with the summary information. + * + */ +public class ConvertCountToDirectScanRule extends RelOptRule { + + public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new ConvertCountToDirectScanRule( + RelOptHelper.some(Aggregate.class, +RelOptHelper.some(Project.class, +RelOptHelper.any(TableScan.class))), "Agg_on_proj_on_scan:logical"); + + public static final RelOptRule AGG_ON_SCAN = new ConvertCountToDirectScanRule( + RelOptHelper.some(Aggregate.class, +RelOptHelper.any(TableScan.class)), "Agg_on_scan:logical