[jira] [Assigned] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out
[ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-5902: Assignee: Vlad Rozov (was: Arina Ielchiieva) > Regression: Queries encounter random failure due to RPC connection timed out > > > Key: DRILL-5902 > URL: https://issues.apache.org/jira/browse/DRILL-5902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Vlad Rozov >Priority: Critical > Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, > node196.drillbit.log > > > Multiple random failures (25) occurred with the latest > Functional-Baseline-88.193 run. Here is a sample query: > {noformat} > /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql > -- Kitchen sink > -- Use all supported functions > select > rank() over W, > dense_rank()over W, > percent_rank() over W, > cume_dist() over W, > avg(c_integer + c_integer) over W, > sum(c_integer/100) over W, > count(*)over W, > min(c_integer) over W, > max(c_integer) over W, > row_number()over W > from > j7 > where > c_boolean is not null > window W as (partition by c_bigint, c_date, c_time, c_boolean order by > c_integer) > {noformat} > From the logs: > {noformat} > 2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > 2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler > - Dropping request for early fragment termination for path > 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> > 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable. > {noformat} > {noformat} > 2017-10-23 04:14:53,941 [UserServer-1] INFO > o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> > /10.10.88.193:38281 (user server) timed out. Timeout was set to 30 seconds. > Closing connection. > 2017-10-23 04:14:53,952 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> > FAILED > 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> > FINISHED > 2017-10-23 04:14:53,956 [UserServer-1] WARN > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc > response. > java.lang.IllegalArgumentException: Self-suppression not permitted > at java.lang.Throwable.addSuppressed(Throwable.java:1043) > ~[na:1.7.0_45] > at > org.apache.drill.common.DeferredException.addException(DeferredException.java:88) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.work.fr
[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg
[ https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351174#comment-16351174 ] ASF GitHub Bot commented on DRILL-6032: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1101 An additional comment is that after the defaults were reduced the number of random failures we see in our functional tests decreased to just 3. Typically there are more than that. > Use RecordBatchSizer to estimate size of columns in HashAgg > --- > > Key: DRILL-6032 > URL: https://issues.apache.org/jira/browse/DRILL-6032 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.13.0 > > > We need to use the RecordBatchSize to estimate the size of columns in the > Partition batches created by HashAgg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg
[ https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351169#comment-16351169 ] ASF GitHub Bot commented on DRILL-6032: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1101 @Ben-Zvi I have responded to comments and implemented requested changes. Please see latest commits for changes. I have made some additional changes after noticing that some of the batch sizing calculations were incorrect, I have also made various documentation and naming changes to improve the readability of the code and to document what needs to be improved in the future.: - The size of varchar columns were not properly accounted for in the outgoing batch in the case where varchars are aggregated. I have added logic to the updateEst... method to account for this. - I have added docs to the updateEst method explaining what the various batch sizing estimates do. - I have changed the variable names for the batch size estimates to more accurately reflect what they do. - Previously if a batch size estimate was smaller than the actual amount of memory allocated, the estimate was updated to be the larger size. I think this was actually the wrong behavior since it causes the HashAgg operator's memory estimates to be extremely aggressive and in the worst case can cause the operator to run out of memory prematurely. Ideally a good estimate will over estimate 50% of the time and under estimate 50% of the time. - I have changed the HashAgg defaults. Although tests passed with the previous defaults, I felt they were too aggressive. I have changed the default number of partitions to 16 and the minimum number of batches to 1. Let me know if you have anymore comments. Thanks. > Use RecordBatchSizer to estimate size of columns in HashAgg > --- > > Key: DRILL-6032 > URL: https://issues.apache.org/jira/browse/DRILL-6032 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.13.0 > > > We need to use the RecordBatchSize to estimate the size of columns in the > Partition batches created by HashAgg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6133) RecordBatchSizer throws IndexOutOfBounds Exception for union vector
Padma Penumarthy created DRILL-6133: --- Summary: RecordBatchSizer throws IndexOutOfBounds Exception for union vector Key: DRILL-6133 URL: https://issues.apache.org/jira/browse/DRILL-6133 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.12.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Fix For: 1.13.0 RecordBatchSizer throws IndexOutOfBoundsException when trying to get payload byte count of union vector. [Error Id: 430026a7-a963-40f1-bae2-1850649e8434 on 172.30.8.158:31013] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[classes/:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300) [classes/:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [classes/:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266) [classes/:na] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [classes/:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] Caused by: java.lang.IndexOutOfBoundsException: DrillBuf[2], udle: [1 0..0], index: 4, length: 4 (expected: range(0, 0)) DrillBuf[2], udle: [1 0..0] at org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) ~[classes/:na] at org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) ~[classes/:na] at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) ~[classes/:4.0.48.Final] at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) ~[classes/:4.0.48.Final] at org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) ~[classes/:na] at org.apache.drill.exec.vector.VarCharVector.getPayloadByteCount(VarCharVector.java:308) ~[classes/:na] at org.apache.drill.exec.vector.NullableVarCharVector.getPayloadByteCount(NullableVarCharVector.java:256) ~[classes/:na] at org.apache.drill.exec.vector.complex.AbstractMapVector.getPayloadByteCount(AbstractMapVector.java:303) ~[classes/:na] at org.apache.drill.exec.vector.complex.UnionVector.getPayloadByteCount(UnionVector.java:574) ~[classes/:na] at org.apache.drill.exec.physical.impl.spill.RecordBatchSizer$ColumnSize.(RecordBatchSizer.java:147) ~[classes/:na] at org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.measureColumn(RecordBatchSizer.java:403) ~[classes/:na] at org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.(RecordBatchSizer.java:350) ~[classes/:na] at org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.(RecordBatchSizer.java:320) ~[classes/:na] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5360) Timestamp type documented as UTC, implemented as local time
[ https://issues.apache.org/jira/browse/DRILL-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351144#comment-16351144 ] Chun Chang commented on DRILL-5360: --- Timestamp inside drill should all be UTC, stored, read, moved around, etc. all in UTC. It can't be anything else. Otherwise, there will be no consistency. This is a bug that need to be fixed. > Timestamp type documented as UTC, implemented as local time > --- > > Key: DRILL-5360 > URL: https://issues.apache.org/jira/browse/DRILL-5360 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Priority: Major > Fix For: 2.0.0 > > > The Drill documentation implies that the {{Timestamp}} type is in UTC: > bq. JDBC timestamp in year, month, date hour, minute, second, and optional > milliseconds format: -MM-dd HH:mm:ss.SSS. ... TIMESTAMP literals: Drill > stores values in Coordinated Universal Time (UTC). Drill supports time > functions in the range 1971 to 2037. ... Drill does not support TIMESTAMP > with time zone. > The above is ambiguous. The first part talks about JDBC timestamps. From the > JDK Javadoc: > bq. Timestamp: A thin wrapper around java.util.Date. ... Date class is > intended to reflect coordinated universal time (UTC)... > So, a JDBC timestamp is intended to represent time in UTC. (The "indented to > reflect" statement leaves open the possibility of misusing {{Date}} to > represent times in other time zones. This was common practice in early Java > development and was the reason for the eventual development of the Joda, then > Java 8 date/time classes.) > The Drill documentation implies that timestamp *literals* are in UTC, but a > careful read of the documentation does allow an interpretation that the > internal representation can be other than UTC. If this is true, then we would > also rely on a liberal reading of the Java `Timestamp` class to also not be > UTC. (Or, we rely on the Drill JDBC driver to convert from the (unknown) > server time zone to a UTC value returned by the Drill JDBC client.) > Still, a superficial reading (and common practice) would suggest that a Drill > Timestamp should be in UTC. > However, a test on a Mac, with an embedded Drillbit (run in the Pacific time > zone, with Daylight Savings Time in effect) shows that the Timestamp binary > value is actual local time: > {code} > long before = System.currentTimeMillis(); > long value = getDateValue(client, "SELECT NOW() FROM (VALUES(1))" ); > double hrsDiff = (value - before) / (1000.00 * 60 * 60); > System.out.println("Hours: " + hrsDiff); > {code} > The above gets the actual UTC time from Java. Then, it runs a query that gets > Drill's idea of the current time using the {{NOW()}} function. (The > {{getDateValue}} function uses the new test framework to access the actual > {{long}} value from the returned value vector.) Finally, we compute the > difference between the two times, converted to hours. Output: > {code} > Hours: -6.975 > {code} > As it turns out, this is the difference between UTC and PDT. So, the time is > in local time, not UTC. > Since the documentation and implementation are both ambiguous, it is hard to > know the intent of the Drill Timestamp. Clearly, common practice is to use > UTC. But, there is wiggle-room. > If the Timestamp value is supposed to be local time, then Drill should > provide a function to return the server's time zone offset (in ms) from UTC > so that the client can to the needed local-to-UTC conversion to get a true > timestamp. > On the other hand, if the Timestamp is supposed to be UTC (per common > practice), then {{NOW()}} should not report local time, it should return UTC. > Further, if {{NOW()}} returns local time, but Timestamp literals are UTC, > then it is hard to see how any query can be rationally written if one > timestamp value is local, but a literal is UTC. > So, job #1 is to define the Timestamp semantics. Then, use that to figure out > where the bug lies to make implementation consistent with documentation (or > visa-versa.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
[ https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-6099: -- Labels: (was: ready-to-commit) > Drill does not push limit past project (flatten) if it cannot be pushed into > scan > - > > Key: DRILL-6099 > URL: https://issues.apache.org/jira/browse/DRILL-6099 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Fix For: 1.13.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > It would be useful to have pushdown occur past flatten(project). Here is an > example to illustrate the issue: > {{explain plan without implementation for }}{{select name, > flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} > {{DrillScreenRel}}{{ }} > {{ DrillLimitRel(fetch=[1])}}{{ }} > {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} > {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, > `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} > = > Content of 0_0_0.json > = > { > "name" : "Eric Goldberg, MD", > "categories" : [ "Doctors", "Health & Medical" ] > } { > "name" : "Pine Cone Restaurant", > "categories" : [ "Restaurants" ] > } { > "name" : "Deforest Family Restaurant", > "categories" : [ "American (Traditional)", "Restaurants" ] > } { > "name" : "Culver's", > "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", > "Restaurants" ] > } { > "name" : "Chang Jiang Chinese Kitchen", > "categories" : [ "Chinese", "Restaurants" ] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
[ https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351094#comment-16351094 ] Aman Sinha commented on DRILL-6099: --- [~gparai] I had couple of questions..sorry didn't get around to sending it earlier. > Drill does not push limit past project (flatten) if it cannot be pushed into > scan > - > > Key: DRILL-6099 > URL: https://issues.apache.org/jira/browse/DRILL-6099 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > It would be useful to have pushdown occur past flatten(project). Here is an > example to illustrate the issue: > {{explain plan without implementation for }}{{select name, > flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} > {{DrillScreenRel}}{{ }} > {{ DrillLimitRel(fetch=[1])}}{{ }} > {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} > {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, > `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} > = > Content of 0_0_0.json > = > { > "name" : "Eric Goldberg, MD", > "categories" : [ "Doctors", "Health & Medical" ] > } { > "name" : "Pine Cone Restaurant", > "categories" : [ "Restaurants" ] > } { > "name" : "Deforest Family Restaurant", > "categories" : [ "American (Traditional)", "Restaurants" ] > } { > "name" : "Culver's", > "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", > "Restaurants" ] > } { > "name" : "Chang Jiang Chinese Kitchen", > "categories" : [ "Chinese", "Restaurants" ] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
[ https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351090#comment-16351090 ] ASF GitHub Bot commented on DRILL-6099: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r165788152 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java --- @@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) { } }; - public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = - new DrillPushLimitToScanRule( - RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some( - DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))), - "DrillPushLimitToScanRule_LimitOnProject") { + public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new DrillPushLimitToScanRule( --- End diff -- I am not sure why this rule is being overloaded for doing limit push past project. This particular rule is about doing limit pushdown into scan for cases where we have LIMIT-SCAN or LIMIT-PROJECT-SCAN. I think we should keep this rule as-is but create a separate rule that does a limit push past project. Was there a strong reason to do it this way ? Could there be a side effect of removing the check for the Scan ? > Drill does not push limit past project (flatten) if it cannot be pushed into > scan > - > > Key: DRILL-6099 > URL: https://issues.apache.org/jira/browse/DRILL-6099 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > It would be useful to have pushdown occur past flatten(project). Here is an > example to illustrate the issue: > {{explain plan without implementation for }}{{select name, > flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} > {{DrillScreenRel}}{{ }} > {{ DrillLimitRel(fetch=[1])}}{{ }} > {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} > {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, > `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} > = > Content of 0_0_0.json > = > { > "name" : "Eric Goldberg, MD", > "categories" : [ "Doctors", "Health & Medical" ] > } { > "name" : "Pine Cone Restaurant", > "categories" : [ "Restaurants" ] > } { > "name" : "Deforest Family Restaurant", > "categories" : [ "American (Traditional)", "Restaurants" ] > } { > "name" : "Culver's", > "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", > "Restaurants" ] > } { > "name" : "Chang Jiang Chinese Kitchen", > "categories" : [ "Chinese", "Restaurants" ] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
[ https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351091#comment-16351091 ] ASF GitHub Bot commented on DRILL-6099: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r165791415 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java --- @@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) { } }; - public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = - new DrillPushLimitToScanRule( - RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some( - DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))), - "DrillPushLimitToScanRule_LimitOnProject") { + public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new DrillPushLimitToScanRule( + RelOptHelper.some(DrillLimitRel.class, RelOptHelper.any(DrillProjectRel.class)), "DrillPushLimitToScanRule_LimitOnProject") { @Override public boolean matches(RelOptRuleCall call) { DrillLimitRel limitRel = call.rel(0); - DrillScanRel scanRel = call.rel(2); - // For now only applies to Parquet. And pushdown only apply limit but not offset, + DrillProjectRel projectRel = call.rel(1); + // pushdown only apply limit but not offset, // so if getFetch() return null no need to run this rule. - if (scanRel.getGroupScan().supportsLimitPushdown() && (limitRel.getFetch() != null)) { --- End diff -- I can understand that this check was removed because this matches() method no longer is checking for DrillScanRel, but does that mean that no one is checking the GroupScan for supportsLimitPushdown() ? > Drill does not push limit past project (flatten) if it cannot be pushed into > scan > - > > Key: DRILL-6099 > URL: https://issues.apache.org/jira/browse/DRILL-6099 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > It would be useful to have pushdown occur past flatten(project). Here is an > example to illustrate the issue: > {{explain plan without implementation for }}{{select name, > flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} > {{DrillScreenRel}}{{ }} > {{ DrillLimitRel(fetch=[1])}}{{ }} > {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} > {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, > `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} > = > Content of 0_0_0.json > = > { > "name" : "Eric Goldberg, MD", > "categories" : [ "Doctors", "Health & Medical" ] > } { > "name" : "Pine Cone Restaurant", > "categories" : [ "Restaurants" ] > } { > "name" : "Deforest Family Restaurant", > "categories" : [ "American (Traditional)", "Restaurants" ] > } { > "name" : "Culver's", > "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", > "Restaurants" ] > } { > "name" : "Chang Jiang Chinese Kitchen", > "categories" : [ "Chinese", "Restaurants" ] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-5885) Drill consumes 2x memory when sorting and reading a spilled batch from disk.
[ https://issues.apache.org/jira/browse/DRILL-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5885: Description: The query is: {code:sql} select count(*) from ( select * from dfs.` / drill / testdata / resource - manager / 3500cols.tbl` order by columns[450], columns[330], columns[230], columns[220], columns[110], columns[90], columns[80], columns[70], columns[40], columns[10], columns[20], columns[30], columns[40], columns[50], columns[454], columns[413], columns[940], columns[834], columns[73], columns[140], columns[104], columns[], columns[30], columns[2420], columns[1520], columns[1410], columns[1110], columns[1290], columns[2380], columns[705], columns[45], columns[1054], columns[2430], columns[420], columns[404], columns[3350], columns[], columns[153], columns[356], columns[84], columns[745], columns[1450], columns[103], columns[2065], columns[343], columns[3420], columns[530], columns[3210] ) d where d.col433 = 'sjka skjf'; {code} was: The query is: {noformat} select count(*) from (select * from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50], columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520], columns[1410], columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350], columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530], columns[3210] ) d where d.col433 = 'sjka skjf'; {noformat} > Drill consumes 2x memory when sorting and reading a spilled batch from disk. > > > Key: DRILL-5885 > URL: https://issues.apache.org/jira/browse/DRILL-5885 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Robert Hou >Priority: Major > > The query is: > {code:sql} > select count(*) > from ( > select * > from dfs.` / drill / testdata / resource - manager / 3500cols.tbl` > order by > columns[450], columns[330], columns[230], columns[220], columns[110], > columns[90], > columns[80], columns[70], columns[40], columns[10], columns[20], > columns[30], > columns[40], columns[50], columns[454], columns[413], columns[940], > columns[834], > columns[73], columns[140], columns[104], columns[], columns[30], > columns[2420], > columns[1520], columns[1410], columns[1110], columns[1290], > columns[2380], > columns[705], columns[45], columns[1054], columns[2430], columns[420], > columns[404], > columns[3350], columns[], columns[153], columns[356], columns[84], > columns[745], > columns[1450], columns[103], columns[2065], columns[343], > columns[3420], columns[530], columns[3210] >) d > where d.col433 = 'sjka skjf'; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6132) HashPartitionSender leaks memory
Chun Chang created DRILL-6132: - Summary: HashPartitionSender leaks memory Key: DRILL-6132 URL: https://issues.apache.org/jira/browse/DRILL-6132 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.12.0 Reporter: Chun Chang Assignee: Timothy Farkas Enable assertion (-ea), I noticed HashPartitionSender leaks memory if aggregation query fails due to OOM. {noformat} message: "SYSTEM ERROR: IllegalStateException: Allocator[op:2:1:0:HashPartitionSender] closed with outstanding buffers allocated (1).\nAllocator(op:2:1:0:HashPartitionSender) 100/8192/3300352/100 (res/actual/peak/limit)\n child allocators: 0\n ledgers: 1\n ledger[703835] allocator: op:2:1:0:HashPartitionSender), isOwning: true, size: 8192, references: 1, life: 6329151004063490..0, allocatorManager: [688058, life: 6329151004058252..0] holds 1 buffers. \n DrillBuf[777552], udle: [688059 0..8192]: , \n reservations: 0\n\n\nFragment 2:1\n\n[Error Id: c7cc9d37-8881-4db1-8123-2651628c4081 on 10.10.30.168:31010]\n\n (java.lang.IllegalStateException) Allocator[op:2:1:0:HashPartitionSender] closed with outstanding buffers allocated (1).\nAllocator(op:2:1:0:HashPartitionSender) 100/8192/3300352/100 (res/actual/peak/limit)\n child allocators: 0\n ledgers: 1\n ledger[703835] allocator: op:2:1:0:HashPartitionSender), isOwning: true, size: 8192, references: 1, life: 6329151004063490..0, allocatorManager: [688058, life: 6329151004058252..0] holds 1 buffers. \n DrillBuf[777552], udle: [688059 0..8192]: , \n reservations: 0\n\n org.apache.drill.exec.memory.BaseAllocator.close():504\n org.apache.drill.exec.ops.BaseOperatorContext.close():157\n org.apache.drill.exec.ops.OperatorContextImpl.close():79\n org.apache.drill.exec.ops.FragmentContext.suppressingClose():429\n org.apache.drill.exec.ops.FragmentContext.close():418\n org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():324\n org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155\n org.apache.drill.exec.work.fragment.FragmentExecutor.run():267\n org.apache.drill.common.SelfCleaningRunnable.run():38\n java.util.concurrent.ThreadPoolExecutor.runWorker():1149\n java.util.concurrent.ThreadPoolExecutor$Worker.run():624\n java.lang.Thread.run():748\n" exception { exception_class: "java.lang.IllegalStateException" message: "Allocator[op:2:1:0:HashPartitionSender] closed with outstanding buffers allocated (1).\nAllocator(op:2:1:0:HashPartitionSender) 100/8192/3300352/100 (res/actual/peak/limit)\n child allocators: 0\n ledgers: 1\n ledger[703835] allocator: op:2:1:0:HashPartitionSender), isOwning: true, size: 8192, references: 1, life: 6329151004063490..0, allocatorManager: [688058, life: 6329151004058252..0] holds 1 buffers. \n DrillBuf[777552], udle: [688059 0..8192]: , \n reservations: 0\n" stack_trace { class_name: "org.apache.drill.exec.memory.BaseAllocator" file_name: "BaseAllocator.java" line_number: 504 method_name: "close" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.ops.BaseOperatorContext" file_name: "BaseOperatorContext.java" line_number: 157 method_name: "close" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.ops.OperatorContextImpl" file_name: "OperatorContextImpl.java" line_number: 79 method_name: "close" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.ops.FragmentContext" file_name: "FragmentContext.java" line_number: 429 method_name: "suppressingClose" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.ops.FragmentContext" file_name: "FragmentContext.java" line_number: 418 method_name: "close" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor" file_name: "FragmentExecutor.java" line_number: 324 method_name: "closeOutResources" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor" file_name: "FragmentExecutor.java" line_number: 155 method_name: "cleanup" is_native_method: false } stack_trace { class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor" file_name: "FragmentExecutor.java" line_number: 267 method_name: "run" is_native_method: false } stack_trace { class_name: "org.apache.drill.common.SelfCleaningRunnable" file_name: "SelfCleaningRunnable.java" line_number: 38 method_name: "run" is_native_method: false } stack_trace { class_name: "..." line_number: 0 method_name: "..." is_native_method: false } } }{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg
[ https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350669#comment-16350669 ] ASF GitHub Bot commented on DRILL-6032: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r165704469 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -956,21 +925,8 @@ private void spillAPartition(int part) { this.htables[part].outputKeys(currOutBatchIndex, this.outContainer, outStartIdxHolder.value, outNumRecordsHolder.value, numPendingOutput); // set the value count for outgoing batch value vectors - /* int i = 0; */ for (VectorWrapper v : outgoing) { v.getValueVector().getMutator().setValueCount(numOutputRecords); -/* --- End diff -- Thanks! > Use RecordBatchSizer to estimate size of columns in HashAgg > --- > > Key: DRILL-6032 > URL: https://issues.apache.org/jira/browse/DRILL-6032 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.13.0 > > > We need to use the RecordBatchSize to estimate the size of columns in the > Partition batches created by HashAgg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
[ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350643#comment-16350643 ] ASF GitHub Bot commented on DRILL-5846: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1060#discussion_r165698800 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java --- @@ -703,7 +703,18 @@ protected void _setLong(int index, long value) { @Override public ByteBuf getBytes(int index, ByteBuf dst, int dstIndex, int length) { -udle.getBytes(index + offset, dst, dstIndex, length); +final int BULK_COPY_THR = 1024; --- End diff -- Vlad, - I had a chat with @bitblender and he explains that Java was invoking a stub (not a function call) to perform copyMemory; he agreed copyMemory will be slower for small buffers and the task was to determine the cutoff point - My tests (I will send you my test) indicate that a length of 1024bytes is the length were copyMemory starts performing exactly as getByte() NOTE - I am using JRE 1.8; static buffers initialized once; payload 1MB (1048576bytes) and loop-count of 102400; MacOS High Sierra; 1 thread, 4GB MX, MS > Improve Parquet Reader Performance for Flat Data types > --- > > Key: DRILL-5846 > URL: https://issues.apache.org/jira/browse/DRILL-5846 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.11.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: performance > Fix For: 1.13.0 > > > The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to > further improve the Parquet Reader performance as several users reported that > Parquet parsing represents the lion share of the overall query execution. It > tracks Flat Data types only as Nested DTs might involve functional and > processing enhancements (e.g., a nested column can be seen as a Document; > user might want to perform operations scoped at the document level that is no > need to span all rows). Another JIRA will be created to handle the nested > columns use-case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6130) Fix NPE during physical plan submission for various storage plugins
[ https://issues.apache.org/jira/browse/DRILL-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350532#comment-16350532 ] ASF GitHub Bot commented on DRILL-6130: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1108 @vdiravka please review. > Fix NPE during physical plan submission for various storage plugins > --- > > Key: DRILL-6130 > URL: https://issues.apache.org/jira/browse/DRILL-6130 > Project: Apache Drill > Issue Type: Task > Components: Storage - Other >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > List of storage plugins to review: > kafka (DRILL-6111) > jdbc (no issues) > hive (DRILL-6127) > mongo (no issues) > opentsdb (no issues) > kudu > hbase > Issues occurred due ser / de bugs in storage plugins scan and sub-scan > classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6130) Fix NPE during physical plan submission for various storage plugins
[ https://issues.apache.org/jira/browse/DRILL-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350530#comment-16350530 ] ASF GitHub Bot commented on DRILL-6130: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/1108 DRILL-6130: Fix NPE during physical plan submission for various storage plugins 1. Fixed ser / de issues for Hive, Kafka, Hbase plugins. 2. Added physical plan submission unit test for all storage plugins in contrib module. 3. Refactoring. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-6130 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1108.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1108 commit aabd11cc327100d248b22b72d9696d8911545d75 Author: Arina Ielchiieva Date: 2018-02-01T17:44:43Z DRILL-6130: Fix NPE during physical plan submission for various storage plugins 1. Fixed ser / de issues for Hive, Kafka, Hbase plugins. 2. Added physical plan submission unit test for all storage plugins in contrib module. 3. Refactoring. > Fix NPE during physical plan submission for various storage plugins > --- > > Key: DRILL-6130 > URL: https://issues.apache.org/jira/browse/DRILL-6130 > Project: Apache Drill > Issue Type: Task > Components: Storage - Other >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > List of storage plugins to review: > kafka (DRILL-6111) > jdbc (no issues) > hive (DRILL-6127) > mongo (no issues) > opentsdb (no issues) > kudu > hbase > Issues occurred due ser / de bugs in storage plugins scan and sub-scan > classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6130) Fix NPE during physical plan submission for various storage plugins
[ https://issues.apache.org/jira/browse/DRILL-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6130: Reviewer: Vitalii Diravka > Fix NPE during physical plan submission for various storage plugins > --- > > Key: DRILL-6130 > URL: https://issues.apache.org/jira/browse/DRILL-6130 > Project: Apache Drill > Issue Type: Task > Components: Storage - Other >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > List of storage plugins to review: > kafka (DRILL-6111) > jdbc (no issues) > hive (DRILL-6127) > mongo (no issues) > opentsdb (no issues) > kudu > hbase > Issues occurred due ser / de bugs in storage plugins scan and sub-scan > classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6130) Fix NPE during physical plan submission for various storage plugins
[ https://issues.apache.org/jira/browse/DRILL-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6130: Component/s: Storage - Other > Fix NPE during physical plan submission for various storage plugins > --- > > Key: DRILL-6130 > URL: https://issues.apache.org/jira/browse/DRILL-6130 > Project: Apache Drill > Issue Type: Task > Components: Storage - Other >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > List of storage plugins to review: > kafka (DRILL-6111) > jdbc (no issues) > hive (DRILL-6127) > mongo (no issues) > opentsdb (no issues) > kudu > hbase > Issues occurred due ser / de bugs in storage plugins scan and sub-scan > classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6130) Fix NPE during physical plan submission for various storage plugins
[ https://issues.apache.org/jira/browse/DRILL-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6130: Description: List of storage plugins to review: kafka (DRILL-6111) jdbc (no issues) hive (DRILL-6127) mongo (no issues) opentsdb (no issues) kudu hbase Issues occurred due ser / de bugs in storage plugins scan and sub-scan classes. was: List of storage plugins to review: kafka (DRILL-6111) jdbc (no issues) hive (DRILL-6127) mongo opentsdb (no issues) kudu hbase > Fix NPE during physical plan submission for various storage plugins > --- > > Key: DRILL-6130 > URL: https://issues.apache.org/jira/browse/DRILL-6130 > Project: Apache Drill > Issue Type: Task > Components: Storage - Other >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > List of storage plugins to review: > kafka (DRILL-6111) > jdbc (no issues) > hive (DRILL-6127) > mongo (no issues) > opentsdb (no issues) > kudu > hbase > Issues occurred due ser / de bugs in storage plugins scan and sub-scan > classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6130) Fix NPE during physical plan submission for various storage plugins
[ https://issues.apache.org/jira/browse/DRILL-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6130: Summary: Fix NPE during physical plan submission for various storage plugins (was: Fix NPE during physical plan submission for storage plugins) > Fix NPE during physical plan submission for various storage plugins > --- > > Key: DRILL-6130 > URL: https://issues.apache.org/jira/browse/DRILL-6130 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > List of storage plugins to review: > kafka (DRILL-6111) > jdbc (no issues) > hive (DRILL-6127) > mongo > opentsdb (no issues) > kudu > hbase -- This message was sent by Atlassian JIRA (v7.6.3#76005)