[jira] [Updated] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int
[ https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-6359: -- Affects Version/s: 1.14.0 > All-text mode in JSON still reads missing column as Nullable Int > > > Key: DRILL-6359 > URL: https://issues.apache.org/jira/browse/DRILL-6359 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0, 1.14.0 >Reporter: Paul Rogers >Priority: Major > > Suppose we have the following file: > {noformat} > {a: 0} > {a: 1} > ... > {a: 70001, b: 10.5} > {noformat} > Where the "..." indicates another 70K records. (Chosen to force the > appearance of {{b}} into a second or later batch.) > Suppose we execute the following query: > {code} > ALTER SESSION SET `store.json.all_text_mode` = true; > SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a; > {code} > The query should work. We have an explicit project for column {{b}} and we've > told JSON to always use text. So, JSON should have enough information to > create column {{b}} as {{Nullable VarChar}}. > Yet, the result of the query in {{sqlline}} is: > {noformat} > Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External > Sort. Please enable Union type. > Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` > (INT:OPTIONAL)]], selectionVector=NONE] > Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` > (VARCHAR:OPTIONAL)]], selectionVector=NONE] > {noformat} > The expected result is that the query works because even missing columns > should be subject to the "all text mode" setting because the JSON reader > handles projection push-down, and is responsible for filling in the missing > columns. > This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in > the "batch size handling" JSON reader rewrite, but I've not tested it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int
[ https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453573#comment-16453573 ] Khurram Faraaz edited comment on DRILL-6359 at 4/26/18 6:40 AM: [~paul-rogers] On Apache Drill 1.14.0-SNAPSHOT, commit id: 931b43ec54bf47dcbb4aa9ae4499f37a8f21b408 we see the same error message {noformat} [root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin [root@qa102-45 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp;drillbit=" apache drill 1.14.0-SNAPSHOT "the only truly happy people are children, the creative minority and drill users" 0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT NULL ORDER BY a; Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] Fragment 0:0 [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> {noformat} Data looks like this, and there are over 70K rows in the JSON data file, generated.json {noformat} { "a": "5ae16f29fb675a7ed96bb532" } { "a": "5ae16f29fb675a7ed96bb532" } ... { "a": "5ae16f2906e007f3dcbaa714" } { "a": "5ae16f2906e007f3dcbaa714" } { "a": "5ae16f29972377bc859056c8","b":"10.5"} {noformat} Stack trace from drillbit.log for the above failure {noformat} 2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, returning null instance. 2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not supported in External Sort. Please enable Union type. org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.jav
[jira] [Commented] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int
[ https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453573#comment-16453573 ] Khurram Faraaz commented on DRILL-6359: --- [~paul-rogers] On Apache Drill 1.14.0-SNAPSHOT, commit id: 931b43ec54bf47dcbb4aa9ae4499f37a8f21b408 we see the same error message {noformat} [root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin [root@qa102-45 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp;drillbit=" apache drill 1.14.0-SNAPSHOT "the only truly happy people are children, the creative minority and drill users" 0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT NULL ORDER BY a; Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] Fragment 0:0 [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> {noformat} Data looks like this, and there are over 70K rows in the JSON data file, generated.json {noformat} { "a": "5ae16f29fb675a7ed96bb532" } { "a": "5ae16f29fb675a7ed96bb532" } ... \{ "a": "5ae16f2906e007f3dcbaa714" } \{ "a": "5ae16f2906e007f3dcbaa714" } \{ "a": "5ae16f29972377bc859056c8","b":"10.5"} {noformat} Stack trace from drillbit.log for the above failure {noformat} 2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, returning null instance. 2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not supported in External Sort. Please enable Union type. org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) [drill-java-exec-1.14.0-SNAPSHOT
[jira] [Commented] (DRILL-4824) Null maps / lists and non-provided state support for JSON fields. Numeric types promotion.
[ https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453560#comment-16453560 ] Paul Rogers commented on DRILL-4824: This ticket has a long history of complexity. Just discovered another one. It appears that the current null handling has been optimized to make results appear nicely in {{sqlline}}. Consider this simple file: {noformat} {a: {b: 10}} {a: {c: "foo"}} {noformat} According to our existing rules, the missing columns are stored as null values. Then, JSON omits nulls from its output. Why? So, it seems, {{sqlline}} can display the following: {noformat} +--+ | a | +--+ | {"b":10} | | {"c":"foo"} | +--+ {noformat} In JDBC, the {{getObject()}} method on the Map vector creates a JSON object. That code probably omits null values. Why? So that with {{sqlline}} calls {{toString()}} on the JSON object, it gets the nice display above. Probably this is not how {{sqlline}} should format its output: our JSON internals should not be dictated by how we do {{toString()}} in {{sqlline}}. But, there you have it anyway. > Null maps / lists and non-provided state support for JSON fields. Numeric > types promotion. > -- > > Key: DRILL-4824 > URL: https://issues.apache.org/jira/browse/DRILL-4824 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.0.0 >Reporter: Roman Kulyk >Assignee: Volodymyr Vysotskyi >Priority: Major > > There is incorrect output in case of JSON file with complex nested data. > _JSON:_ > {code:none|title=example.json|borderStyle=solid} > { > "Field1" : { > } > } > { > "Field1" : { > "InnerField1": {"key1":"value1"}, > "InnerField2": {"key2":"value2"} > } > } > { > "Field1" : { > "InnerField3" : ["value3", "value4"], > "InnerField4" : ["value5", "value6"] > } > } > {code} > _Query:_ > {code:sql} > select Field1 from dfs.`/tmp/example.json` > {code} > _Incorrect result:_ > {code:none} > +---+ > | Field1 | > +---+ > {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{"key1":"value1"},"InnerField2" > {"key2":"value2"},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--+ > {code} > Theres is no need to output missing fields. In case of deeply nested > structure we will get unreadable result for user. > _Correct result:_ > {code:none} > +--+ > | Field1 | > +--+ > |{} > {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}} > {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy
[ https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453544#comment-16453544 ] ASF GitHub Bot commented on DRILL-6242: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1184 One additional note. We noted that JDBC does not support the idea of a nested tuple (a Drill "map".) JDBC does support columns that return a Java object. To bridge the gap, Drill returns a Map column as a Java object. But, why a JSON object? The answer seems to lie with the `sqline` program. If we query a one-line JSON file with a nested object in `sqlline`, we get the following display: ``` SELECT * FROM `json/nested.json`; +-+--+ | custId | name | +-+--+ | 101 | {"first":"John","last":"Smith"} | +-+--+ ``` So, it seems likely that the the value of a Map object was translated to a JSON object so that when `sqlline` calls `toString()` on it, it ends up formatted nicely as above. Because of this, it may be hard to change the kind of objects returned from JDBC for a Map column. > Output format for nested date, time, timestamp values in an object hierarchy > > > Key: DRILL-6242 > URL: https://issues.apache.org/jira/browse/DRILL-6242 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.12.0 >Reporter: Jiang Wu >Assignee: Jiang Wu >Priority: Major > Fix For: 1.14.0 > > > Some storages (mapr db, mongo db, etc.) have hierarchical objects that > contain nested fields of date, time, timestamp types. When a query returns > these objects, the output format for the nested date, time, timestamp, are > showing the internal object (org.joda.time.DateTime), rather than the logical > data value. > For example. Suppose in MongoDB, we have a single object that looks like > this: > {code:java} > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { > "date" : ISODate("2018-03-13T21:52:54.940Z"), > "user" : "jack" > } > } > {code} > Then connect Drill to the above MongoDB storage, and run the following query > within Drill: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | > {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"} > | > {code} > We can see that from the above output, when the date field is retrieved as a > top level column, Drill outputs a logical date value. But when the same > field is within an object hierarchy, Drill outputs the internal object used > to hold the date value. > The expected output is the same display for whether the date field is shown > as a top level column or when it is within an object hierarchy: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | {"date":"2018-03-13","user":"jack"} | > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int
Paul Rogers created DRILL-6359: -- Summary: All-text mode in JSON still reads missing column as Nullable Int Key: DRILL-6359 URL: https://issues.apache.org/jira/browse/DRILL-6359 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Paul Rogers Suppose we have the following file: {noformat} {a: 0} {a: 1} ... {a: 70001, b: 10.5} {noformat} Where the "..." indicates another 70K records. (Chosen to force the appearance of {{b}} into a second or later batch.) Suppose we execute the following query: {code} ALTER SESSION SET `store.json.all_text_mode` = true; SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a; {code} The query should work. We have an explicit project for column {{b}} and we've told JSON to always use text. So, JSON should have enough information to create column {{b}} as {{Nullable VarChar}}. Yet, the result of the query in {{sqlline}} is: {noformat} Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] {noformat} The expected result is that the query works because even missing columns should be subject to the "all text mode" setting because the JSON reader handles projection push-down, and is responsible for filling in the missing columns. This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in the "batch size handling" JSON reader rewrite, but I've not tested it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy
[ https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453501#comment-16453501 ] ASF GitHub Bot commented on DRILL-6242: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1184 Sorry, coming late. There seem to be two problems. The original "nested column" issue is an artifact of the JDBC driver. In Drill, a Map (the thing that contains your nested column) is just a nested tuple. But, JDBC does not have the idea of a nested field, there is no way to ask, for, say "myMap.ts". All you can ask for is "myMap" if it is a Map. The Drill JDBC driver has to invent a value to return. It invents a Java map. For JDBC, which does not support nested fields, you can project your field up to the top level just by naming it in the select clause: ``` SELECT `context`.`date` as `context_date` ... ``` The second problem that this PR seems to address is how dates are stored. Many tests have been changed to double-down on Drill's original sin: that generic dates (2015-04-15, say) are represented a a timestamp in UTC. But, if April 15 is your birthday, it is your birthday in all timezones. We don't say your birthday (or order date, or newspaper issue date or...) is one day in, say London and another day in Los Angeles. Drill should have a "Date" type that is not associated with a timezone. But, we don't so we get tied up in the "treat the local time zone as if it were UTC" issue. The original set of Drill types did have types to handle these, but they didn't quite make it into the final version. Also, this issue has been discussed (with some vigor) on the mailing list once or twice. One flaw, that you seem to have spotted, is that if I read data on a server in one TZ, then display it on a client in another TZ, the dates and times are all messed up. The server reads dates in local TZ, then simply stores the ms value in a date. The client thinks that date is UTC and adjusts it to its own local TZ. All heck breaks loose. (Or something like that; the original test case was over a year ago and I may have messed up the details...) The ideal set of date/time types: * Date: A date in an unspecified time zone, such as your birthday. * Time: A time relative to midnight in an (unknown) Date, no association with a TZ. For example, "we have lunch at 11:30 AM" applies regardless of TZ. * Timestamp: an absolute time relative to UTC. Then, functions can convert back and forth. Joda (and, in Java 8, the new Date/time classes) work this way. This means that the many tests that were modified to turn a generic date into a timestamp are simply making the problem worse: we are saying that 2015-04-15 is midnight, April 15 in London, which is highly unexpected. Bottom line: we've got two very difficult issues here: how to handle maps in JDBC and how to fix Drill's date/time types. > Output format for nested date, time, timestamp values in an object hierarchy > > > Key: DRILL-6242 > URL: https://issues.apache.org/jira/browse/DRILL-6242 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.12.0 >Reporter: Jiang Wu >Assignee: Jiang Wu >Priority: Major > Fix For: 1.14.0 > > > Some storages (mapr db, mongo db, etc.) have hierarchical objects that > contain nested fields of date, time, timestamp types. When a query returns > these objects, the output format for the nested date, time, timestamp, are > showing the internal object (org.joda.time.DateTime), rather than the logical > data value. > For example. Suppose in MongoDB, we have a single object that looks like > this: > {code:java} > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { > "date" : ISODate("2018-03-13T21:52:54.940Z"), > "user" : "jack" > } > } > {code} > Then connect Drill to the above MongoDB storage, and run the following query > within Drill: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | > {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":tr
[jira] [Commented] (DRILL-5927) Root allocator consistently Leaks a buffer in unit tests
[ https://issues.apache.org/jira/browse/DRILL-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453478#comment-16453478 ] ASF GitHub Bot commented on DRILL-5927: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1234 @parthchandra Please let me know if you have any comments. > Root allocator consistently Leaks a buffer in unit tests > > > Key: DRILL-5927 > URL: https://issues.apache.org/jira/browse/DRILL-5927 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Labels: ready-to-commit > Fix For: 1.14.0 > > > TestBsonRecordReader consistently poduces this exception when running on my > laptop > {code} > 13:09:15.777 [main] ERROR o.a.d.exec.server.BootStrapContext - Error while > closing > java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding > buffers allocated (1). > Allocator(ROOT) 0/1024/10113536/4294967296 (res/actual/peak/limit) > child allocators: 0 > ledgers: 1 > ledger[79] allocator: ROOT), isOwning: true, size: 1024, references: 1, > life: 340912804170064..0, allocatorManager: [71, life: 340912803759189..0] > holds 1 buffers. > DrillBuf[106], udle: [72 0..1024] > reservations: 0 > at > org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:502) > ~[classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > [classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > [classes/:na] > at > org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:256) > ~[classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > [classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > [classes/:na] > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:205) > [classes/:na] > at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:315) > [test-classes/:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_144] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_144] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > [junit-4.11.jar:na] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > [junit-4.11.jar:na] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > [junit-4.11.jar:na] > at > mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:44) > [jmockit-1.3.jar:na] > at > mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:29) > [jmockit-1.3.jar:na] > at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144] > at > mockit.internal.util.MethodReflection.invokeWithCheckedThrows(MethodReflection.java:95) > [jmockit-1.3.jar:na] > at > mockit.internal.annotations.MockMethodBridge.callMock(MockMethodBridge.java:76) > [jmockit-1.3.jar:na] > at > mockit.internal.annotations.MockMethodBridge.invoke(MockMethodBridge.java:41) > [jmockit-1.3.jar:na] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java) > [junit-4.11.jar:na] > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > [junit-4.11.jar:na] > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > [junit-4.11.jar:na] > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > [junit-4.11.jar:na] > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > [junit-rt.jar:na] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453462#comment-16453462 ] ASF GitHub Bot commented on DRILL-6307: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184264961 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) { */ private int netRowWidth; private int netRowWidthCap50; + + /** + * actual row size if input is not empty. Otherwise, standard size. + */ + private int rowAllocSize; --- End diff -- I see. In this case, however, arrays (repeated values) will be empty. If we have 10 such rows, there is no reason to have 50 "inner" values. Also, for VarChar, no values will be stored; all columns will be null. (If we are handling non-null columns, then the non-null VarChar will be an empty string.) So, we probably need a bit of a special case: prepare data for a run of null rows (with arrays and VarChar of length 0) vs. take our best guess with no knowledge at all about lengths (which may be non-empty.) Probably not a huge issue if you only need to handle a single row. But, creating a batch with only one row will cause all kinds of performance issues downstream. (I found that out the hard way when a but in sort produced a series of one-row batches...) > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6358) Null value returned from WHERE a IS NOT NULL query
Paul Rogers created DRILL-6358: -- Summary: Null value returned from WHERE a IS NOT NULL query Key: DRILL-6358 URL: https://issues.apache.org/jira/browse/DRILL-6358 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Paul Rogers Consider the following input file: {noformat} {a: null} {a: 10} {noformat} Then run the following. The result is as expected: {noformat} SELECT * FROM `json/null-int.json` WHERE a IS NOT NULL; +-+ | a | +-+ | 10 | +-+ {noformat} Now, do something similar. Create a file that repeats the first record 70,000 times. Call it {{70Knulls.json}}. Run the same query. The results are bizarre: {noformat} SELECT * FROM `gen/70Knulls.json` WHERE a IS NOT NULL; +---+ | ** | +---+ | null | +---+ {noformat} The column name of "\*\*" I probably due to DRILL-6357. The query may be treating a as a missing column, and fill in the bogus "\*\*" field. So, there is no column {{a}} in that theory. So, no rows should have matched. But, then. column {{a}} finally appeared. Why does it have a null value, not 10? And, why did the null value pass the {{IS NOT NULL}} test? Anyway, expected the same output as the two-line case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453409#comment-16453409 ] ASF GitHub Bot commented on DRILL-6307: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184258865 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) { */ private int netRowWidth; private int netRowWidthCap50; + + /** + * actual row size if input is not empty. Otherwise, standard size. + */ + private int rowAllocSize; --- End diff -- actually, this is a problem only for lateral join. In lateral join, right side will work on one row at a time from left side. If right side produces zero rows because of empty array or some other reason, for left outer join, we still have to finish cross join for that row from left side by having nulls for right side columns. Then, we go to next row on left side, continuing to work on filling the output batch till it is full. What that means is we have to allocate vectors based on that first batch on right side, which can be empty. > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6357) Unexpected column "**" when reading a JSON file
Paul Rogers created DRILL-6357: -- Summary: Unexpected column "**" when reading a JSON file Key: DRILL-6357 URL: https://issues.apache.org/jira/browse/DRILL-6357 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Paul Rogers Consider the following JSON file, {{all-null.json}}: {noformat} {a: null} {a: null} {noformat} Do a simple query: {code} SELECT * FROM `json/all-null.json`; {code} The column name is unexpectedly reported as "**": {noformat} +---+ | ** | +---+ | null | | null | +---+ {noformat} However, change the second value to "foo" and the results are as expected: {noformat} +---+ | a | +---+ | null | | foo | +---+ {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6281) Refactor TimedRunnable
[ https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453307#comment-16453307 ] ASF GitHub Bot commented on DRILL-6281: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1238 I had the same question as Arina about the need for this change and I got info from DRILL-5908 about the problem cause and how this change will help fix it. Can you elucidate? > Refactor TimedRunnable > -- > > Key: DRILL-6281 > URL: https://issues.apache.org/jira/browse/DRILL-6281 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453305#comment-16453305 ] ASF GitHub Bot commented on DRILL-6307: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184244877 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) { */ private int netRowWidth; private int netRowWidthCap50; + + /** + * actual row size if input is not empty. Otherwise, standard size. + */ + private int rowAllocSize; --- End diff -- Unless I'm missing something, we can't move forward on a join if one side is empty: we won't know if we have the rows we need. Consider a merge join (simplest). The left gets some data, but the right is empty. We can't proceed unless the right hit EOF. Otherwise, we don't know if we have a match or not for the first left row. We need to read another right batch and keep going until we either hit EOF (no matching rows) or get some data. Once we have some data, we can go row-by-row to see if we have a left-only, right-only, or matching set of rows. If we get to EOF on either side, we know that their are no matches for the other side. What we do in the no-match case depends on whether we are doing LEFT OUTER, RIGHT OUTER or an INNER join. The point is, we can't make progress until we get that non-empty right batch (in this example). So, no reason to allocate space based on an empty batch (unless the entire input is empty) because we'll need to find a non-empty (or EOF) batch anyway. > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453304#comment-16453304 ] ASF GitHub Bot commented on DRILL-6307: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184244479 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -277,18 +286,29 @@ public boolean isRepeatedList() { /** * This is the average per entry width, used for vector allocation. */ -public int getEntryWidth() { +private int getEntryWidthForAlloc() { int width = 0; if (isVariableWidth) { -width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH; +width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH; // Subtract out the bits (is-set) vector width -if (metadata.getDataMode() == DataMode.OPTIONAL) { +if (isOptional) { width -= BIT_VECTOR_WIDTH; } + +if (isRepeated && getValueCount() == 0) { + return (safeDivide(width, STD_REPETITION_FACTOR)); +} } - return (safeDivide(width, cardinality)); + return (safeDivide(width, getEntryCardinalityForAlloc())); +} + +/** + * This is the average per entry cardinality, used for vector allocation. + */ +private float getEntryCardinalityForAlloc() { + return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 1) :getCardinality(); --- End diff -- Makes sense, but why would a batch be empty unless that path hit EOF? Otherwise, the batch might be due to an empty input file. We'd just skip it and move to the next batch until we find one with data. Any reason the "get next batch" code can just loop to be "get next non-empty batch" instead? Otherwise, we can't really do any effective batch sizing as we have no data to go on... > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453295#comment-16453295 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1189 LGTM > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics > for Drill, > * to remove _com.yammer.metrics_ dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy
[ https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453292#comment-16453292 ] ASF GitHub Bot commented on DRILL-6242: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r184243856 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- Hmm. That takes us back to the original problem, that of the date|time|timestamp field inside a complex object. ``` select t.context.date, t.context from test t; will return a java.sql.Date object for column 1, but a java.time.LocalDate for the same object inside column 2. This doesn't seem like a good thing. ``` Why should that be a bad thing though? Ultimately, the object returned by getObject() is displayed to the end user thru the toString method. The string representation of Local[Date|Time|Timestamp] should be the same as that of java.sql.[Date|Time|Timestamp]. Isn't it? > Output format for nested date, time, timestamp values in an object hierarchy > > > Key: DRILL-6242 > URL: https://issues.apache.org/jira/browse/DRILL-6242 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.12.0 >Reporter: Jiang Wu >Assignee: Jiang Wu >Priority: Major > Fix For: 1.14.0 > > > Some storages (mapr db, mongo db, etc.) have hierarchical objects that > contain nested fields of date, time, timestamp types. When a query returns > these objects, the output format for the nested date, time, timestamp, are > showing the internal object (org.joda.time.DateTime), rather than the logical > data value. > For example. Suppose in MongoDB, we have a single object that looks like > this: > {code:java} > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { > "date" : ISODate("2018-03-13T21:52:54.940Z"), > "user" : "jack" > } > } > {code} > Then connect Drill to the above MongoDB storage, and run the following query > within Drill: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | > {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"} > | > {code} > We can see that from the above output, when the date field is retrieved as a > top level column, Drill outputs a logical date value. But when the same > field is within an object hierarchy, Drill outputs the internal object used > to hold the date value. > The expected output is the same display for whether the date field is shown > as a top level column or when it is within an object hierarchy: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | {"date":"2018-03-13","user":"jack"} | > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453288#comment-16453288 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r184243564 --- Diff: logical/pom.xml --- @@ -85,14 +85,12 @@ - com.codahale.metrics + io.dropwizard.metrics --- End diff -- It is not a best practice to rely on a transitive dependency for a compile scope dependency. I'd recommend specifying the dependency on `io.dropwizard.metrics` explicitly in case `java-exec` or `drill-memory-base` uses it. > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics > for Drill, > * to remove _com.yammer.metrics_ dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453280#comment-16453280 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r184242257 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- So what do you tell the user when you get a runtime exception (any exception) that is the result of a bug? It is silly to show them a stack trace that does not help them. It is better to let the user know that there was an internal error and they should log a bug with support or look for help on the dev list. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6356) batch sizing for union all
Padma Penumarthy created DRILL-6356: --- Summary: batch sizing for union all Key: DRILL-6356 URL: https://issues.apache.org/jira/browse/DRILL-6356 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Affects Versions: 1.13.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Fix For: 1.14.0 batch sizing changes for union all operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453245#comment-16453245 ] ASF GitHub Bot commented on DRILL-6307: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184236170 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -277,18 +286,29 @@ public boolean isRepeatedList() { /** * This is the average per entry width, used for vector allocation. */ -public int getEntryWidth() { +private int getEntryWidthForAlloc() { int width = 0; if (isVariableWidth) { -width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH; +width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH; // Subtract out the bits (is-set) vector width -if (metadata.getDataMode() == DataMode.OPTIONAL) { +if (isOptional) { width -= BIT_VECTOR_WIDTH; } + +if (isRepeated && getValueCount() == 0) { + return (safeDivide(width, STD_REPETITION_FACTOR)); --- End diff -- You are right. row count can be non-zero and valueCount can still be zero. Since this is intended for empty batch case, changed this to check for row count instead. > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453237#comment-16453237 ] ASF GitHub Bot commented on DRILL-6307: --- Github user ppadma commented on the issue: https://github.com/apache/drill/pull/1228 @paul-rogers Paul, I addressed code review comments. Can you take a look when you get a chance ? > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-3855) Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule
[ https://issues.apache.org/jira/browse/DRILL-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-3855: --- Reviewer: Aman Sinha (was: Volodymyr Vysotskyi) > Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule > --- > > Key: DRILL-3855 > URL: https://issues.apache.org/jira/browse/DRILL-3855 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.13.0 >Reporter: Sean Hsuan-Yi Chu >Assignee: Vitalii Diravka >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Since the infinite planning issues (Calcite Volcano Planner: Calcite-900) > reported in DRILL-3257, FilterSetOpTransposeRule, > DrillProjectSetOpTransposeRule were disabled. After it can be resolved in > Calcite, we have to enable these two rules to lift the performance. > In addition, will update the plan validation in Unit test in response of the > newly enabled rules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-3855) Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule
[ https://issues.apache.org/jira/browse/DRILL-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-3855: --- Labels: ready-to-commit (was: ) > Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule > --- > > Key: DRILL-3855 > URL: https://issues.apache.org/jira/browse/DRILL-3855 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.13.0 >Reporter: Sean Hsuan-Yi Chu >Assignee: Vitalii Diravka >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Since the infinite planning issues (Calcite Volcano Planner: Calcite-900) > reported in DRILL-3257, FilterSetOpTransposeRule, > DrillProjectSetOpTransposeRule were disabled. After it can be resolved in > Calcite, we have to enable these two rules to lift the performance. > In addition, will update the plan validation in Unit test in response of the > newly enabled rules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453194#comment-16453194 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r184218887 --- Diff: pom.xml --- @@ -1333,6 +1353,12 @@ + --- End diff -- I meant that it can be useful if some new dependencies will have older transitive `org.yummer.metrics` in future. But I do not mind, this isn't so important. Let's leave it without dependencyManagement control. > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics > for Drill, > * to remove _com.yammer.metrics_ dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453195#comment-16453195 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r184223843 --- Diff: logical/pom.xml --- @@ -85,14 +85,12 @@ - com.codahale.metrics + io.dropwizard.metrics --- End diff -- No, it is. Thanks for catching this. Also I have noticed that `java-exec` used `io.dropwizard.metrics` as transitive from `drill-common`. For consistency I have removed `io.dropwizard.metrics` from `drill-memory-base` (it can leverage metrics from `drill-common` too). > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics > for Drill, > * to remove _com.yammer.metrics_ dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6236) batch sizing for hash join
[ https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453168#comment-16453168 ] ASF GitHub Bot commented on DRILL-6236: --- Github user ppadma commented on the issue: https://github.com/apache/drill/pull/1227 @Ben-Zvi I manually added the PR link to the JIRA. all code review comments are addressed. can you look at the latest changes ? > batch sizing for hash join > -- > > Key: DRILL-6236 > URL: https://issues.apache.org/jira/browse/DRILL-6236 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit output batch size for hash join based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6236) batch sizing for hash join
[ https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453150#comment-16453150 ] Padma Penumarthy commented on DRILL-6236: - [https://github.com/apache/drill/pull/1227] > batch sizing for hash join > -- > > Key: DRILL-6236 > URL: https://issues.apache.org/jira/browse/DRILL-6236 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit output batch size for hash join based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453122#comment-16453122 ] ASF GitHub Bot commented on DRILL-6307: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184200281 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -277,18 +286,29 @@ public boolean isRepeatedList() { /** * This is the average per entry width, used for vector allocation. */ -public int getEntryWidth() { +private int getEntryWidthForAlloc() { int width = 0; if (isVariableWidth) { -width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH; +width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH; // Subtract out the bits (is-set) vector width -if (metadata.getDataMode() == DataMode.OPTIONAL) { +if (isOptional) { width -= BIT_VECTOR_WIDTH; } + +if (isRepeated && getValueCount() == 0) { + return (safeDivide(width, STD_REPETITION_FACTOR)); +} } - return (safeDivide(width, cardinality)); + return (safeDivide(width, getEntryCardinalityForAlloc())); +} + +/** + * This is the average per entry cardinality, used for vector allocation. + */ +private float getEntryCardinalityForAlloc() { + return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 1) :getCardinality(); --- End diff -- This is for joins. We allocate vectors based on first batch sizing information and if that first batch is empty, then, we are allocating vectors with zero capacity. When we read the next batch with data, we will end up going through realloc loop as we write values. For ex., for outer left join, if right side batch is empty, we still have to include the right side columns as null in outgoing batch. With the new lateral join operator, if the input has an empty array as the first record in the unnest column, then also we see the problem. > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453123#comment-16453123 ] ASF GitHub Bot commented on DRILL-6307: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184202508 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) { */ private int netRowWidth; private int netRowWidthCap50; + + /** + * actual row size if input is not empty. Otherwise, standard size. + */ + private int rowAllocSize; --- End diff -- This is not just a problem with size estimation for vector memory allocation. Let us say one side of join receives an empty batch as first batch. If we use row width as 0 in outgoing row width calculation, number of rows (to include in the outgoing batch) we will calculate will be higher and later when we get a non empty batch, we might exceed the memory limits. > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453106#comment-16453106 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1237 @vrozov, I have implemented the provided suggestions. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader
[ https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453069#comment-16453069 ] ASF GitHub Bot commented on DRILL-6331: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r184199682 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } -try { - if (fs != null) { + +for (DrillFileSystem fs : fileSystems) { + try { fs.close(); -fs = null; - } -} catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) -.addContext("Failed to close the Drill file system for " + getName()) -.build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { -Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); -fs = new DrillFileSystem(conf, getStats()); +DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- I'm not suggesting we use the same fs for each split, but the opposite. The fs obect used per split/rowgroup should be different so that we get the right fs wait time for every minor fragment. But this change allows more than one fs object per operator context; which we were explicitly preventing earlier. I'm not sure I understand why you needed to change that. > Parquet filter pushdown does not support the native hive reader > --- > > Key: DRILL-6331 > URL: https://issues.apache.org/jira/browse/DRILL-6331 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Affects Versions: 1.13.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > > Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the > core difference between them was > that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader > instead of HiveReader. > This allowed to read Hive parquet files using Drill native parquet reader but > did not expose Hive data to Drill optimizations. > For example, filter push down, limit push down, count to direct scan > optimizations. > Hive code had to be refactored to use the same interfaces as > ParquestGroupScan in order to be exposed to such optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453056#comment-16453056 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184197379 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws IOException { public IterOutcome next() { batchLoader.resetRecordCount(); stats.startProcessing(); + +RawFragmentBatch batch = null; try{ - RawFragmentBatch batch; + --- End diff -- will do. Thanks for the suggestion. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453053#comment-16453053 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184197278 --- Diff: exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java --- @@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger target) { target.historicalLog.recordEvent("incoming(from %s)", owningLedger.allocator.name); } -boolean overlimit = target.allocator.forceAllocate(size); +// Release first to handle the case where the current and target allocators were part of the same +// parent / child tree. allocator.releaseBytes(size); +boolean allocationFit = target.allocator.forceAllocate(size); --- End diff -- What about debugging: - An operation fails - And yet there is a change of ownership @vrozov, I rather not make this change as I strongly believe the change of ownership should happen only on success. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453041#comment-16453041 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184195748 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,46 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { +if (body == null) { + return this; // NOOP +} + +if (!body.getLedger().isOwningLedger() + || body.getLedger().isOwner(targetAllocator)) { + + return this; +} + +int writerIndex = body.writerIndex(); +TransferResult transferResult = body.transferOwnership(targetAllocator); --- End diff -- - After a call to getNext(), the operator code checks whether the transfer has succeeded - So that an OOM condition is reported within the fragment - When the change of ownership happens from parent to child, there should not be any OOM condition since this is internal account For code clarity, I can improve the javadoc to ask the caller to check for an OOM condition (similarly to the BaseAlocator behavior). > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6282: --- Description: There are three types of metrics in Drill: 1. _com.yammer.metrics_, 2. _com.codahale.metrics_, 3. _io.dropwizard.metrics_ Directly Drill uses only 2. The 1 and 3 ones are used as transitive dependencies only. The 2 and 3 ones have the same class full identifiers and maven doesn't know which library to use ([https://github.com/dropwizard/metrics/issues/1044]). The 2 one is an outdated, but the 3 one is still under developing and updating. Therefore the decision is: * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics for Drill, * to remove _com.yammer.metrics_ dependencies. was: There are three types of metrics in Drill: 1. _com.yammer.metrics_, 2. _com.codahale.metrics_, 3. _io.dropwizard.metrics_ Directly Drill uses only 2. The 1 and 3 ones are used as transitive dependencies only. The 2 and 3 ones have the same class full identifiers and maven doesn't know which library to use ([https://github.com/dropwizard/metrics/issues/1044]). The 2 one is an outdated, but the 3 one is still under developing and updating. Therefore the decision is: * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics for Drill, * to control under dependency management block _com.yammer.metrics_ Metrics. > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics > for Drill, > * to remove _com.yammer.metrics_ dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6342) Parquet filter pushdown doesn't work in case of filtering fields inside arrays of complex fields
[ https://issues.apache.org/jira/browse/DRILL-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453028#comment-16453028 ] ASF GitHub Bot commented on DRILL-6342: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1231 +1. LGTM > Parquet filter pushdown doesn't work in case of filtering fields inside > arrays of complex fields > > > Key: DRILL-6342 > URL: https://issues.apache.org/jira/browse/DRILL-6342 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: Complex_data.tar.gz > > > *Data:* > Complex_data data set is attached > *Query:* > {code:sql} > explain plan for select * from dfs.tmp.`Complex_data` t where > t.list_of_complex_fields[2].nested_field is true > {code} > *Expected result:* > numFiles=2 > Statistics of the file that should't be scanned: > {noformat} > list_of_complex_fields: > .nested_field: BOOLEAN UNCOMPRESSED DO:0 FPO:497 > SZ:41/41/1.00 VC:3 ENC:PLAIN,RLE ST:[min: false, max: false, num_nulls: 0] > {noformat} > *Actual result:* > numFiles=3 > I.e, filter pushdown is not work -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT
[ https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453023#comment-16453023 ] ASF GitHub Bot commented on DRILL-6327: --- Github user sohami commented on the issue: https://github.com/apache/drill/pull/1240 @parthchandra - please help to review this PR. > Update unary operators to handle IterOutcome.EMIT > - > > Key: DRILL-6327 > URL: https://issues.apache.org/jira/browse/DRILL-6327 > Project: Apache Drill > Issue Type: Task >Reporter: Parth Chandra >Assignee: Sorabh Hamirwasia >Priority: Major > > IterOutcome.EMIT is a new state introduced by the Lateral Join > implementation. All operators need to be updated to handle it. > This Jira is to track the subtask of updating the unary operators (derived > from AbstractSingleRecordBatch). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT
[ https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia reassigned DRILL-6327: Assignee: Sorabh Hamirwasia > Update unary operators to handle IterOutcome.EMIT > - > > Key: DRILL-6327 > URL: https://issues.apache.org/jira/browse/DRILL-6327 > Project: Apache Drill > Issue Type: Task >Reporter: Parth Chandra >Assignee: Sorabh Hamirwasia >Priority: Major > > IterOutcome.EMIT is a new state introduced by the Lateral Join > implementation. All operators need to be updated to handle it. > This Jira is to track the subtask of updating the unary operators (derived > from AbstractSingleRecordBatch). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT
[ https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-6327: - Reviewer: Parth Chandra > Update unary operators to handle IterOutcome.EMIT > - > > Key: DRILL-6327 > URL: https://issues.apache.org/jira/browse/DRILL-6327 > Project: Apache Drill > Issue Type: Task >Reporter: Parth Chandra >Assignee: Sorabh Hamirwasia >Priority: Major > > IterOutcome.EMIT is a new state introduced by the Lateral Join > implementation. All operators need to be updated to handle it. > This Jira is to track the subtask of updating the unary operators (derived > from AbstractSingleRecordBatch). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT
[ https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453022#comment-16453022 ] ASF GitHub Bot commented on DRILL-6327: --- GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/1240 DRILL-6327: Update unary operators to handle IterOutcome.EMIT Note: Handles for Non-Blocking Unary operators (like Filter/Project/etc) with EMIT Iter.Outcome You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-6327 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1240.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1240 commit 74b9ead99cf1f92713ab9b81f23f680bd1a9cfff Author: Sorabh Hamirwasia Date: 2018-04-05T00:54:58Z DRILL-6327: Update unary operators to handle IterOutcome.EMIT Note: Handles for Non-Blocking Unary operators (like Filter/Project/etc) with EMIT Iter.Outcome > Update unary operators to handle IterOutcome.EMIT > - > > Key: DRILL-6327 > URL: https://issues.apache.org/jira/browse/DRILL-6327 > Project: Apache Drill > Issue Type: Task >Reporter: Parth Chandra >Priority: Major > > IterOutcome.EMIT is a new state introduced by the Lateral Join > implementation. All operators need to be updated to handle it. > This Jira is to track the subtask of updating the unary operators (derived > from AbstractSingleRecordBatch). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453016#comment-16453016 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184192630 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,46 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { +if (body == null) { + return this; // NOOP +} + +if (!body.getLedger().isOwningLedger() + || body.getLedger().isOwner(targetAllocator)) { + + return this; +} + +int writerIndex = body.writerIndex(); +TransferResult transferResult = body.transferOwnership(targetAllocator); + +// Set the index and increment reference count +transferResult.buffer.writerIndex(writerIndex); + +// Clear the current Drillbuffer since caller will perform release() on the new one +body.release(); + +return new RawFragmentBatch(getHeader(), transferResult.buffer, getSender(), false); --- End diff -- Look at the IncomingBuffers.batchArrived: - It holds a reference to the RawFragmentBatch - Holds a lock on the parent collector for insertion - Though, the collector enqueue method itself is not synchronized on the "this" object - Instead, it relies on the synchronization provided as part of the RawBatchBuffer queue - This means a getNext() call will be able to dequeue a RawFragmentBatch instance - Concurrently, the IncomingBuffers.batchArrived will invoke release on the cached reference > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly
[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453013#comment-16453013 ] ASF GitHub Bot commented on DRILL-6307: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184192443 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -50,7 +50,7 @@ public class RecordBatchSizer { private static final int OFFSET_VECTOR_WIDTH = UInt4Vector.VALUE_WIDTH; private static final int BIT_VECTOR_WIDTH = UInt1Vector.VALUE_WIDTH; - private static final int STD_REPETITION_FACTOR = 10; + public static final int STD_REPETITION_FACTOR = 10; --- End diff -- done. using 5 in both places now. > Handle empty batches in record batch sizer correctly > > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader
[ https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452986#comment-16452986 ] ASF GitHub Bot commented on DRILL-6331: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r184188427 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } -try { - if (fs != null) { + +for (DrillFileSystem fs : fileSystems) { + try { fs.close(); -fs = null; - } -} catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) -.addContext("Failed to close the Drill file system for " + getName()) -.build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { -Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); -fs = new DrillFileSystem(conf, getStats()); +DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- @parthchandra this is definitely a good question. I did so because in previous code new fs was created for each Hive table split [1]. Projection pusher is used to define fs for each split, it resolves path for table partitions. Frankly saying it worked fine for me without it (all tests have passed) but in Hive code the same approach is used and apparently for the same reasons it was used in Drill. To be safe, I have done the same. If you think we can the same fs for each row group in Hive, then I can adjust the changes. [1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java#L112 > Parquet filter pushdown does not support the native hive reader > --- > > Key: DRILL-6331 > URL: https://issues.apache.org/jira/browse/DRILL-6331 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Affects Versions: 1.13.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > > Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the > core difference between them was > that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader > instead of HiveReader. > This allowed to read Hive parquet files using Drill native parquet reader but > did not expose Hive data to Drill optimizations. > For example, filter push down, limit push down, count to direct scan > optimizations. > Hive code had to be refactored to use the same interfaces as > ParquestGroupScan in order to be exposed to such optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution
[ https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452961#comment-16452961 ] ASF GitHub Bot commented on DRILL-6272: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1225 @vrozov re-implemented using maven embedder. Had to upgrade jmokcit lib to the latest version since it caused NPE with maven embedder (NPE from jmockit even if no mocks are used, fixed in newer version). Please review. > Remove binary jars files from source distribution > - > > Key: DRILL-6272 > URL: https://issues.apache.org/jira/browse/DRILL-6272 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.14.0 > > > Per [~vrozov] the source distribution contains binary jar files under > exec/java-exec/src/test/resources/jars -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6282: --- Description: There are three types of metrics in Drill: 1. _com.yammer.metrics_, 2. _com.codahale.metrics_, 3. _io.dropwizard.metrics_ Directly Drill uses only 2. The 1 and 3 ones are used as transitive dependencies only. The 2 and 3 ones have the same class full identifiers and maven doesn't know which library to use ([https://github.com/dropwizard/metrics/issues/1044]). The 2 one is an outdated, but the 3 one is still under developing and updating. Therefore the decision is: * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics for Drill, * to control under dependency management block _com.yammer.metrics_ Metrics. was: There are three types of metrics in Drill: 1. _com.yammer.metrics_, 2. _com.codahale.metrics_, 3. _io.dropwizard.metrics_ Directly Drill uses only 2. The 1 and 3 ones are used as transitive dependencies only. The 2 and 3 ones have the same class full identifiers and maven doesn't know which library to use ([https://github.com/dropwizard/metrics/issues/1044]). The 2 one is an outdated, but the 3 one is still under developing and updating. Therefore the decision is: * to use only one _io.dropwizard.metrics_ Metrics for Drill, * to remove _com.codahale.metrics_ metrics, * to control under dependency management block _com.yammer.metrics_ Metrics. > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics > for Drill, > * to control under dependency management block _com.yammer.metrics_ Metrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6282: --- Description: There are three types of metrics in Drill: 1. _com.yammer.metrics_, 2. _com.codahale.metrics_, 3. _io.dropwizard.metrics_ Directly Drill uses only 2. The 1 and 3 ones are used as transitive dependencies only. The 2 and 3 ones have the same class full identifiers and maven doesn't know which library to use ([https://github.com/dropwizard/metrics/issues/1044]). The 2 one is an outdated, but the 3 one is still under developing and updating. Therefore the decision is: * to use only one _io.dropwizard.metrics_ Metrics for Drill, * to remove _com.codahale.metrics_ metrics, * to control under dependency management block _com.yammer.metrics_ Metrics. was: There are three types of metrics-core in Drill: 1. _com.yammer.metrics_, 2. _com.codahale.metrics_, 3. _io.dropwizard.metrics_ Drill uses only 1 and 2. The last 3 one is used by Hive. 1st one has different class full identifiers, but the 2 and 3 ones have the same class full identifiers and maven doesn't know which library to use ([https://github.com/dropwizard/metrics/issues/1044]). But I found that 3 one library is used by Hive only for tests, therefore it is not required for Drill and could be excluded from hive-metastore and hive-exec. The dependencies conflict is related not only to metrics-core, but to metrics-servlets and metrics-json as well. All these metrics should be organized with proper excluding and dependency management blocks. > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Directly Drill uses only 2. The 1 and 3 ones are used as transitive > dependencies only. > The 2 and 3 ones have the same class full identifiers and maven doesn't know > which library to use ([https://github.com/dropwizard/metrics/issues/1044]). > The 2 one is an outdated, but the 3 one is still under developing and > updating. > Therefore the decision is: > * to use only one _io.dropwizard.metrics_ Metrics for Drill, > * to remove _com.codahale.metrics_ metrics, > * to control under dependency management block _com.yammer.metrics_ Metrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-6282: --- Summary: Update Drill's Metrics dependencies (was: Excluding io.dropwizard.metrics dependencies) > Update Drill's Metrics dependencies > --- > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics-core in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Drill uses only 1 and 2. The last 3 one is used by Hive. > 1st one has different class full identifiers, but the 2 and 3 ones have the > same class full identifiers and maven doesn't know which library to use > ([https://github.com/dropwizard/metrics/issues/1044]). > But I found that 3 one library is used by Hive only for tests, therefore it > is not required for Drill and could be excluded from hive-metastore and > hive-exec. > The dependencies conflict is related not only to metrics-core, but to > metrics-servlets and metrics-json as well. > All these metrics should be organized with proper excluding and dependency > management blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452883#comment-16452883 ] ASF GitHub Bot commented on DRILL-143: -- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1239 One other thing to highlight from an earlier comment. CPU is something that the user specifies in the DoY config file. That information is passed to YARN in container requests. This feature asks the user to modify the drill-env.sh file to enable cgroups to limit CPU. As a result, the user must specify the CPU limit in two places. DoY went to extreme lengths to unify memory configuration so it is set in one place. The assumption was that, since Apache YARN handles cgroups, we've also got unified CPU specification. But, in this "side-car" approach we don't. So, would be good to capture the CPU amount from the YARN config as explained earlier and use that to set the Drill cgroups env vars -- but only if some "self-enforcing cgroup" flag is enabled in the config (which can be done by default for the limited MapR YARN.) > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452877#comment-16452877 ] ASF GitHub Bot commented on DRILL-143: -- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1239 @kkhatua, putting on my Apache hat... Apache Drill is an Apache project that must work with other Apache projects such as Apache YARN. The Apache Drill DoY support is designed to work well with Apache YARN (and has a few special additions for MapR YARN's unique limitations.) It is important that the Apache DoY work well with the generic YARN. No harm in adding tweaks (such as this one) to work with vendor-specific limitations. But, the overriding concern is that the DoY feature be useful in Apache. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452872#comment-16452872 ] ASF GitHub Bot commented on DRILL-143: -- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1239#discussion_r184169114 --- Diff: distribution/src/resources/yarn-drillbit.sh --- @@ -175,4 +209,11 @@ fi echo "`date` Starting drillbit on `hostname` under YARN, logging to $DRILLBIT_LOG_PATH" echo "`ulimit -a`" >> "$DRILLBIT_LOG_PATH" 2>&1 -"$DRILL_HOME/bin/runbit" exec +# Run in background +"$DRILL_HOME/bin/runbit" exec & --- End diff -- Under YARN, it is YARN that maintains the pid, not Drill. YARN expects its child processes to run in the foreground and will handle capturing the pid. This is a case in which "native" Apache YARN works differently than "MapR YARN." Since Apache YARN handles cgroups, it is the one that needs the pid. Under MapR's limited YARN, then Drill is second-guessing YARN and needs the pid. It may be that MapR's YARN can handle a background process; I don't recall the details. Is there way way to run Drill in the background, get the pid, then return to the foreground so that the script does not exit until Drill itself exits? In fact, if I remember correctly, the scripts have two layers; in one layer, the script replaces itself with the Drill process. Something to check. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452816#comment-16452816 ] ASF GitHub Bot commented on DRILL-6348: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184159218 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws IOException { public IterOutcome next() { batchLoader.resetRecordCount(); stats.startProcessing(); + +RawFragmentBatch batch = null; try{ - RawFragmentBatch batch; + --- End diff -- create function `getNextNotEmptyBatch()` that calls `getNextBatch()` and either returns not empty batch or `null`. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452807#comment-16452807 ] ASF GitHub Bot commented on DRILL-6348: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184156922 --- Diff: exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java --- @@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger target) { target.historicalLog.recordEvent("incoming(from %s)", owningLedger.allocator.name); } -boolean overlimit = target.allocator.forceAllocate(size); +// Release first to handle the case where the current and target allocators were part of the same +// parent / child tree. allocator.releaseBytes(size); +boolean allocationFit = target.allocator.forceAllocate(size); --- End diff -- If this happens, is not there a problem that the old allocator already released the memory? In any case, won't runtime exception cancel the query anyway and all allocators will be closed. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452797#comment-16452797 ] ASF GitHub Bot commented on DRILL-6348: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184155724 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,46 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { +if (body == null) { + return this; // NOOP +} + +if (!body.getLedger().isOwningLedger() + || body.getLedger().isOwner(targetAllocator)) { + + return this; +} + +int writerIndex = body.writerIndex(); +TransferResult transferResult = body.transferOwnership(targetAllocator); --- End diff -- But what if it is an over limit? > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452793#comment-16452793 ] ASF GitHub Bot commented on DRILL-143: -- Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1239 @paul-rogers DoY is no more a MapR-only feature, and if it helps to have Drill self-enforce, this works. If YARN is able to enforce for Drill, the user need not specify the settings in their `drill-env.sh`. I see https://issues.apache.org/jira/browse/YARN-810 as being an open issue. (Source: [Issue List](https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20status%20in%20(Open%2C%20"Patch%20Available")%20AND%20text%20~%20"cgroup"%20ORDER%20BY%20key%20%2C%20status%20ASC) ). So, I probably don't need to do any explicit checks for hadoop distros and documentation should be sufficient to address this, IMO. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452791#comment-16452791 ] ASF GitHub Bot commented on DRILL-6348: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184154997 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,46 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { +if (body == null) { + return this; // NOOP +} + +if (!body.getLedger().isOwningLedger() + || body.getLedger().isOwner(targetAllocator)) { + + return this; +} + +int writerIndex = body.writerIndex(); +TransferResult transferResult = body.transferOwnership(targetAllocator); + +// Set the index and increment reference count +transferResult.buffer.writerIndex(writerIndex); + +// Clear the current Drillbuffer since caller will perform release() on the new one +body.release(); + +return new RawFragmentBatch(getHeader(), transferResult.buffer, getSender(), false); --- End diff -- I don't see where RawFragmentBatch is cached. Is not it removed from a queue using poll()? > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452774#comment-16452774 ] ASF GitHub Bot commented on DRILL-143: -- Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1239#discussion_r184153040 --- Diff: distribution/src/resources/yarn-drillbit.sh --- @@ -175,4 +209,11 @@ fi echo "`date` Starting drillbit on `hostname` under YARN, logging to $DRILLBIT_LOG_PATH" echo "`ulimit -a`" >> "$DRILLBIT_LOG_PATH" 2>&1 -"$DRILL_HOME/bin/runbit" exec +# Run in background +"$DRILL_HOME/bin/runbit" exec & --- End diff -- The process is momentarily in the background to capture the PID. We eventually wait for it. Are you saying that a process will not continue to run because it is in the background?? > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452763#comment-16452763 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184151186 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -182,13 +184,18 @@ public IterOutcome next() { return IterOutcome.OUT_OF_MEMORY; } + // Transfer the ownership of this raw-batch to this operator for proper memory statistics reporting + batch = batch.transferBodyOwnership(oContext.getAllocator()); + final RecordBatchDef rbd = batch.getHeader().getDef(); final boolean schemaChanged = batchLoader.load(rbd, batch.getBody()); // TODO: Clean: DRILL-2933: That load(...) no longer throws // SchemaChangeException, so check/clean catch clause below. stats.addLongStat(Metric.BYTES_RECEIVED, batch.getByteCount()); batch.release(); + batch = null; --- End diff -- @vrozov My bad, the highlight was on the "batch = null" code; I guess, you meant why can't we move the whole release logic within the finally block. I agree with your proposal as delaying a bit the release phase doesn't hurt u in this case. I will make the change. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452758#comment-16452758 ] ASF GitHub Bot commented on DRILL-6348: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184150526 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws IOException { public IterOutcome next() { batchLoader.resetRecordCount(); stats.startProcessing(); + +RawFragmentBatch batch = null; --- End diff -- It is not obvious that `body` is `null` when the `if` condition is met. Additionally, what if a release logic changes in future? It is a bad practice to rely on the details of implementation. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-5927) Root allocator consistently Leaks a buffer in unit tests
[ https://issues.apache.org/jira/browse/DRILL-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5927: - Labels: ready-to-commit (was: ) > Root allocator consistently Leaks a buffer in unit tests > > > Key: DRILL-5927 > URL: https://issues.apache.org/jira/browse/DRILL-5927 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Labels: ready-to-commit > Fix For: 1.14.0 > > > TestBsonRecordReader consistently poduces this exception when running on my > laptop > {code} > 13:09:15.777 [main] ERROR o.a.d.exec.server.BootStrapContext - Error while > closing > java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding > buffers allocated (1). > Allocator(ROOT) 0/1024/10113536/4294967296 (res/actual/peak/limit) > child allocators: 0 > ledgers: 1 > ledger[79] allocator: ROOT), isOwning: true, size: 1024, references: 1, > life: 340912804170064..0, allocatorManager: [71, life: 340912803759189..0] > holds 1 buffers. > DrillBuf[106], udle: [72 0..1024] > reservations: 0 > at > org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:502) > ~[classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > [classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > [classes/:na] > at > org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:256) > ~[classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > [classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > [classes/:na] > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:205) > [classes/:na] > at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:315) > [test-classes/:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_144] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_144] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > [junit-4.11.jar:na] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > [junit-4.11.jar:na] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > [junit-4.11.jar:na] > at > mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:44) > [jmockit-1.3.jar:na] > at > mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:29) > [jmockit-1.3.jar:na] > at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144] > at > mockit.internal.util.MethodReflection.invokeWithCheckedThrows(MethodReflection.java:95) > [jmockit-1.3.jar:na] > at > mockit.internal.annotations.MockMethodBridge.callMock(MockMethodBridge.java:76) > [jmockit-1.3.jar:na] > at > mockit.internal.annotations.MockMethodBridge.invoke(MockMethodBridge.java:41) > [jmockit-1.3.jar:na] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java) > [junit-4.11.jar:na] > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > [junit-4.11.jar:na] > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > [junit-4.11.jar:na] > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > [junit-4.11.jar:na] > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > [junit-rt.jar:na] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-5927) Root allocator consistently Leaks a buffer in unit tests
[ https://issues.apache.org/jira/browse/DRILL-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5927: - Reviewer: Vlad Rozov (was: Parth Chandra) > Root allocator consistently Leaks a buffer in unit tests > > > Key: DRILL-5927 > URL: https://issues.apache.org/jira/browse/DRILL-5927 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Labels: ready-to-commit > Fix For: 1.14.0 > > > TestBsonRecordReader consistently poduces this exception when running on my > laptop > {code} > 13:09:15.777 [main] ERROR o.a.d.exec.server.BootStrapContext - Error while > closing > java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding > buffers allocated (1). > Allocator(ROOT) 0/1024/10113536/4294967296 (res/actual/peak/limit) > child allocators: 0 > ledgers: 1 > ledger[79] allocator: ROOT), isOwning: true, size: 1024, references: 1, > life: 340912804170064..0, allocatorManager: [71, life: 340912803759189..0] > holds 1 buffers. > DrillBuf[106], udle: [72 0..1024] > reservations: 0 > at > org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:502) > ~[classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > [classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > [classes/:na] > at > org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:256) > ~[classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) > [classes/:na] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) > [classes/:na] > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:205) > [classes/:na] > at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:315) > [test-classes/:na] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.8.0_144] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[na:1.8.0_144] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > [junit-4.11.jar:na] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > [junit-4.11.jar:na] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > [junit-4.11.jar:na] > at > mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:44) > [jmockit-1.3.jar:na] > at > mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:29) > [jmockit-1.3.jar:na] > at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144] > at > mockit.internal.util.MethodReflection.invokeWithCheckedThrows(MethodReflection.java:95) > [jmockit-1.3.jar:na] > at > mockit.internal.annotations.MockMethodBridge.callMock(MockMethodBridge.java:76) > [jmockit-1.3.jar:na] > at > mockit.internal.annotations.MockMethodBridge.invoke(MockMethodBridge.java:41) > [jmockit-1.3.jar:na] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java) > [junit-4.11.jar:na] > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > [junit-4.11.jar:na] > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > [junit-4.11.jar:na] > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > [junit-4.11.jar:na] > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > [junit-rt.jar:na] > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > [junit-rt.jar:na] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452730#comment-16452730 ] ASF GitHub Bot commented on DRILL-6348: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184146733 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -182,13 +184,18 @@ public IterOutcome next() { return IterOutcome.OUT_OF_MEMORY; } + // Transfer the ownership of this raw-batch to this operator for proper memory statistics reporting + batch = batch.transferBodyOwnership(oContext.getAllocator()); + final RecordBatchDef rbd = batch.getHeader().getDef(); final boolean schemaChanged = batchLoader.load(rbd, batch.getBody()); // TODO: Clean: DRILL-2933: That load(...) no longer throws // SchemaChangeException, so check/clean catch clause below. stats.addLongStat(Metric.BYTES_RECEIVED, batch.getByteCount()); batch.release(); + batch = null; --- End diff -- Why can't it be done in finally? > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452716#comment-16452716 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184107121 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{ final Object [] expected = new Object[] {1, 1, 1, 0}; runTest(expected, "functions/testIsNumericFunction.json"); } + + @Test + public void testLog10WithDouble() throws Throwable { +String json = "{" + + "\"positive_infinity\" : Infinity," + + "\"negative_infinity\" : -Infinity," + + "\"nan\" : NaN," + + "\"num1\": 0.0," + + "\"num2\": 0.1," + + "\"num3\": 1.0," + + "\"num4\": 1.5," + + "\"num5\": -1.5," + + "\"num6\": 10.0" + + "}"; +String query = "select " + +"log10(positive_infinity) as pos_inf, " + +"log10(negative_infinity) as neg_inf, " + +"log10(nan) as nan, " + +"log10(num1) as num1, " + +"log10(num2) as num2, " + +"log10(num3) as num3, " + +"log10(num4) as num4, " + +"log10(num5) as num5, " + +"log10(num6) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); --- End diff -- Please consider `setSessionOption`. > Add LOG10 function implementation > - > > Key: DRILL-6345 > URL: https://issues.apache.org/jira/browse/DRILL-6345 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach >Priority: Major > Fix For: 1.14.0 > > > Add LOG10 function implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452713#comment-16452713 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184110535 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{ final Object [] expected = new Object[] {1, 1, 1, 0}; runTest(expected, "functions/testIsNumericFunction.json"); } + + @Test + public void testLog10WithDouble() throws Throwable { +String json = "{" + + "\"positive_infinity\" : Infinity," + + "\"negative_infinity\" : -Infinity," + + "\"nan\" : NaN," + + "\"num1\": 0.0," + + "\"num2\": 0.1," + + "\"num3\": 1.0," + + "\"num4\": 1.5," + + "\"num5\": -1.5," + + "\"num6\": 10.0" + + "}"; +String query = "select " + +"log10(positive_infinity) as pos_inf, " + +"log10(negative_infinity) as neg_inf, " + +"log10(nan) as nan, " + +"log10(num1) as num1, " + +"log10(num2) as num2, " + +"log10(num3) as num3, " + +"log10(num4) as num4, " + +"log10(num5) as num5, " + +"log10(num6) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() + .run(); +} finally { + test("alter session set `%s` = false", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); --- End diff -- For the case when default value of the option is changed, disabling it after the tests may cause problems for other tests. Correct behaviour is to reset the value of the option. May be used method `resetSessionOption` > Add LOG10 function implementation > - > > Key: DRILL-6345 > URL: https://issues.apache.org/jira/browse/DRILL-6345 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach >Priority: Major > Fix For: 1.14.0 > > > Add LOG10 function implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452719#comment-16452719 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184138785 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{ final Object [] expected = new Object[] {1, 1, 1, 0}; runTest(expected, "functions/testIsNumericFunction.json"); } + + @Test + public void testLog10WithDouble() throws Throwable { +String json = "{" + + "\"positive_infinity\" : Infinity," + + "\"negative_infinity\" : -Infinity," + + "\"nan\" : NaN," + + "\"num1\": 0.0," + + "\"num2\": 0.1," + + "\"num3\": 1.0," + + "\"num4\": 1.5," + + "\"num5\": -1.5," + + "\"num6\": 10.0" + + "}"; +String query = "select " + +"log10(positive_infinity) as pos_inf, " + +"log10(negative_infinity) as neg_inf, " + +"log10(nan) as nan, " + +"log10(num1) as num1, " + +"log10(num2) as num2, " + +"log10(num3) as num3, " + +"log10(num4) as num4, " + +"log10(num5) as num5, " + +"log10(num6) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() + .run(); +} finally { + test("alter session set `%s` = false", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = false", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testLog10WithFloat() throws Throwable { +String json = "{" + +"\"positive_infinity\" : Infinity," + +"\"negative_infinity\" : -Infinity," + +"\"nan\" : NaN," + +"\"num1\": 0.0," + +"\"num2\": 0.1," + +"\"num3\": 1.0," + +"\"num4\": 1.5," + +"\"num5\": -1.5," + +"\"num6\": 10.0" + +"}"; +String query = "select " + +"log10(cast(positive_infinity as float)) as pos_inf, " + +"log10(cast(negative_infinity as float)) as neg_inf, " + +"log10(cast(nan as float)) as nan, " + +"log10(cast(num1 as float)) as num1, " + +"log10(cast(num2 as float)) as num2, " + +"log10(cast(num3 as float)) as num3, " + +"log10(cast(num4 as float)) as num4, " + +"log10(cast(num5 as float)) as num5, " + +"log10(cast(num6 as float)) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -0.3528508d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() + .run(); +} finally { + test("alter session set `%s` = false", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testLog10WithInt() throws Throwable { +String json = "{" + +"\"num1\": 0.0," + +"\"num3\": 1.0," + +"\"num5\": -1.0," + +"\
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452718#comment-16452718 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184106411 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{ final Object [] expected = new Object[] {1, 1, 1, 0}; runTest(expected, "functions/testIsNumericFunction.json"); } + + @Test + public void testLog10WithDouble() throws Throwable { +String json = "{" + + "\"positive_infinity\" : Infinity," + + "\"negative_infinity\" : -Infinity," + + "\"nan\" : NaN," + + "\"num1\": 0.0," + + "\"num2\": 0.1," + + "\"num3\": 1.0," + + "\"num4\": 1.5," + + "\"num5\": -1.5," + + "\"num6\": 10.0" + + "}"; +String query = "select " + +"log10(positive_infinity) as pos_inf, " + +"log10(negative_infinity) as neg_inf, " + +"log10(nan) as nan, " + +"log10(num1) as num1, " + +"log10(num2) as num2, " + +"log10(num3) as num3, " + +"log10(num4) as num4, " + +"log10(num5) as num5, " + +"log10(num6) as num6 " + +"" + --- End diff -- Please remove empty string here and in other places. > Add LOG10 function implementation > - > > Key: DRILL-6345 > URL: https://issues.apache.org/jira/browse/DRILL-6345 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach >Priority: Major > Fix For: 1.14.0 > > > Add LOG10 function implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452714#comment-16452714 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184138601 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{ final Object [] expected = new Object[] {1, 1, 1, 0}; runTest(expected, "functions/testIsNumericFunction.json"); } + + @Test + public void testLog10WithDouble() throws Throwable { +String json = "{" + + "\"positive_infinity\" : Infinity," + + "\"negative_infinity\" : -Infinity," + + "\"nan\" : NaN," + + "\"num1\": 0.0," + + "\"num2\": 0.1," + + "\"num3\": 1.0," + + "\"num4\": 1.5," + + "\"num5\": -1.5," + + "\"num6\": 10.0" + + "}"; +String query = "select " + +"log10(positive_infinity) as pos_inf, " + +"log10(negative_infinity) as neg_inf, " + +"log10(nan) as nan, " + +"log10(num1) as num1, " + +"log10(num2) as num2, " + +"log10(num3) as num3, " + +"log10(num4) as num4, " + +"log10(num5) as num5, " + +"log10(num6) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() + .run(); +} finally { + test("alter session set `%s` = false", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = false", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testLog10WithFloat() throws Throwable { +String json = "{" + +"\"positive_infinity\" : Infinity," + +"\"negative_infinity\" : -Infinity," + +"\"nan\" : NaN," + +"\"num1\": 0.0," + +"\"num2\": 0.1," + +"\"num3\": 1.0," + +"\"num4\": 1.5," + +"\"num5\": -1.5," + +"\"num6\": 10.0" + +"}"; +String query = "select " + +"log10(cast(positive_infinity as float)) as pos_inf, " + +"log10(cast(negative_infinity as float)) as neg_inf, " + +"log10(cast(nan as float)) as nan, " + +"log10(cast(num1 as float)) as num1, " + +"log10(cast(num2 as float)) as num2, " + +"log10(cast(num3 as float)) as num3, " + +"log10(cast(num4 as float)) as num4, " + +"log10(cast(num5 as float)) as num5, " + +"log10(cast(num6 as float)) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -0.3528508d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() --- End diff -- `build().run()` -> `go()`  > Add LOG10 function implementation > - > > Key: DRILL-6345 > URL: https://issues.apache.org/jira/browse/DRILL-6345 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach >
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452717#comment-16452717 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184144882 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{ final Object [] expected = new Object[] {1, 1, 1, 0}; runTest(expected, "functions/testIsNumericFunction.json"); } + + @Test + public void testLog10WithDouble() throws Throwable { +String json = "{" + + "\"positive_infinity\" : Infinity," + + "\"negative_infinity\" : -Infinity," + + "\"nan\" : NaN," + + "\"num1\": 0.0," + + "\"num2\": 0.1," + + "\"num3\": 1.0," + + "\"num4\": 1.5," + + "\"num5\": -1.5," + + "\"num6\": 10.0" + + "}"; +String query = "select " + +"log10(positive_infinity) as pos_inf, " + +"log10(negative_infinity) as neg_inf, " + +"log10(nan) as nan, " + +"log10(num1) as num1, " + +"log10(num2) as num2, " + +"log10(num3) as num3, " + +"log10(num4) as num4, " + +"log10(num5) as num5, " + +"log10(num6) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() + .run(); +} finally { + test("alter session set `%s` = false", ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE); + test("alter session set `%s` = false", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testLog10WithFloat() throws Throwable { +String json = "{" + +"\"positive_infinity\" : Infinity," + +"\"negative_infinity\" : -Infinity," + +"\"nan\" : NaN," + +"\"num1\": 0.0," + +"\"num2\": 0.1," + +"\"num3\": 1.0," + +"\"num4\": 1.5," + +"\"num5\": -1.5," + +"\"num6\": 10.0" + +"}"; +String query = "select " + +"log10(cast(positive_infinity as float)) as pos_inf, " + +"log10(cast(negative_infinity as float)) as neg_inf, " + +"log10(cast(nan as float)) as nan, " + +"log10(cast(num1 as float)) as num1, " + +"log10(cast(num2 as float)) as num2, " + +"log10(cast(num3 as float)) as num3, " + +"log10(cast(num4 as float)) as num4, " + +"log10(cast(num5 as float)) as num5, " + +"log10(cast(num6 as float)) as num6 " + +"" + +"from dfs.`data.json`"; +File file = new File(dirTestWatcher.getRootDir(), "data.json"); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `%s` = true", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("pos_inf", "neg_inf", "nan", "num1", "num2", "num3", "num4", "num5", "num6") + .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, Double.NaN, Double.NEGATIVE_INFINITY, + -0.3528508d, 0d, 0.17609125905568124d, Double.NaN, 1.0d) + .build() + .run(); +} finally { + test("alter session set `%s` = false", ExecConstants.JSON_READER_NAN_INF_NUMBERS); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testLog10WithInt() throws Throwable { +String json = "{" + +"\"num1\": 0.0," + +"\"num3\": 1.0," + +"\"num5\": -1.0," + +"\
[jira] [Commented] (DRILL-6345) Add LOG10 function implementation
[ https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452715#comment-16452715 ] ASF GitHub Bot commented on DRILL-6345: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r184145427 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -20,12 +20,20 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; +import java.io.File; import java.math.BigDecimal; +import java.nio.file.Files; +import java.util.DoubleSummaryStatistics; +import java.util.List; +import com.google.common.collect.Lists; +import org.apache.commons.io.FileUtils; import org.apache.drill.categories.OperatorTest; import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.exec.ExecConstants; import org.apache.drill.exec.ExecTest; +import org.apache.drill.exec.compile.ExampleTemplateWithInner; --- End diff -- Is this import required? > Add LOG10 function implementation > - > > Key: DRILL-6345 > URL: https://issues.apache.org/jira/browse/DRILL-6345 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach >Priority: Major > Fix For: 1.14.0 > > > Add LOG10 function implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452711#comment-16452711 ] ASF GitHub Bot commented on DRILL-143: -- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1239 There may be some misunderstanding of how DoY works. The only info that users can pass to DoY is that which is in the DoY config file. We should add arguments to that file which will be passed through the DoY client to the DoY AM, and from there, as an env var, to the Drillbit containers. We already do this for memory so that both Drill and YARN agree on memory. We should do the same for CPU so that Drill, YARN and cgroups agree on the number of CPUs that Drill can use. Since this feature is MapR-only, it might be OK to require that users alter their `drill-env.sh` to set an environment variable the forces Drill to police itself under YARN. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Excluding io.dropwizard.metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452709#comment-16452709 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1189 Please update JIRA, PR and commit titles and squash commits. > Excluding io.dropwizard.metrics dependencies > > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics-core in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Drill uses only 1 and 2. The last 3 one is used by Hive. > 1st one has different class full identifiers, but the 2 and 3 ones have the > same class full identifiers and maven doesn't know which library to use > ([https://github.com/dropwizard/metrics/issues/1044]). > But I found that 3 one library is used by Hive only for tests, therefore it > is not required for Drill and could be excluded from hive-metastore and > hive-exec. > The dependencies conflict is related not only to metrics-core, but to > metrics-servlets and metrics-json as well. > All these metrics should be organized with proper excluding and dependency > management blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452712#comment-16452712 ] ASF GitHub Bot commented on DRILL-143: -- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1239#discussion_r184144439 --- Diff: distribution/src/resources/yarn-drillbit.sh --- @@ -175,4 +209,11 @@ fi echo "`date` Starting drillbit on `hostname` under YARN, logging to $DRILLBIT_LOG_PATH" echo "`ulimit -a`" >> "$DRILLBIT_LOG_PATH" 2>&1 -"$DRILL_HOME/bin/runbit" exec +# Run in background +"$DRILL_HOME/bin/runbit" exec & --- End diff -- This can't be for YARN. Under YARN, Drill must be run in the foreground. The original Drill-on-YARN work ensured that al this works. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader
[ https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452708#comment-16452708 ] ASF GitHub Bot commented on DRILL-6331: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183919198 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } -try { - if (fs != null) { + +for (DrillFileSystem fs : fileSystems) { + try { fs.close(); -fs = null; - } -} catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) -.addContext("Failed to close the Drill file system for " + getName()) -.build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { -Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); -fs = new DrillFileSystem(conf, getStats()); +DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- I don't get why you need multiple DrillFileSystems per operator context? The reason for the DrillFileSystem abstraction (and the reason for tying it to the operator context) is to track the time a (scan) operator was waiting for a file system call to return. This is reported in the wait time for the operator in the query profile. For scans this is a critical number as the time spent waiting for a disk read determines if the query is disk bound. Associating multiple file system objects with a single operator context will throw the math out of whack. I think. > Parquet filter pushdown does not support the native hive reader > --- > > Key: DRILL-6331 > URL: https://issues.apache.org/jira/browse/DRILL-6331 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Affects Versions: 1.13.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > > Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the > core difference between them was > that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader > instead of HiveReader. > This allowed to read Hive parquet files using Drill native parquet reader but > did not expose Hive data to Drill optimizations. > For example, filter push down, limit push down, count to direct scan > optimizations. > Hive code had to be refactored to use the same interfaces as > ParquestGroupScan in order to be exposed to such optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452706#comment-16452706 ] ASF GitHub Bot commented on DRILL-143: -- Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1239 Thanks for that pointer, @paul-rogers ! I'll make the relevant changes and add to this commit. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-143: --- Labels: doc-impacting (was: doc-impacting ready-to-commit) > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader
[ https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452707#comment-16452707 ] ASF GitHub Bot commented on DRILL-6331: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183909850 --- Diff: common/src/main/java/org/apache/drill/common/Stopwatch.java --- @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.common; + +import com.google.common.base.Ticker; + +import java.util.concurrent.TimeUnit; + +/** + * Helper that creates stopwatch based if debug level is enabled. --- End diff -- Do we really need this? In general we have (or should have) used Stopwatch to track metrics and or performance bottlenecks in production. In neither case do we want to enable debug. Also, for debugging performance issues (I see that the places you've changed to use this Stopwatch are places where we encountered performance issues), would it be better to use ``` Stopwatch timer; if(logger.isDebugEnabled()){ timer = Stopwatch.createStarted(); } ``` More verbose, but guaranteed to be optimized away by the JVM. Not insisting that we change this, BTW. > Parquet filter pushdown does not support the native hive reader > --- > > Key: DRILL-6331 > URL: https://issues.apache.org/jira/browse/DRILL-6331 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Affects Versions: 1.13.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > > Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the > core difference between them was > that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader > instead of HiveReader. > This allowed to read Hive parquet files using Drill native parquet reader but > did not expose Hive data to Drill optimizations. > For example, filter push down, limit push down, count to direct scan > optimizations. > Hive code had to be refactored to use the same interfaces as > ParquestGroupScan in order to be exposed to such optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Excluding io.dropwizard.metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452704#comment-16452704 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r184144853 --- Diff: pom.xml --- @@ -1333,6 +1353,12 @@ + --- End diff -- I am not sure why is it necessary to have `com.yammer.metrics` in dependencyManagement? > Excluding io.dropwizard.metrics dependencies > > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics-core in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Drill uses only 1 and 2. The last 3 one is used by Hive. > 1st one has different class full identifiers, but the 2 and 3 ones have the > same class full identifiers and maven doesn't know which library to use > ([https://github.com/dropwizard/metrics/issues/1044]). > But I found that 3 one library is used by Hive only for tests, therefore it > is not required for Drill and could be excluded from hive-metastore and > hive-exec. > The dependencies conflict is related not only to metrics-core, but to > metrics-servlets and metrics-json as well. > All these metrics should be organized with proper excluding and dependency > management blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6282) Excluding io.dropwizard.metrics dependencies
[ https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452697#comment-16452697 ] ASF GitHub Bot commented on DRILL-6282: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r184144021 --- Diff: logical/pom.xml --- @@ -85,14 +85,12 @@ - com.codahale.metrics + io.dropwizard.metrics --- End diff -- Is it used in logical? > Excluding io.dropwizard.metrics dependencies > > > Key: DRILL-6282 > URL: https://issues.apache.org/jira/browse/DRILL-6282 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > There are three types of metrics-core in Drill: > 1. _com.yammer.metrics_, > 2. _com.codahale.metrics_, > 3. _io.dropwizard.metrics_ > Drill uses only 1 and 2. The last 3 one is used by Hive. > 1st one has different class full identifiers, but the 2 and 3 ones have the > same class full identifiers and maven doesn't know which library to use > ([https://github.com/dropwizard/metrics/issues/1044]). > But I found that 3 one library is used by Hive only for tests, therefore it > is not required for Drill and could be excluded from hive-metastore and > hive-exec. > The dependencies conflict is related not only to metrics-core, but to > metrics-servlets and metrics-json as well. > All these metrics should be organized with proper excluding and dependency > management blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-143) Support CGROUPs resource management
[ https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452696#comment-16452696 ] ASF GitHub Bot commented on DRILL-143: -- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1239 @kkhatua, it turns out that upstream YARN has long had effective cgroup support per container. ( have the pleasure of sitting near the guy who maintains that work.)There has long been a discussion about whether the MapR version of YARN picked up those changes, we believe that MapR does *not* support this upstream work. As a result, under Apache YARN, the YARN NM itself will impose cgroup controls and Drill need not do it itself. For MapR YARN (only) Drill (and all other YARN apps) must do their own cgroup control. Please make sure that this feature is off by default to allow YARN to do the work. Only enable it for versions of YARN (such as MapR) which do not provide cgroup control in YARN itself. > Support CGROUPs resource management > --- > > Key: DRILL-143 > URL: https://issues.apache.org/jira/browse/DRILL-143 > Project: Apache Drill > Issue Type: New Feature >Reporter: Jacques Nadeau >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill > > > For the purpose of playing nice on clusters that don't have YARN, we should > write up configuration and scripts to allows users to run Drill next to > existing workloads without sharing resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452670#comment-16452670 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184117938 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -182,13 +184,18 @@ public IterOutcome next() { return IterOutcome.OUT_OF_MEMORY; } + // Transfer the ownership of this raw-batch to this operator for proper memory statistics reporting + batch = batch.transferBodyOwnership(oContext.getAllocator()); + final RecordBatchDef rbd = batch.getHeader().getDef(); final boolean schemaChanged = batchLoader.load(rbd, batch.getBody()); // TODO: Clean: DRILL-2933: That load(...) no longer throws // SchemaChangeException, so check/clean catch clause below. stats.addLongStat(Metric.BYTES_RECEIVED, batch.getByteCount()); batch.release(); + batch = null; --- End diff -- Not sure what you mean but this is the goal of the current code: - After a batch is properly set, we need to decrease the ref count by one by the end of the next() method - If an exception happens before the release call, then the finally block will be able to release the batch since it will be different than null - Otherwise, the release will be performed and the batch set to null which will disable the release within the finally block > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452671#comment-16452671 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184138876 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,46 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { +if (body == null) { + return this; // NOOP +} + +if (!body.getLedger().isOwningLedger() + || body.getLedger().isOwner(targetAllocator)) { + + return this; +} + +int writerIndex = body.writerIndex(); +TransferResult transferResult = body.transferOwnership(targetAllocator); + +// Set the index and increment reference count +transferResult.buffer.writerIndex(writerIndex); + +// Clear the current Drillbuffer since caller will perform release() on the new one +body.release(); + +return new RawFragmentBatch(getHeader(), transferResult.buffer, getSender(), false); --- End diff -- That was my original code which failed miserably: - The RPC code has references on the DrillBuf and the RawFragmentBatch - This means, we need to ensure that a release call is routed to the right DrillBuf (otherwise, the reference count logic stops working) - Creating a new RawFragmentBatch instance essentially provided just that (proper reference count accounting) > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452672#comment-16452672 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184119110 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,46 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { +if (body == null) { + return this; // NOOP +} + +if (!body.getLedger().isOwningLedger() + || body.getLedger().isOwner(targetAllocator)) { + + return this; +} + +int writerIndex = body.writerIndex(); +TransferResult transferResult = body.transferOwnership(targetAllocator); --- End diff -- yes, it is. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452668#comment-16452668 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184112400 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws IOException { public IterOutcome next() { batchLoader.resetRecordCount(); stats.startProcessing(); + +RawFragmentBatch batch = null; --- End diff -- The release logic is a NOOP if the body is null; the while loop is to guard against an empty batch. > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452673#comment-16452673 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184140733 --- Diff: exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java --- @@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger target) { target.historicalLog.recordEvent("incoming(from %s)", owningLedger.allocator.name); } -boolean overlimit = target.allocator.forceAllocate(size); +// Release first to handle the case where the current and target allocators were part of the same +// parent / child tree. allocator.releaseBytes(size); +boolean allocationFit = target.allocator.forceAllocate(size); --- End diff -- What if a runtime exception is thrown during forceAllocate(...)? > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452669#comment-16452669 ] ASF GitHub Bot commented on DRILL-6348: --- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r184114148 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws IOException { public IterOutcome next() { batchLoader.resetRecordCount(); stats.startProcessing(); + +RawFragmentBatch batch = null; try{ - RawFragmentBatch batch; + --- End diff -- Can you clarify your ask? > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452483#comment-16452483 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184061116 --- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java --- @@ -20,12 +20,17 @@ import java.io.IOException; import java.net.URL; +import com.google.common.base.Function; --- End diff -- I tested it manually. I considered 2 options: 1. UDF has an input old decimal type. In this case, function resolver adds cast from vardecimal to old decimal type which is used in UDF. 2. UDF returns the old decimal type. In this case, Drill casts the result of UDF to vardecimal. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452488#comment-16452488 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184062425 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetFixedWidthDictionaryReaders.java --- @@ -248,27 +227,61 @@ protected void readField(long recordsToReadInThisPass) { } } - static class DictionaryDecimal18Reader extends FixedByteAlignedReader { -DictionaryDecimal18Reader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, - ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, Decimal18Vector v, - SchemaElement schemaElement) throws ExecutionSetupException { + static class DictionaryVarDecimalReader extends FixedByteAlignedReader { + +DictionaryVarDecimalReader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, +ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, VarDecimalVector v, +SchemaElement schemaElement) throws ExecutionSetupException { super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); } // this method is called by its superclass during a read loop @Override protected void readField(long recordsToReadInThisPass) { + recordsReadInThisIteration = + Math.min(pageReader.currentPageCount - pageReader.valuesRead, + recordsToReadInThisPass - valuesReadInCurrentPass); + + switch (columnDescriptor.getType()) { +case INT32: + if (usingDictionary) { +for (int i = 0; i < recordsReadInThisIteration; i++) { + byte[] bytes = Ints.toByteArray(pageReader.dictionaryValueReader.readInteger()); + setValueBytes(i, bytes); +} +setWriteIndex(); + } else { +super.readField(recordsToReadInThisPass); + } + break; +case INT64: + if (usingDictionary) { +for (int i = 0; i < recordsReadInThisIteration; i++) { + byte[] bytes = Longs.toByteArray(pageReader.dictionaryValueReader.readLong()); + setValueBytes(i, bytes); +} +setWriteIndex(); + } else { +super.readField(recordsToReadInThisPass); + } + break; + } +} - recordsReadInThisIteration = Math.min(pageReader.currentPageCount -- pageReader.valuesRead, recordsToReadInThisPass - valuesReadInCurrentPass); +/** + * Set the write Index. The next page that gets read might be a page that does not use dictionary encoding + * and we will go into the else condition below. The readField method of the parent class requires the + * writer index to be set correctly. + */ +private void setWriteIndex() { + readLengthInBits = recordsReadInThisIteration * dataTypeLengthInBits; + readLength = (int) Math.ceil(readLengthInBits / 8.0); --- End diff -- This is the number of bits in a byte, but a double value is used to avoid integer division. Thanks for pointing this, replaced by constant. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452476#comment-16452476 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184009513 --- Diff: exec/vector/src/main/codegen/templates/ValueHolders.java --- @@ -32,99 +32,81 @@ /* * This class is generated using freemarker and the ${.template_name} template. */ -public final class ${className} implements ValueHolder{ +public final class ${className} implements ValueHolder { public static final MajorType TYPE = Types.${mode.name?lower_case}(MinorType.${minor.class?upper_case}); -<#if mode.name == "Repeated"> + <#if mode.name == "Repeated"> -/** The first index (inclusive) into the Vector. **/ -public int start; + /** The first index (inclusive) into the Vector. **/ + public int start; -/** The last index (exclusive) into the Vector. **/ -public int end; + /** The last index (exclusive) into the Vector. **/ + public int end; -/** The Vector holding the actual values. **/ -public ${minor.class}Vector vector; + /** The Vector holding the actual values. **/ + public ${minor.class}Vector vector; -<#else> -public static final int WIDTH = ${type.width}; + <#else> + public static final int WIDTH = ${type.width}; -<#if mode.name == "Optional">public int isSet; -<#assign fields = minor.fields!type.fields /> -<#list fields as field> -public ${field.type} ${field.name}; - + <#if mode.name == "Optional">public int isSet; + <#assign fields = minor.fields!type.fields /> + <#list fields as field> + public ${field.type} ${field.name}; + -<#if minor.class.startsWith("Decimal")> -public static final int maxPrecision = ${minor.maxPrecisionDigits}; -<#if minor.class.startsWith("Decimal28") || minor.class.startsWith("Decimal38")> -public static final int nDecimalDigits = ${minor.nDecimalDigits}; + <#if minor.class.startsWith("Decimal")> + public static final int maxPrecision = ${minor.maxPrecisionDigits}; + <#if minor.class.startsWith("Decimal28") || minor.class.startsWith("Decimal38")> --- End diff -- Thanks, good idea, marked as deprecated and added a comment to use VarDecimalHolder instead. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452478#comment-16452478 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184012035 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java --- @@ -409,4 +409,30 @@ public void testNLJoinCorrectnessRightMultipleBatches() throws Exception { setSessionOption(ExecConstants.SLICE_TARGET, 10); } } + + @Test + public void testNlJoinWithStringsInCondition() throws Exception { +try { + test(DISABLE_NLJ_SCALAR); + test(DISABLE_JOIN_OPTIMIZATION); + + final String query = --- End diff -- Before my changes similar query but with decimal literal in condition passed because it was handled as double. But since with my changes decimal literal is handled as the decimal, it is discovered a bug in nested loop join which is observed for the types that use drillbuf to shore their value. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452473#comment-16452473 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184010194 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestVarlenDecimal.java --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import org.apache.commons.io.FileUtils; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.test.BaseTestQuery; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.hamcrest.CoreMatchers; +import org.junit.Assert; +import org.junit.Test; + +import java.math.BigDecimal; +import java.nio.file.Paths; + +public class TestVarlenDecimal extends BaseTestQuery { + + private static final String DATAFILE = "cp.`parquet/varlenDecimal.parquet`"; + + @Test + public void testNullCount() throws Exception { +String query = String.format("select count(*) as c from %s where department_id is null", DATAFILE); --- End diff -- Agree, removed `String.format` here and in other places. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452466#comment-16452466 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184002812 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/RangeExprEvaluator.java --- @@ -219,6 +219,7 @@ private Statistics evalCastFunc(FunctionHolderExpression holderExpr, Statistics return null; // cast func between srcType and destType is NOT allowed. } + // TODO: add decimal support --- End diff -- Thanks, removed. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452484#comment-16452484 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184035111 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestVarDecimalFunctions.java --- @@ -0,0 +1,911 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.fn.impl; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.test.BaseTestQuery; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.math.RoundingMode; + +@Category(SqlFunctionTest.class) +public class TestVarDecimalFunctions extends BaseTestQuery { + + // Tests for math functions + + @Test + public void testDecimalAdd() throws Exception { +String query = +"select\n" + +// checks trimming of scale +"cast('999.92345678912' as DECIMAL(38, 11))\n" + +"+ cast('0.32345678912345678912345678912345678912' as DECIMAL(38, 38)) as s1,\n" + +// sanitary checks +"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 2))\n" + +"+ cast('123456789123456789123456789123456.789' as DECIMAL(36, 3)) as s2,\n" + +"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 2))\n" + +"+ cast('0' as DECIMAL(36, 3)) as s3,\n" + +"cast('15.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 2)) as s4,\n" + +"cast('11.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 2)) as s5,\n" + +"cast('0' as DECIMAL(36, 2)) - cast('12.93' as DECIMAL(36, 2)) as s6,\n" + +// check trimming (negative scale) +"cast('2345678912' as DECIMAL(38, 0))\n" + +"+ cast('32345678912345678912345678912345678912' as DECIMAL(38, 0)) as s7"; +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + testBuilder() +.sqlQuery(query) +.ordered() +.baselineColumns("s1", "s2", "s3", "s4", "s5", "s6", "s7") +.baselineValues( +new BigDecimal("999.92345678912") +.add(new BigDecimal("0.32345678912345678912345678912345678912")) +.round(new MathContext(38, RoundingMode.HALF_UP)), +new BigDecimal("1358024680358024680358024680358024.679"), +new BigDecimal("1234567891234567891234567891234567.890"), +new BigDecimal("2.09"), new BigDecimal("-1.91"), new BigDecimal("-12.93"), +new BigDecimal("1.3234567891234567891234567890469135782E+38")) +.build() --- End diff -- Thanks, replaced. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452482#comment-16452482 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184051298 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestFixedlenDecimal.java --- @@ -20,61 +20,74 @@ import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.exec.planner.physical.PlannerSettings; -import org.junit.BeforeClass; import org.junit.Test; import org.junit.experimental.categories.Category; @Category({UnlikelyTest.class}) public class TestFixedlenDecimal extends BaseTestQuery { - // enable decimal data type - @BeforeClass - public static void enableDecimalDataType() throws Exception { -test(String.format("alter session set `%s` = true", PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY)); - } - private static final String DATAFILE = "cp.`parquet/fixedlenDecimal.parquet`"; @Test public void testNullCount() throws Exception { -testBuilder() -.sqlQuery("select count(*) as c from %s where department_id is null", DATAFILE) -.unOrdered() -.baselineColumns("c") -.baselineValues(1L) -.build() -.run(); +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + testBuilder() + .sqlQuery("select count(*) as c from %s where department_id is null", DATAFILE) + .unOrdered() + .baselineColumns("c") + .baselineValues(1L) + .build() + .run(); +} finally { + resetSessionOption(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY); +} } @Test public void testNotNullCount() throws Exception { -testBuilder() -.sqlQuery("select count(*) as c from %s where department_id is not null", DATAFILE) -.unOrdered() -.baselineColumns("c") -.baselineValues(106L) -.build() -.run(); +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + testBuilder() + .sqlQuery("select count(*) as c from %s where department_id is not null", DATAFILE) + .unOrdered() + .baselineColumns("c") + .baselineValues(106L) + .build() --- End diff -- Thanks, replaced here and in other places. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452475#comment-16452475 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184031285 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/TypeInferenceUtils.java --- @@ -382,13 +407,26 @@ public RelDataType inferReturnType(SqlOperatorBinding opBinding) { final RelDataType operandType = opBinding.getOperandType(0); final TypeProtos.MinorType inputMinorType = getDrillTypeFromCalciteType(operandType); - if(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.BIGINT)) + if (TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.BIGINT)) == TypeProtos.MinorType.BIGINT) { return createCalciteTypeWithNullability( factory, SqlTypeName.BIGINT, isNullable); - } else if(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.FLOAT8)) + } else if (TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.FLOAT4)) --- End diff -- Thanks, added an explanation for sum and avg return type inferences. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452465#comment-16452465 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183998600 --- Diff: contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java --- @@ -225,10 +247,10 @@ public int next() { int counter = 0; Boolean b = true; try { - while (counter < 4095 && b == true) { // loop at 4095 since nullables use one more than record count and we + while (counter < 4095 && b) { // loop at 4095 since nullables use one more than record count and we // allocate on powers of two. b = resultSet.next(); -if(b == false) { +if(!b) { --- End diff -- Thanks, done. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452487#comment-16452487 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184002128 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/output/DecimalReturnTypeInference.java --- @@ -281,20 +295,45 @@ @Override public TypeProtos.MajorType getType(List logicalExpressions, FunctionAttributes attributes) { int scale = 0; - int precision = 0; // Get the max scale and precision from the inputs for (LogicalExpression e : logicalExpressions) { scale = Math.max(scale, e.getMajorType().getScale()); -precision = Math.max(precision, e.getMajorType().getPrecision()); } - return (TypeProtos.MajorType.newBuilder() - .setMinorType(attributes.getReturnValue().getType().getMinorType()) + return TypeProtos.MajorType.newBuilder() + .setMinorType(TypeProtos.MinorType.VARDECIMAL) .setScale(scale) - .setPrecision(38) - .setMode(TypeProtos.DataMode.REQUIRED) - .build()); + .setPrecision(DRILL_REL_DATATYPE_SYSTEM.getMaxNumericPrecision()) + .setMode(TypeProtos.DataMode.OPTIONAL) + .build(); +} + } + + /** + * Return type calculation implementation for functions with return type set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.ReturnType#DECIMAL_AVG_AGGREGATE}. + */ + public static class DecimalAvgAggReturnTypeInference implements ReturnTypeInference { --- End diff -- Thanks, added. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452486#comment-16452486 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184099659 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java --- @@ -559,6 +560,19 @@ public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability) if (matchNullability) { return makeAbstractCast(type, exp); } + // for the case when BigDecimal literal has a scale or precision + // that differs from the value from specified RelDataType, cast cannot be removed + // TODO: remove this code when CALCITE-1468 is fixed + if (type.getSqlTypeName() == SqlTypeName.DECIMAL && exp instanceof RexLiteral) { --- End diff -- Created DRILL-6355. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452485#comment-16452485 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184065631 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/FrameSupportTemplate.java --- @@ -300,7 +300,7 @@ public void cleanup() { * @param index of row to aggregate */ public abstract void evaluatePeer(@Named("index") int index); - public abstract void setupEvaluatePeer(@Named("incoming") VectorAccessible incoming, @Named("outgoing") VectorAccessible outgoing) throws SchemaChangeException; + public abstract void setupEvaluatePeer(@Named("incoming") WindowDataBatch incoming, @Named("outgoing") VectorAccessible outgoing) throws SchemaChangeException; --- End diff -- On the earlier stage of making changes, compilation error has appeared for runtime-generated code without this change, but for now, I don't see it without this change, so I reverted it. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452469#comment-16452469 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184008881 --- Diff: exec/vector/src/main/codegen/templates/AbstractPromotableFieldWriter.java --- @@ -75,12 +75,19 @@ public void endList() { <#list vv.types as type><#list type.minor as minor><#assign name = minor.class?cap_first /> <#assign fields = minor.fields!type.fields /> - <#if !minor.class?starts_with("Decimal") > + <#if minor.class?contains("VarDecimal")> + @Override + public void write${minor.class}(BigDecimal value) { +getWriter(MinorType.${name?upper_case}).write${minor.class}(value); + } + + @Override public void write(${name}Holder holder) { getWriter(MinorType.${name?upper_case}).write(holder); } + <#if !minor.class?contains("Decimal") || minor.class?contains("VarDecimal")> --- End diff -- Thanks, added. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452468#comment-16452468 ] ASF GitHub Bot commented on DRILL-6094: --- Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r184008595 --- Diff: exec/vector/src/main/codegen/templates/AbstractFieldReader.java --- @@ -29,9 +29,9 @@ * This class is generated using freemarker and the ${.template_name} template. */ @SuppressWarnings("unused") -abstract class AbstractFieldReader extends AbstractBaseReader implements FieldReader{ +abstract class AbstractFieldReader extends AbstractBaseReader implements FieldReader { - AbstractFieldReader(){ + AbstractFieldReader() { --- End diff -- Done. > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)