date:20180425

[jira] [Updated] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int

2018-04-25 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-6359:
--
Affects Version/s: 1.14.0

> All-text mode in JSON still reads missing column as Nullable Int
> 
>
> Key: DRILL-6359
> URL: https://issues.apache.org/jira/browse/DRILL-6359
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Paul Rogers
>Priority: Major
>
> Suppose we have the following file:
> {noformat}
> {a: 0}
> {a: 1}
> ...
> {a: 70001, b: 10.5}
> {noformat}
> Where the "..." indicates another 70K records. (Chosen to force the 
> appearance of {{b}} into a second or later batch.)
> Suppose we execute the following query:
> {code}
> ALTER SESSION SET `store.json.all_text_mode` = true;
> SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a;
> {code}
> The query should work. We have an explicit project for column {{b}} and we've 
> told JSON to always use text. So, JSON should have enough information to 
> create column {{b}} as {{Nullable VarChar}}.
> Yet, the result of the query in {{sqlline}} is:
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
> Sort. Please enable Union type.
> Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (INT:OPTIONAL)]], selectionVector=NONE]
> Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (VARCHAR:OPTIONAL)]], selectionVector=NONE]
> {noformat}
> The expected result is that the query works because even missing columns 
> should be subject to the "all text mode" setting because the JSON reader 
> handles projection push-down, and is responsible for filling in the missing 
> columns.
> This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in 
> the "batch size handling" JSON reader rewrite, but I've not tested it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int

2018-04-25 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453573#comment-16453573
 ] 

Khurram Faraaz edited comment on DRILL-6359 at 4/26/18 6:40 AM:


[~paul-rogers]

On Apache Drill 1.14.0-SNAPSHOT, commit id: 
931b43ec54bf47dcbb4aa9ae4499f37a8f21b408

we see the same error message
{noformat}
[root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin
[root@qa102-45 bin]# ./sqlline -u 
"jdbc:drill:schema=dfs.tmp;drillbit="
apache drill 1.14.0-SNAPSHOT
"the only truly happy people are children, the creative minority and drill 
users"

0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT 
NULL ORDER BY a;
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]
Fragment 0:0

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}
Data looks like this, and there are over 70K rows in the JSON data file, 
generated.json
{noformat}
{
 "a": "5ae16f29fb675a7ed96bb532"
 }
 {
 "a": "5ae16f29fb675a7ed96bb532"
 }
 ...
 { "a": "5ae16f2906e007f3dcbaa714" }
 { "a": "5ae16f2906e007f3dcbaa714" }
 { "a": "5ae16f29972377bc859056c8","b":"10.5"}
{noformat}
Stack trace from drillbit.log for the above failure
{noformat}
2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN 
o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, 
returning null instance.
2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO 
o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not 
supported in External Sort. Please enable Union type.
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: 
Schema changes not supported in External Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.jav

[jira] [Commented] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int

2018-04-25 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453573#comment-16453573
 ] 

Khurram Faraaz commented on DRILL-6359:
---

[~paul-rogers]

On Apache Drill 1.14.0-SNAPSHOT, commit id: 
931b43ec54bf47dcbb4aa9ae4499f37a8f21b408

we see the same error message

{noformat}
[root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin
[root@qa102-45 bin]# ./sqlline -u 
"jdbc:drill:schema=dfs.tmp;drillbit="
apache drill 1.14.0-SNAPSHOT
"the only truly happy people are children, the creative minority and drill 
users"

0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT 
NULL ORDER BY a;
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]
Fragment 0:0

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}

Data looks like this, and there are over 70K rows in the JSON data file, 
generated.json

{noformat}
{
 "a": "5ae16f29fb675a7ed96bb532"
 }
 {
 "a": "5ae16f29fb675a7ed96bb532"
 }
 ...
 \{ "a": "5ae16f2906e007f3dcbaa714" }
 \{ "a": "5ae16f2906e007f3dcbaa714" }
 \{ "a": "5ae16f29972377bc859056c8","b":"10.5"}
{noformat}


Stack trace from drillbit.log for the above failure

{noformat}
2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN 
o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, 
returning null instance.
2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO 
o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not 
supported in External Sort. Please enable Union type.
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: 
Schema changes not supported in External Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
[drill-java-exec-1.14.0-SNAPSHOT

[jira] [Commented] (DRILL-4824) Null maps / lists and non-provided state support for JSON fields. Numeric types promotion.

2018-04-25 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453560#comment-16453560
 ] 

Paul Rogers commented on DRILL-4824:


This ticket has a long history of complexity. Just discovered another one. It 
appears that the current null handling has been optimized to make results 
appear nicely in {{sqlline}}.

Consider this simple file:

{noformat}
{a: {b: 10}}
{a: {c: "foo"}}
{noformat}

According to our existing rules, the missing columns are stored as null values. 
Then, JSON omits nulls from its output. Why? So, it seems, {{sqlline}} can 
display the following:

{noformat}
+--+
|  a   |
+--+
| {"b":10} |
| {"c":"foo"}  |
+--+
{noformat}

In JDBC, the {{getObject()}} method on the Map vector creates a JSON object. 
That code probably omits null values. Why? So that with {{sqlline}} calls 
{{toString()}} on the JSON object, it gets the nice display above.

Probably this is not how {{sqlline}} should format its output: our JSON 
internals should not be dictated by how we do {{toString()}} in {{sqlline}}. 
But, there you have it anyway.

> Null maps / lists and non-provided state support for JSON fields. Numeric 
> types promotion.
> --
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Roman Kulyk
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---+
> |  Field1   |
> +---+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--+
> | Field1   |
> +--+
> |{} 
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453544#comment-16453544
 ] 

ASF GitHub Bot commented on DRILL-6242:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1184
  
One additional note. We noted that JDBC does not support the idea of a 
nested tuple (a Drill "map".) JDBC does support columns that return a Java 
object. To bridge the gap, Drill returns a Map column as a Java object.

But, why a JSON object? The answer seems to lie with the `sqline` program. 
If we query a one-line JSON file with a nested object in `sqlline`, we get the 
following display:

```
SELECT * FROM `json/nested.json`;
+-+--+
| custId  |   name   |
+-+--+
| 101 | {"first":"John","last":"Smith"}  |
+-+--+
```

So, it seems likely that the the value of a Map object was translated to a 
JSON object so that when `sqlline` calls `toString()` on it, it ends up 
formatted nicely as above.

Because of this, it may be hard to change the kind of objects returned from 
JDBC for a Map column.


> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Assignee: Jiang Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int

2018-04-25 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-6359:
--

 Summary: All-text mode in JSON still reads missing column as 
Nullable Int
 Key: DRILL-6359
 URL: https://issues.apache.org/jira/browse/DRILL-6359
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Paul Rogers


Suppose we have the following file:

{noformat}
{a: 0}
{a: 1}
...
{a: 70001, b: 10.5}
{noformat}

Where the "..." indicates another 70K records. (Chosen to force the appearance 
of {{b}} into a second or later batch.)

Suppose we execute the following query:

{code}
ALTER SESSION SET `store.json.all_text_mode` = true;
SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a;
{code}

The query should work. We have an explicit project for column {{b}} and we've 
told JSON to always use text. So, JSON should have enough information to create 
column {{b}} as {{Nullable VarChar}}.

Yet, the result of the query in {{sqlline}} is:

{noformat}
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]
{noformat}

The expected result is that the query works because even missing columns should 
be subject to the "all text mode" setting because the JSON reader handles 
projection push-down, and is responsible for filling in the missing columns.

This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in 
the "batch size handling" JSON reader rewrite, but I've not tested it.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453501#comment-16453501
 ] 

ASF GitHub Bot commented on DRILL-6242:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1184
  
Sorry, coming late. There seem to be two problems. The original "nested 
column" issue is an artifact of the JDBC driver. In Drill, a Map (the thing 
that contains your nested column) is just a nested tuple. But, JDBC does not 
have the idea of a nested field, there is no way to ask, for, say "myMap.ts". 
All you can ask for is "myMap" if it is a Map. The Drill JDBC driver has to 
invent a value to return. It invents a Java map.

For JDBC, which does not support nested fields, you can project your field 
up to the top level just by naming it in the select clause:

```
SELECT `context`.`date` as `context_date` ...
```

The second problem that this PR seems to address is how dates are stored. 
Many tests have been changed to double-down on Drill's original sin: that 
generic dates (2015-04-15, say) are represented a a timestamp in UTC. But, if 
April 15 is your birthday, it is your birthday in all timezones. We don't say 
your birthday (or order date, or newspaper issue date or...) is one day in, say 
London and another day in Los Angeles.

Drill should have a "Date" type that is not associated with a timezone. 
But, we don't so we get tied up in the "treat the local time zone as if it were 
UTC" issue.

The original set of Drill types did have types to handle these, but they 
didn't quite make it into the final version. Also, this issue has been 
discussed (with some vigor) on the mailing list once or twice.

One flaw, that you seem to have spotted, is that if I read data on a server 
in one TZ, then display it on a client in another TZ, the dates and times are 
all messed up. The server reads dates in local TZ, then simply stores the ms 
value in a date. The client thinks that date is UTC and adjusts it to its own 
local TZ. All heck breaks loose. (Or something like that; the original test 
case was over a year ago and I may have messed up the details...)

The ideal set of date/time types:

* Date: A date in an unspecified time zone, such as your birthday.
* Time: A time relative to midnight in an (unknown) Date, no association 
with a TZ. For example, "we have lunch at 11:30 AM" applies regardless of TZ.
* Timestamp: an absolute time relative to UTC.

Then, functions can convert back and forth. Joda (and, in Java 8, the new 
Date/time classes) work this way.

This means that the many tests that were modified to turn a generic date 
into a timestamp are simply making the problem worse: we are saying that 
2015-04-15 is midnight, April 15 in London, which is highly unexpected.

Bottom line: we've got two very difficult issues here: how to handle maps 
in JDBC and how to fix Drill's date/time types.


> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Assignee: Jiang Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":tr

[jira] [Commented] (DRILL-5927) Root allocator consistently Leaks a buffer in unit tests

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453478#comment-16453478
 ] 

ASF GitHub Bot commented on DRILL-5927:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1234
  
@parthchandra Please let me know if you have any comments.


> Root allocator consistently Leaks a buffer in unit tests
> 
>
> Key: DRILL-5927
> URL: https://issues.apache.org/jira/browse/DRILL-5927
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> TestBsonRecordReader consistently poduces this exception when running on my 
> laptop
> {code}
> 13:09:15.777 [main] ERROR o.a.d.exec.server.BootStrapContext - Error while 
> closing
> java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding 
> buffers allocated (1).
> Allocator(ROOT) 0/1024/10113536/4294967296 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 1
> ledger[79] allocator: ROOT), isOwning: true, size: 1024, references: 1, 
> life: 340912804170064..0, allocatorManager: [71, life: 340912803759189..0] 
> holds 1 buffers. 
> DrillBuf[106], udle: [72 0..1024]
>   reservations: 0
>   at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:502) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> [classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> [classes/:na]
>   at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:256)
>  ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> [classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> [classes/:na]
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:205) 
> [classes/:na]
>   at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:315) 
> [test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_144]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_144]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_144]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  [junit-4.11.jar:na]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.11.jar:na]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  [junit-4.11.jar:na]
>   at 
> mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:44)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:29)
>  [jmockit-1.3.jar:na]
>   at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_144]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
>   at 
> mockit.internal.util.MethodReflection.invokeWithCheckedThrows(MethodReflection.java:95)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.internal.annotations.MockMethodBridge.callMock(MockMethodBridge.java:76)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.internal.annotations.MockMethodBridge.invoke(MockMethodBridge.java:41) 
> [jmockit-1.3.jar:na]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java)
>  [junit-4.11.jar:na]
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) 
> [junit-4.11.jar:na]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309) 
> [junit-4.11.jar:na]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160) 
> [junit-4.11.jar:na]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453462#comment-16453462
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184264961
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
*/
   private int netRowWidth;
   private int netRowWidthCap50;
+
+  /**
+   * actual row size if input is not empty. Otherwise, standard size.
+   */
+  private int rowAllocSize;
--- End diff --

I see. In this case, however, arrays (repeated values) will be empty. If we 
have 10 such rows, there is no reason to have 50 "inner" values. Also, for 
VarChar, no values will be stored; all columns will be null. (If we are 
handling non-null columns, then the non-null VarChar will be an empty string.)

So, we probably need a bit of a special case: prepare data for a run of 
null rows (with arrays and VarChar of length 0) vs. take our best guess with no 
knowledge at all about lengths (which may be non-empty.)

Probably not a huge issue if you only need to handle a single row. But, 
creating a batch with only one row will cause all kinds of performance issues 
downstream. (I found that out the hard way when a but in sort produced a series 
of one-row batches...)


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6358) Null value returned from WHERE a IS NOT NULL query

2018-04-25 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-6358:
--

 Summary: Null value returned from WHERE a IS NOT NULL query
 Key: DRILL-6358
 URL: https://issues.apache.org/jira/browse/DRILL-6358
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Paul Rogers


Consider the following input file:

{noformat}
{a: null}
{a: 10}
{noformat}

Then run the following. The result is as expected:

{noformat}
SELECT * FROM `json/null-int.json` WHERE a IS NOT NULL;
+-+
|  a  |
+-+
| 10  |
+-+
{noformat}

Now, do something similar. Create a file that repeats the first record 70,000 
times. Call it {{70Knulls.json}}. Run the same query. The results are bizarre:

{noformat}
SELECT * FROM `gen/70Knulls.json` WHERE a IS NOT NULL;
+---+
|  **   |
+---+
| null  |
+---+
{noformat}

The column name of "\*\*" I probably due to DRILL-6357. The query may be 
treating a as a missing column, and fill in the bogus "\*\*" field.

So, there is no column {{a}} in that theory. So, no rows should have matched. 
But, then. column {{a}} finally appeared. Why does it have a null value, not 
10? And, why did the null value pass the {{IS NOT NULL}} test?

Anyway, expected the same output as the two-line case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453409#comment-16453409
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184258865
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
*/
   private int netRowWidth;
   private int netRowWidthCap50;
+
+  /**
+   * actual row size if input is not empty. Otherwise, standard size.
+   */
+  private int rowAllocSize;
--- End diff --

actually, this is a problem only for lateral join. In lateral join, right 
side will work on one row at a time from left side. If right side produces zero 
rows because of empty array or some other reason, for left outer join, we still 
have to finish cross join for that row from left side by having nulls for right 
side columns. Then, we go to next row on left side, continuing to work on 
filling the output batch till it is full.  What that means is we have to 
allocate vectors based on that first batch on right side, which can be empty.


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6357) Unexpected column "**" when reading a JSON file

2018-04-25 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-6357:
--

 Summary: Unexpected column "**" when reading a JSON file
 Key: DRILL-6357
 URL: https://issues.apache.org/jira/browse/DRILL-6357
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Paul Rogers


Consider the following JSON file, {{all-null.json}}:

{noformat}
{a: null}
{a: null}
{noformat}

Do a simple query:

{code}
SELECT * FROM `json/all-null.json`;
{code}

The column name is unexpectedly reported as "**":

{noformat}
+---+
|  **   |
+---+
| null  |
| null  |
+---+
{noformat}

However, change the second value to "foo" and the results are as expected:

{noformat}
+---+
|   a   |
+---+
| null  |
| foo   |
+---+
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453307#comment-16453307
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1238
  
I had the same question as Arina about the need for this change and I got 
info from DRILL-5908 about the problem cause and how this change will help fix 
it. Can you elucidate?


> Refactor TimedRunnable
> --
>
> Key: DRILL-6281
> URL: https://issues.apache.org/jira/browse/DRILL-6281
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453305#comment-16453305
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184244877
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
*/
   private int netRowWidth;
   private int netRowWidthCap50;
+
+  /**
+   * actual row size if input is not empty. Otherwise, standard size.
+   */
+  private int rowAllocSize;
--- End diff --

Unless I'm missing something, we can't move forward on a join if one side 
is empty: we won't know if we have the rows we need. Consider a merge join 
(simplest). The left gets some data, but the right is empty. We can't proceed 
unless the right hit EOF. Otherwise, we don't know if we have a match or not 
for the first left row.

We need to read another right batch and keep going until we either hit EOF 
(no matching rows) or get some data.

Once we have some data, we can go row-by-row to see if we have a left-only, 
right-only, or matching set of rows. If we get to EOF on either side, we know 
that their are no matches for the other side.

What we do in the no-match case depends on whether we are doing LEFT OUTER, 
RIGHT OUTER or an INNER join.

The point is, we can't make progress until we get that non-empty right 
batch (in this example). So, no reason to allocate space based on an empty 
batch (unless the entire input is empty) because we'll need to find a non-empty 
(or EOF) batch anyway.


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453304#comment-16453304
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184244479
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -277,18 +286,29 @@ public boolean isRepeatedList() {
 /**
  * This is the average per entry width, used for vector allocation.
  */
-public int getEntryWidth() {
+private int getEntryWidthForAlloc() {
   int width = 0;
   if (isVariableWidth) {
-width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
+width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH;
 
 // Subtract out the bits (is-set) vector width
-if (metadata.getDataMode() == DataMode.OPTIONAL) {
+if (isOptional) {
   width -= BIT_VECTOR_WIDTH;
 }
+
+if (isRepeated && getValueCount() == 0) {
+  return (safeDivide(width, STD_REPETITION_FACTOR));
+}
   }
 
-  return (safeDivide(width, cardinality));
+  return (safeDivide(width, getEntryCardinalityForAlloc()));
+}
+
+/**
+ * This is the average per entry cardinality, used for vector 
allocation.
+ */
+private float getEntryCardinalityForAlloc() {
+  return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 
1) :getCardinality();
--- End diff --

Makes sense, but why would a batch be empty unless that path hit EOF? 
Otherwise, the batch might be due to an empty input file. We'd just skip it and 
move to the next batch until we find one with data. Any reason the "get next 
batch" code can just loop to be "get next non-empty batch" instead? Otherwise, 
we can't really do any effective batch sizing as we have no data to go on...


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453295#comment-16453295
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1189
  
LGTM


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
>  1. _com.yammer.metrics_, 
>  2. _com.codahale.metrics_, 
>  3. _io.dropwizard.metrics_
>  Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
>  The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
>  The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
>  * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
> for Drill,
>  * to remove _com.yammer.metrics_ dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453292#comment-16453292
 ] 

ASF GitHub Bot commented on DRILL-6242:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r184243856
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Hmm. That takes us back to the original problem, that of the 
date|time|timestamp field inside a complex object. 
```
select t.context.date, t.context from test t;

will return a java.sql.Date object for column 1, but a java.time.LocalDate 
for the same object inside column 2. This doesn't seem like a good thing.
```
Why should that be a bad thing though? Ultimately, the object returned by 
getObject() is displayed to the end user thru the toString method. The string 
representation of Local[Date|Time|Timestamp]  should be the same as that of 
java.sql.[Date|Time|Timestamp]. Isn't it?



> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Assignee: Jiang Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453288#comment-16453288
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1189#discussion_r184243564
  
--- Diff: logical/pom.xml ---
@@ -85,14 +85,12 @@
 
 
 
-  com.codahale.metrics
+  io.dropwizard.metrics
--- End diff --

It is not a best practice to rely on a transitive dependency for a compile 
scope dependency. I'd recommend specifying the dependency on 
`io.dropwizard.metrics` explicitly in case `java-exec` or `drill-memory-base` 
uses it.


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
>  1. _com.yammer.metrics_, 
>  2. _com.codahale.metrics_, 
>  3. _io.dropwizard.metrics_
>  Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
>  The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
>  The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
>  * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
> for Drill,
>  * to remove _com.yammer.metrics_ dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453280#comment-16453280
 ] 

ASF GitHub Bot commented on DRILL-6202:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1144#discussion_r184242257
  
--- Diff: src/main/resources/checkstyle-config.xml ---
@@ -30,10 +30,15 @@
 
 
 
+
--- End diff --

So what do you tell the user when you get a runtime exception (any 
exception) that is the result of a bug? It is silly to show them a stack trace 
that does not help them. It is better to let the user know that there was an 
internal error and they should log a bug with support or look for help on the 
dev list.


> Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
> 
>
> Key: DRILL-6202
> URL: https://issues.apache.org/jira/browse/DRILL-6202
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> As bounds checking may be enabled or disabled, using 
> IndexOutOfBoundsException to resize vectors is unreliable. It works only when 
> bounds checking is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6356) batch sizing for union all

2018-04-25 Thread Padma Penumarthy (JIRA)

Padma Penumarthy created DRILL-6356:
---

 Summary: batch sizing for union all
 Key: DRILL-6356
 URL: https://issues.apache.org/jira/browse/DRILL-6356
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
 Fix For: 1.14.0


batch sizing changes for union all operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453245#comment-16453245
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184236170
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -277,18 +286,29 @@ public boolean isRepeatedList() {
 /**
  * This is the average per entry width, used for vector allocation.
  */
-public int getEntryWidth() {
+private int getEntryWidthForAlloc() {
   int width = 0;
   if (isVariableWidth) {
-width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
+width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH;
 
 // Subtract out the bits (is-set) vector width
-if (metadata.getDataMode() == DataMode.OPTIONAL) {
+if (isOptional) {
   width -= BIT_VECTOR_WIDTH;
 }
+
+if (isRepeated && getValueCount() == 0) {
+  return (safeDivide(width, STD_REPETITION_FACTOR));
--- End diff --

You are right. row count can be non-zero and valueCount can still be zero.  
Since this is intended for empty batch case, changed this to check for row 
count instead.


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453237#comment-16453237
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1228
  
@paul-rogers Paul, I addressed code review comments. Can you take a look 
when you get a chance ?


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-3855) Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule

2018-04-25 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-3855:
---
Reviewer: Aman Sinha  (was: Volodymyr Vysotskyi)

> Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule
> ---
>
> Key: DRILL-3855
> URL: https://issues.apache.org/jira/browse/DRILL-3855
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Since the infinite planning issues (Calcite Volcano Planner: Calcite-900) 
> reported in DRILL-3257, FilterSetOpTransposeRule, 
> DrillProjectSetOpTransposeRule were disabled. After it can be resolved in 
> Calcite, we have to enable these two rules to lift the performance. 
> In addition, will update the plan validation in Unit test in response of the 
> newly enabled rules. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-3855) Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule

2018-04-25 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-3855:
---
Labels: ready-to-commit  (was: )

> Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule
> ---
>
> Key: DRILL-3855
> URL: https://issues.apache.org/jira/browse/DRILL-3855
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Since the infinite planning issues (Calcite Volcano Planner: Calcite-900) 
> reported in DRILL-3257, FilterSetOpTransposeRule, 
> DrillProjectSetOpTransposeRule were disabled. After it can be resolved in 
> Calcite, we have to enable these two rules to lift the performance. 
> In addition, will update the plan validation in Unit test in response of the 
> newly enabled rules. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453194#comment-16453194
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1189#discussion_r184218887
  
--- Diff: pom.xml ---
@@ -1333,6 +1353,12 @@
   
 
   
+  
--- End diff --

I meant that it can be useful if some new dependencies will have older 
transitive `org.yummer.metrics` in future. 
But I do not mind, this isn't so important. Let's leave it without 
dependencyManagement control.


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
>  1. _com.yammer.metrics_, 
>  2. _com.codahale.metrics_, 
>  3. _io.dropwizard.metrics_
>  Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
>  The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
>  The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
>  * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
> for Drill,
>  * to remove _com.yammer.metrics_ dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453195#comment-16453195
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1189#discussion_r184223843
  
--- Diff: logical/pom.xml ---
@@ -85,14 +85,12 @@
 
 
 
-  com.codahale.metrics
+  io.dropwizard.metrics
--- End diff --

No, it is. Thanks for catching this. 
Also I have noticed that `java-exec` used `io.dropwizard.metrics` as 
transitive from `drill-common`.
For consistency I have removed  `io.dropwizard.metrics` from 
`drill-memory-base` (it can leverage metrics from `drill-common` too).


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
>  1. _com.yammer.metrics_, 
>  2. _com.codahale.metrics_, 
>  3. _io.dropwizard.metrics_
>  Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
>  The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
>  The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
>  * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
> for Drill,
>  * to remove _com.yammer.metrics_ dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453168#comment-16453168
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1227
  
@Ben-Zvi I manually added the PR link to the JIRA. all code review comments 
are addressed. can you look at the latest changes ?


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-04-25 Thread Padma Penumarthy (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453150#comment-16453150
 ] 

Padma Penumarthy commented on DRILL-6236:
-

[https://github.com/apache/drill/pull/1227]

 

> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453122#comment-16453122
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184200281
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -277,18 +286,29 @@ public boolean isRepeatedList() {
 /**
  * This is the average per entry width, used for vector allocation.
  */
-public int getEntryWidth() {
+private int getEntryWidthForAlloc() {
   int width = 0;
   if (isVariableWidth) {
-width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
+width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH;
 
 // Subtract out the bits (is-set) vector width
-if (metadata.getDataMode() == DataMode.OPTIONAL) {
+if (isOptional) {
   width -= BIT_VECTOR_WIDTH;
 }
+
+if (isRepeated && getValueCount() == 0) {
+  return (safeDivide(width, STD_REPETITION_FACTOR));
+}
   }
 
-  return (safeDivide(width, cardinality));
+  return (safeDivide(width, getEntryCardinalityForAlloc()));
+}
+
+/**
+ * This is the average per entry cardinality, used for vector 
allocation.
+ */
+private float getEntryCardinalityForAlloc() {
+  return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 
1) :getCardinality();
--- End diff --

This is for joins. We allocate vectors based on first batch sizing 
information and if that first batch is empty, then, we are allocating vectors 
with zero capacity. When we read the next batch with data, we will end up going 
through realloc loop as we write values. For ex., for outer left join, if right 
side batch is empty, we still have to include the right side columns as null in 
outgoing batch. With the new lateral join operator, if the input has an empty 
array as the first record in the unnest column, then also we see the problem. 


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453123#comment-16453123
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184202508
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
*/
   private int netRowWidth;
   private int netRowWidthCap50;
+
+  /**
+   * actual row size if input is not empty. Otherwise, standard size.
+   */
+  private int rowAllocSize;
--- End diff --

This is not just a problem with size estimation for vector memory 
allocation.  Let us say one side of join receives an empty batch as first 
batch.  If we use row width as 0 in outgoing row width calculation, number of 
rows (to include in the outgoing batch) we will calculate will be higher and 
later when we get a non empty batch, we might exceed the memory limits.


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453106#comment-16453106
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1237
  
@vrozov, I have implemented the provided suggestions.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453069#comment-16453069
 ] 

ASF GitHub Bot commented on DRILL-6331:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r184199682
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
@@ -158,25 +159,26 @@ public void close() {
 } catch (RuntimeException e) {
   ex = ex == null ? e : ex;
 }
-try {
-  if (fs != null) {
+
+for (DrillFileSystem fs : fileSystems) {
+  try {
 fs.close();
-fs = null;
-  }
-} catch (IOException e) {
+  } catch (IOException e) {
   throw UserException.resourceError(e)
-.addContext("Failed to close the Drill file system for " + 
getName())
-.build(logger);
+  .addContext("Failed to close the Drill file system for " + 
getName())
+  .build(logger);
+  }
 }
+
 if (ex != null) {
   throw ex;
 }
   }
 
   @Override
   public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
-Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
-fs = new DrillFileSystem(conf, getStats());
+DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --

I'm not suggesting we use the same fs for each split, but the opposite. The 
fs obect used per split/rowgroup should be different so that we get the right 
fs wait time for every minor fragment. But this change allows more than one fs 
object per operator context; which we were explicitly preventing earlier. I'm 
not sure I understand why you needed to change that.




> Parquet filter pushdown does not support the native hive reader
> ---
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453056#comment-16453056
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184197379
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
   public IterOutcome next() {
 batchLoader.resetRecordCount();
 stats.startProcessing();
+
+RawFragmentBatch batch = null;
 try{
-  RawFragmentBatch batch;
+
--- End diff --

will do. Thanks for the suggestion.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453053#comment-16453053
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184197278
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java
 ---
@@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger 
target) {
   target.historicalLog.recordEvent("incoming(from %s)", 
owningLedger.allocator.name);
 }
 
-boolean overlimit = target.allocator.forceAllocate(size);
+// Release first to handle the case where the current and target 
allocators were part of the same
+// parent / child tree.
 allocator.releaseBytes(size);
+boolean allocationFit = target.allocator.forceAllocate(size);
--- End diff --

What about debugging:
- An operation fails
- And yet there is a change of ownership
 
@vrozov, I rather not make this change as I strongly believe the change of 
ownership should happen only on success.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453041#comment-16453041
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184195748
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
--- End diff --

- After a call to getNext(), the operator code checks whether the transfer 
has succeeded
- So that an OOM condition is reported within the fragment
- When the change of ownership happens from parent to child, there should 
not be any OOM condition since this is internal account

For code clarity, I can improve the javadoc to ask the caller to check for 
an OOM condition (similarly to the BaseAlocator behavior).


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6282:
---
Description: 
There are three types of metrics in Drill: 
 1. _com.yammer.metrics_, 
 2. _com.codahale.metrics_, 
 3. _io.dropwizard.metrics_
 Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
dependencies only.
 The 2 and 3 ones have the same class full identifiers and maven doesn't know 
which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
 The 2 one is an outdated, but the 3 one is still under developing and updating.

Therefore the decision is:
 * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
for Drill,
 * to remove _com.yammer.metrics_ dependencies.

  was:
There are three types of metrics in Drill: 
 1. _com.yammer.metrics_, 
 2. _com.codahale.metrics_, 
 3. _io.dropwizard.metrics_
 Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
dependencies only.
 The 2 and 3 ones have the same class full identifiers and maven doesn't know 
which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
 The 2 one is an outdated, but the 3 one is still under developing and updating.

Therefore the decision is:
 * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
for Drill,
 * to control under dependency management block _com.yammer.metrics_ Metrics.


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
>  1. _com.yammer.metrics_, 
>  2. _com.codahale.metrics_, 
>  3. _io.dropwizard.metrics_
>  Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
>  The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
>  The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
>  * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
> for Drill,
>  * to remove _com.yammer.metrics_ dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6342) Parquet filter pushdown doesn't work in case of filtering fields inside arrays of complex fields

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453028#comment-16453028
 ] 

ASF GitHub Bot commented on DRILL-6342:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1231
  
+1. LGTM


> Parquet filter pushdown doesn't work in case of filtering fields inside 
> arrays of complex fields
> 
>
> Key: DRILL-6342
> URL: https://issues.apache.org/jira/browse/DRILL-6342
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: Complex_data.tar.gz
>
>
> *Data:*
>  Complex_data data set is attached
> *Query:*
> {code:sql}
> explain plan for select * from dfs.tmp.`Complex_data` t where 
> t.list_of_complex_fields[2].nested_field is true
> {code}
> *Expected result:*
> numFiles=2
> Statistics of the file that should't be scanned:
> {noformat}
> list_of_complex_fields:
> .nested_field:   BOOLEAN UNCOMPRESSED DO:0 FPO:497 
> SZ:41/41/1.00 VC:3 ENC:PLAIN,RLE ST:[min: false, max: false, num_nulls: 0]
> {noformat}
> *Actual result:*
> numFiles=3
> I.e, filter pushdown is not work



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453023#comment-16453023
 ] 

ASF GitHub Bot commented on DRILL-6327:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1240
  
@parthchandra - please help to review this PR.


> Update unary operators to handle IterOutcome.EMIT
> -
>
> Key: DRILL-6327
> URL: https://issues.apache.org/jira/browse/DRILL-6327
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Parth Chandra
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> IterOutcome.EMIT is a new state introduced by the Lateral Join 
> implementation. All operators need to be updated to handle it.
> This Jira is to track the subtask of updating the unary operators (derived 
> from AbstractSingleRecordBatch).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT

2018-04-25 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia reassigned DRILL-6327:


Assignee: Sorabh Hamirwasia

> Update unary operators to handle IterOutcome.EMIT
> -
>
> Key: DRILL-6327
> URL: https://issues.apache.org/jira/browse/DRILL-6327
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Parth Chandra
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> IterOutcome.EMIT is a new state introduced by the Lateral Join 
> implementation. All operators need to be updated to handle it.
> This Jira is to track the subtask of updating the unary operators (derived 
> from AbstractSingleRecordBatch).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT

2018-04-25 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6327:
-
Reviewer: Parth Chandra

> Update unary operators to handle IterOutcome.EMIT
> -
>
> Key: DRILL-6327
> URL: https://issues.apache.org/jira/browse/DRILL-6327
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Parth Chandra
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> IterOutcome.EMIT is a new state introduced by the Lateral Join 
> implementation. All operators need to be updated to handle it.
> This Jira is to track the subtask of updating the unary operators (derived 
> from AbstractSingleRecordBatch).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453022#comment-16453022
 ] 

ASF GitHub Bot commented on DRILL-6327:
---

GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/1240

DRILL-6327: Update unary operators to handle IterOutcome.EMIT

Note: Handles for Non-Blocking Unary operators (like 
Filter/Project/etc) with EMIT Iter.Outcome

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-6327

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1240


commit 74b9ead99cf1f92713ab9b81f23f680bd1a9cfff
Author: Sorabh Hamirwasia 
Date:   2018-04-05T00:54:58Z

DRILL-6327: Update unary operators to handle IterOutcome.EMIT
Note: Handles for Non-Blocking Unary operators (like 
Filter/Project/etc) with EMIT Iter.Outcome




> Update unary operators to handle IterOutcome.EMIT
> -
>
> Key: DRILL-6327
> URL: https://issues.apache.org/jira/browse/DRILL-6327
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Parth Chandra
>Priority: Major
>
> IterOutcome.EMIT is a new state introduced by the Lateral Join 
> implementation. All operators need to be updated to handle it.
> This Jira is to track the subtask of updating the unary operators (derived 
> from AbstractSingleRecordBatch).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453016#comment-16453016
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184192630
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
+
+// Set the index and increment reference count
+transferResult.buffer.writerIndex(writerIndex);
+
+// Clear the current Drillbuffer since caller will perform release() 
on the new one
+body.release();
+
+return new RawFragmentBatch(getHeader(), transferResult.buffer, 
getSender(), false);
--- End diff --

Look at the IncomingBuffers.batchArrived:
- It holds a reference to the RawFragmentBatch
- Holds a lock on the parent collector for insertion
- Though, the collector enqueue method itself is not synchronized on the 
"this" object
- Instead, it relies on the synchronization provided as part of the 
RawBatchBuffer queue
- This means a getNext() call will be able to dequeue a RawFragmentBatch 
instance
- Concurrently, the IncomingBuffers.batchArrived will invoke release on the 
cached reference


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453013#comment-16453013
 ] 

ASF GitHub Bot commented on DRILL-6307:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1228#discussion_r184192443
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -50,7 +50,7 @@
 public class RecordBatchSizer {
   private static final int OFFSET_VECTOR_WIDTH = UInt4Vector.VALUE_WIDTH;
   private static final int BIT_VECTOR_WIDTH = UInt1Vector.VALUE_WIDTH;
-  private static final int STD_REPETITION_FACTOR = 10;
+  public static final int STD_REPETITION_FACTOR = 10;
--- End diff --

done. using 5 in both places now. 


> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452986#comment-16452986
 ] 

ASF GitHub Bot commented on DRILL-6331:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r184188427
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
@@ -158,25 +159,26 @@ public void close() {
 } catch (RuntimeException e) {
   ex = ex == null ? e : ex;
 }
-try {
-  if (fs != null) {
+
+for (DrillFileSystem fs : fileSystems) {
+  try {
 fs.close();
-fs = null;
-  }
-} catch (IOException e) {
+  } catch (IOException e) {
   throw UserException.resourceError(e)
-.addContext("Failed to close the Drill file system for " + 
getName())
-.build(logger);
+  .addContext("Failed to close the Drill file system for " + 
getName())
+  .build(logger);
+  }
 }
+
 if (ex != null) {
   throw ex;
 }
   }
 
   @Override
   public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
-Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
-fs = new DrillFileSystem(conf, getStats());
+DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --

@parthchandra this is definitely a good question. I did so because in 
previous code new fs was created for each Hive table split [1]. Projection 
pusher is used to define fs for each split, it resolves path for table 
partitions. Frankly saying it worked fine for me without it (all tests have 
passed) but in Hive code the same approach is used and apparently for the same 
reasons it was used in Drill. To be safe, I have done the same. If you think we 
can the same fs for each row group in Hive, then I can adjust the changes.

[1] 
https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java#L112


> Parquet filter pushdown does not support the native hive reader
> ---
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452961#comment-16452961
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1225
  
@vrozov re-implemented using maven embedder. Had to upgrade jmokcit lib to 
the latest version since it caused  NPE with maven embedder (NPE from jmockit 
even if no mocks are used, fixed in newer version). Please review.


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.14.0
>
>
> Per [~vrozov] the source distribution contains binary jar files under 
> exec/java-exec/src/test/resources/jars



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6282:
---
Description: 
There are three types of metrics in Drill: 
 1. _com.yammer.metrics_, 
 2. _com.codahale.metrics_, 
 3. _io.dropwizard.metrics_
 Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
dependencies only.
 The 2 and 3 ones have the same class full identifiers and maven doesn't know 
which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
 The 2 one is an outdated, but the 3 one is still under developing and updating.

Therefore the decision is:
 * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
for Drill,
 * to control under dependency management block _com.yammer.metrics_ Metrics.

  was:
There are three types of metrics in Drill: 
1. _com.yammer.metrics_, 
2. _com.codahale.metrics_, 
3. _io.dropwizard.metrics_
Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
dependencies only.
The 2 and 3 ones have the same class full identifiers and maven doesn't know 
which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
The 2 one is an outdated, but the 3 one is still under developing and updating.

Therefore the decision is:
* to use only one _io.dropwizard.metrics_ Metrics for Drill,
* to remove _com.codahale.metrics_ metrics, 
* to control under dependency management block _com.yammer.metrics_ Metrics.


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
>  1. _com.yammer.metrics_, 
>  2. _com.codahale.metrics_, 
>  3. _io.dropwizard.metrics_
>  Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
>  The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
>  The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
>  * to replace com.codahale.metrics with last _io.dropwizard.metrics_ Metrics 
> for Drill,
>  * to control under dependency management block _com.yammer.metrics_ Metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6282:
---
Description: 
There are three types of metrics in Drill: 
1. _com.yammer.metrics_, 
2. _com.codahale.metrics_, 
3. _io.dropwizard.metrics_
Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
dependencies only.
The 2 and 3 ones have the same class full identifiers and maven doesn't know 
which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
The 2 one is an outdated, but the 3 one is still under developing and updating.

Therefore the decision is:
* to use only one _io.dropwizard.metrics_ Metrics for Drill,
* to remove _com.codahale.metrics_ metrics, 
* to control under dependency management block _com.yammer.metrics_ Metrics.

  was:
There are three types of metrics-core in Drill: 
1. _com.yammer.metrics_, 
2. _com.codahale.metrics_, 
3. _io.dropwizard.metrics_
Drill uses only 1 and 2. The last 3 one is used by Hive.
1st one has different class full identifiers, but the 2 and 3 ones have the 
same class full identifiers and maven doesn't know which library to use 
([https://github.com/dropwizard/metrics/issues/1044]).
But I found that 3 one library is used by Hive only for tests, therefore it is 
not required for Drill and could be excluded from hive-metastore and hive-exec.
The dependencies conflict is related not only to metrics-core, but to 
metrics-servlets and metrics-json as well.

All these metrics should be organized with proper excluding and dependency 
management blocks.


> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics in Drill: 
> 1. _com.yammer.metrics_, 
> 2. _com.codahale.metrics_, 
> 3. _io.dropwizard.metrics_
> Directly Drill uses only 2. The 1 and 3 ones are used as transitive 
> dependencies only.
> The 2 and 3 ones have the same class full identifiers and maven doesn't know 
> which library to use ([https://github.com/dropwizard/metrics/issues/1044]).
> The 2 one is an outdated, but the 3 one is still under developing and 
> updating.
> Therefore the decision is:
> * to use only one _io.dropwizard.metrics_ Metrics for Drill,
> * to remove _com.codahale.metrics_ metrics, 
> * to control under dependency management block _com.yammer.metrics_ Metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6282) Update Drill's Metrics dependencies

2018-04-25 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6282:
---
Summary: Update Drill's Metrics dependencies  (was: Excluding 
io.dropwizard.metrics dependencies)

> Update Drill's Metrics dependencies
> ---
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics-core in Drill: 
> 1. _com.yammer.metrics_, 
> 2. _com.codahale.metrics_, 
> 3. _io.dropwizard.metrics_
> Drill uses only 1 and 2. The last 3 one is used by Hive.
> 1st one has different class full identifiers, but the 2 and 3 ones have the 
> same class full identifiers and maven doesn't know which library to use 
> ([https://github.com/dropwizard/metrics/issues/1044]).
> But I found that 3 one library is used by Hive only for tests, therefore it 
> is not required for Drill and could be excluded from hive-metastore and 
> hive-exec.
> The dependencies conflict is related not only to metrics-core, but to 
> metrics-servlets and metrics-json as well.
> All these metrics should be organized with proper excluding and dependency 
> management blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452883#comment-16452883
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1239
  
One other thing to highlight from an earlier comment. CPU is something that 
the user specifies in the DoY config file. That information is passed to YARN 
in container requests. This feature asks the user to modify the drill-env.sh 
file to enable cgroups to limit CPU. As a result, the user must specify the CPU 
limit in two places.

DoY went to extreme lengths to unify memory configuration so it is set in 
one place. The assumption was that, since Apache YARN handles cgroups, we've 
also got unified CPU specification. But, in this "side-car" approach we don't. 
So, would be good to capture the CPU amount from the YARN config as explained 
earlier and use that to set the Drill cgroups env vars -- but only if some 
"self-enforcing cgroup" flag is enabled in the config (which can be done by 
default for the limited MapR YARN.)


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452877#comment-16452877
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1239
  
@kkhatua, putting on my Apache hat... Apache Drill is an Apache project 
that must work with other Apache projects such as Apache YARN. The Apache Drill 
DoY support is designed to work well with Apache YARN (and has a few special 
additions for MapR YARN's unique limitations.) It is important that the Apache 
DoY work well with the generic YARN. No harm in adding tweaks (such as this 
one) to work with vendor-specific limitations. But, the overriding concern is 
that the DoY feature be useful in Apache. 


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452872#comment-16452872
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1239#discussion_r184169114
  
--- Diff: distribution/src/resources/yarn-drillbit.sh ---
@@ -175,4 +209,11 @@ fi
 echo "`date` Starting drillbit on `hostname` under YARN, logging to 
$DRILLBIT_LOG_PATH"
 echo "`ulimit -a`" >> "$DRILLBIT_LOG_PATH" 2>&1
 
-"$DRILL_HOME/bin/runbit" exec
+# Run in background
+"$DRILL_HOME/bin/runbit" exec &
--- End diff --

Under YARN, it is YARN that maintains the pid, not Drill. YARN expects its 
child processes to run in the foreground and will handle capturing the pid.

This is a case in which "native" Apache YARN works differently than "MapR 
YARN." Since Apache YARN handles cgroups, it is the one that needs the pid. 
Under MapR's limited YARN, then Drill is second-guessing YARN and needs the 
pid. It may be that MapR's YARN can handle a background process; I don't recall 
the details.

Is there way way to run Drill in the background, get the pid, then return 
to the foreground so that the script does not exit until Drill itself exits?

In fact, if I remember correctly, the scripts have two layers; in one 
layer, the script replaces itself with the Drill process. Something to check.


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452816#comment-16452816
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184159218
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
   public IterOutcome next() {
 batchLoader.resetRecordCount();
 stats.startProcessing();
+
+RawFragmentBatch batch = null;
 try{
-  RawFragmentBatch batch;
+
--- End diff --

create function `getNextNotEmptyBatch()` that calls `getNextBatch()` and 
either returns not empty batch or `null`.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452807#comment-16452807
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184156922
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java
 ---
@@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger 
target) {
   target.historicalLog.recordEvent("incoming(from %s)", 
owningLedger.allocator.name);
 }
 
-boolean overlimit = target.allocator.forceAllocate(size);
+// Release first to handle the case where the current and target 
allocators were part of the same
+// parent / child tree.
 allocator.releaseBytes(size);
+boolean allocationFit = target.allocator.forceAllocate(size);
--- End diff --

If this happens, is not there a problem that the old allocator already 
released the memory? In any case, won't runtime exception cancel the query 
anyway and all allocators will be closed.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452797#comment-16452797
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184155724
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
--- End diff --

But what if it is an over limit?


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452793#comment-16452793
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1239
  
@paul-rogers DoY is no more a MapR-only feature, and if it helps to have 
Drill self-enforce, this works. If YARN is able to enforce for Drill, the user 
need not specify the settings in their `drill-env.sh`. 

I see https://issues.apache.org/jira/browse/YARN-810 as being an open 
issue. (Source: [Issue 
List](https://issues.apache.org/jira/issues/?jql=project%20%3D%20YARN%20AND%20status%20in%20(Open%2C%20"Patch%20Available")%20AND%20text%20~%20"cgroup"%20ORDER%20BY%20key%20%2C%20status%20ASC)
 ). So, I probably don't need to do any explicit checks for hadoop distros and 
documentation should be sufficient to address this, IMO.


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452791#comment-16452791
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184154997
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
+
+// Set the index and increment reference count
+transferResult.buffer.writerIndex(writerIndex);
+
+// Clear the current Drillbuffer since caller will perform release() 
on the new one
+body.release();
+
+return new RawFragmentBatch(getHeader(), transferResult.buffer, 
getSender(), false);
--- End diff --

I don't see where RawFragmentBatch is cached. Is not it removed from a 
queue using poll()?


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452774#comment-16452774
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1239#discussion_r184153040
  
--- Diff: distribution/src/resources/yarn-drillbit.sh ---
@@ -175,4 +209,11 @@ fi
 echo "`date` Starting drillbit on `hostname` under YARN, logging to 
$DRILLBIT_LOG_PATH"
 echo "`ulimit -a`" >> "$DRILLBIT_LOG_PATH" 2>&1
 
-"$DRILL_HOME/bin/runbit" exec
+# Run in background
+"$DRILL_HOME/bin/runbit" exec &
--- End diff --

The process is momentarily in the background to capture the PID. We 
eventually wait for it. Are you saying that a process will not continue to run 
because it is in the background??


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452763#comment-16452763
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184151186
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -182,13 +184,18 @@ public IterOutcome next() {
 return IterOutcome.OUT_OF_MEMORY;
   }
 
+  // Transfer the ownership of this raw-batch to this operator for 
proper memory statistics reporting
+  batch = batch.transferBodyOwnership(oContext.getAllocator());
+
   final RecordBatchDef rbd = batch.getHeader().getDef();
   final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
   // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
   // SchemaChangeException, so check/clean catch clause below.
   stats.addLongStat(Metric.BYTES_RECEIVED, batch.getByteCount());
 
   batch.release();
+  batch = null;
--- End diff --

@vrozov 

My bad, the highlight was on the "batch = null" code; I guess, you meant 
why can't we move the whole release logic within the finally block. I agree 
with your proposal as delaying a bit the release phase doesn't hurt u in this 
case. I will make the change.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452758#comment-16452758
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184150526
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
   public IterOutcome next() {
 batchLoader.resetRecordCount();
 stats.startProcessing();
+
+RawFragmentBatch batch = null;
--- End diff --

It is not obvious that `body` is `null` when the `if` condition is met. 
Additionally, what if a release logic changes in future? It is a bad practice 
to rely on the details of implementation.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5927) Root allocator consistently Leaks a buffer in unit tests

2018-04-25 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5927:
-
Labels: ready-to-commit  (was: )

> Root allocator consistently Leaks a buffer in unit tests
> 
>
> Key: DRILL-5927
> URL: https://issues.apache.org/jira/browse/DRILL-5927
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> TestBsonRecordReader consistently poduces this exception when running on my 
> laptop
> {code}
> 13:09:15.777 [main] ERROR o.a.d.exec.server.BootStrapContext - Error while 
> closing
> java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding 
> buffers allocated (1).
> Allocator(ROOT) 0/1024/10113536/4294967296 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 1
> ledger[79] allocator: ROOT), isOwning: true, size: 1024, references: 1, 
> life: 340912804170064..0, allocatorManager: [71, life: 340912803759189..0] 
> holds 1 buffers. 
> DrillBuf[106], udle: [72 0..1024]
>   reservations: 0
>   at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:502) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> [classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> [classes/:na]
>   at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:256)
>  ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> [classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> [classes/:na]
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:205) 
> [classes/:na]
>   at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:315) 
> [test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_144]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_144]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_144]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  [junit-4.11.jar:na]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.11.jar:na]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  [junit-4.11.jar:na]
>   at 
> mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:44)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:29)
>  [jmockit-1.3.jar:na]
>   at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_144]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
>   at 
> mockit.internal.util.MethodReflection.invokeWithCheckedThrows(MethodReflection.java:95)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.internal.annotations.MockMethodBridge.callMock(MockMethodBridge.java:76)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.internal.annotations.MockMethodBridge.invoke(MockMethodBridge.java:41) 
> [jmockit-1.3.jar:na]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java)
>  [junit-4.11.jar:na]
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) 
> [junit-4.11.jar:na]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309) 
> [junit-4.11.jar:na]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160) 
> [junit-4.11.jar:na]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5927) Root allocator consistently Leaks a buffer in unit tests

2018-04-25 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5927:
-
Reviewer: Vlad Rozov  (was: Parth Chandra)

> Root allocator consistently Leaks a buffer in unit tests
> 
>
> Key: DRILL-5927
> URL: https://issues.apache.org/jira/browse/DRILL-5927
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> TestBsonRecordReader consistently poduces this exception when running on my 
> laptop
> {code}
> 13:09:15.777 [main] ERROR o.a.d.exec.server.BootStrapContext - Error while 
> closing
> java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding 
> buffers allocated (1).
> Allocator(ROOT) 0/1024/10113536/4294967296 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 1
> ledger[79] allocator: ROOT), isOwning: true, size: 1024, references: 1, 
> life: 340912804170064..0, allocatorManager: [71, life: 340912803759189..0] 
> holds 1 buffers. 
> DrillBuf[106], udle: [72 0..1024]
>   reservations: 0
>   at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:502) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> [classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> [classes/:na]
>   at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:256)
>  ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> [classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> [classes/:na]
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:205) 
> [classes/:na]
>   at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:315) 
> [test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_144]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_144]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_144]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  [junit-4.11.jar:na]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.11.jar:na]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  [junit-4.11.jar:na]
>   at 
> mockit.integration.junit4.internal.JUnit4TestRunnerDecorator.invokeExplosively(JUnit4TestRunnerDecorator.java:44)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.integration.junit4.internal.MockFrameworkMethod.invokeExplosively(MockFrameworkMethod.java:29)
>  [jmockit-1.3.jar:na]
>   at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_144]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
>   at 
> mockit.internal.util.MethodReflection.invokeWithCheckedThrows(MethodReflection.java:95)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.internal.annotations.MockMethodBridge.callMock(MockMethodBridge.java:76)
>  [jmockit-1.3.jar:na]
>   at 
> mockit.internal.annotations.MockMethodBridge.invoke(MockMethodBridge.java:41) 
> [jmockit-1.3.jar:na]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java)
>  [junit-4.11.jar:na]
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) 
> [junit-4.11.jar:na]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309) 
> [junit-4.11.jar:na]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160) 
> [junit-4.11.jar:na]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452730#comment-16452730
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184146733
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -182,13 +184,18 @@ public IterOutcome next() {
 return IterOutcome.OUT_OF_MEMORY;
   }
 
+  // Transfer the ownership of this raw-batch to this operator for 
proper memory statistics reporting
+  batch = batch.transferBodyOwnership(oContext.getAllocator());
+
   final RecordBatchDef rbd = batch.getHeader().getDef();
   final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
   // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
   // SchemaChangeException, so check/clean catch clause below.
   stats.addLongStat(Metric.BYTES_RECEIVED, batch.getByteCount());
 
   batch.release();
+  batch = null;
--- End diff --

Why can't it be done in finally?


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452716#comment-16452716
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184107121
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
  }
+
+  @Test
+  public void testLog10WithDouble() throws Throwable {
+String json = "{" +
+  "\"positive_infinity\" : Infinity," +
+  "\"negative_infinity\" : -Infinity," +
+  "\"nan\" : NaN," +
+  "\"num1\": 0.0," +
+  "\"num2\": 0.1," +
+  "\"num3\": 1.0," +
+  "\"num4\": 1.5," +
+  "\"num5\": -1.5," +
+  "\"num6\": 10.0" +
+  "}";
+String query = "select " +
+"log10(positive_infinity) as pos_inf, " +
+"log10(negative_infinity) as neg_inf, " +
+"log10(nan) as nan, " +
+"log10(num1) as num1, " +
+"log10(num2) as num2, " +
+"log10(num3) as num3, " +
+"log10(num4) as num4, " +
+"log10(num5) as num5, " +
+"log10(num6) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
--- End diff --

Please consider `setSessionOption`.


> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Add LOG10 function implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452713#comment-16452713
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184110535
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
  }
+
+  @Test
+  public void testLog10WithDouble() throws Throwable {
+String json = "{" +
+  "\"positive_infinity\" : Infinity," +
+  "\"negative_infinity\" : -Infinity," +
+  "\"nan\" : NaN," +
+  "\"num1\": 0.0," +
+  "\"num2\": 0.1," +
+  "\"num3\": 1.0," +
+  "\"num4\": 1.5," +
+  "\"num5\": -1.5," +
+  "\"num6\": 10.0" +
+  "}";
+String query = "select " +
+"log10(positive_infinity) as pos_inf, " +
+"log10(negative_infinity) as neg_inf, " +
+"log10(nan) as nan, " +
+"log10(num1) as num1, " +
+"log10(num2) as num2, " +
+"log10(num3) as num3, " +
+"log10(num4) as num4, " +
+"log10(num5) as num5, " +
+"log10(num6) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d)
+  .build()
+  .run();
+} finally {
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
--- End diff --

For the case when default value of the option is changed, disabling it 
after the tests may cause problems for other tests. Correct behaviour is to 
reset the value of the option. May be used method `resetSessionOption`


> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Add LOG10 function implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452719#comment-16452719
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184138785
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
  }
+
+  @Test
+  public void testLog10WithDouble() throws Throwable {
+String json = "{" +
+  "\"positive_infinity\" : Infinity," +
+  "\"negative_infinity\" : -Infinity," +
+  "\"nan\" : NaN," +
+  "\"num1\": 0.0," +
+  "\"num2\": 0.1," +
+  "\"num3\": 1.0," +
+  "\"num4\": 1.5," +
+  "\"num5\": -1.5," +
+  "\"num6\": 10.0" +
+  "}";
+String query = "select " +
+"log10(positive_infinity) as pos_inf, " +
+"log10(negative_infinity) as neg_inf, " +
+"log10(nan) as nan, " +
+"log10(num1) as num1, " +
+"log10(num2) as num2, " +
+"log10(num3) as num3, " +
+"log10(num4) as num4, " +
+"log10(num5) as num5, " +
+"log10(num6) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d)
+  .build()
+  .run();
+} finally {
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testLog10WithFloat() throws Throwable {
+String json = "{" +
+"\"positive_infinity\" : Infinity," +
+"\"negative_infinity\" : -Infinity," +
+"\"nan\" : NaN," +
+"\"num1\": 0.0," +
+"\"num2\": 0.1," +
+"\"num3\": 1.0," +
+"\"num4\": 1.5," +
+"\"num5\": -1.5," +
+"\"num6\": 10.0" +
+"}";
+String query = "select " +
+"log10(cast(positive_infinity as float)) as pos_inf, " +
+"log10(cast(negative_infinity as float)) as neg_inf, " +
+"log10(cast(nan as float)) as nan, " +
+"log10(cast(num1 as float)) as num1, " +
+"log10(cast(num2 as float)) as num2, " +
+"log10(cast(num3 as float)) as num3, " +
+"log10(cast(num4 as float)) as num4, " +
+"log10(cast(num5 as float)) as num5, " +
+"log10(cast(num6 as float)) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -0.3528508d, 0d, 0.17609125905568124d, 
Double.NaN, 1.0d)
+  .build()
+  .run();
+} finally {
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testLog10WithInt() throws Throwable {
+String json = "{" +
+"\"num1\": 0.0," +
+"\"num3\": 1.0," +
+"\"num5\": -1.0," +
+"\

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452718#comment-16452718
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184106411
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
  }
+
+  @Test
+  public void testLog10WithDouble() throws Throwable {
+String json = "{" +
+  "\"positive_infinity\" : Infinity," +
+  "\"negative_infinity\" : -Infinity," +
+  "\"nan\" : NaN," +
+  "\"num1\": 0.0," +
+  "\"num2\": 0.1," +
+  "\"num3\": 1.0," +
+  "\"num4\": 1.5," +
+  "\"num5\": -1.5," +
+  "\"num6\": 10.0" +
+  "}";
+String query = "select " +
+"log10(positive_infinity) as pos_inf, " +
+"log10(negative_infinity) as neg_inf, " +
+"log10(nan) as nan, " +
+"log10(num1) as num1, " +
+"log10(num2) as num2, " +
+"log10(num3) as num3, " +
+"log10(num4) as num4, " +
+"log10(num5) as num5, " +
+"log10(num6) as num6 " +
+"" +
--- End diff --

Please remove empty string here and in other places.


> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Add LOG10 function implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452714#comment-16452714
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184138601
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
  }
+
+  @Test
+  public void testLog10WithDouble() throws Throwable {
+String json = "{" +
+  "\"positive_infinity\" : Infinity," +
+  "\"negative_infinity\" : -Infinity," +
+  "\"nan\" : NaN," +
+  "\"num1\": 0.0," +
+  "\"num2\": 0.1," +
+  "\"num3\": 1.0," +
+  "\"num4\": 1.5," +
+  "\"num5\": -1.5," +
+  "\"num6\": 10.0" +
+  "}";
+String query = "select " +
+"log10(positive_infinity) as pos_inf, " +
+"log10(negative_infinity) as neg_inf, " +
+"log10(nan) as nan, " +
+"log10(num1) as num1, " +
+"log10(num2) as num2, " +
+"log10(num3) as num3, " +
+"log10(num4) as num4, " +
+"log10(num5) as num5, " +
+"log10(num6) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d)
+  .build()
+  .run();
+} finally {
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testLog10WithFloat() throws Throwable {
+String json = "{" +
+"\"positive_infinity\" : Infinity," +
+"\"negative_infinity\" : -Infinity," +
+"\"nan\" : NaN," +
+"\"num1\": 0.0," +
+"\"num2\": 0.1," +
+"\"num3\": 1.0," +
+"\"num4\": 1.5," +
+"\"num5\": -1.5," +
+"\"num6\": 10.0" +
+"}";
+String query = "select " +
+"log10(cast(positive_infinity as float)) as pos_inf, " +
+"log10(cast(negative_infinity as float)) as neg_inf, " +
+"log10(cast(nan as float)) as nan, " +
+"log10(cast(num1 as float)) as num1, " +
+"log10(cast(num2 as float)) as num2, " +
+"log10(cast(num3 as float)) as num3, " +
+"log10(cast(num4 as float)) as num4, " +
+"log10(cast(num5 as float)) as num5, " +
+"log10(cast(num6 as float)) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -0.3528508d, 0d, 0.17609125905568124d, 
Double.NaN, 1.0d)
+  .build()
--- End diff --

`build().run()` -> `go()`



> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452717#comment-16452717
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184144882
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -132,4 +148,200 @@ public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
  }
+
+  @Test
+  public void testLog10WithDouble() throws Throwable {
+String json = "{" +
+  "\"positive_infinity\" : Infinity," +
+  "\"negative_infinity\" : -Infinity," +
+  "\"nan\" : NaN," +
+  "\"num1\": 0.0," +
+  "\"num2\": 0.1," +
+  "\"num3\": 1.0," +
+  "\"num4\": 1.5," +
+  "\"num5\": -1.5," +
+  "\"num6\": 10.0" +
+  "}";
+String query = "select " +
+"log10(positive_infinity) as pos_inf, " +
+"log10(negative_infinity) as neg_inf, " +
+"log10(nan) as nan, " +
+"log10(num1) as num1, " +
+"log10(num2) as num2, " +
+"log10(num3) as num3, " +
+"log10(num4) as num4, " +
+"log10(num5) as num5, " +
+"log10(num6) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -1.0d, 0d, 0.17609125905568124d, Double.NaN, 1.0d)
+  .build()
+  .run();
+} finally {
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READ_NUMBERS_AS_DOUBLE);
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testLog10WithFloat() throws Throwable {
+String json = "{" +
+"\"positive_infinity\" : Infinity," +
+"\"negative_infinity\" : -Infinity," +
+"\"nan\" : NaN," +
+"\"num1\": 0.0," +
+"\"num2\": 0.1," +
+"\"num3\": 1.0," +
+"\"num4\": 1.5," +
+"\"num5\": -1.5," +
+"\"num6\": 10.0" +
+"}";
+String query = "select " +
+"log10(cast(positive_infinity as float)) as pos_inf, " +
+"log10(cast(negative_infinity as float)) as neg_inf, " +
+"log10(cast(nan as float)) as nan, " +
+"log10(cast(num1 as float)) as num1, " +
+"log10(cast(num2 as float)) as num2, " +
+"log10(cast(num3 as float)) as num3, " +
+"log10(cast(num4 as float)) as num4, " +
+"log10(cast(num5 as float)) as num5, " +
+"log10(cast(num6 as float)) as num6 " +
+"" +
+"from dfs.`data.json`";
+File file = new File(dirTestWatcher.getRootDir(), "data.json");
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `%s` = true", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  testBuilder()
+  .sqlQuery(query)
+  .ordered()
+  .baselineColumns("pos_inf", "neg_inf", "nan", "num1", 
"num2", "num3", "num4", "num5", "num6")
+  .baselineValues(Double.POSITIVE_INFINITY, Double.NaN, 
Double.NaN, Double.NEGATIVE_INFINITY,
+  -0.3528508d, 0d, 0.17609125905568124d, 
Double.NaN, 1.0d)
+  .build()
+  .run();
+} finally {
+  test("alter session set `%s` = false", 
ExecConstants.JSON_READER_NAN_INF_NUMBERS);
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testLog10WithInt() throws Throwable {
+String json = "{" +
+"\"num1\": 0.0," +
+"\"num3\": 1.0," +
+"\"num5\": -1.0," +
+"\

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452715#comment-16452715
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1230#discussion_r184145427
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
 ---
@@ -20,12 +20,20 @@
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
 
+import java.io.File;
 import java.math.BigDecimal;
+import java.nio.file.Files;
+import java.util.DoubleSummaryStatistics;
+import java.util.List;
 
+import com.google.common.collect.Lists;
+import org.apache.commons.io.FileUtils;
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.compile.ExampleTemplateWithInner;
--- End diff --

Is this import required?


> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Add LOG10 function implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452711#comment-16452711
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1239
  
There may be some misunderstanding of how DoY works. The only info that 
users can pass to DoY is that which is in the DoY config file. We should add 
arguments to that file which will be passed through the DoY client to the DoY 
AM, and from there, as an env var, to the Drillbit containers. We already do 
this for memory so that both Drill and YARN agree on memory. We should do the 
same for CPU so that Drill, YARN and cgroups agree on the number of CPUs that 
Drill can use.

Since this feature is MapR-only, it might be OK to require that users alter 
their `drill-env.sh` to set an environment variable the forces Drill to police 
itself under YARN.


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Excluding io.dropwizard.metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452709#comment-16452709
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1189
  
Please update JIRA, PR and commit titles and squash commits.


> Excluding io.dropwizard.metrics dependencies
> 
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics-core in Drill: 
> 1. _com.yammer.metrics_, 
> 2. _com.codahale.metrics_, 
> 3. _io.dropwizard.metrics_
> Drill uses only 1 and 2. The last 3 one is used by Hive.
> 1st one has different class full identifiers, but the 2 and 3 ones have the 
> same class full identifiers and maven doesn't know which library to use 
> ([https://github.com/dropwizard/metrics/issues/1044]).
> But I found that 3 one library is used by Hive only for tests, therefore it 
> is not required for Drill and could be excluded from hive-metastore and 
> hive-exec.
> The dependencies conflict is related not only to metrics-core, but to 
> metrics-servlets and metrics-json as well.
> All these metrics should be organized with proper excluding and dependency 
> management blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452712#comment-16452712
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1239#discussion_r184144439
  
--- Diff: distribution/src/resources/yarn-drillbit.sh ---
@@ -175,4 +209,11 @@ fi
 echo "`date` Starting drillbit on `hostname` under YARN, logging to 
$DRILLBIT_LOG_PATH"
 echo "`ulimit -a`" >> "$DRILLBIT_LOG_PATH" 2>&1
 
-"$DRILL_HOME/bin/runbit" exec
+# Run in background
+"$DRILL_HOME/bin/runbit" exec &
--- End diff --

This can't be for YARN. Under YARN, Drill must be run in the foreground. 
The original Drill-on-YARN work ensured that al this works.


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452708#comment-16452708
 ] 

ASF GitHub Bot commented on DRILL-6331:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r183919198
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
@@ -158,25 +159,26 @@ public void close() {
 } catch (RuntimeException e) {
   ex = ex == null ? e : ex;
 }
-try {
-  if (fs != null) {
+
+for (DrillFileSystem fs : fileSystems) {
+  try {
 fs.close();
-fs = null;
-  }
-} catch (IOException e) {
+  } catch (IOException e) {
   throw UserException.resourceError(e)
-.addContext("Failed to close the Drill file system for " + 
getName())
-.build(logger);
+  .addContext("Failed to close the Drill file system for " + 
getName())
+  .build(logger);
+  }
 }
+
 if (ex != null) {
   throw ex;
 }
   }
 
   @Override
   public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
-Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
-fs = new DrillFileSystem(conf, getStats());
+DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --

I don't get why you need multiple DrillFileSystems per operator context? 
The reason for the DrillFileSystem abstraction (and the reason for tying it to 
the operator context) is to track the time a (scan) operator was waiting for a 
file system call to return. This is reported in the wait time for the operator 
in the query profile.  For scans this is a critical number as the time spent 
waiting for a disk read determines if the query is disk bound.
Associating multiple file system objects with a single operator context 
will throw the math out of whack. I think.



> Parquet filter pushdown does not support the native hive reader
> ---
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452706#comment-16452706
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1239
  
Thanks for that pointer, @paul-rogers ! I'll make the relevant changes and 
add to this commit.


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-143:
---
Labels: doc-impacting  (was: doc-impacting ready-to-commit)

> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452707#comment-16452707
 ] 

ASF GitHub Bot commented on DRILL-6331:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r183909850
  
--- Diff: common/src/main/java/org/apache/drill/common/Stopwatch.java ---
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.common;
+
+import com.google.common.base.Ticker;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Helper that creates stopwatch based if debug level is enabled.
--- End diff --

Do we really need this? In general we have (or should have) used Stopwatch 
to track metrics and or performance bottlenecks in production. In neither case 
do we want to enable debug.
Also, for debugging performance issues (I see that the places you've 
changed to use this Stopwatch are places where we encountered performance 
issues), would it be better to use 
```
Stopwatch timer;
if(logger.isDebugEnabled()){
   timer = Stopwatch.createStarted();
}
```
More verbose, but guaranteed to be optimized away by the JVM.
Not insisting that we change this, BTW.


> Parquet filter pushdown does not support the native hive reader
> ---
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Excluding io.dropwizard.metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452704#comment-16452704
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1189#discussion_r184144853
  
--- Diff: pom.xml ---
@@ -1333,6 +1353,12 @@
   
 
   
+  
--- End diff --

I am not sure why is it necessary to have `com.yammer.metrics` in 
dependencyManagement?


> Excluding io.dropwizard.metrics dependencies
> 
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics-core in Drill: 
> 1. _com.yammer.metrics_, 
> 2. _com.codahale.metrics_, 
> 3. _io.dropwizard.metrics_
> Drill uses only 1 and 2. The last 3 one is used by Hive.
> 1st one has different class full identifiers, but the 2 and 3 ones have the 
> same class full identifiers and maven doesn't know which library to use 
> ([https://github.com/dropwizard/metrics/issues/1044]).
> But I found that 3 one library is used by Hive only for tests, therefore it 
> is not required for Drill and could be excluded from hive-metastore and 
> hive-exec.
> The dependencies conflict is related not only to metrics-core, but to 
> metrics-servlets and metrics-json as well.
> All these metrics should be organized with proper excluding and dependency 
> management blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6282) Excluding io.dropwizard.metrics dependencies

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452697#comment-16452697
 ] 

ASF GitHub Bot commented on DRILL-6282:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1189#discussion_r184144021
  
--- Diff: logical/pom.xml ---
@@ -85,14 +85,12 @@
 
 
 
-  com.codahale.metrics
+  io.dropwizard.metrics
--- End diff --

Is it used in logical?


> Excluding io.dropwizard.metrics dependencies
> 
>
> Key: DRILL-6282
> URL: https://issues.apache.org/jira/browse/DRILL-6282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> There are three types of metrics-core in Drill: 
> 1. _com.yammer.metrics_, 
> 2. _com.codahale.metrics_, 
> 3. _io.dropwizard.metrics_
> Drill uses only 1 and 2. The last 3 one is used by Hive.
> 1st one has different class full identifiers, but the 2 and 3 ones have the 
> same class full identifiers and maven doesn't know which library to use 
> ([https://github.com/dropwizard/metrics/issues/1044]).
> But I found that 3 one library is used by Hive only for tests, therefore it 
> is not required for Drill and could be excluded from hive-metastore and 
> hive-exec.
> The dependencies conflict is related not only to metrics-core, but to 
> metrics-servlets and metrics-json as well.
> All these metrics should be organized with proper excluding and dependency 
> management blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452696#comment-16452696
 ] 

ASF GitHub Bot commented on DRILL-143:
--

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1239
  
@kkhatua, it turns out that upstream YARN has long had effective cgroup 
support per container. ( have the pleasure of sitting near the guy who 
maintains that work.)There has long been a discussion about whether the MapR 
version of YARN picked up those changes, we believe that MapR does *not* 
support this upstream work.

As a result, under Apache YARN, the YARN NM itself will impose cgroup 
controls and Drill need not do it itself. For MapR YARN (only) Drill (and all 
other YARN apps) must do their own cgroup control.

Please make sure that this feature is off by default to allow YARN to do 
the work. Only enable it for versions of YARN (such as MapR) which do not 
provide cgroup control in YARN itself.


> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.14.0
>
> Attachments: 253ce178-ddeb-e482-cd64-44ab7284ad1c.sys.drill
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452670#comment-16452670
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184117938
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -182,13 +184,18 @@ public IterOutcome next() {
 return IterOutcome.OUT_OF_MEMORY;
   }
 
+  // Transfer the ownership of this raw-batch to this operator for 
proper memory statistics reporting
+  batch = batch.transferBodyOwnership(oContext.getAllocator());
+
   final RecordBatchDef rbd = batch.getHeader().getDef();
   final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
   // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
   // SchemaChangeException, so check/clean catch clause below.
   stats.addLongStat(Metric.BYTES_RECEIVED, batch.getByteCount());
 
   batch.release();
+  batch = null;
--- End diff --

Not sure what you mean but this is the goal of the current code:
- After a batch is properly set, we need to decrease the ref count by one 
by the end of the next() method
- If an exception happens before the release call, then the finally block 
will be able to release the batch since it will be different than null
- Otherwise, the release will be performed and the batch set to null which 
will disable the release within the finally block 


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452671#comment-16452671
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184138876
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
+
+// Set the index and increment reference count
+transferResult.buffer.writerIndex(writerIndex);
+
+// Clear the current Drillbuffer since caller will perform release() 
on the new one
+body.release();
+
+return new RawFragmentBatch(getHeader(), transferResult.buffer, 
getSender(), false);
--- End diff --

That was my original code which failed miserably:
- The RPC code has references on the DrillBuf and the RawFragmentBatch
- This means, we need to ensure that a release call is routed to the right 
DrillBuf (otherwise, the reference count logic stops working)
- Creating a new RawFragmentBatch instance essentially provided just that 
(proper reference count accounting)


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452672#comment-16452672
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184119110
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
--- End diff --

yes, it is. 


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452668#comment-16452668
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184112400
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
   public IterOutcome next() {
 batchLoader.resetRecordCount();
 stats.startProcessing();
+
+RawFragmentBatch batch = null;
--- End diff --

The release logic is a NOOP if the body is null; the while loop is to guard 
against an empty batch.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452673#comment-16452673
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184140733
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java
 ---
@@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger 
target) {
   target.historicalLog.recordEvent("incoming(from %s)", 
owningLedger.allocator.name);
 }
 
-boolean overlimit = target.allocator.forceAllocate(size);
+// Release first to handle the case where the current and target 
allocators were part of the same
+// parent / child tree.
 allocator.releaseBytes(size);
+boolean allocationFit = target.allocator.forceAllocate(size);
--- End diff --

What if a runtime exception is thrown during forceAllocate(...)?  


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452669#comment-16452669
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184114148
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -153,8 +153,10 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
   public IterOutcome next() {
 batchLoader.resetRecordCount();
 stats.startProcessing();
+
+RawFragmentBatch batch = null;
 try{
-  RawFragmentBatch batch;
+
--- End diff --

Can you clarify your ask?


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452483#comment-16452483
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184061116
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java 
---
@@ -20,12 +20,17 @@
 import java.io.IOException;
 import java.net.URL;
 
+import com.google.common.base.Function;
--- End diff --

I tested it manually. I considered 2 options:
1. UDF has an input old decimal type. In this case, function resolver adds 
cast from vardecimal to old decimal type which is used in UDF.
2. UDF returns the old decimal type. In this case, Drill casts the result 
of UDF to vardecimal.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452488#comment-16452488
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184062425
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetFixedWidthDictionaryReaders.java
 ---
@@ -248,27 +227,61 @@ protected void readField(long 
recordsToReadInThisPass) {
 }
   }
 
-  static class DictionaryDecimal18Reader extends 
FixedByteAlignedReader {
-DictionaryDecimal18Reader(ParquetRecordReader parentReader, int 
allocateSize, ColumnDescriptor descriptor,
-   ColumnChunkMetaData columnChunkMetaData, 
boolean fixedLength, Decimal18Vector v,
-   SchemaElement schemaElement) throws 
ExecutionSetupException {
+  static class DictionaryVarDecimalReader extends 
FixedByteAlignedReader {
+
+DictionaryVarDecimalReader(ParquetRecordReader parentReader, int 
allocateSize, ColumnDescriptor descriptor,
+ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, 
VarDecimalVector v,
+SchemaElement schemaElement) throws ExecutionSetupException {
   super(parentReader, allocateSize, descriptor, columnChunkMetaData, 
fixedLength, v, schemaElement);
 }
 
 // this method is called by its superclass during a read loop
 @Override
 protected void readField(long recordsToReadInThisPass) {
+  recordsReadInThisIteration =
+  Math.min(pageReader.currentPageCount - pageReader.valuesRead,
+  recordsToReadInThisPass - valuesReadInCurrentPass);
+
+  switch (columnDescriptor.getType()) {
+case INT32:
+  if (usingDictionary) {
+for (int i = 0; i < recordsReadInThisIteration; i++) {
+  byte[] bytes = 
Ints.toByteArray(pageReader.dictionaryValueReader.readInteger());
+  setValueBytes(i, bytes);
+}
+setWriteIndex();
+  } else {
+super.readField(recordsToReadInThisPass);
+  }
+  break;
+case INT64:
+  if (usingDictionary) {
+for (int i = 0; i < recordsReadInThisIteration; i++) {
+  byte[] bytes = 
Longs.toByteArray(pageReader.dictionaryValueReader.readLong());
+  setValueBytes(i, bytes);
+}
+setWriteIndex();
+  } else {
+super.readField(recordsToReadInThisPass);
+  }
+  break;
+  }
+}
 
-  recordsReadInThisIteration = Math.min(pageReader.currentPageCount
-- pageReader.valuesRead, recordsToReadInThisPass - 
valuesReadInCurrentPass);
+/**
+ * Set the write Index. The next page that gets read might be a page 
that does not use dictionary encoding
+ * and we will go into the else condition below. The readField method 
of the parent class requires the
+ * writer index to be set correctly.
+ */
+private void setWriteIndex() {
+  readLengthInBits = recordsReadInThisIteration * dataTypeLengthInBits;
+  readLength = (int) Math.ceil(readLengthInBits / 8.0);
--- End diff --

This is the number of bits in a byte, but a double value is used to avoid 
integer division. Thanks for pointing this, replaced by constant.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452476#comment-16452476
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184009513
  
--- Diff: exec/vector/src/main/codegen/templates/ValueHolders.java ---
@@ -32,99 +32,81 @@
 /*
  * This class is generated using freemarker and the ${.template_name} 
template.
  */
-public final class ${className} implements ValueHolder{
+public final class ${className} implements ValueHolder {
   
   public static final MajorType TYPE = 
Types.${mode.name?lower_case}(MinorType.${minor.class?upper_case});
 
-<#if mode.name == "Repeated">
+  <#if mode.name == "Repeated">
 
-/** The first index (inclusive) into the Vector. **/
-public int start;
+  /** The first index (inclusive) into the Vector. **/
+  public int start;
 
-/** The last index (exclusive) into the Vector. **/
-public int end;
+  /** The last index (exclusive) into the Vector. **/
+  public int end;
 
-/** The Vector holding the actual values. **/
-public ${minor.class}Vector vector;
+  /** The Vector holding the actual values. **/
+  public ${minor.class}Vector vector;
 
-<#else>
-public static final int WIDTH = ${type.width};
+  <#else>
+  public static final int WIDTH = ${type.width};
 
-<#if mode.name == "Optional">public int isSet;
-<#assign fields = minor.fields!type.fields />
-<#list fields as field>
-public ${field.type} ${field.name};
-
+  <#if mode.name == "Optional">public int isSet;
+  <#assign fields = minor.fields!type.fields />
+  <#list fields as field>
+  public ${field.type} ${field.name};
+  
 
-<#if minor.class.startsWith("Decimal")>
-public static final int maxPrecision = ${minor.maxPrecisionDigits};
-<#if minor.class.startsWith("Decimal28") || 
minor.class.startsWith("Decimal38")>
-public static final int nDecimalDigits = ${minor.nDecimalDigits};
+  <#if minor.class.startsWith("Decimal")>
+  public static final int maxPrecision = ${minor.maxPrecisionDigits};
+  <#if minor.class.startsWith("Decimal28") || 
minor.class.startsWith("Decimal38")>
--- End diff --

Thanks, good idea, marked as deprecated and added a comment to use 
VarDecimalHolder instead.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452478#comment-16452478
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184012035
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
 ---
@@ -409,4 +409,30 @@ public void 
testNLJoinCorrectnessRightMultipleBatches() throws Exception {
   setSessionOption(ExecConstants.SLICE_TARGET, 10);
 }
   }
+
+  @Test
+  public void testNlJoinWithStringsInCondition() throws Exception {
+try {
+  test(DISABLE_NLJ_SCALAR);
+  test(DISABLE_JOIN_OPTIMIZATION);
+
+  final String query =
--- End diff --

Before my changes similar query but with decimal literal in condition 
passed because it was handled as double. But since with my changes decimal 
literal is handled as the decimal, it is discovered a bug in nested loop join 
which is observed for the types that use drillbuf to shore their value.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452473#comment-16452473
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184010194
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestVarlenDecimal.java
 ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.test.BaseTestQuery;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.hamcrest.CoreMatchers;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.math.BigDecimal;
+import java.nio.file.Paths;
+
+public class TestVarlenDecimal extends BaseTestQuery {
+
+  private static final String DATAFILE = 
"cp.`parquet/varlenDecimal.parquet`";
+
+  @Test
+  public void testNullCount() throws Exception {
+String query = String.format("select count(*) as c from %s where 
department_id is null", DATAFILE);
--- End diff --

Agree, removed `String.format` here and in other places.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452466#comment-16452466
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184002812
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/RangeExprEvaluator.java
 ---
@@ -219,6 +219,7 @@ private Statistics 
evalCastFunc(FunctionHolderExpression holderExpr, Statistics
 return null; // cast func between srcType and destType is NOT 
allowed.
   }
 
+  // TODO: add decimal support
--- End diff --

Thanks, removed.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452484#comment-16452484
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184035111
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestVarDecimalFunctions.java
 ---
@@ -0,0 +1,911 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.fn.impl;
+
+import org.apache.drill.categories.SqlFunctionTest;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.math.BigDecimal;
+import java.math.MathContext;
+import java.math.RoundingMode;
+
+@Category(SqlFunctionTest.class)
+public class TestVarDecimalFunctions extends BaseTestQuery {
+
+  // Tests for math functions
+
+  @Test
+  public void testDecimalAdd() throws Exception {
+String query =
+"select\n" +
+// checks trimming of scale
+"cast('999.92345678912' as DECIMAL(38, 
11))\n" +
+"+ cast('0.32345678912345678912345678912345678912' as 
DECIMAL(38, 38)) as s1,\n" +
+// sanitary checks
+"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 
2))\n" +
+"+ cast('123456789123456789123456789123456.789' as DECIMAL(36, 
3)) as s2,\n" +
+"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 
2))\n" +
+"+ cast('0' as DECIMAL(36, 3)) as s3,\n" +
+"cast('15.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 
2)) as s4,\n" +
+"cast('11.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 
2)) as s5,\n" +
+"cast('0' as DECIMAL(36, 2)) - cast('12.93' as DECIMAL(36, 2)) 
as s6,\n" +
+// check trimming (negative scale)
+"cast('2345678912' as DECIMAL(38, 
0))\n" +
+"+ cast('32345678912345678912345678912345678912' as 
DECIMAL(38, 0)) as s7";
+try {
+  alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true);
+  testBuilder()
+.sqlQuery(query)
+.ordered()
+.baselineColumns("s1", "s2", "s3", "s4", "s5", "s6", "s7")
+.baselineValues(
+new BigDecimal("999.92345678912")
+.add(new 
BigDecimal("0.32345678912345678912345678912345678912"))
+.round(new MathContext(38, RoundingMode.HALF_UP)),
+new BigDecimal("1358024680358024680358024680358024.679"),
+new BigDecimal("1234567891234567891234567891234567.890"),
+new BigDecimal("2.09"), new BigDecimal("-1.91"), new 
BigDecimal("-12.93"),
+new BigDecimal("1.3234567891234567891234567890469135782E+38"))
+.build()
--- End diff --

Thanks, replaced.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452482#comment-16452482
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184051298
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestFixedlenDecimal.java
 ---
@@ -20,61 +20,74 @@
 import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.exec.planner.physical.PlannerSettings;
-import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({UnlikelyTest.class})
 public class TestFixedlenDecimal extends BaseTestQuery {
-  // enable decimal data type
-  @BeforeClass
-  public static void enableDecimalDataType() throws Exception {
-test(String.format("alter session set `%s` = true", 
PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY));
-  }
-
   private static final String DATAFILE = 
"cp.`parquet/fixedlenDecimal.parquet`";
 
   @Test
   public void testNullCount() throws Exception {
-testBuilder()
-.sqlQuery("select count(*) as c from %s where department_id is 
null", DATAFILE)
-.unOrdered()
-.baselineColumns("c")
-.baselineValues(1L)
-.build()
-.run();
+try {
+  alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true);
+  testBuilder()
+  .sqlQuery("select count(*) as c from %s where department_id is 
null", DATAFILE)
+  .unOrdered()
+  .baselineColumns("c")
+  .baselineValues(1L)
+  .build()
+  .run();
+} finally {
+  resetSessionOption(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY);
+}
   }
 
   @Test
   public void testNotNullCount() throws Exception {
-testBuilder()
-.sqlQuery("select count(*) as c from %s where department_id is not 
null", DATAFILE)
-.unOrdered()
-.baselineColumns("c")
-.baselineValues(106L)
-.build()
-.run();
+try {
+  alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true);
+  testBuilder()
+  .sqlQuery("select count(*) as c from %s where department_id is 
not null", DATAFILE)
+  .unOrdered()
+  .baselineColumns("c")
+  .baselineValues(106L)
+  .build()
--- End diff --

Thanks, replaced here and in other places.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452475#comment-16452475
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184031285
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/TypeInferenceUtils.java
 ---
@@ -382,13 +407,26 @@ public RelDataType inferReturnType(SqlOperatorBinding 
opBinding) {
 
   final RelDataType operandType = opBinding.getOperandType(0);
   final TypeProtos.MinorType inputMinorType = 
getDrillTypeFromCalciteType(operandType);
-  
if(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, 
TypeProtos.MinorType.BIGINT))
+  if 
(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, 
TypeProtos.MinorType.BIGINT))
   == TypeProtos.MinorType.BIGINT) {
 return createCalciteTypeWithNullability(
 factory,
 SqlTypeName.BIGINT,
 isNullable);
-  } else 
if(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, 
TypeProtos.MinorType.FLOAT8))
+  } else if 
(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, 
TypeProtos.MinorType.FLOAT4))
--- End diff --

Thanks, added an explanation for sum and avg return type inferences.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452465#comment-16452465
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r183998600
  
--- Diff: 
contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java
 ---
@@ -225,10 +247,10 @@ public int next() {
 int counter = 0;
 Boolean b = true;
 try {
-  while (counter < 4095 && b == true) { // loop at 4095 since 
nullables use one more than record count and we
+  while (counter < 4095 && b) { // loop at 4095 since nullables use 
one more than record count and we
 // allocate on powers of two.
 b = resultSet.next();
-if(b == false) {
+if(!b) {
--- End diff --

Thanks, done.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452487#comment-16452487
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184002128
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/output/DecimalReturnTypeInference.java
 ---
@@ -281,20 +295,45 @@
 @Override
 public TypeProtos.MajorType getType(List 
logicalExpressions, FunctionAttributes attributes) {
   int scale = 0;
-  int precision = 0;
 
   // Get the max scale and precision from the inputs
   for (LogicalExpression e : logicalExpressions) {
 scale = Math.max(scale, e.getMajorType().getScale());
-precision = Math.max(precision, e.getMajorType().getPrecision());
   }
 
-  return (TypeProtos.MajorType.newBuilder()
-  
.setMinorType(attributes.getReturnValue().getType().getMinorType())
+  return TypeProtos.MajorType.newBuilder()
+  .setMinorType(TypeProtos.MinorType.VARDECIMAL)
   .setScale(scale)
-  .setPrecision(38)
-  .setMode(TypeProtos.DataMode.REQUIRED)
-  .build());
+  .setPrecision(DRILL_REL_DATATYPE_SYSTEM.getMaxNumericPrecision())
+  .setMode(TypeProtos.DataMode.OPTIONAL)
+  .build();
+}
+  }
+
+  /**
+   * Return type calculation implementation for functions with return type 
set as
+   * {@link 
org.apache.drill.exec.expr.annotations.FunctionTemplate.ReturnType#DECIMAL_AVG_AGGREGATE}.
+   */
+  public static class DecimalAvgAggReturnTypeInference implements 
ReturnTypeInference {
--- End diff --

Thanks, added.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452486#comment-16452486
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184099659
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -559,6 +560,19 @@ public RexNode makeCast(RelDataType type, RexNode exp, 
boolean matchNullability)
   if (matchNullability) {
 return makeAbstractCast(type, exp);
   }
+  // for the case when BigDecimal literal has a scale or precision
+  // that differs from the value from specified RelDataType, cast 
cannot be removed
+  // TODO: remove this code when CALCITE-1468 is fixed
+  if (type.getSqlTypeName() == SqlTypeName.DECIMAL && exp instanceof 
RexLiteral) {
--- End diff --

Created DRILL-6355.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452485#comment-16452485
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184065631
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/FrameSupportTemplate.java
 ---
@@ -300,7 +300,7 @@ public void cleanup() {
* @param index of row to aggregate
*/
   public abstract void evaluatePeer(@Named("index") int index);
-  public abstract void setupEvaluatePeer(@Named("incoming") 
VectorAccessible incoming, @Named("outgoing") VectorAccessible outgoing) throws 
SchemaChangeException;
+  public abstract void setupEvaluatePeer(@Named("incoming") 
WindowDataBatch incoming, @Named("outgoing") VectorAccessible outgoing) throws 
SchemaChangeException;
--- End diff --

On the earlier stage of making changes, compilation error has appeared for 
runtime-generated code without this change, but for now, I don't see it without 
this change, so I reverted it.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452469#comment-16452469
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184008881
  
--- Diff: 
exec/vector/src/main/codegen/templates/AbstractPromotableFieldWriter.java ---
@@ -75,12 +75,19 @@ public void endList() {
 
   <#list vv.types as type><#list type.minor as minor><#assign name = 
minor.class?cap_first />
   <#assign fields = minor.fields!type.fields />
-  <#if !minor.class?starts_with("Decimal") >
+  <#if minor.class?contains("VarDecimal")>
+  @Override
+  public void write${minor.class}(BigDecimal value) {
+getWriter(MinorType.${name?upper_case}).write${minor.class}(value);
+  }
+  
+
   @Override
   public void write(${name}Holder holder) {
 getWriter(MinorType.${name?upper_case}).write(holder);
   }
 
+  <#if !minor.class?contains("Decimal") || 
minor.class?contains("VarDecimal")>
--- End diff --

Thanks, added.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-04-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452468#comment-16452468
 ] 

ASF GitHub Bot commented on DRILL-6094:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1232#discussion_r184008595
  
--- Diff: exec/vector/src/main/codegen/templates/AbstractFieldReader.java 
---
@@ -29,9 +29,9 @@
  * This class is generated using freemarker and the ${.template_name} 
template.
  */
 @SuppressWarnings("unused")
-abstract class AbstractFieldReader extends AbstractBaseReader implements 
FieldReader{
+abstract class AbstractFieldReader extends AbstractBaseReader implements 
FieldReader {
 
-  AbstractFieldReader(){
+  AbstractFieldReader() {
--- End diff --

Done.


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 147 matches

Mail list logo