date:20170817

Volodymyr Vysotskyi created DRILL-5725:
--

 Summary: Update Jackson version to 2.7.8
 Key: DRILL-5725
 URL: https://issues.apache.org/jira/browse/DRILL-5725
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Volodymyr Vysotskyi


Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update Jackson 
version to 2.7.8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5725) Update Jackson version to 2.7.8


 [ 
https://issues.apache.org/jira/browse/DRILL-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi reassigned DRILL-5725:
--

Assignee: Volodymyr Vysotskyi

> Update Jackson version to 2.7.8
> ---
>
> Key: DRILL-5725
> URL: https://issues.apache.org/jira/browse/DRILL-5725
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>
> Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update 
> Jackson version to 2.7.8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5725) Update Jackson version to 2.7.8


[ 
https://issues.apache.org/jira/browse/DRILL-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130239#comment-16130239
 ] 

ASF GitHub Bot commented on DRILL-5725:
---

GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/908

DRILL-5725: Update Jackson version to 2.7.8



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-5725

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #908


commit 283d1c73cbc2a72aa55396dc11a221b9380f091d
Author: Volodymyr Vysotskyi 
Date:   2017-08-16T14:16:55Z

DRILL-5725: Update Jackson version to 2.7.8




> Update Jackson version to 2.7.8
> ---
>
> Key: DRILL-5725
> URL: https://issues.apache.org/jira/browse/DRILL-5725
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>
> Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update 
> Jackson version to 2.7.8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5725) Update Jackson version to 2.7.8


 [ 
https://issues.apache.org/jira/browse/DRILL-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-5725:
---
Description: 
Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update Jackson 
version to 2.7.8.

All Jackson versions 2.7.x before 2.7.8 have [CVE-2016-7051 
vulnerability|https://nvd.nist.gov/vuln/detail/CVE-2016-7051]. 
The problem was with the {{jackson-dataformat-xml}} module 
([issue-211|https://github.com/FasterXML/jackson-dataformat-xml/issues/211]). 
Drill does not use this module yet, but we want to update the version for the 
case when we will start to use this module.

  was:Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update 
Jackson version to 2.7.8


> Update Jackson version to 2.7.8
> ---
>
> Key: DRILL-5725
> URL: https://issues.apache.org/jira/browse/DRILL-5725
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>
> Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update 
> Jackson version to 2.7.8.
> All Jackson versions 2.7.x before 2.7.8 have [CVE-2016-7051 
> vulnerability|https://nvd.nist.gov/vuln/detail/CVE-2016-7051]. 
> The problem was with the {{jackson-dataformat-xml}} module 
> ([issue-211|https://github.com/FasterXML/jackson-dataformat-xml/issues/211]). 
> Drill does not use this module yet, but we want to update the version for the 
> case when we will start to use this module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5725) Update Jackson version to 2.7.8


 [ 
https://issues.apache.org/jira/browse/DRILL-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-5725:
---
Description: 
Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update Jackson 
version to 2.7.8.

All Jackson versions 2.7.x before 2.7.8 have [CVE-2016-7051 
vulnerability|https://nvd.nist.gov/vuln/detail/CVE-2016-7051]. 
The problem was with the {{jackson-dataformat-xml}} module 
([issue-211|https://github.com/FasterXML/jackson-dataformat-xml/issues/211]). 
Drill does not use this module yet, but we want to update the version for the 
case when we start to use this module.

  was:
Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update Jackson 
version to 2.7.8.

All Jackson versions 2.7.x before 2.7.8 have [CVE-2016-7051 
vulnerability|https://nvd.nist.gov/vuln/detail/CVE-2016-7051]. 
The problem was with the {{jackson-dataformat-xml}} module 
([issue-211|https://github.com/FasterXML/jackson-dataformat-xml/issues/211]). 
Drill does not use this module yet, but we want to update the version for the 
case when we will start to use this module.


> Update Jackson version to 2.7.8
> ---
>
> Key: DRILL-5725
> URL: https://issues.apache.org/jira/browse/DRILL-5725
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>
> Currently, Drill uses Jackson 2.7.1. The goal of this Jira is to update 
> Jackson version to 2.7.8.
> All Jackson versions 2.7.x before 2.7.8 have [CVE-2016-7051 
> vulnerability|https://nvd.nist.gov/vuln/detail/CVE-2016-7051]. 
> The problem was with the {{jackson-dataformat-xml}} module 
> ([issue-211|https://github.com/FasterXML/jackson-dataformat-xml/issues/211]). 
> Drill does not use this module yet, but we want to update the version for the 
> case when we start to use this module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-4264) Allow field names to include dots


 [ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-4264:
---
Summary: Allow field names to include dots  (was: Dots in identifier are 
not escaped correctly)

> Allow field names to include dots
> -
>
> Key: DRILL-4264
> URL: https://issues.apache.org/jira/browse/DRILL-4264
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Reporter: Alex
>Assignee: Volodymyr Vysotskyi
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> If you have some json data like this...
> {code:javascript}
> {
>   "0.0.1":{
> "version":"0.0.1",
> "date_created":"2014-03-15"
>   },
>   "0.1.2":{
> "version":"0.1.2",
> "date_created":"2014-05-21"
>   }
> }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4264) Allow field names to include dots


[ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130668#comment-16130668
 ] 

ASF GitHub Bot commented on DRILL-4264:
---

GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/909

DRILL-4264: Allow field names to include dots

1. Removed checking the field name for dots.
2. Replaced using `SchemaPath.getAsUnescapedPath()` method by 
`SchemaPath.getRootSegmentPath()` and 
`SchemaPathUtil.getMaterializedFieldFromSchemaPath()` where it is needed. 
3. Replaced using `MaterializedField.getPath()` and 
`MaterializedField.getLastName()` methods by `MaterializedField.getName()` 
method and checked the correctness of the behaviour.
4. Added tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-4264

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #909


commit 4ba59488a96fb79455b192ed960a728481ceaf93
Author: Volodymyr Vysotskyi 
Date:   2017-07-05T19:08:59Z

DRILL-4264: Allow field names to include dots




> Allow field names to include dots
> -
>
> Key: DRILL-4264
> URL: https://issues.apache.org/jira/browse/DRILL-4264
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Reporter: Alex
>Assignee: Volodymyr Vysotskyi
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> If you have some json data like this...
> {code:javascript}
> {
>   "0.0.1":{
> "version":"0.0.1",
> "date_created":"2014-03-15"
>   },
>   "0.1.2":{
> "version":"0.1.2",
> "date_created":"2014-05-21"
>   }
> }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs

2017-08-17 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5507:


Assignee: Timothy Farkas

> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5726) Support Impersonation without authentication for REST API

2017-08-17 Thread Arina Ielchiieva (JIRA)

Arina Ielchiieva created DRILL-5726:
---

 Summary: Support Impersonation without authentication for REST API
 Key: DRILL-5726
 URL: https://issues.apache.org/jira/browse/DRILL-5726
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.11.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.12.0


Today if a user is not authenticated via REST API then there is no way to 
provide a user name for executing queries. It will by default be executed as 
"anonymous" user. This doesn't work when impersonation without authentication 
is enabled on Drill server side, since anonymous user doesn't exist the query 
will fail. We need a way to provide a user name when impersonation is enabled 
on Drill side and query is executed from REST API.

_Implementation details:_
When only impersonation is enabled form-based authentication will be used.
On Web UI user will be prompted to enter only login, then session for that user 
will be created, user will be treated as admin. Form-based authentication will 
cache user information, so user won't need to set username each time he / she 
wants to execute the query. Log in / out options will be also available. 
Screenshot of login page is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5726) Support Impersonation without authentication for REST API

2017-08-17 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5726:

Attachment: login_page.JPG

> Support Impersonation without authentication for REST API
> -
>
> Key: DRILL-5726
> URL: https://issues.apache.org/jira/browse/DRILL-5726
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.12.0
>
> Attachments: login_page.JPG
>
>
> Today if a user is not authenticated via REST API then there is no way to 
> provide a user name for executing queries. It will by default be executed as 
> "anonymous" user. This doesn't work when impersonation without authentication 
> is enabled on Drill server side, since anonymous user doesn't exist the query 
> will fail. We need a way to provide a user name when impersonation is enabled 
> on Drill side and query is executed from REST API.
> _Implementation details:_
> When only impersonation is enabled form-based authentication will be used.
> On Web UI user will be prompted to enter only login, then session for that 
> user will be created, user will be treated as admin. Form-based 
> authentication will cache user information, so user won't need to set 
> username each time he / she wants to execute the query. Log in / out options 
> will be also available. Screenshot of login page is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5726) Support Impersonation without authentication for REST API


[ 
https://issues.apache.org/jira/browse/DRILL-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130800#comment-16130800
 ] 

ASF GitHub Bot commented on DRILL-5726:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/910

DRILL-5726: Support Impersonation without authentication for REST API

Details in [DRILL-5726](https://issues.apache.org/jira/browse/DRILL-5726).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5726

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #910


commit f3986726f601ed31f93209b2daec497a7fbd2870
Author: Arina Ielchiieva 
Date:   2017-08-17T12:08:12Z

DRILL-5726: Support Impersonation without authentication for REST API




> Support Impersonation without authentication for REST API
> -
>
> Key: DRILL-5726
> URL: https://issues.apache.org/jira/browse/DRILL-5726
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.12.0
>
> Attachments: login_page.JPG
>
>
> Today if a user is not authenticated via REST API then there is no way to 
> provide a user name for executing queries. It will by default be executed as 
> "anonymous" user. This doesn't work when impersonation without authentication 
> is enabled on Drill server side, since anonymous user doesn't exist the query 
> will fail. We need a way to provide a user name when impersonation is enabled 
> on Drill side and query is executed from REST API.
> _Implementation details:_
> When only impersonation is enabled form-based authentication will be used.
> On Web UI user will be prompted to enter only login, then session for that 
> user will be created, user will be treated as admin. Form-based 
> authentication will cache user information, so user won't need to set 
> username each time he / she wants to execute the query. Log in / out options 
> will be also available. Screenshot of login page is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131289#comment-16131289
 ] 

Timothy Farkas commented on DRILL-5507:
---

Many of these messages get produced because there can be many Blocks for each 
file. This message doesn't indicate an error and should be debug level instead 
info.

> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131289#comment-16131289
 ] 

Timothy Farkas edited comment on DRILL-5507 at 8/17/17 9:04 PM:


Many of these messages get produced because there can be many Blocks for each 
file. This message doesn't indicate an error and should be debug level instead 
of info.


was (Author: timothyfarkas):
Many of these messages get produced because there can be many Blocks for each 
file. This message doesn't indicate an error and should be debug level instead 
info.

> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131331#comment-16131331
 ] 

ASF GitHub Bot commented on DRILL-5507:
---

GitHub user ilooner-mapr opened a pull request:

https://github.com/apache/drill/pull/911

 - DRILL-5507 Made verbose info logging message debug level and print…

…ed it less frequently

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner-mapr/drill DRILL-5507

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #911


commit 34d5fe6e215526176a876fd076b6110ffa8c829d
Author: Timothy Farkas 
Date:   2017-08-17T21:29:38Z

 - DRILL-5507 Made verbose info logging message debug level and printed it 
less frequently




> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131334#comment-16131334
 ] 

ASF GitHub Bot commented on DRILL-5507:
---

Github user ilooner-mapr commented on the issue:

https://github.com/apache/drill/pull/911
  
@paul-rogers 


> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5727) Update release profile to generate SHA-512 checksum.

2017-08-17 Thread Parth Chandra (JIRA)

Parth Chandra created DRILL-5727:


 Summary: Update release profile to generate SHA-512 checksum.
 Key: DRILL-5727
 URL: https://issues.apache.org/jira/browse/DRILL-5727
 Project: Apache Drill
  Issue Type: Bug
Reporter: Parth Chandra


Per latest release guidelines, we should generate a sha-512 checksum with the 
release artifacts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


 [ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-5507:
--
Reviewer: Paul Rogers

> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131362#comment-16131362
 ] 

ASF GitHub Bot commented on DRILL-5507:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/911#discussion_r133838104
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java
 ---
@@ -228,6 +230,7 @@ public EndpointByteMap getEndpointByteMap(FileWork 
work) throws IOException {
 
 // Find submap of ranges that intersect with the rowGroup
 ImmutableRangeMap subRangeMap = 
blockMap.subRangeMap(rowGroupRange);
+Set noDrillbitHosts = Sets.newHashSet();
--- End diff --

Consider `final Set noDrillbitHosts = logger.isDebugEnabled() ? 
Sets.newHashSet() : null;`


> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131363#comment-16131363
 ] 

ASF GitHub Bot commented on DRILL-5507:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/911#discussion_r133838389
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java
 ---
@@ -246,12 +249,16 @@ public EndpointByteMap getEndpointByteMap(FileWork 
work) throws IOException {
 DrillbitEndpoint endpoint = getDrillBitEndpoint(host);
 if (endpoint != null) {
   endpointByteMap.add(endpoint, bytes);
-} else {
-  logger.info("Failure finding Drillbit running on host {}.  
Skipping affinity to that host.", host);
+} else if (logger.isDebugEnabled()) {
--- End diff --

and `} else if (noDrillbitHosts != null && noDrillbitHosts.add(host)) { 
logger.debug(...); }`


> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5507) Millions of "Failure finding Drillbit running on host" info messages in foreman logs


[ 
https://issues.apache.org/jira/browse/DRILL-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131455#comment-16131455
 ] 

ASF GitHub Bot commented on DRILL-5507:
---

Github user ilooner-mapr commented on the issue:

https://github.com/apache/drill/pull/911
  
Applied comments


> Millions of "Failure finding Drillbit running on host" info messages in 
> foreman logs
> 
>
> Key: DRILL-5507
> URL: https://issues.apache.org/jira/browse/DRILL-5507
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Veera Naranammalpuram
>Assignee: Timothy Farkas
>
> When foreman tries to execute a query with data files that reside on nodes 
> other than the nodes that are running drillbits, there are millions of 
> messages in the log files like this:
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> After 200 MB of these messages, 960K of them and 7 seconds of planning, it 
> continues to execute this query. Is there a way to disable / suppress these 
> messages? What's causing this? Should this be printed with INFO logging as 
> we're seeing or only with DEBUG logging? Is there a way to turn off this 
> check? Below is a snippet from foreman logs: 
> 2017-05-10 13:22:10,916 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1402.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0929.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1098.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1230.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1388.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp0535.mydomain.com.  Skipping affinity to that host.
> 2017-05-10 13:22:10,923 [26eca5b2-a847-b7fe-adff-1685de29bb7a:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host 
> hostlp1291.mydomain.com.  Skipping affinity to that host.
> The hostname and domain are scrubbed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch

Boaz Ben-Zvi created DRILL-5728:
---

 Summary: Hash Aggregate: Useless bigint value vector in the values 
batch
 Key: DRILL-5728
 URL: https://issues.apache.org/jira/browse/DRILL-5728
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen
Affects Versions: 1.11.0
Reporter: Boaz Ben-Zvi
Priority: Minor


 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":


public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}


 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


 [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5728:

Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{{public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}}}


 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":


public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}


 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> {{public void updateAggrValuesInternal(int incomingRowIdx, int 
> htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>  
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>  
> work0 = valu

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


 [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5728:

Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{quote}public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}{quote}


 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{quote}{{public void updateAggrValuesInternal(int incomingRowIdx, int 
htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}}}{quote}


 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> {quote}public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>  
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>  
>

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


 [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5728:

Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{quote}{{public void updateAggrValuesInternal(int incomingRowIdx, int 
htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}}}{quote}


 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{{public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}}}


 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> {quote}{{public void updateAggrValuesInternal(int incomingRowIdx, int 
> htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>  
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>  
>

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


 [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5728:

Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
bq. throws SchemaChangeException
bq. {
bq. {
bq. IntHolder out11 = new IntHolder();
bq. {
bq. out11 .value = vv8 .getAccessor().get((incomingRowIdx));
bq. }
bq. IntHolder in = out11;
bq. work0 .value = vv1 .getAccessor().get((htRowIdx));
bq. BigIntHolder value = work0;
bq. work4 .value = vv5 .getAccessor().get((htRowIdx));
bq. BigIntHolder nonNullCount = work4;
bq.  
bq. SumFunctions$IntSum_add: {
bq. nonNullCount.value = 1;
bq. value.value += in.value;
bq. }
bq.  
bq. work0 = value;
bq. vv1 .getMutator().set((htRowIdx), work0 .value);
bq. work4 = nonNullCount;
bq. vv5 .getMutator().set((htRowIdx), work4 .value);
bq. }
bq. }


 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{quote}public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}{quote}


 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> bq. throws SchemaChangeException
> bq. {
> bq. {
> bq. IntHolder out11 = new IntHolder();
> bq. {
> bq. out11 .value = vv8 
> .getAccessor().get((incomingRowIdx));
> bq. }
> bq. IntHolder in = out11;
> bq. work0 .value = vv1 .getAccessor().get((htRowIdx));
> bq. BigIntHolder value = work0;
> bq. work4 .value = vv5 .getAccessor().get((htRowIdx));
> bq. BigIntHolder nonNullCount = work4;
> bq.

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


 [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5728:

Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":




 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
bq. throws SchemaChangeException
bq. {
bq. {
bq. IntHolder out11 = new IntHolder();
bq. {
bq. out11 .value = vv8 .getAccessor().get((incomingRowIdx));
bq. }
bq. IntHolder in = out11;
bq. work0 .value = vv1 .getAccessor().get((htRowIdx));
bq. BigIntHolder value = work0;
bq. work4 .value = vv5 .getAccessor().get((htRowIdx));
bq. BigIntHolder nonNullCount = work4;
bq.  
bq. SumFunctions$IntSum_add: {
bq. nonNullCount.value = 1;
bq. value.value += in.value;
bq. }
bq.  
bq. work0 = value;
bq. vv1 .getMutator().set((htRowIdx), work0 .value);
bq. work4 = nonNullCount;
bq. vv5 .getMutator().set((htRowIdx), work4 .value);
bq. }
bq. }


 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


 [ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5728:

Description: 
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":

{code}
public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}

{code}


 

  was:
 When aggregating a non-nullable column (like *sum(l_partkey)* below), the code 
generation creates an extra value vector (in addition to the actual "sum" 
vector) which is used as a "nonNullCount".
   This is useless (as the underlying column is non-nullable), and wastes 
considerable memory ( 8 * 64K = 512K per each value in a batch !!)

Example query:

{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
l_orderkry;}}


And as can be seen in the generated code below, the bigint value vector *vv5* 
is only used to hold a *1* flag to note "not null":




 


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> {code}
> public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>  
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>  
> work0 = value;
> vv1 .getMutator().set((htRowIdx), work0 .value);
> work4 = nonNullCount;
> vv5 .getMutator().set((htRowIdx), work4 .value);
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


[ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131478#comment-16131478
 ] 

Boaz Ben-Zvi commented on DRILL-5728:
-

Similar code is used when the underlying value column is nullable (see below). 
In this case the additional value vector may be needed, but maybe can be 
replaced by a bitset instead of bigint vector to save memory.

{code}
public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
NullableBigIntHolder out11 = new NullableBigIntHolder();
{
out11 .isSet = vv8 .getAccessor().isSet((incomingRowIdx));
if (out11 .isSet == 1) {
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
}
NullableBigIntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
 
SumFunctions$NullableBigIntSum_add: {
sout:
{
if (in.isSet == 0) {
break sout;
}
nonNullCount.value = 1;
value.value += in.value;
}
}
 
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}
{code}


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> {code}
> public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>  
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>  
> work0 = value;
> vv1 .getMutator().set((htRowIdx), work0 .value);
> work4 = nonNullCount;
> vv5 .getMutator().set((htRowIdx), work4 .value);
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5729) Fix Travis Checks

Timothy Farkas created DRILL-5729:
-

 Summary: Fix Travis Checks
 Key: DRILL-5729
 URL: https://issues.apache.org/jira/browse/DRILL-5729
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas
 Fix For: 1.12.0


Currently the Travis Checks are failing. The failures are happening because 
Travis recently switched their default build containers from Ubuntu precise to 
Ubuntu Trusty and we do not explicitly define the dist we build on in our 
travis.yml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5729) Fix Travis Checks


[ 
https://issues.apache.org/jira/browse/DRILL-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131482#comment-16131482
 ] 

Timothy Farkas commented on DRILL-5729:
---

The fix would be to explicitly use ubuntu precise until we understand why 
ubuntu trusty has issues.

> Fix Travis Checks
> -
>
> Key: DRILL-5729
> URL: https://issues.apache.org/jira/browse/DRILL-5729
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
> Fix For: 1.12.0
>
>
> Currently the Travis Checks are failing. The failures are happening because 
> Travis recently switched their default build containers from Ubuntu precise 
> to Ubuntu Trusty and we do not explicitly define the dist we build on in our 
> travis.yml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5729) Fix Travis Checks


[ 
https://issues.apache.org/jira/browse/DRILL-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131484#comment-16131484
 ] 

ASF GitHub Bot commented on DRILL-5729:
---

GitHub user ilooner-mapr opened a pull request:

https://github.com/apache/drill/pull/913

 - DRILL-5729 Fix Travis Build



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner-mapr/drill DRILL-5729

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/913.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #913


commit b52188f8091a3b22021cb1efaa42f7ebb1b2792d
Author: Timothy Farkas 
Date:   2017-08-17T23:37:24Z

 - DRILL-5729 explicitly made the travis container ubuntu precise to fix 
build errors caused by ubuntu trusty.




> Fix Travis Checks
> -
>
> Key: DRILL-5729
> URL: https://issues.apache.org/jira/browse/DRILL-5729
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
> Fix For: 1.12.0
>
>
> Currently the Travis Checks are failing. The failures are happening because 
> Travis recently switched their default build containers from Ubuntu precise 
> to Ubuntu Trusty and we do not explicitly define the dist we build on in our 
> travis.yml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5588) Hash Aggregate: Avoid copy on output of aggregate columns


[ 
https://issues.apache.org/jira/browse/DRILL-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131486#comment-16131486
 ] 

Boaz Ben-Zvi commented on DRILL-5588:
-

See DRILL-5728 : The generated code allocates a special bigint value vector to 
hold the "nullable" bits for the "real" values value vector. This 
implementation would need to be changed (to a nullable value vector for the 
values) so we could return the values value vector as is.
  

> Hash Aggregate: Avoid copy on output of aggregate columns
> -
>
> Key: DRILL-5588
> URL: https://issues.apache.org/jira/browse/DRILL-5588
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>
>  When the Hash Aggregate operator outputs its result batches downstream, the 
> key columns (value vectors) are returned as is, but for the aggregate columns 
> new value vectors are allocated and the values are copied. This has an impact 
> on performance. (see the method allocateOutgoing() ). A second effect is on 
> memory management (as this allocation is not planned for by the code that 
> controls spilling, etc).
>For some simple aggregate functions (e.g. SUM), the stored value vectors 
> for the aggregate values can be returned as is. For functions like AVG, there 
> is a need to divide the SUM values by the COUNT values. Still this can be 
> done in-place (of the SUM values) and avoid new allocation and copy. 
>For VarChar type aggregate values (only used by MAX or MIN), there is 
> another issue -- currently any such value vector is allocated as an 
> ObjectVector (see BatchHolder()) (and on the JVM heap, not in direct memory). 
> This is to manage the sizes of the values, which could change as the 
> aggregation progresses (e.g., for MAX(name) -- first record has 'abe', but 
> the next record has 'benjamin' which is both bigger ('b' > 'a') and longer). 
> For the final output, this requires a new allocation and a copy in order to 
> have a compact value vector in direct memory. Maybe the ObjectVector could be 
> replaced with some direct memory implementation that is optimized for "good" 
> values (e.g., all are of similar size), but penalized "bad" values (e.g., 
> reallocates or moves values, when needed) ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch


[ 
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131488#comment-16131488
 ] 

Boaz Ben-Zvi commented on DRILL-5728:
-

This Jira has some relation to DRILL-5588 -- we'd need to change the "values" 
value vector ( *vv1* ) to a nullable value vector so it could be returned as is 
with no need to iterate over the whole vector when producing an output.


> Hash Aggregate: Useless bigint value vector in the values batch
> ---
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the 
> code generation creates an extra value vector (in addition to the actual 
> "sum" vector) which is used as a "nonNullCount".
>This is useless (as the underlying column is non-nullable), and wastes 
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by 
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* 
> is only used to hold a *1* flag to note "not null":
> {code}
> public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>  
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>  
> work0 = value;
> vv1 .getMutator().set((htRowIdx), work0 .value);
> work4 = nonNullCount;
> vv5 .getMutator().set((htRowIdx), work4 .value);
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5729) Fix Travis Checks


[ 
https://issues.apache.org/jira/browse/DRILL-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131506#comment-16131506
 ] 

ASF GitHub Bot commented on DRILL-5729:
---

Github user ilooner-mapr commented on the issue:

https://github.com/apache/drill/pull/913
  
@parthchandra 


> Fix Travis Checks
> -
>
> Key: DRILL-5729
> URL: https://issues.apache.org/jira/browse/DRILL-5729
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
> Fix For: 1.12.0
>
>
> Currently the Travis Checks are failing. The failures are happening because 
> Travis recently switched their default build containers from Ubuntu precise 
> to Ubuntu Trusty and we do not explicitly define the dist we build on in our 
> travis.yml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5730) Fix Unit Test failures on JDK 8 And Some JDK 7 versions

Timothy Farkas created DRILL-5730:
-

 Summary: Fix Unit Test failures on JDK 8 And Some JDK 7 versions
 Key: DRILL-5730
 URL: https://issues.apache.org/jira/browse/DRILL-5730
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Tests fail on JDK 8 and oracle JDK 7 on my mac

Failed tests: 
  TestMetadataProvider.tables:153 expected: but was:
  TestMetadataProvider.tablesWithTableNameFilter:212 expected: but 
was:
  TestMetadataProvider.tablesWithSystemTableFilter:187 expected: but 
was:
  TestMetadataProvider.tablesWithTableFilter:176 expected: but was:

Tests in error: 
  TestInfoSchema.selectFromAllTables » UserRemote SYSTEM ERROR: 
URISyntaxExcepti...
  TestCustomUserAuthenticator.positiveUserAuth » UserRemote SYSTEM ERROR: 
URISyn...
  TestCustomUserAuthenticator.positiveUserAuthAfterNegativeUserAuth » UserRemote
  TestViewSupport.infoSchemaWithView:350->BaseTestQuery.testRunAndReturn:344 » 
Rpc
  TestParquetScan.testSuccessFile:58->BaseTestQuery.testRunAndReturn:344 » Rpc 
o...






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (DRILL-5268) SYSTEM ERROR: UnsupportedOperationException: Unable to get size for minor type [MAP] and mode [REQUIRED]

2017-08-17 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5268.

   Resolution: Fixed
Fix Version/s: 1.12.0

> SYSTEM ERROR: UnsupportedOperationException: Unable to get size for minor 
> type [MAP] and mode [REQUIRED]
> 
>
> Key: DRILL-5268
> URL: https://issues.apache.org/jira/browse/DRILL-5268
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
> Fix For: 1.12.0
>
> Attachments: drill5268.tgz
>
>
> git.commit.id.abbrev=300e934
> With the managed external sort turned on, I get the below error
> {code}
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 52428800;
> select * from (select d1.type, d1.evnt, d1.transaction from (select d.type 
> type, flatten(d.events) evnt, flatten(d.transactions) transaction from 
> dfs.`/drill/testdata/resource-manager/10rows/data.json` d) d1 order by 
> d1.evnt.event_time, d1.transaction.trans_time) d2 where d2.type='web' and 
> d2.evnt.evnt_type = 'cmpgn4';
> Error: SYSTEM ERROR: UnsupportedOperationException: Unable to get size for 
> minor type [MAP] and mode [REQUIRED]
> Fragment 0:0
> [Error Id: a9dc1de5-2ff7-44db-bdd2-b166b1f0cea8 on qa-node183.qa.lab:31010] 
> (state=,code=0)
> {code}
> If we do not enable the managed sort, then we end up with DRILL-5234



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5268) SYSTEM ERROR: UnsupportedOperationException: Unable to get size for minor type [MAP] and mode [REQUIRED]

2017-08-17 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5268:
--

Assignee: Paul Rogers

> SYSTEM ERROR: UnsupportedOperationException: Unable to get size for minor 
> type [MAP] and mode [REQUIRED]
> 
>
> Key: DRILL-5268
> URL: https://issues.apache.org/jira/browse/DRILL-5268
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: drill5268.tgz
>
>
> git.commit.id.abbrev=300e934
> With the managed external sort turned on, I get the below error
> {code}
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 52428800;
> select * from (select d1.type, d1.evnt, d1.transaction from (select d.type 
> type, flatten(d.events) evnt, flatten(d.transactions) transaction from 
> dfs.`/drill/testdata/resource-manager/10rows/data.json` d) d1 order by 
> d1.evnt.event_time, d1.transaction.trans_time) d2 where d2.type='web' and 
> d2.evnt.evnt_type = 'cmpgn4';
> Error: SYSTEM ERROR: UnsupportedOperationException: Unable to get size for 
> minor type [MAP] and mode [REQUIRED]
> Fragment 0:0
> [Error Id: a9dc1de5-2ff7-44db-bdd2-b166b1f0cea8 on qa-node183.qa.lab:31010] 
> (state=,code=0)
> {code}
> If we do not enable the managed sort, then we end up with DRILL-5234



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (DRILL-5253) External sort fails with OOM error (Fails to allocate sv2)

2017-08-17 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5253.

   Resolution: Fixed
Fix Version/s: 1.12.0

> External sort fails with OOM error (Fails to allocate sv2)
> --
>
> Key: DRILL-5253
> URL: https://issues.apache.org/jira/browse/DRILL-5253
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 2762f36d-a2e7-5582-922d-3c4626be18c0.sys.drill
>
>
> git.commit.id.abbrev=2af709f
> The data set used in the below query has the same value for every column in 
> every row. The query fails with an OOM as it exceeds the allocated memory
> {code}
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
>  select count(*) from (select * from identical order by col1, col2, col3, 
> col4, col5, col6, col7, col8, col9, col10);
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
> buffer after repeated attempts
> Fragment 2:0
> [Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Exception from the logs
> {code}
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
> buffer after repeated attempts
> [Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
> buffer after repeated attempts
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:371)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)

[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching


[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131756#comment-16131756
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r133859807
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -96,20 +145,46 @@ public static String sqlToRegexLike(
 || (nextChar == '%')
 || (nextChar == escapeChar)) {
   javaPattern.append(nextChar);
+  simplePattern.append(nextChar);
   i++;
 } else {
   throw invalidEscapeSequence(sqlPattern, i);
 }
   } else if (c == '_') {
+// if we find _, it is not simple pattern, we are looking for only 
%
+notSimple = true;
 javaPattern.append('.');
   } else if (c == '%') {
+if (i == 0) {
+  // % at the start could potentially be one of the simple cases 
i.e. ENDS_WITH.
+  endsWith = true;
+} else if (i == (len-1)) {
+  // % at the end could potentially be one of the simple cases 
i.e. STARTS_WITH
+  startsWith = true;
+} else {
+  // If we find % anywhere other than start or end, it is not a 
simple case.
--- End diff --

Consider ABC%XYZ.
It might be worthwhile to decide whether to leverage a pattern or fall back 
to Java's Regex util based on the number of occurrences of '%' as a criteria.


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching


[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131755#comment-16131755
 ] 

ASF GitHub Bot commented on DRILL-5697:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/907#discussion_r133859377
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java 
---
@@ -47,18 +47,55 @@
   "[:alnum:]", "\\p{Alnum}"
   };
 
+  // type of pattern string.
+  public enum sqlPatternType {
+STARTS_WITH, // Starts with a constant string followed by any string 
values (ABC%)
+ENDS_WITH, // Ends with a constant string, starts with any string 
values (%ABC)
+CONTAINS, // Contains a constant string, starts and ends with any 
string values (%ABC%)
--- End diff --

You should add a pattern of the form 'Starts with a constant, ends with 
another constant, and has any string in between'
(ABC%XYZ)


> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

[
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131762#comment-16131762
]

ASF GitHub Bot commented on DRILL-5657:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/914

DRILL-5657: Size-aware vector writer structure

This large PR provides another two levels of foundation for size-aware
vector writers in the Drill record readers. It combines code from two previous
PRS:

* PR 866 - DRILL-5657: Implement size-aware result set loader
* PR 887 - DRILL-5688: Add repeated map support to column accessors

The PR then goes on to integrate the two prior PRs and provide additional
functionality.

Like the two previous PRs, this one is divided into commits to group the
work.

1. Accessor layer
2. Row Set layer
3. Tuple and Column Model layer
4. Row Set Loader layer
5. Secondary changes

Much of the material below appears in Javadoc throughout the code. The
material here is not meant to replace that documentation. Instead, it is meant
to provide the “big picture”: placing the bits and pieces in context and
pointing out interesting functionality to explore in each layer.

## Commit 1: The Accessor Layer

The first commit provides the core of the mechanism: the writers that put
data into vectors, and the readers that retrieve that data. The version here is
an evolution of the version provided in an earlier PR a few months ago.

### Overview of the Drill Vector Data Model

The code will make much more sense if we start with a review of Drill’s
complex vector data model. Drill has 38+ data (“minor”) types as defined in the
[proto
buf](https://github.com/apache/drill/blob/master/protocol/src/main/protobuf/Types.proto)
definition. Drill also has three cardinalities (“modes”) defined in the same
file. The result is over 120+ different vector types. Then, when you add maps,
repeated maps, lists and repeated lists, you rapidly get an explosion of types
that the writer code must handle.

Vectors can be categorized along multiple dimensions:

* By data (minor) type
* By cardinality (mode)
* By fixed or variable width

A repeated map, a list, a repeated list and any array (repeated) scalar all
are array-like. Nullable and required modes are identical (single values), but
a nullable has an additional is-set (“bit”) vector.

A key contribution of this PR is the data model used to organize vectors.

* Both the top-level row, and a Drill map are “tuples” and are treated
similarly in the model.
* All non-map, non-list (that is, scalar) data types are treated uniformly.
* All arrays (whether a list, a repeated list, a repeated map, or a
repeated scalar) are treated uniformly.

### Accessor Data Model

The above leads to a very simple, JSON-like data model, introduced in this
PR.

* A tuple reader or writer models a row. (Usually via a subclass.) Column
are accessible by name or position.
* Every column is modeled as an object.
* The object can have an object type: scalar, tuple or array.
* An array has a single element type (but many run-time elements)
* A scalar can be nullable or not, and provides a uniform get/set interface.

This data model is similar to; but has important differences from, the
prior, generated, readers and writers.

The object layer is new: it is the simplest way to model the three “object
types.” An app using this code would use just the leaf scalar readers and
writers.

Although there is quite a bit of code change here to provide the new
structure the core functionality of reading and writing to vectors has not
changed much. And, this code has extensive unit tests, which should avoid the
need to "mentally execute" each line of code.

See the classes in `org.apache.drill.exec.vector.accessor` for details. In
particular, please see the `package-info.java` file in that package for more
information.

As before, the top package provides a set of interfaces; the inner packages
provide the implementation. The `ColumnAccessors.java` template generates the
per-vector code. Warning: this template has become quite cryptic: the best bet
for review is to generate the Java code and review that.

### Writer Performance

During previous review, we discussed ways to optimize writer performance.
This PR has two improvements:

* Completely rework the writers to minimize code steps
* Rework the “column loaders” to eliminate them: instead of two additional
method calls, the “loader” now uses the column writers directly.

Much behind-the-scenes rework was needed to accomplish the above.

Column readers, however, were left with their existing stru

[jira] [Commented] (DRILL-5688) Add repeated map support to column accessors


[ 
https://issues.apache.org/jira/browse/DRILL-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131764#comment-16131764
 ] 

ASF GitHub Bot commented on DRILL-5688:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/887
  
Closing as this PR is now superseded by #914.


> Add repeated map support to column accessors
> 
>
> Key: DRILL-5688
> URL: https://issues.apache.org/jira/browse/DRILL-5688
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> DRILL-5211 describes how Drill runs into OOM issues due to Drill's two 
> allocators: Netty and Unsafe. That JIRA also describes the solution: limit 
> vectors to 16 MB in length (with the eventual goal of limiting overall batch 
> size.) DRILL-5517 added "size-aware" support to the column accessors created 
> to parallel Drill's existing readers and writers. (The parallel 
> implementation ensures that we don't break existing code that uses the 
> existing mechanism; same as we did for the external sort.)
> This ticket describes work to extend the column accessors to handle repeated 
> maps and lists. Key themes:
> * Define a common metadata schema for use in this layer and the "result set 
> loader" of DRILL-5657. This schema layer builds on top of the existing schema 
> to add the kind of metadata needed here and by the "sizer" created for the 
> external sort.
> * Define a JSON-like reader and writer structure that supports the full Drill 
> data model semantics. (The earlier version focused on the scalar types and 
> arrays of scalars to prove the concept of limiting vector sizes.)
> * Revising test code to use the revised column writer structure.
> Implementation details appear in the PR.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5688) Add repeated map support to column accessors


[ 
https://issues.apache.org/jira/browse/DRILL-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131765#comment-16131765
 ] 

ASF GitHub Bot commented on DRILL-5688:
---

Github user paul-rogers closed the pull request at:

https://github.com/apache/drill/pull/887


> Add repeated map support to column accessors
> 
>
> Key: DRILL-5688
> URL: https://issues.apache.org/jira/browse/DRILL-5688
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> DRILL-5211 describes how Drill runs into OOM issues due to Drill's two 
> allocators: Netty and Unsafe. That JIRA also describes the solution: limit 
> vectors to 16 MB in length (with the eventual goal of limiting overall batch 
> size.) DRILL-5517 added "size-aware" support to the column accessors created 
> to parallel Drill's existing readers and writers. (The parallel 
> implementation ensures that we don't break existing code that uses the 
> existing mechanism; same as we did for the external sort.)
> This ticket describes work to extend the column accessors to handle repeated 
> maps and lists. Key themes:
> * Define a common metadata schema for use in this layer and the "result set 
> loader" of DRILL-5657. This schema layer builds on top of the existing schema 
> to add the kind of metadata needed here and by the "sizer" created for the 
> external sort.
> * Define a JSON-like reader and writer structure that supports the full Drill 
> data model semantics. (The earlier version focused on the scalar types and 
> arrays of scalars to prove the concept of limiting vector sizes.)
> * Revising test code to use the revised column writer structure.
> Implementation details appear in the PR.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader


[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131766#comment-16131766
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/866
  
Closing as this PR is now superseded by #914.


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader