[jira] [Updated] (DRILL-6164) Heap memory leak during parquet scan and OOM

2018-02-16 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6164:
--
Labels: ready-to-commit  (was: )

> Heap memory leak during parquet scan and OOM
> 
>
> Key: DRILL-6164
> URL: https://issues.apache.org/jira/browse/DRILL-6164
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> During a scan of a large set of parquet files, Drill iterates over the set 
> initializing parquet readers. Such initialization may require a significant 
> memory usage (both heap and direct). When scan moves to the next parquet file 
> in the set, it does not remove reference to the reader from the set it 
> iterates over and does not remove references created during initialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6024) Use grace period only in production servers

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6024:


Assignee: Venkata Jyothsna Donapati

> Use grace period only in production servers
> ---
>
> Key: DRILL-6024
> URL: https://issues.apache.org/jira/browse/DRILL-6024
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> DRILL-4286 introduces graceful shutdown. Currently by default it is turned 
> out (grace period is set to 0) since if we turn it out by default it affects 
> non-productions systems, for example, unit tests time run increased x3 times.
> [~Paul.Rogers] proposed the following solution: 
> {quote}
> In a production system, we do want the grace period; it is an essential part 
> of the graceful shutdown procedure.
> However, if we are doing a non-graceful shutdown, the grace is unneeded.
> Also, if the cluster contains only one node (as in most unit tests), there is 
> nothing to wait for, so the grace period is not needed. The same is true in 
> an embedded Drillbit for Sqlline.
> So, can we provide a solution that handles these cases rather than simply 
> turning off the grace period always?
> If using the local cluster coordinator, say, then no grace is needed. If 
> using ZK, but there is only one Drillbit, no grace is needed. (There is a 
> race condition, but may be OK.)
> Or, if we detect we are embedded, no grace period.
> Then, also, if we are doing a graceful shutdown, we need the grace. But, if 
> we are doing a "classic" shutdown, no grace is needed.
> The result should be that the grace period is used only in production 
> servers, only when doing a graceful shutdown.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6024) Use grace period only in production servers

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6024:
-
Fix Version/s: (was: 1.13.0)

> Use grace period only in production servers
> ---
>
> Key: DRILL-6024
> URL: https://issues.apache.org/jira/browse/DRILL-6024
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> DRILL-4286 introduces graceful shutdown. Currently by default it is turned 
> out (grace period is set to 0) since if we turn it out by default it affects 
> non-productions systems, for example, unit tests time run increased x3 times.
> [~Paul.Rogers] proposed the following solution: 
> {quote}
> In a production system, we do want the grace period; it is an essential part 
> of the graceful shutdown procedure.
> However, if we are doing a non-graceful shutdown, the grace is unneeded.
> Also, if the cluster contains only one node (as in most unit tests), there is 
> nothing to wait for, so the grace period is not needed. The same is true in 
> an embedded Drillbit for Sqlline.
> So, can we provide a solution that handles these cases rather than simply 
> turning off the grace period always?
> If using the local cluster coordinator, say, then no grace is needed. If 
> using ZK, but there is only one Drillbit, no grace is needed. (There is a 
> race condition, but may be OK.)
> Or, if we detect we are embedded, no grace period.
> Then, also, if we are doing a graceful shutdown, we need the grace. But, if 
> we are doing a "classic" shutdown, no grace is needed.
> The result should be that the grace period is used only in production 
> servers, only when doing a graceful shutdown.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6044) Shutdown button does not work from WebUI

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6044:
-
Fix Version/s: 1.13.0

> Shutdown button does not work from WebUI
> 
>
> Key: DRILL-6044
> URL: https://issues.apache.org/jira/browse/DRILL-6044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.13.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Screen Shot 2017-12-19 at 10.51.16 AM.png
>
>
> git.commit.id.abbrev=eb0c403
> Nothing happens when click on the SHUTDOWN button from the WebUI.  The 
> browser's debugger showed that the request failed due to access control 
> checks (see attached screen shot).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6009) No drillbits on index page

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6009:


Assignee: Venkata Jyothsna Donapati

> No drillbits on index page
> --
>
> Key: DRILL-6009
> URL: https://issues.apache.org/jira/browse/DRILL-6009
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Attachments: empty_drillbits.JPG
>
>
> After DRILL-4286 once I saw that index page showed no drillbits at all but it 
> was working, so at least one drillbit was online (empty_drillbits.JPG). After 
> refresh everything was fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6044) Shutdown button does not work from WebUI

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6044:


Assignee: Venkata Jyothsna Donapati

> Shutdown button does not work from WebUI
> 
>
> Key: DRILL-6044
> URL: https://issues.apache.org/jira/browse/DRILL-6044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.13.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Attachments: Screen Shot 2017-12-19 at 10.51.16 AM.png
>
>
> git.commit.id.abbrev=eb0c403
> Nothing happens when click on the SHUTDOWN button from the WebUI.  The 
> browser's debugger showed that the request failed due to access control 
> checks (see attached screen shot).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6040) Need to add usage for graceful_stop to drillbit.sh

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6040:
-
Fix Version/s: 1.13.0

> Need to add usage for graceful_stop to drillbit.sh
> --
>
> Key: DRILL-6040
> URL: https://issues.apache.org/jira/browse/DRILL-6040
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=eb0c403
> Usage for graceful_stop is missing from drillbit.sh.
> ./drillbit.sh
> Usage: drillbit.sh [--config|--site ] 
> (start|stop|status|restart|run) [args]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6039:
-
Fix Version/s: 1.13.0

> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6022) Improve js part for graceful shutdown

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6022:


Assignee: Venkata Jyothsna Donapati

> Improve js part for graceful shutdown
> -
>
> Key: DRILL-6022
> URL: https://issues.apache.org/jira/browse/DRILL-6022
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.13.0
>
>
> DRILL-4286 introduces graceful shutdown but its js part needs improvement:
> a. ajax call do not handle errors, so when error occurs it is just swallowed.
> b. there are some unused and / or unnecessary variables usage
> c. shutdown functionality is disabled when user is not an admin but some 
> other ajax calls are still being executed, for example, port number, number 
> of queries, grace period. All that can be also can be disabled when user is 
> not an admin.
> d. there are many ajax calls which can be factored out in dedicated js file.
> Other fixes:
> a. all shutdown functionality reside in DrillRoot class, it can be factored 
> out in shutdown specific class where all shutdown functionality will be 
> allowed only for admin on class level, currently we marked in on the level 
> (see DRILL-6019).
> b. issue described in DRILL-6021.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6021) Show shutdown button when authentication is not enabled

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6021:


Assignee: Venkata Jyothsna Donapati

> Show shutdown button when authentication is not enabled
> ---
>
> Key: DRILL-6021
> URL: https://issues.apache.org/jira/browse/DRILL-6021
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.13.0
>
>
> After DRILL-6017 {{shouldShowAdminInfo}} is used to decide if shutdown button 
> should be displayed on index page. But this option is set to true when 
> authentication is enabled and user is an admin. When authentication is not 
> enabled, user by default is admin. So with this fix without authentication, 
> shutdown button is absent but should be present.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6010) Working drillbit showing as in QUIESCENT state

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6010:
-
Fix Version/s: 1.13.0

> Working drillbit showing as in QUIESCENT state
> --
>
> Key: DRILL-6010
> URL: https://issues.apache.org/jira/browse/DRILL-6010
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: online_vs_quiescent.JPG
>
>
> After DRILL-4286 once I got a situation that after running all functional 
> tests three drillbits were in ONLINE state, another one in QUIESCENT. Though 
> from the one in quiescent state I could run queries and so it was online. 
> drillbit.sh stop could not shutdown it and had to do kill -9 of the process 
> (online_vs_quiescent.JPG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6010) Working drillbit showing as in QUIESCENT state

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6010:


Assignee: Venkata Jyothsna Donapati

> Working drillbit showing as in QUIESCENT state
> --
>
> Key: DRILL-6010
> URL: https://issues.apache.org/jira/browse/DRILL-6010
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Attachments: online_vs_quiescent.JPG
>
>
> After DRILL-4286 once I got a situation that after running all functional 
> tests three drillbits were in ONLINE state, another one in QUIESCENT. Though 
> from the one in quiescent state I could run queries and so it was online. 
> drillbit.sh stop could not shutdown it and had to do kill -9 of the process 
> (online_vs_quiescent.JPG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6008) Unable to shutdown Drillbit

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6008:


Assignee: Venkata Jyothsna Donapati

> Unable to shutdown Drillbit
> ---
>
> Key: DRILL-6008
> URL: https://issues.apache.org/jira/browse/DRILL-6008
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Attachments: fqdn.JPG, method_is_not_allowed.JPG, 
> response_of_undefined.JPG
>
>
> Could not shutdown drillbit on cluster where host name was used as drillbit's 
> address (fqdn.JPG). Pressing shutdown resulted in 
> (response_of_undefined.JPG). I have tried using ip address and also no luck 
> (method_is_not_allowed.JPG).
> I could shutdown drillbit in embeddded mode but then I saw the following 
> errors (local_shutdown.JPG): looks like Web UI was trying to get drillbit 
> status though it was down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6165) Drill should support versioning between Drill clients (JDBC/ODBC) and Drill server

2018-02-16 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6165:
-

 Summary: Drill should support versioning between Drill clients 
(JDBC/ODBC) and Drill server
 Key: DRILL-6165
 URL: https://issues.apache.org/jira/browse/DRILL-6165
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC, Client - ODBC
Affects Versions: 1.12.0
Reporter: Robert Hou
Assignee: Pritesh Maker


We need to determine which versions of JDBC/ODBC drivers can be used with which 
versions of Drill server.  Due to recent improvements in security, a newer 
client had problems working with an older server.  The current solution is to 
require drill clients and drill servers to be the same version.  In some cases, 
different versions of drill clients can work with different versions of drill 
servers, but this compatibility is being determined on a version-by-version, 
feature-by-feature basis.

We need an architecture that enables this to work automatically.  In 
particular, if a new drill client requests a feature that the older drill 
server does not support, this should be handled gracefully without returning an 
error.

This also has an impact on QA resources.  We recently had a customer issue that 
needed to be fixed on three different Drill server releases, so three new 
drivers had to be created and tested.

Note that drill clients and drill servers can be on different versions for 
various reasons:

1) A user may need to access different drill servers.  They can only have one 
version of the drill client installed on their machine.

2) Many users may need to access the same drill server.  Some users may have 
one version of the drill client installed while other users may have a 
different version of the drill client installed.  In a large customer 
installation, it is difficult to get all users to upgrade their drill client at 
the same time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6164) Heap memory leak during parquet scan and OOM

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367818#comment-16367818
 ] 

ASF GitHub Bot commented on DRILL-6164:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1122
  
One of the functional tests fails. I am looking into it, please don't merge 
yet.


> Heap memory leak during parquet scan and OOM
> 
>
> Key: DRILL-6164
> URL: https://issues.apache.org/jira/browse/DRILL-6164
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.13.0
>
>
> During a scan of a large set of parquet files, Drill iterates over the set 
> initializing parquet readers. Such initialization may require a significant 
> memory usage (both heap and direct). When scan moves to the next parquet file 
> in the set, it does not remove reference to the reader from the set it 
> iterates over and does not remove references created during initialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6164) Heap memory leak during parquet scan and OOM

2018-02-16 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6164:
--
Labels:   (was: ready-to-commit)

> Heap memory leak during parquet scan and OOM
> 
>
> Key: DRILL-6164
> URL: https://issues.apache.org/jira/browse/DRILL-6164
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.13.0
>
>
> During a scan of a large set of parquet files, Drill iterates over the set 
> initializing parquet readers. Such initialization may require a significant 
> memory usage (both heap and direct). When scan moves to the next parquet file 
> in the set, it does not remove reference to the reader from the set it 
> iterates over and does not remove references created during initialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5741) Automatically manage memory allocations during startup

2018-02-16 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5741:

Labels: doc-impacting  (was: )

> Automatically manage memory allocations during startup
> --
>
> Key: DRILL-5741
> URL: https://issues.apache.org/jira/browse/DRILL-5741
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
> Attachments: Auto Mem Allocation Proposal - Computation Logic.pdf, 
> Auto Mem Allocation Proposal - Scenarios.pdf
>
>
> Currently, during startup, a Drillbit can be assigned large values for the 
> following:
> * Xmx (Heap)
> * XX:MaxDirectMemorySize
> * XX:ReservedCodeCacheSize
> * XX:MaxPermSize
> All of this, potentially, can exceed the available memory on a system when a 
> Drillbit is under heavy load. It would be good to have the Drillbit ensure 
> during startup itself that the cumulative value of these parameters does not 
> exceed a pre-defined upper limit for the Drill process.
> This JIRA is a *proposal* to allow for automatic configuration (based on 
> configuration patterns observed in production Drill clusters). It leverages 
> the capability of providing default/distribution (and user-specific) checks 
> during Drill Startup from DRILL-6068.
> The idea is to remove the need for a user to worry about managing the tuning 
> parameters, by providing the optimal values. In addition, it also allows for 
> the memory allocation to be implicitly managed by simply providing the Drill 
> process with a single dimensional of total process memory (either in absolute 
> values, or as a percentage of the total system memory), while 
> {{auto-setup.sh}} provides the individual allocations.
> This allocation is then partitioned into allocations for Heap and Direct 
> Memory, with a small portion allocated for the Generated Java CodeCache as 
> well. If any of the individual allocations are also specified (via 
> {{distrib-env.sh}} or {{drill-env.sh}}), the remaining unspecified 
> allocations are adjusted to stay +within the limits+ of the total memory 
> allocation.
> The *details* of the proposal are here:
> https://docs.google.com/spreadsheets/d/1N6VYlQFiPoTV4iD46XbkIrvEQesiGFUU9-GWXYsAPXs/edit#gid=0
> For those unable to access the Google Document, PDFs are attached:
> * [^Auto Mem Allocation Proposal - Computation Logic.pdf] - Provides the 
> equation used for computing the heap, direct and code cache allocations for a 
> given input
> * [^Auto Mem Allocation Proposal - Scenarios.pdf] - Describes the various 
> inputs, and their expected allocations
> The variables that are (_optionally_) defined (in memory, {{distrib-env.sh}} 
> or {{drill-env.sh}} ) are:
> * {{DRILLBIT_MAX_PROC_MEM}} : Total Process Memory
> * {{DRILL_HEAP}} : JVM Max Heap Size
> * {{DRILL_MAX_DIRECT_MEMORY}} : JVM Max Direct Memory Size
> * {{DRILLBIT_CODE_CACHE_SIZE}} : JVM Code Cache Size
> Note: _With JDK8, MaxPermSize is no longer supported, so we do not account 
> for this any more, and will unset the variable if JDK8 or higher is detected._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5741) Automatically manage memory allocations during startup

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367741#comment-16367741
 ] 

ASF GitHub Bot commented on DRILL-5741:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1082
  
Good point. Let me send out a post on the user-list since this affects a 
broader audience beyond just Dev. Thanks for reviewing!


> Automatically manage memory allocations during startup
> --
>
> Key: DRILL-5741
> URL: https://issues.apache.org/jira/browse/DRILL-5741
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Auto Mem Allocation Proposal - Computation Logic.pdf, 
> Auto Mem Allocation Proposal - Scenarios.pdf
>
>
> Currently, during startup, a Drillbit can be assigned large values for the 
> following:
> * Xmx (Heap)
> * XX:MaxDirectMemorySize
> * XX:ReservedCodeCacheSize
> * XX:MaxPermSize
> All of this, potentially, can exceed the available memory on a system when a 
> Drillbit is under heavy load. It would be good to have the Drillbit ensure 
> during startup itself that the cumulative value of these parameters does not 
> exceed a pre-defined upper limit for the Drill process.
> This JIRA is a *proposal* to allow for automatic configuration (based on 
> configuration patterns observed in production Drill clusters). It leverages 
> the capability of providing default/distribution (and user-specific) checks 
> during Drill Startup from DRILL-6068.
> The idea is to remove the need for a user to worry about managing the tuning 
> parameters, by providing the optimal values. In addition, it also allows for 
> the memory allocation to be implicitly managed by simply providing the Drill 
> process with a single dimensional of total process memory (either in absolute 
> values, or as a percentage of the total system memory), while 
> {{auto-setup.sh}} provides the individual allocations.
> This allocation is then partitioned into allocations for Heap and Direct 
> Memory, with a small portion allocated for the Generated Java CodeCache as 
> well. If any of the individual allocations are also specified (via 
> {{distrib-env.sh}} or {{drill-env.sh}}), the remaining unspecified 
> allocations are adjusted to stay +within the limits+ of the total memory 
> allocation.
> The *details* of the proposal are here:
> https://docs.google.com/spreadsheets/d/1N6VYlQFiPoTV4iD46XbkIrvEQesiGFUU9-GWXYsAPXs/edit#gid=0
> For those unable to access the Google Document, PDFs are attached:
> * [^Auto Mem Allocation Proposal - Computation Logic.pdf] - Provides the 
> equation used for computing the heap, direct and code cache allocations for a 
> given input
> * [^Auto Mem Allocation Proposal - Scenarios.pdf] - Describes the various 
> inputs, and their expected allocations
> The variables that are (_optionally_) defined (in memory, {{distrib-env.sh}} 
> or {{drill-env.sh}} ) are:
> * {{DRILLBIT_MAX_PROC_MEM}} : Total Process Memory
> * {{DRILL_HEAP}} : JVM Max Heap Size
> * {{DRILL_MAX_DIRECT_MEMORY}} : JVM Max Direct Memory Size
> * {{DRILLBIT_CODE_CACHE_SIZE}} : JVM Code Cache Size
> Note: _With JDK8, MaxPermSize is no longer supported, so we do not account 
> for this any more, and will unset the variable if JDK8 or higher is detected._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6164) Heap memory leak during parquet scan and OOM

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6164:
-
Labels: ready-to-commit  (was: )

> Heap memory leak during parquet scan and OOM
> 
>
> Key: DRILL-6164
> URL: https://issues.apache.org/jira/browse/DRILL-6164
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> During a scan of a large set of parquet files, Drill iterates over the set 
> initializing parquet readers. Such initialization may require a significant 
> memory usage (both heap and direct). When scan moves to the next parquet file 
> in the set, it does not remove reference to the reader from the set it 
> iterates over and does not remove references created during initialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6164) Heap memory leak during parquet scan and OOM

2018-02-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6164:
-
Fix Version/s: 1.13.0

> Heap memory leak during parquet scan and OOM
> 
>
> Key: DRILL-6164
> URL: https://issues.apache.org/jira/browse/DRILL-6164
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> During a scan of a large set of parquet files, Drill iterates over the set 
> initializing parquet readers. Such initialization may require a significant 
> memory usage (both heap and direct). When scan moves to the next parquet file 
> in the set, it does not remove reference to the reader from the set it 
> iterates over and does not remove references created during initialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367719#comment-16367719
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r167944958
  
--- Diff: contrib/storage-hive/core/pom.xml ---
@@ -58,6 +58,10 @@
   commons-codec
   commons-codec
 
+
--- End diff --

Both cases can resolve it. In details `io.dropwizard.metrics:metrics-core` 
is nowhere used in Drill. And this is a dependency for `tephra-core` and 
transitive for `hive-metastore`. But it conflicts with Drill's 
`com.codahale.metrics`. 
`hive-metastore` uses 3.0.1 version of this dependency, but the last 
version in maven repository is 4.0.2.
I added this dependency to `dependencyManagement` block with 4.0.2 version 
and conflict is resolved as well. I think it is a better decision, because it 
can help to avoid similar conflicts in future.

Also `metrics-core` in `hive-hbase-handler` has not influence to Drill, so 
I've removed my exclusion of it.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367721#comment-16367721
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r167947402
  
--- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
@@ -39,23 +39,28 @@
   log4j
 
 
-  commons-codec
-  commons-codec
-
-
-  calcite-avatica
-  org.apache.calcite
+  org.json
+  json
 
   
 
+
+  org.apache.parquet
+  parquet-column
+  ${parquet.version}
--- End diff --

`hive-exec` uses only two parquet dependencies: 
[parquet-column](https://github.com/apache/hive/blob/branch-2.1/ql/pom.xml#L444)
 and 
[parquet-hadoop-bundle](https://github.com/apache/hive/blob/branch-2.1/ql/pom.xml#L109).
But Drill doesn't use own version of `parquet-hadoop-bundle` and moreover 
Drill version for it is absent in maven repository.

It appears that Hive 2.3.2 uses `parquet-column` 1.8.1 version as well. But 
last Apache Hive master is updated to 1.9.0 version.
I have added comment about it into `drill-hive-exec-shaded` POM to update 
it in future.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367722#comment-16367722
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r167946336
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -507,5 +512,49 @@ public static boolean 
hasHeaderOrFooter(HiveTableWithColumnCache table) {
 int skipFooter = retrieveIntProperty(tableProperties, 
serdeConstants.FOOTER_COUNT, -1);
 return skipHeader > 0 || skipFooter > 0;
   }
+
+  /**
+   * This method checks whether the table is transactional and set 
necessary properties in {@link JobConf}.
+   * If schema evolution properties aren't set in job conf for the input 
format, method sets the column names
+   * and types from table/partition properties or storage descriptor.
+   *
+   * @param job the job to update
+   * @param properties table or partition properties
+   * @param sd storage descriptor
+   */
+  public static void verifyAndAddTransactionalProperties(JobConf job, 
StorageDescriptor sd) {
+
+if (AcidUtils.isTablePropertyTransactional(job)) {
+  AcidUtils.setTransactionalTableScan(job, true);
+
+  // No work is needed, if schema evolution is used
+  if (Utilities.isSchemaEvolutionEnabled(job, true) && 
job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
+  job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
+return;
+  }
+
+  String colNames;
+  String colTypes;
+
+  // Try to get get column names and types from table or partition 
properties. If they are absent there, get columns
+  // data from storage descriptor of the table
+  colNames = job.get(serdeConstants.LIST_COLUMNS);
+  colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
+
+  if (colNames == null || colTypes == null) {
+List colNamesList = Lists.newArrayList();
+List colTypesList = Lists.newArrayList();
+for (FieldSchema col: sd.getCols()) {
+  colNamesList.add(col.getName());
+  colTypesList.add(col.getType());
+}
+colNames = Joiner.on(",").join(colNamesList);
--- End diff --

I have changed it. But we need call `input.getName()` and 
`input.getType()`, that's why I use two Functions and code became bigger. Once 
we will use Java8 it can be smaller. Or did I miss something here?


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367720#comment-16367720
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r167945192
  
--- Diff: common/pom.xml ---
@@ -45,6 +45,7 @@
 
   org.apache.calcite
   calcite-core
+  ${calcite.version}
--- End diff --

I have returned calcite-core version into `DependencyManagement` block. 
Drill Calcite version libraries are included into "drill-hive-exec-shaded" 
module. 

I will tell you why it works for now:
When user submits query in Drill via Hive plugin, the query is validated 
and planned via Drill Calcite. So Hive Calcite isn't necessary for it. 
Hive Calcite is used only in the process of Drill unit testing, where a lot 
of Hive specific queries are performed to setup Hive store for testing. But 
Drill Calcite and Avatica versions have conflicts with Hive old Calcite and 
Avatica versions. That's why I have disabled Calcite cost based optimizator 
`conf.set(ConfVars.HIVE_CBO_ENABLED.varname, "false");`. We can enable it again 
once Hive will leverage the newest Calcite version. The comment about it is 
added into `drill-hive-exec-shaded` POM.



> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6118) Handle item star columns during project / filter push down and directory pruning

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367537#comment-16367537
 ] 

ASF GitHub Bot commented on DRILL-6118:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1104#discussion_r168801629
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import org.apache.calcite.adapter.enumerable.EnumerableTableScan;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.prepare.RelOptTableImpl;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.CorrelationId;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.logical.LogicalFilter;
+import org.apache.calcite.rel.logical.LogicalProject;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexVisitorImpl;
+import org.apache.calcite.schema.Table;
+import org.apache.drill.exec.planner.types.RelDataTypeDrillImpl;
+import org.apache.drill.exec.planner.types.RelDataTypeHolder;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.DesiredField;
+import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
+
+/**
+ * Rule will transform filter -> project -> scan call with item star 
fields in filter
+ * into project -> filter -> project -> scan where item star fields are 
pushed into scan
+ * and replaced with actual field references.
+ *
+ * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
+ * Item star operator appears when sub-select or cte with star are used as 
source.
+ */
+public class DrillFilterItemStarReWriterRule extends RelOptRule {
+
+  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
+  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
+  "DrillFilterItemStarReWriterRule");
+
+  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
+super(operand, id);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+Filter filterRel = call.rel(0);
+Project projectRel = call.rel(1);
+TableScan scanRel = call.rel(2);
+
+ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames());
--- End diff --

@chunhui-shi added more unit tests. Please review.


> Handle item star columns during project  /  filter push down and directory 
> pruning
> --
>
> Key: DRILL-6118
> URL: https://issues.apache.org/jira/browse/DRILL-6118
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arin

[jira] [Updated] (DRILL-6154) NaN, Infinity issues

2018-02-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6154:

Affects Version/s: 1.13.0

> NaN, Infinity issues
> 
>
> Key: DRILL-6154
> URL: https://issues.apache.org/jira/browse/DRILL-6154
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> 1.Issue
> *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONS:*
>  - *sqrt*
>  - *trunc*
> *ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were 
> added: *NaN, Infinity, -Infinity*. The new data types must be processed 
> properly by existing functions. There are a few issues:
>  *1. SQRT function*. Run the following test query:
> {code:java}
> select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, 
> sqrt(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  
>  - EXPECTED_RESULT: it was expected to get the following result: _NaN, 
> Infinity, NaN_ (expected result is based on java Math.sqrt() method)
>  - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
> *2. TRUNC function*. According to DRILL docs 
> ([https://drill.apache.org/docs/math-and-trig/):] _TRUNC(x, y) : Truncates x 
> to y decimal places. *Specifying y is optional. Default is 1*_. So, the 
> function must work properly without specifying *y*
>  However an error message appears. Run test_query:
> {code:java}
> select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, 
> trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, 
> NaN*
>  - ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
> Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: NumberFormatException Fragment 0:0 [Error Id: 
> 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
> Please investigate and fix, test file attached PN_Inf_NaN.json
> 2. Issue 
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* INNER JOIN
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: 
> *NaN, Infinity, -Infinity*. 
>  During testing activities, it was detected a bit strange behavior of INNER 
> JOIN operator - different query results in almost the same queries. 
>  *Query1*
> {code:java}
>  select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query2*
> {code:java}
>  select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
>  - In *Query1* - 2 columns are selected - t.name, tt.name
>  - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
>  - *Query1* returns
> {code:java}
>   name name0
>   object2 object2
>   object2 object3
>   object2 object4
>   object3 object2
>   object3 object3
>   object3 object4
>   object4 object2
>   object4 object3
>   object4 object4
>   {code}
> This result seems to be correct.
>  - *Query2* returns _*No result found*_, not expected:
>  *EXPECTED_RESULT:*
> {code:java}
>   name
>   object2
>   object3
>   object4
>   {code}
> *ACTUAL_RESULT:(*
> {code:java}
> No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which 
> contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is 
> not reproducible is a user is JOINing tables by a column containing other 
> data types
> 3. Issue
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* ORDER BY, DESC
> *THIS ISSUE REFERS TO:DRILL-5919*
> *ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
> sorting columns containing NaN values. In one case it considers NaN to be the 
> largest value, in another - the smallest one. 
>  *Steps:*
>  - Select from the attached test file (orderBy.json, attached)
> {code:java}
> SELECT name, attr4 from dfs.tmp.`orderBy.json` order by name, attr4{code}
>  - Check the attached screen shot (orderByIssue.jpg):
>  *EXPECTED_RESULT:* It was expected the 'ORDER BY' clause to sort attr4 
> columns data in the same way (most probably NaN should be the largest, see 
> *NB*)
>  *ACTUAL_RESULT:* attr4 column's values were sorted in different ways: for 
> 'obj1'/'obj3' NaN is the largest, for 'obj2'/'obj4' NaN is the smallest.
> *NB:* Postgres as well as Java's sorting (Collection.sort() / Arrays.sort() 
> methods) treats NaN as the largest value
> 4. Issue
>  *AF

[jira] [Commented] (DRILL-6154) NaN, Infinity issues

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367444#comment-16367444
 ] 

ASF GitHub Bot commented on DRILL-6154:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1123#discussion_r168777355
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillConvertletTable.java
 ---
@@ -34,10 +40,17 @@
   public static HashMap map = new 
HashMap<>();
 
   public static SqlRexConvertletTable INSTANCE = new 
DrillConvertletTable();
+  private static SqlRexConvertlet sqrtConvertlet = new SqlRexConvertlet() {
--- End diff --

Please add comment explaining why we need sqrt specific convertlet.


> NaN, Infinity issues
> 
>
> Key: DRILL-6154
> URL: https://issues.apache.org/jira/browse/DRILL-6154
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> 1.Issue
> *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONS:*
>  - *sqrt*
>  - *trunc*
> *ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were 
> added: *NaN, Infinity, -Infinity*. The new data types must be processed 
> properly by existing functions. There are a few issues:
>  *1. SQRT function*. Run the following test query:
> {code:java}
> select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, 
> sqrt(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  
>  - EXPECTED_RESULT: it was expected to get the following result: _NaN, 
> Infinity, NaN_ (expected result is based on java Math.sqrt() method)
>  - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
> *2. TRUNC function*. According to DRILL docs 
> ([https://drill.apache.org/docs/math-and-trig/):] _TRUNC(x, y) : Truncates x 
> to y decimal places. *Specifying y is optional. Default is 1*_. So, the 
> function must work properly without specifying *y*
>  However an error message appears. Run test_query:
> {code:java}
> select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, 
> trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, 
> NaN*
>  - ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
> Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: NumberFormatException Fragment 0:0 [Error Id: 
> 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
> Please investigate and fix, test file attached PN_Inf_NaN.json
> 2. Issue 
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* INNER JOIN
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: 
> *NaN, Infinity, -Infinity*. 
>  During testing activities, it was detected a bit strange behavior of INNER 
> JOIN operator - different query results in almost the same queries. 
>  *Query1*
> {code:java}
>  select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query2*
> {code:java}
>  select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
>  - In *Query1* - 2 columns are selected - t.name, tt.name
>  - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
>  - *Query1* returns
> {code:java}
>   name name0
>   object2 object2
>   object2 object3
>   object2 object4
>   object3 object2
>   object3 object3
>   object3 object4
>   object4 object2
>   object4 object3
>   object4 object4
>   {code}
> This result seems to be correct.
>  - *Query2* returns _*No result found*_, not expected:
>  *EXPECTED_RESULT:*
> {code:java}
>   name
>   object2
>   object3
>   object4
>   {code}
> *ACTUAL_RESULT:(*
> {code:java}
> No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which 
> contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is 
> not reproducible is a user is JOINing tables by a column containing other 
> data types
> 3. Issue
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* ORDER BY, DESC
> *THIS ISSUE REFERS TO:DRILL-5919*
> *ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
> sorting columns containing NaN values. In one case it considers NaN to be the 
> largest value, in another - the smallest one. 
>  *Steps:*
>  - Select from the attached test file (orderBy.json, attached)
> {code:j

[jira] [Commented] (DRILL-6154) NaN, Infinity issues

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367443#comment-16367443
 ] 

ASF GitHub Bot commented on DRILL-6154:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1123#discussion_r168777217
  
--- Diff: exec/java-exec/src/main/codegen/templates/MathFunctions.java ---
@@ -67,7 +67,11 @@ private GMathFunctions(){}
   public static class ${func.className}${type.input} implements 
DrillSimpleFunc {
 
 @Param ${type.input}Holder in;
+  <#if func.funcName == 'sqrt'>
+@Output Float8Holder out;
--- End diff --

Please add comment why we use float holder for sqrt function.


> NaN, Infinity issues
> 
>
> Key: DRILL-6154
> URL: https://issues.apache.org/jira/browse/DRILL-6154
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> 1.Issue
> *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONS:*
>  - *sqrt*
>  - *trunc*
> *ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were 
> added: *NaN, Infinity, -Infinity*. The new data types must be processed 
> properly by existing functions. There are a few issues:
>  *1. SQRT function*. Run the following test query:
> {code:java}
> select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, 
> sqrt(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  
>  - EXPECTED_RESULT: it was expected to get the following result: _NaN, 
> Infinity, NaN_ (expected result is based on java Math.sqrt() method)
>  - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
> *2. TRUNC function*. According to DRILL docs 
> ([https://drill.apache.org/docs/math-and-trig/):] _TRUNC(x, y) : Truncates x 
> to y decimal places. *Specifying y is optional. Default is 1*_. So, the 
> function must work properly without specifying *y*
>  However an error message appears. Run test_query:
> {code:java}
> select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, 
> trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, 
> NaN*
>  - ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
> Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: NumberFormatException Fragment 0:0 [Error Id: 
> 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
> Please investigate and fix, test file attached PN_Inf_NaN.json
> 2. Issue 
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* INNER JOIN
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: 
> *NaN, Infinity, -Infinity*. 
>  During testing activities, it was detected a bit strange behavior of INNER 
> JOIN operator - different query results in almost the same queries. 
>  *Query1*
> {code:java}
>  select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query2*
> {code:java}
>  select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
>  - In *Query1* - 2 columns are selected - t.name, tt.name
>  - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
>  - *Query1* returns
> {code:java}
>   name name0
>   object2 object2
>   object2 object3
>   object2 object4
>   object3 object2
>   object3 object3
>   object3 object4
>   object4 object2
>   object4 object3
>   object4 object4
>   {code}
> This result seems to be correct.
>  - *Query2* returns _*No result found*_, not expected:
>  *EXPECTED_RESULT:*
> {code:java}
>   name
>   object2
>   object3
>   object4
>   {code}
> *ACTUAL_RESULT:(*
> {code:java}
> No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which 
> contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is 
> not reproducible is a user is JOINing tables by a column containing other 
> data types
> 3. Issue
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* ORDER BY, DESC
> *THIS ISSUE REFERS TO:DRILL-5919*
> *ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
> sorting columns containing NaN values. In one case it considers NaN to be the 
> largest value, in another - the smallest one. 
>  *Steps:*
>  - Select from the attached test file (orderBy.json, attached)
> {code:java}
> SELECT name, attr4 f

[jira] [Commented] (DRILL-6154) NaN, Infinity issues

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367445#comment-16367445
 ] 

ASF GitHub Bot commented on DRILL-6154:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1123#discussion_r168777666
  
--- Diff: exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java 
---
@@ -102,7 +102,20 @@ public void add() {
  
 nonNullCount.value = 1;
  <#if aggrtype.funcName == "min">
-   value.value = Math.min(value.value, in.value);
+   <#if type.inputType?contains("Float4") || 
type.inputType?contains("Float8")>
--- End diff --

1. Please mind original indent.
2. Please add comment describing how we chose min / max for nan / inf 
values.
3. Please consider using if without nested if: `if - elseif - else`.


> NaN, Infinity issues
> 
>
> Key: DRILL-6154
> URL: https://issues.apache.org/jira/browse/DRILL-6154
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> 1.Issue
> *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONS:*
>  - *sqrt*
>  - *trunc*
> *ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were 
> added: *NaN, Infinity, -Infinity*. The new data types must be processed 
> properly by existing functions. There are a few issues:
>  *1. SQRT function*. Run the following test query:
> {code:java}
> select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, 
> sqrt(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  
>  - EXPECTED_RESULT: it was expected to get the following result: _NaN, 
> Infinity, NaN_ (expected result is based on java Math.sqrt() method)
>  - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
> *2. TRUNC function*. According to DRILL docs 
> ([https://drill.apache.org/docs/math-and-trig/):] _TRUNC(x, y) : Truncates x 
> to y decimal places. *Specifying y is optional. Default is 1*_. So, the 
> function must work properly without specifying *y*
>  However an error message appears. Run test_query:
> {code:java}
> select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, 
> trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, 
> NaN*
>  - ACTUAL_RESULT: it appears the following error message: *Query Failed: An 
> Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: NumberFormatException Fragment 0:0 [Error Id: 
> 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
> Please investigate and fix, test file attached PN_Inf_NaN.json
> 2. Issue 
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* INNER JOIN
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: 
> *NaN, Infinity, -Infinity*. 
>  During testing activities, it was detected a bit strange behavior of INNER 
> JOIN operator - different query results in almost the same queries. 
>  *Query1*
> {code:java}
>  select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query2*
> {code:java}
>  select distinct t.name from dfs.tmp.`ObjsX.json` t inner join 
> dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
>  - In *Query1* - 2 columns are selected - t.name, tt.name
>  - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
>  - *Query1* returns
> {code:java}
>   name name0
>   object2 object2
>   object2 object3
>   object2 object4
>   object3 object2
>   object3 object3
>   object3 object4
>   object4 object2
>   object4 object3
>   object4 object4
>   {code}
> This result seems to be correct.
>  - *Query2* returns _*No result found*_, not expected:
>  *EXPECTED_RESULT:*
> {code:java}
>   name
>   object2
>   object3
>   object4
>   {code}
> *ACTUAL_RESULT:(*
> {code:java}
> No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which 
> contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is 
> not reproducible is a user is JOINing tables by a column containing other 
> data types
> 3. Issue
>  *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* ORDER BY, DESC
> *THIS ISSUE REFERS TO:DRILL-5919*
> *ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when 
> sorting columns containing NaN values. In one case it considers NaN to b

[jira] [Commented] (DRILL-6158) Create a mux operator for union exchange to enable two phase merging instead of foreman merging all the batches.

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367390#comment-16367390
 ] 

ASF GitHub Bot commented on DRILL-6158:
---

GitHub user vladimirtkach opened a pull request:

https://github.com/apache/drill/pull/1123

DRILL-6158: NaN, Infinity issues

- changed comparison rules for NaN, Infinity values. For now NaN is the 
biggest value, Infinity - second biggest value
- fixed min, max, trunc functions for NaN, Infinity values
- made drill use original sqrt function instead of  substitution

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vladimirtkach/drill DRILL-6154

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1123


commit 6195546c58604c833c8e9134227a353884401e24
Author: vladimir tkach 
Date:   2018-02-16T14:36:49Z

DRILL-6158: NaN, Infinity issues

- changed comparison rules for NaN, Infinity values. For now NaN is the 
biggest value, Infinity - second biggest value
- fixed min, max, trunc functions for NaN, Infinity values
- made drill use original sqrt function instead of  substitution




> Create a mux operator for union exchange to enable two phase merging instead 
> of foreman merging all the batches.
> 
>
> Key: DRILL-6158
> URL: https://issues.apache.org/jira/browse/DRILL-6158
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: Future
>
>
> Consider the following simple query
> {code}
> select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000
> {code}
> The following plan is generated for this query
> {code}
> 00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
> 1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
> 1.56569100288E11 network, 4.64926176E7 memory}, id = 787
> 00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
> zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
> rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 
> memory}, id = 786
> 00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
> ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 
> 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id 
> = 785
> 00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
> RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
> {9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
> 4.64926176E7 memory}, id = 784
> 00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
> a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 
> cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
> 01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
> zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
> 4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
> 782
> 01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
> ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
> 4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
> 781
> 01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
> RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative 
> cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
> 4.64926176E7 memory}, id = 780
> 01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
> rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
> cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 
> 3.2460300288E10 network, 4.64926176E7 memory}, id = 779
> 01-06  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
> files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
> maprfs:/tmp/csvd1/Daamulti11random21.csv, 
> maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
> ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
> 4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
> 01-05  Broadc

[jira] [Commented] (DRILL-5578) Drill fails on date functions in 'where clause' when queried on a JDBC Storage plugin

2018-02-16 Thread Rahul Raj (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366900#comment-16366900
 ] 

Rahul Raj commented on DRILL-5578:
--

I tried on drill-java-exec-1.13.0-SNAPSHOT which has the latest calcite changes 
and the stack trace is below:

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: 
SPECIAL [Error Id: d3399f0a-fac2-44d1-832a-e3c762825767 on 
rahul-Latitude-E5440:31010] at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:761)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:327)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:223)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_151] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] Caused 
by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
during fragment initialization: class org.apache.calcite.sql.SqlSyntax$6: 
SPECIAL at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:282) 
[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] ... 3 common frames 
omitted Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlSyntax$6: SPECIAL at 
org.apache.calcite.util.Util.needToImplement(Util.java:925) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlSyntax$6.unparse(SqlSyntax.java:116) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlOperator.unparse(SqlOperator.java:332) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:332) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.dialect.MysqlSqlDialect.unparseCall(MysqlSqlDialect.java:154)
 ~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlUtil.unparseBinarySyntax(SqlUtil.java:323) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlSyntax$3.unparse(SqlSyntax.java:65) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlOperator.unparse(SqlOperator.java:332) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:332) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.dialect.MysqlSqlDialect.unparseCall(MysqlSqlDialect.java:154)
 ~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlNodeList.andOrList(SqlNodeList.java:142) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlOperator.unparseListClause(SqlOperator.java:347) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlSelectOperator.unparse(SqlSelectOperator.java:197) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlSelect.unparse(SqlSelect.java:240) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:152) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:158) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.drill.exec.store.jdbc.JdbcPrel.(JdbcPrel.java:65) 
~[drill-jdbc-storage-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.store.jdbc.JdbcIntermediatePrel.finalizeRel(JdbcIntermediatePrel.java:66)
 ~[drill-jdbc-storage-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:309)
 ~[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT] at 
org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:279) 
~[calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at 
org.apache.calcite.rel.RelShuttleImpl.vi

[jira] [Updated] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6140:

Reviewer: Aman Sinha  (was: Arina Ielchiieva)

> Operators listed in Profiles Page doesn't always correspond with operator 
> specified in Physical Plan
> 
>
> Key: DRILL-6140
> URL: https://issues.apache.org/jira/browse/DRILL-6140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: 25978a1a-24cf-fb4a-17af-59e7115b4fa1.sys.drill
>
>
> A query's physical plan correctly shows
> {code}
>  00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
> = 1.0, cumulative cost = { ...
>  00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
> 1.0, cumulative cost = { ...
>  01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
> rowcount = 1.79279253E7, cumulative cost = ...
>  01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY 
> rfsSpecCode, ...
>01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType 
> = ...
>  01-05 SelectionVectorRemover : rowType = RecordType(ANY 
> schemaName, ...
>01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
>  01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 
> 'rfsSpecCode')], ...
>01-08 Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [
> {code}
> However, the profile page shows the operators as...
> ||Operator ID || Type || Metrics||
> |00-xx-00 | SCREEN | ... |
> |00-xx-01 | PROJECT | ... |
> |00-xx-02 | STREAMING_AGGREGATE | ... |
> |00-xx-03 | UNORDERED_RECEIVER | ... |
> |01-xx-00 | SINGLE_SENDER | ... |
> |01-xx-01 | STREAMING_AGGREGATE | ... |
> |01-xx-02 | PROJECT | ... |
> |01-xx-03 | SINGLE_SENDER | ... |
> |01-xx-04 | PROJECT | ... |
> |01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
> |01-xx-06 | FILTER | ... |
> |01-xx-07 | PROJECT | ... |
> |01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |
> As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} 
> making the profile hard to interpret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366729#comment-16366729
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1113#discussion_r168699838
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java
 ---
@@ -125,20 +125,17 @@ public void cancel() {
   case PREPARING:
   case PLANNING:
   case ENQUEUED:
-moveToState(QueryState.CANCELLATION_REQUESTED, null);
-return;
-
   case STARTING:
   case RUNNING:
-addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
-return;
+moveToState(QueryState.CANCELLATION_REQUESTED, null);
--- End diff --

1. Your point makes sense. In this case could you please update java doc 
for the `cancel` method to be consistent with new changes?
2. Maybe we should remove word `regression` from the commit message to 
avoid confusion?


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 26