[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user adeneche commented on the pull request: https://github.com/apache/drill/pull/503#issuecomment-219583987 I made a quick change to address Sudheesh concern about deferring processing events when the query is queued. This is a WIP to see if I am not missing anything obvious, will clean/rename/document once we agree this is a correct fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4681) ChannelClosedException causes all queries which are communicating on that channel to fail
Rahul Challapalli created DRILL-4681: Summary: ChannelClosedException causes all queries which are communicating on that channel to fail Key: DRILL-4681 URL: https://issues.apache.org/jira/browse/DRILL-4681 Project: Apache Drill Issue Type: Bug Components: Execution - Flow, Execution - RPC Affects Versions: 1.7.0 Reporter: Rahul Challapalli commit # : 09b262776e965ea17a6a863801f7e1ee3e5b3d5a Below is what I am describing: 1. One of the fragments cause a channel closed exception (due to an OOM or some other condition) 2. Drill fails all other fragments which are running at that time even though the fragments themselves eventually run to completion. At a high concurrency this could lead a lot of query failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63444968 --- Diff: common/src/main/java/org/apache/drill/common/EventProcessor.java --- @@ -37,6 +37,7 @@ public abstract class EventProcessor { private final LinkedList queuedEvents = new LinkedList<>(); private volatile boolean isProcessing = false; + private volatile boolean started; --- End diff -- Explicitly set to false as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63444982 --- Diff: common/src/main/java/org/apache/drill/common/EventProcessor.java --- @@ -57,25 +58,35 @@ public EventProcessor() { */ public void sendEvent(final T newEvent) { synchronized (queuedEvents) { - if (isProcessing) { -queuedEvents.addLast(newEvent); + queuedEvents.addLast(newEvent); + if (!started || isProcessing) { return; } isProcessing = true; } +processEvents(); + } + + public void start() { +if (started) { + return; +} + +synchronized (queuedEvents) { + started = true; + isProcessing = true; +} + +processEvents(); + } + + private void processEvents() { @SuppressWarnings("resource") final DeferredException deferredException = new DeferredException(); -T event = newEvent; +T event; --- End diff -- Move this declaration inside the loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63444973 --- Diff: common/src/main/java/org/apache/drill/common/EventProcessor.java --- @@ -57,25 +58,35 @@ public EventProcessor() { */ public void sendEvent(final T newEvent) { synchronized (queuedEvents) { - if (isProcessing) { -queuedEvents.addLast(newEvent); + queuedEvents.addLast(newEvent); + if (!started || isProcessing) { --- End diff -- Update the method docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63444805 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -296,7 +295,7 @@ public void run() { * would wait on the cancelling thread to signal a resume and the cancelling thread would wait on the Foreman * to accept events. */ - acceptExternalEvents.countDown(); + stateSwitch.start(); --- End diff -- that is true indeed, I didn't notice the Foreman was also using the event queue before it counted down the latch. I will update the fix to respect this behavior --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4573 Fixed issue with regexp_replace fun...
Github user jinfengni commented on the pull request: https://github.com/apache/drill/pull/502#issuecomment-219577333 @jcmcote , I'll take a look at the patch shortly. You are right that the build issue should not be caused by your code change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63442039 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -296,7 +295,7 @@ public void run() { * would wait on the cancelling thread to signal a resume and the cancelling thread would wait on the Foreman * to accept events. */ - acceptExternalEvents.countDown(); + stateSwitch.start(); --- End diff -- This defers all state transitions until the thread is done. This changes the behavior when queuing is enabled, the user will not know if the query has moved from ENQUEUED to STARTING. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user adeneche commented on the pull request: https://github.com/apache/drill/pull/503#issuecomment-219571472 This is an alternative change that removes the blocking latch from Foreman. There is still a possibility for the foreman thread to block forever in case of a bug but this should no longer cause rpc threads to block as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63431381 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -1221,6 +1223,12 @@ public void interrupted(final InterruptedException e) { * to the user */ public void moveToState(final QueryState newState, final Exception ex) { + // if the current thread is the foreman thread, throw an exception + // otherwise the foreman will be blocked forever on acceptExternalEvents + if (myThreadRef == Thread.currentThread()) { --- End diff -- We were assuming that the Foreman thread would never call Foreman.StateListener.moveToState() and it would be called by another (rpc) thread instead. It turns out when the foreman is submitting remote fragments, RpcBus.send() could actually cause the foreman thread to call moveToState directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user hnfgns commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63430594 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -1221,6 +1223,12 @@ public void interrupted(final InterruptedException e) { * to the user */ public void moveToState(final QueryState newState, final Exception ex) { + // if the current thread is the foreman thread, throw an exception + // otherwise the foreman will be blocked forever on acceptExternalEvents + if (myThreadRef == Thread.currentThread()) { --- End diff -- Let me ask you this: what was the primary cause of deadlock came out of your investigation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...
Github user hnfgns commented on a diff in the pull request: https://github.com/apache/drill/pull/503#discussion_r63428428 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -1221,6 +1223,12 @@ public void interrupted(final InterruptedException e) { * to the user */ public void moveToState(final QueryState newState, final Exception ex) { + // if the current thread is the foreman thread, throw an exception + // otherwise the foreman will be blocked forever on acceptExternalEvents + if (myThreadRef == Thread.currentThread()) { --- End diff -- Well, Foreman does not call moveToState in common path but when it fails then it could. My point is instead of hacking the method to throw an exception if the thread is foreman we should release the latch and handle the exception in foreman#run. What I mean is Foreman#run() { ``` try { do sth } catch(ex) { release latch handleException(ex) } finally { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4680) Cancel query that is in Planing state - user RPC thread of the foreman Drillbit will block waiting for query to finish planing
Khurram Faraaz created DRILL-4680: - Summary: Cancel query that is in Planing state - user RPC thread of the foreman Drillbit will block waiting for query to finish planing Key: DRILL-4680 URL: https://issues.apache.org/jira/browse/DRILL-4680 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.7.0 Environment: 4 node cluster CentOS Reporter: Khurram Faraaz Unfortunately, if you try to cancel a query that's still in planing, the user RPC thread of the foreman Drillbit will block waiting for the query to finish planing, this will cause any other query submitted to the same Drillbit to block as well. It's a known issue (actually a serious Drill limitation). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4678) Query HANG - SELECT DISTINCT over date data
Khurram Faraaz created DRILL-4678: - Summary: Query HANG - SELECT DISTINCT over date data Key: DRILL-4678 URL: https://issues.apache.org/jira/browse/DRILL-4678 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.7.0 Environment: 4 node cluster CentOS Reporter: Khurram Faraaz Below query hangs {noformat} 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('1993-08-18' AS DATE)), (CAST('1970-06-11' AS DATE)), (CAST('1970-06-11' AS DATE)), (CAST('1970-06-11' AS DATE)), (CAST('1970-06-11' AS DATE)), (CAST('1970-06-11' AS DATE)), (CAST('1959-10-23' AS DATE)), (CAST('1992-01-14' AS DATE)), (CAST('1994-07-24' AS DATE)), (CAST('1979-11-25' AS DATE)), (CAST('1945-01-14' AS DATE)), (CAST('1982-07-25' AS DATE)), (CAST('1966-09-06' AS DATE)), (CAST('1989-05-01' AS DATE)), (CAST('1996-03-08' AS DATE)), (CAST('1998-08-19' AS DATE)), (CAST('2013-08-13' AS DATE)), (CAST('2013-08-13' AS DATE)), (CAST('2013-08-13' AS DATE)), (CAST('2013-08-13' AS DATE)), (CAST('2013-08-13' AS DATE)), (CAST('2013-08-13' AS DATE)), (CAST('1999-07-20' AS DATE)), (CAST('1962-07-03' AS DATE)), (CAST('2011-08-17' AS DATE)), (CAST('2011-05-16' AS DATE)), (CAST('1946-05-08' AS DATE)), (CAST('1994-02-13' AS DATE)), (CAST('1978-08-09' AS DATE)), (CAST('1978-08-09' AS DATE)), (CAST('1978-08-09' AS DATE)), (CAST('1978-08-09' AS DATE)), (CAST('1958-02-06' AS DATE)), (CAST('2012-06-11' AS DATE)), (CAST('2012-06-11' AS DATE)), (CAST('2012-06-11' AS DATE)), (CAST('2012-06-11' AS DATE)), (CAST('1998-03-26' AS DATE)), (CAST('1996-11-04' AS DATE)), (CAST('1953-09-25' AS DATE)), (CAST('2003-06-17' AS DATE)), (CAST('2003-06-17' AS DATE)), (CAST('2003-06-17' AS DATE)), (CAST('2003-06-17' AS DATE)), (CAST('2003-06-17' AS DATE)), (CAST('1980-07-05' AS DATE)), (CAST('1982-06-15' AS DATE)), (CAST('1951-05-16' AS DATE))) tbl(dt) {noformat} Details from Web UI Profile tab, please note that the query is still in STARTING state {noformat} Running Queries TimeUserQuery State Foreman 05/16/2016 10:33:57 mapr SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199 STARTING centos-01.qa.lab {noformat} There is no other useful information in drillbit.log. jstack output is attached here for your reference. The same query works fine on Postgres 9.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)