[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread adeneche
Github user adeneche commented on the pull request:

https://github.com/apache/drill/pull/503#issuecomment-219583987
  
I made a quick change to address Sudheesh concern about deferring 
processing events when the query is queued. This is a WIP to see if I am not 
missing anything obvious, will clean/rename/document once we agree this is a 
correct fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4681) ChannelClosedException causes all queries which are communicating on that channel to fail

2016-05-16 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4681:


 Summary: ChannelClosedException causes all queries which are 
communicating on that channel to fail 
 Key: DRILL-4681
 URL: https://issues.apache.org/jira/browse/DRILL-4681
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow, Execution - RPC
Affects Versions: 1.7.0
Reporter: Rahul Challapalli


commit # : 09b262776e965ea17a6a863801f7e1ee3e5b3d5a

Below is what I am describing:
1. One of the fragments cause a channel closed exception (due to an OOM or 
some other condition)
2. Drill fails all other fragments which are running at that time even 
though the fragments themselves eventually run to completion. At a high 
concurrency this could lead a lot of query failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63444968
  
--- Diff: common/src/main/java/org/apache/drill/common/EventProcessor.java 
---
@@ -37,6 +37,7 @@
 public abstract class EventProcessor {
   private final LinkedList queuedEvents = new LinkedList<>();
   private volatile boolean isProcessing = false;
+  private volatile boolean started;
--- End diff --

Explicitly set to false as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63444982
  
--- Diff: common/src/main/java/org/apache/drill/common/EventProcessor.java 
---
@@ -57,25 +58,35 @@ public EventProcessor() {
*/
   public void sendEvent(final T newEvent) {
 synchronized (queuedEvents) {
-  if (isProcessing) {
-queuedEvents.addLast(newEvent);
+  queuedEvents.addLast(newEvent);
+  if (!started || isProcessing) {
 return;
   }
 
   isProcessing = true;
 }
 
+processEvents();
+  }
+
+  public void start() {
+if (started) {
+  return;
+}
+
+synchronized (queuedEvents) {
+  started = true;
+  isProcessing = true;
+}
+
+processEvents();
+  }
+
+  private void processEvents() {
 @SuppressWarnings("resource")
 final DeferredException deferredException = new DeferredException();
-T event = newEvent;
+T event;
--- End diff --

Move this declaration inside the loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63444973
  
--- Diff: common/src/main/java/org/apache/drill/common/EventProcessor.java 
---
@@ -57,25 +58,35 @@ public EventProcessor() {
*/
   public void sendEvent(final T newEvent) {
 synchronized (queuedEvents) {
-  if (isProcessing) {
-queuedEvents.addLast(newEvent);
+  queuedEvents.addLast(newEvent);
+  if (!started || isProcessing) {
--- End diff --

Update the method docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63444805
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -296,7 +295,7 @@ public void run() {
* would wait on the cancelling thread to signal a resume and the 
cancelling thread would wait on the Foreman
* to accept events.
*/
-  acceptExternalEvents.countDown();
+  stateSwitch.start();
--- End diff --

that is true indeed, I didn't notice the Foreman was also using the event 
queue before it counted down the latch. I will update the fix to respect this 
behavior


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4573 Fixed issue with regexp_replace fun...

2016-05-16 Thread jinfengni
Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/502#issuecomment-219577333
  
@jcmcote , I'll take a look at the patch shortly. You are right that the 
build issue should not be caused by your code change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63442039
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -296,7 +295,7 @@ public void run() {
* would wait on the cancelling thread to signal a resume and the 
cancelling thread would wait on the Foreman
* to accept events.
*/
-  acceptExternalEvents.countDown();
+  stateSwitch.start();
--- End diff --

This defers all state transitions until the thread is done. This changes 
the behavior when queuing is enabled, the user will not know if the query has 
moved from ENQUEUED to STARTING.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread adeneche
Github user adeneche commented on the pull request:

https://github.com/apache/drill/pull/503#issuecomment-219571472
  
This is an alternative change that removes the blocking latch from Foreman. 
There is still a possibility for the foreman thread to block forever in case of 
a bug but this should no longer cause rpc threads to block as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63431381
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -1221,6 +1223,12 @@ public void interrupted(final InterruptedException 
e) {
  *   to the user
  */
 public void moveToState(final QueryState newState, final Exception ex) 
{
+  // if the current thread is the foreman thread, throw an exception
+  // otherwise the foreman will be blocked forever on 
acceptExternalEvents
+  if (myThreadRef == Thread.currentThread()) {
--- End diff --

We were assuming that the Foreman thread would never call 
Foreman.StateListener.moveToState() and it would be called by another (rpc) 
thread instead.
It turns out when the foreman is submitting remote fragments, RpcBus.send() 
could actually cause the foreman thread to call moveToState directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread hnfgns
Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63430594
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -1221,6 +1223,12 @@ public void interrupted(final InterruptedException 
e) {
  *   to the user
  */
 public void moveToState(final QueryState newState, final Exception ex) 
{
+  // if the current thread is the foreman thread, throw an exception
+  // otherwise the foreman will be blocked forever on 
acceptExternalEvents
+  if (myThreadRef == Thread.currentThread()) {
--- End diff --

Let me ask you this: what was the primary cause of deadlock came out of 
your investigation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4676: Foreman.moveToState can block fore...

2016-05-16 Thread hnfgns
Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/503#discussion_r63428428
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -1221,6 +1223,12 @@ public void interrupted(final InterruptedException 
e) {
  *   to the user
  */
 public void moveToState(final QueryState newState, final Exception ex) 
{
+  // if the current thread is the foreman thread, throw an exception
+  // otherwise the foreman will be blocked forever on 
acceptExternalEvents
+  if (myThreadRef == Thread.currentThread()) {
--- End diff --

Well, Foreman does not call moveToState in common path but when it fails 
then it could. My point is instead of hacking the method to throw an exception 
if the thread is foreman we should release the latch and handle the exception 
in foreman#run. What I mean is

Foreman#run() { 
```
try {
  do sth
} catch(ex) {
  release latch 
  handleException(ex)
} finally {
  ...
}
```





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4680) Cancel query that is in Planing state - user RPC thread of the foreman Drillbit will block waiting for query to finish planing

2016-05-16 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4680:
-

 Summary: Cancel query that is in Planing state - user RPC thread 
of the foreman Drillbit will block waiting for query to finish planing
 Key: DRILL-4680
 URL: https://issues.apache.org/jira/browse/DRILL-4680
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.7.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz


Unfortunately, if you try to cancel a query that's still in planing, the user 
RPC thread of the foreman Drillbit will block waiting for the query to finish 
planing, this will cause any other query submitted to the same Drillbit to 
block as well. It's a known issue (actually a serious Drill  limitation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4678) Query HANG - SELECT DISTINCT over date data

2016-05-16 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4678:
-

 Summary: Query HANG - SELECT DISTINCT over date data
 Key: DRILL-4678
 URL: https://issues.apache.org/jira/browse/DRILL-4678
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.7.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz


Below query hangs

{noformat}
2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
VALUES(CAST('1964-03-07' AS DATE)),
  (CAST('2002-03-04' AS DATE)),
  (CAST('1966-09-04' AS DATE)),
  (CAST('1993-08-18' AS DATE)),
  (CAST('1970-06-11' AS DATE)),
  (CAST('1970-06-11' AS DATE)),
  (CAST('1970-06-11' AS DATE)),
  (CAST('1970-06-11' AS DATE)),
  (CAST('1970-06-11' AS DATE)),
  (CAST('1959-10-23' AS DATE)),
  (CAST('1992-01-14' AS DATE)),
  (CAST('1994-07-24' AS DATE)),
  (CAST('1979-11-25' AS DATE)),
  (CAST('1945-01-14' AS DATE)),
  (CAST('1982-07-25' AS DATE)),
  (CAST('1966-09-06' AS DATE)),
  (CAST('1989-05-01' AS DATE)),
  (CAST('1996-03-08' AS DATE)),
  (CAST('1998-08-19' AS DATE)),
  (CAST('2013-08-13' AS DATE)),
  (CAST('2013-08-13' AS DATE)),
  (CAST('2013-08-13' AS DATE)),
  (CAST('2013-08-13' AS DATE)),
  (CAST('2013-08-13' AS DATE)),
  (CAST('2013-08-13' AS DATE)),
(CAST('1999-07-20' AS DATE)),
(CAST('1962-07-03' AS DATE)),
  (CAST('2011-08-17' AS DATE)),
  (CAST('2011-05-16' AS DATE)),
  (CAST('1946-05-08' AS DATE)),
  (CAST('1994-02-13' AS DATE)),
  (CAST('1978-08-09' AS DATE)),
  (CAST('1978-08-09' AS DATE)),
  (CAST('1978-08-09' AS DATE)),
  (CAST('1978-08-09' AS DATE)),
  (CAST('1958-02-06' AS DATE)),
  (CAST('2012-06-11' AS DATE)),
  (CAST('2012-06-11' AS DATE)),
  (CAST('2012-06-11' AS DATE)),
  (CAST('2012-06-11' AS DATE)),
  (CAST('1998-03-26' AS DATE)),
  (CAST('1996-11-04' AS DATE)),
  (CAST('1953-09-25' AS DATE)),
  (CAST('2003-06-17' AS DATE)),
  (CAST('2003-06-17' AS DATE)),
  (CAST('2003-06-17' AS DATE)),
  (CAST('2003-06-17' AS DATE)),
  (CAST('2003-06-17' AS DATE)),
  (CAST('1980-07-05' AS DATE)),
  (CAST('1982-06-15' AS DATE)),
  (CAST('1951-05-16' AS DATE)))
tbl(dt)
{noformat}

Details from Web UI Profile tab, please note that the query is still in 
STARTING state

{noformat}
Running Queries
TimeUserQuery   State   Foreman
05/16/2016 10:33:57 
mapr
 SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), 
(CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
STARTING
centos-01.qa.lab
{noformat}

There is no other useful information in drillbit.log. jstack output is attached 
here for your reference.

The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)