[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2018-05-03 Thread Hrishikesh Gadre (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462840#comment-16462840
 ] 

Hrishikesh Gadre edited comment on SOLR-7344 at 5/3/18 5:45 PM:


[~markrmil...@gmail.com]
{quote}Is this deadlock even an issue anymore?

We are Jetty 9 now and it only offers NIO connectors (so long thread per 
request). AFAIK that means requests waiting on IO don't hold a thread.
{quote}
In order to fully utilize NIO connector capability, the application needs to 
use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a 
good tutorial that you can take a look: 
https://docs.oracle.com/javaee/7/tutorial/servlets012.htm

Is it possible for us to use this feature for SOLR? Sure, but it will take a 
major rewrite of core parts of SOLR cloud (e.g. distributed querying, 
replication, remote queries etc.) as these components synchronously wait for 
the results of RPC calls. The servlet-request scheduler proposed in this Jira 
([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 
3 async API to queue up the requests overflowing the thread-pool capacity, 
ensuring that distributed deadlocks are avoided without requiring *any* change 
in the SOLR cloud functionality.

 


was (Author: hgadre):
[~markrmil...@gmail.com]
{quote}Is this deadlock even an issue anymore?

We are Jetty 9 now and it only offers NIO connectors (so long thread per 
request). AFAIK that means requests waiting on IO don't hold a thread.
{quote}
In order to fully utilize NIO connector capability, the application needs to 
use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a 
good tutorial that you can take a look: 
[https://www.javacodegeeks.com/2013/08/async-servlet-feature-of-servlet-3.html]

Is it possible for us to use this feature for SOLR? Sure, but it will take a 
major rewrite of core parts of SOLR cloud (e.g. distributed querying, 
replication, remote queries etc.) as these components synchronously wait for 
the results of RPC calls. The servlet-request scheduler proposed in this Jira 
([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 
3 async API to queue up the requests overflowing the thread-pool capacity, 
ensuring that distributed deadlocks are avoided without requiring *any* change 
in the SOLR cloud functionality.

 

> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Priority: Major
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2018-05-03 Thread Hrishikesh Gadre (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462840#comment-16462840
 ] 

Hrishikesh Gadre edited comment on SOLR-7344 at 5/3/18 5:43 PM:


[~markrmil...@gmail.com]
{quote}Is this deadlock even an issue anymore?

We are Jetty 9 now and it only offers NIO connectors (so long thread per 
request). AFAIK that means requests waiting on IO don't hold a thread.
{quote}
In order to fully utilize NIO connector capability, the application needs to 
use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a 
good tutorial that you can take a look: 
[https://www.javacodegeeks.com/2013/08/async-servlet-feature-of-servlet-3.html]

Is it possible for us to use this feature for SOLR? Sure, but it will take a 
major rewrite of core parts of SOLR cloud (e.g. distributed querying, 
replication, remote queries etc.) as these components synchronously wait for 
the results of RPC calls. The servlet-request scheduler proposed in this Jira 
([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 
3 async API to queue up the requests overflowing the thread-pool capacity, 
ensuring that distributed deadlocks are avoided without requiring *any* change 
in the SOLR cloud functionality.

 


was (Author: hgadre):
[~markrmil...@gmail.com]
{quote}Is this deadlock even an issue anymore?

We are Jetty 9 now and it only offers NIO connectors (so long thread per 
request). AFAIK that means requests waiting on IO don't hold a thread.
{quote}
In order to fully utilize NIO connector capability, the application needs to 
use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a 
good tutorial that you can take a look: 
[https://plumbr.io/blog/java/how-to-use-asynchronous-servlets-to-improve-performance]

Is it possible for us to use this feature for SOLR? Sure, but it will take a 
major rewrite of core parts of SOLR cloud (e.g. distributed querying, 
replication, remote queries etc.) as these components synchronously wait for 
the results of RPC calls. The servlet-request scheduler proposed in this Jira 
([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 
3 async API to queue up the requests overflowing the thread-pool capacity, 
ensuring that distributed deadlocks are avoided without requiring *any* change 
in the SOLR cloud functionality.

 

> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Priority: Major
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2015-10-17 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961970#comment-14961970
 ] 

Ramkumar Aiyengar edited comment on SOLR-7344 at 10/17/15 5:10 PM:
---

[~ysee...@gmail.com], [~markrmil...@gmail.com] and I got a chance to discuss 
this at Revolution. One approach we seemed to have consensus on (correct me 
guys if I am wrong) is:

 - We use a single, small, internal queue for non-streaming APIs.
 - We leave the limit high for streaming APIs and raise a new issue for (1) The 
ability to control the amount of nesting in Streaming APIs, and (2) To 
dynamically have N pools (Internal.1, Internal.2, , Internal.N), each with 
a small limit, where N is the amount of maximum nesting configured for 
Streaming APIs..


was (Author: andyetitmoves):
[~ysee...@gmail.com], [~markrmil...@gmail.com] and me got a chance to discuss 
this at Revolution. One approach we seem to have consensus on (correct me guys 
if I am wrong) is:

 - We use a single, small, internal queue for non-streaming APIs.
 - We leave the limit high for streaming APIs and raise a new issue for (1) The 
ability to control the amount of nesting in Streaming APIs, and (2) To 
dynamically have N pools (Internal.1, Internal.2, , Internal.N), each with 
a small limit, where N is the amount of maximum nesting configured for 
Streaming APIs..

> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2015-06-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586324#comment-14586324
 ] 

Yonik Seeley edited comment on SOLR-7344 at 6/15/15 5:17 PM:
-

Dude (mark), you're just being obstinate now.  Relying on timeouts to break 
distributed deadlock is horrible and will cause random unrelated requests to 
also fail.  We need a *separate* queue with a high limit (or a higher limit) 
for types of requests that can cause another request to be fired off.


was (Author: ysee...@gmail.com):
Dude, you're just being obstinate now.  Relying on timeouts to break 
distributed deadlock is horrible and will cause random unrelated requests to 
also fail.  We need a *separate* queue with a high limit (or a higher limit) 
for types of requests that can cause another request to be fired off.

> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2015-06-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585348#comment-14585348
 ] 

Yonik Seeley edited comment on SOLR-7344 at 6/15/15 12:40 AM:
--

bq. Good discussion. I hope we are still on track with this design ?

The general mechanism seems fine, yes.

But if this issue will also enable it by default, then the details still need 
to be nailed down.  What requests go to what queues, and which queues can be 
safely lowered to limit concurrency and increase throughput, and which queues 
need to be configured near upper bounds to work correctly.

Or we could split it up and limit this JIRA issue to the mechanism and punt 
enabling with good defaults to another issue.
edit: or enable but have high limits for all queues before we figure out what 
good defaults are (that should not be worse than the current scenario).  I just 
don't want to go backwards by committing something with bad defaults on a 
promise that we'll figure out the right thing later.



was (Author: ysee...@gmail.com):
bq. Good discussion. I hope we are still on track with this design ?

The general mechanism seems fine, yes.

But if this issue will also enable it by default, then the details still need 
to be nailed down.  What requests go to what queues, and which queues can be 
safely lowered to limit concurrency and increase throughput, and which queues 
need to be configured near upper bounds to work correctly.

Or we could split it up and limit this JIRA issue to the mechanism and punt 
enabling with good defaults to another issue.


> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2015-06-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585270#comment-14585270
 ] 

Mark Miller edited comment on SOLR-7344 at 6/14/15 9:10 PM:


bq. We don't want 10K threads today... we have to have it (a high limit) or 
we'll get into distributed deadlock. 

Yes, though these days, we have default connection timeouts and it should 
actually make progress (the timeouts are very long though).

bq. better than arbitrarily failing requests at some low level when the system 
still has resources it could use.

I don't agree with that. I think if a user wants that fine, but that by default 
we should run within something comfortable like 500-3000 threads or something, 
and reject above that. I'd way rather Jetty was friendly with the system by 
default and we didn't have the thread storms we can see now. To me, that is the 
point of this issue - reasonable thread limits, reject stuff that breaks them, 
allow a user to configure limits as high as they want. That's just a much nicer 
system IMO.

bq. it just means that we can't put a safe bound on it without risking 
distributed deadlock.

You shouldn't deadlock with this approach - the request that can't get a thread 
for it's type will simply time out on the queue and we can deny the request and 
alert the user they are using above normal resources and if they want to up the 
limits fine, but by default we avoid creating shit loads of threads.


was (Author: markrmil...@gmail.com):
bq. We don't want 10K threads today... we have to have it (a high limit) or 
we'll get into distributed deadlock. 

Yes, though these days, we have default connection timeouts and it should 
actually make progress (the timeouts are very long though).

bq. better than arbitrarily failing requests at some low level when the system 
still has resources it could use.

I don't agree with that. I think if a user wants that fine, but that by default 
we should run within something comfortable like 500-300 threads or something, 
and reject above that. I'd way rather Jetty was friendly with the system by 
default and we didn't have the thread storms we can see now. To me, that is the 
point of this issue - reasonable thread limits, reject stuff that breaks them, 
allow a user to configure limits as high as they want. That's just a much nicer 
system IMO.

bq. it just means that we can't put a safe bound on it without risking 
distributed deadlock.

You shouldn't deadlock with this approach - the request that can't get a thread 
for it's type will simply time out on the queue and we can deny the request and 
alert the user they are using above normal resources and if they want to up the 
limits fine, but by default we avoid creating shit loads of threads.

> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2015-06-14 Thread Hrishikesh Gadre (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585226#comment-14585226
 ] 

Hrishikesh Gadre edited comment on SOLR-7344 at 6/14/15 7:47 PM:
-

>>did this solve the distributed deadlock issue

Yes it would *solve* the distributed deadlock issue. Remember how deadlock can 
happen in the first place? 

- *All* worker threads are processing top level requests (either request 
forwarding or scatter-gather querying)
- During the request processing, they sent sub-requests and are waiting for the 
results of those requests.
- These sub requests can not be processed since there are no threads available 
for processing.

How would this approach fix the problem? - By allowing top-level requests to 
consume only a (configurable) portion of thread pool. This ensures that a 
portion of thread pool is available for processing sub requests. This is as 
good as having two thread pools.

>>did this address the need to limit concurrent requests without accidentally 
>>decreasing throughput for some request loads (think of the >>differences 
>>between high fanout and low fanout query request types for example).

It should. But this depends upon choosing the appropriate size for various 
request types.

>>did it make life harder for clients

The clients are not at all aware of this change. So I don't think it would be a 
problem. 



was (Author: hgadre):
>>did this solve the distributed deadlock issue

Yes it would *solve* the distributed deadlock issue. Remember how deadlock can 
happen in the first place? 

- *All* worker threads are processing top level requests (either request 
forwarding or scatter-gather querying)
- During the request processing, they sent sub-requests and are waiting for the 
results of those requests.
- These sub requests can not be processed since there are no threads available 
for processing.

How would this approach fix the problem? - By not allowing top-level requests 
to consume only a (configurable) portion of thread pool. This ensures that a 
portion of thread pool is available for processing sub requests. This is as 
good as having two thread pools.

>>did this address the need to limit concurrent requests without accidentally 
>>decreasing throughput for some request loads (think of the >>differences 
>>between high fanout and low fanout query request types for example).

It should. But this depends upon choosing the appropriate size for various 
request types.

>>did it make life harder for clients

The clients are not at all aware of this change. So I don't think it would be a 
problem. 


> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.

2015-06-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585190#comment-14585190
 ] 

Yonik Seeley edited comment on SOLR-7344 at 6/14/15 7:02 PM:
-

I don't think we should place much weight on internal enforcement.  Job #1 
should be: what will actually *work* best for our existing system right now, by 
default, and be the least invasive to clients (without counting internal solr 
code as clients).  I see discussions of mechanisms for tagging requests, but I 
still don't have an understanding of if the overall problem will be solved or 
not.

To recap the problem:
 1) We want to cap the number of certain types of requests executing 
concurrently for both flow control (see SOLR-7571) and to make more efficient 
use of resources.
 2) Solr makes requests to itself in various scenarios
 - distributed sub-requests (currently only one)
 - distributed updates (forwards to leaders, distributed updates
 - forwards of requests because the forwarder is not part of the target 
collection
 - Solr Streaming API: potentially unlimited nesting of requests (solr calling 
itself)

Can someone describe what the current proposal will actually look like (by 
default, including what queues would have what limits)?

Edit: this issue is getting big enough, I had missed Hrishikesh's message on 
the proposed queue types.
{quote} 
I think the tricky part here is to identify the appropriate thread-pool size 
for each of the partition. Please take a look and let me know any feedback.
{quote}
Indeed... it seems like this is what we need to be solving (what queues, what 
limits, what behavior over the limit).  Without that I can't even tell if we've 
solved the distributed-deadlock problem or not.


was (Author: ysee...@gmail.com):
I don't think we should place much weight on internal enforcement.  Job #1 
should be: what will actually *work* best for our existing system right now, by 
default, and be the least invasive to clients (without counting internal solr 
code as clients).  I see discussions of mechanisms for tagging requests, but I 
still don't have an understanding of if the overall problem will be solved or 
not.

To recap the problem:
 1) We want to cap the number of certain types of requests executing 
concurrently for both flow control (see SOLR-7571) and to make more efficient 
use of resources.
 2) Solr makes requests to itself in various scenarios
 - distributed sub-requests (currently only one)
 - distributed updates (forwards to leaders, distributed updates
 - forwards of requests because the forwarder is not part of the target 
collection
 - Solr Streaming API: potentially unlimited nesting of requests (solr calling 
itself)

Can someone describe what the current proposal will actually look like (by 
default, including what queues would have what limits)?


> Allow Jetty thread pool limits while still avoiding distributed deadlock.
> -
>
> Key: SOLR-7344
> URL: https://issues.apache.org/jira/browse/SOLR-7344
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
> Attachments: SOLR-7344.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org