[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462840#comment-16462840 ] Hrishikesh Gadre edited comment on SOLR-7344 at 5/3/18 5:45 PM: [~markrmil...@gmail.com] {quote}Is this deadlock even an issue anymore? We are Jetty 9 now and it only offers NIO connectors (so long thread per request). AFAIK that means requests waiting on IO don't hold a thread. {quote} In order to fully utilize NIO connector capability, the application needs to use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a good tutorial that you can take a look: https://docs.oracle.com/javaee/7/tutorial/servlets012.htm Is it possible for us to use this feature for SOLR? Sure, but it will take a major rewrite of core parts of SOLR cloud (e.g. distributed querying, replication, remote queries etc.) as these components synchronously wait for the results of RPC calls. The servlet-request scheduler proposed in this Jira ([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 3 async API to queue up the requests overflowing the thread-pool capacity, ensuring that distributed deadlocks are avoided without requiring *any* change in the SOLR cloud functionality. was (Author: hgadre): [~markrmil...@gmail.com] {quote}Is this deadlock even an issue anymore? We are Jetty 9 now and it only offers NIO connectors (so long thread per request). AFAIK that means requests waiting on IO don't hold a thread. {quote} In order to fully utilize NIO connector capability, the application needs to use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a good tutorial that you can take a look: [https://www.javacodegeeks.com/2013/08/async-servlet-feature-of-servlet-3.html] Is it possible for us to use this feature for SOLR? Sure, but it will take a major rewrite of core parts of SOLR cloud (e.g. distributed querying, replication, remote queries etc.) as these components synchronously wait for the results of RPC calls. The servlet-request scheduler proposed in this Jira ([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 3 async API to queue up the requests overflowing the thread-pool capacity, ensuring that distributed deadlocks are avoided without requiring *any* change in the SOLR cloud functionality. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller >Priority: Major > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462840#comment-16462840 ] Hrishikesh Gadre edited comment on SOLR-7344 at 5/3/18 5:43 PM: [~markrmil...@gmail.com] {quote}Is this deadlock even an issue anymore? We are Jetty 9 now and it only offers NIO connectors (so long thread per request). AFAIK that means requests waiting on IO don't hold a thread. {quote} In order to fully utilize NIO connector capability, the application needs to use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a good tutorial that you can take a look: [https://www.javacodegeeks.com/2013/08/async-servlet-feature-of-servlet-3.html] Is it possible for us to use this feature for SOLR? Sure, but it will take a major rewrite of core parts of SOLR cloud (e.g. distributed querying, replication, remote queries etc.) as these components synchronously wait for the results of RPC calls. The servlet-request scheduler proposed in this Jira ([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 3 async API to queue up the requests overflowing the thread-pool capacity, ensuring that distributed deadlocks are avoided without requiring *any* change in the SOLR cloud functionality. was (Author: hgadre): [~markrmil...@gmail.com] {quote}Is this deadlock even an issue anymore? We are Jetty 9 now and it only offers NIO connectors (so long thread per request). AFAIK that means requests waiting on IO don't hold a thread. {quote} In order to fully utilize NIO connector capability, the application needs to use asynchronous servlet APIs (provided as part of Servlet 3 spec). Here is a good tutorial that you can take a look: [https://plumbr.io/blog/java/how-to-use-asynchronous-servlets-to-improve-performance] Is it possible for us to use this feature for SOLR? Sure, but it will take a major rewrite of core parts of SOLR cloud (e.g. distributed querying, replication, remote queries etc.) as these components synchronously wait for the results of RPC calls. The servlet-request scheduler proposed in this Jira ([https://github.com/hgadre/servletrequest-scheduler)] internally uses servlet 3 async API to queue up the requests overflowing the thread-pool capacity, ensuring that distributed deadlocks are avoided without requiring *any* change in the SOLR cloud functionality. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller >Priority: Major > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961970#comment-14961970 ] Ramkumar Aiyengar edited comment on SOLR-7344 at 10/17/15 5:10 PM: --- [~ysee...@gmail.com], [~markrmil...@gmail.com] and I got a chance to discuss this at Revolution. One approach we seemed to have consensus on (correct me guys if I am wrong) is: - We use a single, small, internal queue for non-streaming APIs. - We leave the limit high for streaming APIs and raise a new issue for (1) The ability to control the amount of nesting in Streaming APIs, and (2) To dynamically have N pools (Internal.1, Internal.2, , Internal.N), each with a small limit, where N is the amount of maximum nesting configured for Streaming APIs.. was (Author: andyetitmoves): [~ysee...@gmail.com], [~markrmil...@gmail.com] and me got a chance to discuss this at Revolution. One approach we seem to have consensus on (correct me guys if I am wrong) is: - We use a single, small, internal queue for non-streaming APIs. - We leave the limit high for streaming APIs and raise a new issue for (1) The ability to control the amount of nesting in Streaming APIs, and (2) To dynamically have N pools (Internal.1, Internal.2, , Internal.N), each with a small limit, where N is the amount of maximum nesting configured for Streaming APIs.. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586324#comment-14586324 ] Yonik Seeley edited comment on SOLR-7344 at 6/15/15 5:17 PM: - Dude (mark), you're just being obstinate now. Relying on timeouts to break distributed deadlock is horrible and will cause random unrelated requests to also fail. We need a *separate* queue with a high limit (or a higher limit) for types of requests that can cause another request to be fired off. was (Author: ysee...@gmail.com): Dude, you're just being obstinate now. Relying on timeouts to break distributed deadlock is horrible and will cause random unrelated requests to also fail. We need a *separate* queue with a high limit (or a higher limit) for types of requests that can cause another request to be fired off. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585348#comment-14585348 ] Yonik Seeley edited comment on SOLR-7344 at 6/15/15 12:40 AM: -- bq. Good discussion. I hope we are still on track with this design ? The general mechanism seems fine, yes. But if this issue will also enable it by default, then the details still need to be nailed down. What requests go to what queues, and which queues can be safely lowered to limit concurrency and increase throughput, and which queues need to be configured near upper bounds to work correctly. Or we could split it up and limit this JIRA issue to the mechanism and punt enabling with good defaults to another issue. edit: or enable but have high limits for all queues before we figure out what good defaults are (that should not be worse than the current scenario). I just don't want to go backwards by committing something with bad defaults on a promise that we'll figure out the right thing later. was (Author: ysee...@gmail.com): bq. Good discussion. I hope we are still on track with this design ? The general mechanism seems fine, yes. But if this issue will also enable it by default, then the details still need to be nailed down. What requests go to what queues, and which queues can be safely lowered to limit concurrency and increase throughput, and which queues need to be configured near upper bounds to work correctly. Or we could split it up and limit this JIRA issue to the mechanism and punt enabling with good defaults to another issue. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585270#comment-14585270 ] Mark Miller edited comment on SOLR-7344 at 6/14/15 9:10 PM: bq. We don't want 10K threads today... we have to have it (a high limit) or we'll get into distributed deadlock. Yes, though these days, we have default connection timeouts and it should actually make progress (the timeouts are very long though). bq. better than arbitrarily failing requests at some low level when the system still has resources it could use. I don't agree with that. I think if a user wants that fine, but that by default we should run within something comfortable like 500-3000 threads or something, and reject above that. I'd way rather Jetty was friendly with the system by default and we didn't have the thread storms we can see now. To me, that is the point of this issue - reasonable thread limits, reject stuff that breaks them, allow a user to configure limits as high as they want. That's just a much nicer system IMO. bq. it just means that we can't put a safe bound on it without risking distributed deadlock. You shouldn't deadlock with this approach - the request that can't get a thread for it's type will simply time out on the queue and we can deny the request and alert the user they are using above normal resources and if they want to up the limits fine, but by default we avoid creating shit loads of threads. was (Author: markrmil...@gmail.com): bq. We don't want 10K threads today... we have to have it (a high limit) or we'll get into distributed deadlock. Yes, though these days, we have default connection timeouts and it should actually make progress (the timeouts are very long though). bq. better than arbitrarily failing requests at some low level when the system still has resources it could use. I don't agree with that. I think if a user wants that fine, but that by default we should run within something comfortable like 500-300 threads or something, and reject above that. I'd way rather Jetty was friendly with the system by default and we didn't have the thread storms we can see now. To me, that is the point of this issue - reasonable thread limits, reject stuff that breaks them, allow a user to configure limits as high as they want. That's just a much nicer system IMO. bq. it just means that we can't put a safe bound on it without risking distributed deadlock. You shouldn't deadlock with this approach - the request that can't get a thread for it's type will simply time out on the queue and we can deny the request and alert the user they are using above normal resources and if they want to up the limits fine, but by default we avoid creating shit loads of threads. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585226#comment-14585226 ] Hrishikesh Gadre edited comment on SOLR-7344 at 6/14/15 7:47 PM: - >>did this solve the distributed deadlock issue Yes it would *solve* the distributed deadlock issue. Remember how deadlock can happen in the first place? - *All* worker threads are processing top level requests (either request forwarding or scatter-gather querying) - During the request processing, they sent sub-requests and are waiting for the results of those requests. - These sub requests can not be processed since there are no threads available for processing. How would this approach fix the problem? - By allowing top-level requests to consume only a (configurable) portion of thread pool. This ensures that a portion of thread pool is available for processing sub requests. This is as good as having two thread pools. >>did this address the need to limit concurrent requests without accidentally >>decreasing throughput for some request loads (think of the >>differences >>between high fanout and low fanout query request types for example). It should. But this depends upon choosing the appropriate size for various request types. >>did it make life harder for clients The clients are not at all aware of this change. So I don't think it would be a problem. was (Author: hgadre): >>did this solve the distributed deadlock issue Yes it would *solve* the distributed deadlock issue. Remember how deadlock can happen in the first place? - *All* worker threads are processing top level requests (either request forwarding or scatter-gather querying) - During the request processing, they sent sub-requests and are waiting for the results of those requests. - These sub requests can not be processed since there are no threads available for processing. How would this approach fix the problem? - By not allowing top-level requests to consume only a (configurable) portion of thread pool. This ensures that a portion of thread pool is available for processing sub requests. This is as good as having two thread pools. >>did this address the need to limit concurrent requests without accidentally >>decreasing throughput for some request loads (think of the >>differences >>between high fanout and low fanout query request types for example). It should. But this depends upon choosing the appropriate size for various request types. >>did it make life harder for clients The clients are not at all aware of this change. So I don't think it would be a problem. > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7344) Allow Jetty thread pool limits while still avoiding distributed deadlock.
[ https://issues.apache.org/jira/browse/SOLR-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585190#comment-14585190 ] Yonik Seeley edited comment on SOLR-7344 at 6/14/15 7:02 PM: - I don't think we should place much weight on internal enforcement. Job #1 should be: what will actually *work* best for our existing system right now, by default, and be the least invasive to clients (without counting internal solr code as clients). I see discussions of mechanisms for tagging requests, but I still don't have an understanding of if the overall problem will be solved or not. To recap the problem: 1) We want to cap the number of certain types of requests executing concurrently for both flow control (see SOLR-7571) and to make more efficient use of resources. 2) Solr makes requests to itself in various scenarios - distributed sub-requests (currently only one) - distributed updates (forwards to leaders, distributed updates - forwards of requests because the forwarder is not part of the target collection - Solr Streaming API: potentially unlimited nesting of requests (solr calling itself) Can someone describe what the current proposal will actually look like (by default, including what queues would have what limits)? Edit: this issue is getting big enough, I had missed Hrishikesh's message on the proposed queue types. {quote} I think the tricky part here is to identify the appropriate thread-pool size for each of the partition. Please take a look and let me know any feedback. {quote} Indeed... it seems like this is what we need to be solving (what queues, what limits, what behavior over the limit). Without that I can't even tell if we've solved the distributed-deadlock problem or not. was (Author: ysee...@gmail.com): I don't think we should place much weight on internal enforcement. Job #1 should be: what will actually *work* best for our existing system right now, by default, and be the least invasive to clients (without counting internal solr code as clients). I see discussions of mechanisms for tagging requests, but I still don't have an understanding of if the overall problem will be solved or not. To recap the problem: 1) We want to cap the number of certain types of requests executing concurrently for both flow control (see SOLR-7571) and to make more efficient use of resources. 2) Solr makes requests to itself in various scenarios - distributed sub-requests (currently only one) - distributed updates (forwards to leaders, distributed updates - forwards of requests because the forwarder is not part of the target collection - Solr Streaming API: potentially unlimited nesting of requests (solr calling itself) Can someone describe what the current proposal will actually look like (by default, including what queues would have what limits)? > Allow Jetty thread pool limits while still avoiding distributed deadlock. > - > > Key: SOLR-7344 > URL: https://issues.apache.org/jira/browse/SOLR-7344 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller > Attachments: SOLR-7344.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org