gus-asf commented on PR #2379:
URL: https://github.com/apache/solr/pull/2379#issuecomment-2029799436

   One thing folks may want to review is the addition of synchronization around 
manipulations of HttpShardHandler#responseCancelableMap which were necessary to 
avoid CME such as:
   ```
   {
     "responseHeader":{
       "zkConnected":true,
       "status":500,
       "QTime":18,
       "params":{
         "q":"document",
         "indent":"true",
         "q.op":"OR",
         "timeAllowed":"5",
         "allowPartialResults":"false",
         "useParams":"",
         "_":"1711320289606"    }  },
     "error":{
       "trace":"java.util.ConcurrentModificationException
   at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
   at java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1625)
   at 
org.apache.solr.handler.component.HttpShardHandler.cancelAll(HttpShardHandler.java:257)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:577)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:238)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2886)
   at 
org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:876)
   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:560)
   at 
org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:254)
   at 
org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:215)
   at 
org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:247)
   at 
org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:211)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
   at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)
   at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635)
   ```
   
   This happens because the code is iterating an unsynchronized hash map when 
the request thread signals a cancel operation. I chose to also enclose 
pending.incrementAndGet() or decrementAndGet() and responses.take() and even 
though they are by themselves probablly safe, there's a loop on the value of 
pending > 0 and these three operations are logically linked so it seemed a bit 
dangerous to leave them independent.
   
   Also I noticed that ShardeResponse.code is being set, but the IDE flags it 
as "assigned but never read", and against the recommendation of the previously 
existing comments it's not volatile...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to