[ https://issues.apache.org/jira/browse/SOLR-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768110#comment-17768110 ]
Chris M. Hostetter commented on SOLR-16992: ------------------------------------------- {quote}given the proliferation of this pattern across multiple classes, does it make sense to move this to a common util class and call it everywhere? not sure if possible but it would help nail it down once for all cases. {quote} yeah ... maybe some sort of {{public static <T> List<T> submitAllAndAwaitAggregatingExceptions(ExecutorService exec, List<Callable<T>> tasks) throws SolrException}} that returns the list of all the results from the {{Future.get()}} calls or throws a {{SolrException}} if one of more {{get()}} calls throws an exception, with the first exception encountered as the {{getCause()}} and the rest as {{getSuppressed()}} {quote}it feels like there is some overlap between these 2 ideas. not sure how the impls would vary but I like waiting for everything to finish and discarding any useless results (prioritize the exception). {quote} Hmmm... i'm not really sure how they would overlap? It seems to me we either: * wait for every Future, aggregated the results (and or throw an aggregated exceptions), then shutdown OR * as soon as any {{Future.get()}} call throws an exception, call {{shutdownNow}} (in place of the current {{{}shutdown{}}}) then {{awaitTermination()}} to make sure any currently executing {{StreamOpen}} instances are done, so ther is no risk they are still running when we throw that exception we caught {quote}> Change SolrClientCache so that any method that will add to solrClients throws an IllegalStateException if isClosed -0 it doesn't feel like this class is a correct place to tackle this. {quote} I don't really understand your response (particularly given your responses to my later ideas) so i'm suspicious that i was vague to the point of confusing you.IllegalStateException what i'm suggesting is that if this sequence of code happens... {code:java} SolrClientCache cache = new SolrClientCache(); cache.close(); if (cloudMode) { cache.getCloudSolrClient(zkHost); // this should throw IllegalStateException } else { cache.getHttpSolrClient(url); // and/or this should throw IllegalStateException } {code} {quote}I think it can be legal to call close twice (treat it as an idempotent operation), but not open after close that feels like a strong breach of api contract. {quote} correct. the {{Closeable.close()}} must be idempotent, but other methods in {{Closeable}} impls are allowed to behave any way we want if they are caled after {{close()}} {quote}Another idea can we shortcut `StreamOpener` to bail early if stream is closed? so we avoid some noise in the logs. {quote} Even if we re-wrote {{StreamOpener}} to look something like this, there would still be a concurrency race condition... {code:java} @Override public TupleWrapper call() throws Exception { if (stream.isClosed()) { throw new Exception("Stream closed"); } // RACE CONDITION: at this point some other thread could call stream.close() stream.open(); TupleWrapper wrapper = new TupleWrapper(stream, comp); ... {code} checking "isClosed ?" really needs to happen internally to the logic in the {{TupleStream}} instances, and it either needs to happen in blocks {{syncrhonized}} by the same gate, or using a {{private volatile boolean isClosed}} > Non-reproducible StreamingTest failures -- suggests CloudSolrStream > concurency race condition > --------------------------------------------------------------------------------------------- > > Key: SOLR-16992 > URL: https://issues.apache.org/jira/browse/SOLR-16992 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > Attachments: > OUTPUT-org.apache.solr.client.solrj.io.stream.StreamingTest.txt, > thetaphi_solr_Solr-main-Linux_14679.log.txt > > > Roughly 3% of all jenkins jobs that run {{StreamingTest}} wind up having > suite level failures. > These failures have historically taken the form of > {{com.carrotsearch.randomizedtesting.ThreadLeakError}} and the leaked threads > all have names like > {{"h2sc-718-thread-2"}} indicating that they come from the internal > {{ExecutorService}} of an {{{}Http2SolrClient{}}}. > In my experience, the seeds from these failures have never reproduced - > suggesting that the problem is related to concurrency. > SOLR-16983 restored the (correct) use of {{ObjectReleaseTracker}} which in > theory should help pinpoint where {{Http2SolrClient}} instances might not be > getting closed (by causing {{ObjectReleaseTracker}} to fail with stacktraces > of when/where any unclosed instances were created - ie: which test method) > In practice, I have managed to force one failure from {{StreamingTest}} since > the SOLR-16983 changes (logs to be attached soon) - but it still didn't > indicate any leaked/unclosed {{Http2SolrClient}} instances. What it instead > indicated was a _single_ unclosed {{InputStream}} instance related to > {{Http2SolrClient}} connections (SOLR-16983 also added better tracking of > this) coming from {{StreamingTest.testExceptionStream}} - a test method that > opens _five_ very similar {{ExceptionStream}} instances, wrapping > {{CloudSolrStream}} instance, which expect to trigger server side errors. > By it's very design, {{ExceptionStream}} catches & records any exceptions > from the stream it wraps, so even in the event of these "expected" server > side errors, {{ExceptionStream.close()}} should still be correctly getting > called (and propagating down to the {{CloudStream}} it wraps). > I believe the underlying problem has to do with a concurrency race condition > between the call to {{CloudStream.close()}} and the {{ExecutorService}} used > internally by {{CloudSolrStream.openStreams()}} (details to follow) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org