Re: ConcurrentUpdate issue of solr 8.8.1

2021-05-26 Thread Mark Miller
You can run into this fairly easily even if you are doing everything right
and none of your operations are intrinsically slow or problematic.

DeleteByQuery, regardless of its cost, tends to be a large contributor,
though not always. You can mitigate a bit with cautious, controlled use of
it.

I’m not surprised that http2 is even more prone to also being involved,
though I didn’t think that client was yet using an http2 version, so that
is a bit surprising, but these things can bleed over quite easily. Even
more so in new Jetty versions.

Jetty client -> server communication (and http2 in general) can be much
more picky around handling connections that are not tightly managed for
resuse under http2 (which can multiplex many requests over a single
connection). If you don’t fully read input/output streams for example, the
server doesn’t know that you don’t intend to finish dealing with your
stream. It will wait some amount of time. And then it will whack that
connection. There are all sorts of things that can manifest from this
depending on all kinds of factors, but one of them is your client server
communication can be hosed for a bit. Similarity things can happen even if
you do always keep your connection pool connections in shape if say, you
set a content length header that doesn’t match the content. You can do a
lot poorly all day and hardly notice a peep unless you turn on debug
logging for jetty or monitor tcp stats. And most of the time, things won’t
be terrible as a result of it either. But every now and then you get pretty
annoying consequences. And if you have something aggravating in the mix,
maybe more than now and then.

As I said though, you can run into this dist stall issue from a variety of
ways, you can march down the list of what can cause it and low and behold,
there that stall is again for another reason.

I would try and tune towards what is working for you - http1 appears better
- go with it. If you can use delete by query more sparingly, I would
-perhaps batch them when updates are not so common, that’s a big
instigator. Careful mixing in commits with it.

You can check and make sure your server idle timeouts are higher than the
clients - that can help a bit.

If you are using the cloud client or even if you are not, you can hash on
ids client side and send updates right to the leader they belong on to help
reduce extraneous zig zag update traffic.

In an intensive test world, this takes a large, multi pronged path to fully
eliminate. In a production world under your control, you should be able to
find a reasonable result with a bit of what you have been exploring and
some acceptance for imperfection or use adjustment.

Things can also change as Jetty dependencies are updated - though I will
say in my experience not often for the better. A positive if one is working
on development, perhaps less so in production. And again, even still, http
1.1 tends to be more forgiving. And delete by query and some other issues
are still a bit more persistent, but less likely to move backwards or
unexpectedly on you as the more ethereal connection issues and spurious EOF
parsing exceptions.

Many users are not heavily troubled by the above, so likely you will find a
workable setup. That background is to essentially say, you may find clear
sky’s tomorrow, but also, it’s not just some setup or use issue on your
end, and also, while a single fix might set you up, keep in mind a single
fix may be what your waiting for only by chance of your situation, so be
proactive.

MRM

On Wed, May 19, 2021 at 8:48 AM Ding, Lehong (external - Project)
 wrote:

> *Background:*
>
> Before we moving to solr 8.8.1 from 7.7.2, we performed some performance
> test on solr 8.8.1. We met a lot of concurrent update error in solr log.
>
> *Envrironment:*
>
> solrCloud with 3 cluster nodes with 500 collections, 5 collections have
> about 10m documents.
>
> (1 shard, 3 replica)
>
> *Threads:*
>
> 30 update/add threads + 10 deleteByQuery threads
>
> *Results:*
>
> During deleteByQuery thread runing, only one node (lead node) has update
> transactions, but other two node has none .
>
> *Errrors:  *
>
> java.io.IOException: Request processing has stalled for 20091ms with 100
> remaining elements in the queue.java.io.IOException: Request processing has
> stalled for 20091ms with 100 remaining elements in the queue. at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:449)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) at
> org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:345)
> at
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:338)
> at
> org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:244)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribAdd(DistributedZkUpdateProcessor.java:300)
> at
> 

Solr/Lucene joint development workflow?

2021-05-26 Thread Michael Gibney
I'm working on some features that involve changes to both Lucene and
Solr. Post-TLP-split, I'm wondering whether anyone has recommended
techniques to handle this kind of situation.

Ideally one would work on Lucene changes, get them merged, and then
proceed with Solr development; but realistically even if this were as
straightforward in practice as it sounds in principal, there are cases
where one would still want to develop in parallel.

I haven't been able to find any documented recommendation on this
subject. It's possible to have a locally built Lucene snapshot (via
`gradlew mavenToLocalRepo`); but I was only able to actually _build_
Solr against the local Lucene artifact by adding `mavenLocal()` to the
`allprojects/repositories` block in `gradle/defaults.gradle` -- and I
have yet to figure a way get the local Lucene artifact on the test
classpath (so I'm as yet unable to run Solr tests that depend on
unmerged upstream changes to Lucene).

It's also possible that the partially-functional approach described
above will have to change now that Solr main depends on a specific
Lucene snapshot version.

Is anybody doing something like this? Or perhaps I'm asking the wrong
question? I can think of solutions that involve setting up my own
maven repository, to which I publish my own pinned versions of Lucene,
and refer to such pinned versions/repo as part of a given Solr
"patch". But that feels both half-baked _and_ bloated, so I don't want
to go down that road unless I feel convinced there's no better
alternative.

Michael

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: [JENKINS] Solr-main-Linux (64bit/jdk-16) - Build # 727 - Unstable!

2021-05-26 Thread David Smiley
I'm investigating this one; it's related to a change I made.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, May 25, 2021 at 12:06 PM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Solr-main-Linux/727/
> Java: 64bit/jdk-16 -XX:+UseCompressedOops -XX:+UseSerialGC
>
> 1 tests failed.
> FAILED:
> org.apache.solr.handler.component.DistributedFacetPivotLongTailTest.test
>
> Error Message:
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:
> Error from server at http://127.0.0.1:44397/in_a/js/collection1: / by zero
>
> Stack Trace:
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:
> Error from server at http://127.0.0.1:44397/in_a/js/collection1: / by zero
> at
> __randomizedtesting.SeedInfo.seed([61BD6FC847FCC61F:E9E95012E900ABE7]:0)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:698)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229)
> at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1004)
> at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1019)
> at
> org.apache.solr.BaseDistributedSearchTestCase.queryServer(BaseDistributedSearchTestCase.java:625)
> at
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:677)
> at
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:655)
> at
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:634)
> at
> org.apache.solr.handler.component.DistributedFacetPivotLongTailTest.doTestDeepPivotStats(DistributedFacetPivotLongTailTest.java:273)
> at
> org.apache.solr.handler.component.DistributedFacetPivotLongTailTest.test(DistributedFacetPivotLongTailTest.java:67)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1087)
> at
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1058)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at
> 

RE: Misconfigured gradle rat inputs?

2021-05-26 Thread Uwe Schindler
I opened:

https://issues.apache.org/jira/browse/LUCENE-9977

 

and for Solr:

https://issues.apache.org/jira/browse/SOLR-15436

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Uwe Schindler  
Sent: Wednesday, May 26, 2021 11:01 AM
To: d...@lucene.apache.org
Subject: RE: Misconfigured gradle rat inputs?

 

Hi,

 

I tried to fix the problem but gave up because of limited time.

 

The problem is that this task is global per project and not split into 
different ones, it’s all mixed together. It works if I add a @InputDirectory 
with the projectDir, but this leads to strange exceptions, because it also 
tries to hash files from build directory.

 

IMHO, the correct way to fix this is:

*   Generate a generic RatTask that extends SourceTask (do NOT extend 
DefaultTask!). This brings a source directory and include/exclude automatically 
so it’s easiy to configure. All you need is to use the converter when executing 
the task, that changes a FileCollection to an ANT fileset: 
this.getSource().addToAntBuilder​(antTaskDeclaration, ”fileset”, 
FileCollection.AntType.FileSet)
*   Create a separate task for each affected sourceset and also one for the 
base project dir, each with correct includes/excludes

 

I gave up, as my time was limited and I was not able to quickly split the task 
into one for each surceset

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de  

 

From: Dawid Weiss mailto:dawid.we...@gmail.com> > 
Sent: Wednesday, May 26, 2021 7:29 AM
To: Lucene Dev mailto:d...@lucene.apache.org> >
Subject: Re: Misconfigured gradle rat inputs?

 

 

Yep, looks like missing inputs and the task is skipped. Good catch, Alan. I can 
look a this later too, Uwe (just create an issue and assign to me).

 

On Tue, May 25, 2021 at 7:40 PM Uwe Schindler mailto:u...@thetaphi.de> > wrote:

Hi,

the problem is that the RAT task has only some patterns for filenames as input, 
but no actual @InputDirectory. If the files change, nothing changes from 
Gradle's point of view.

The validation/rat-sources.gradle and its inner class "RatTask" must at least 
declare @InputDirectory with a default value of ${project.projectDir}. This 
would cause any change to retrigger the task.

To make it more professional, it should declare a FileCollection and apply the 
patterns, but that's more complicated as this task just wraps the native Ant 
RAT task.

I can fix this (must be done for Solr and Lucene, both have the problem).

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de  

> -Original Message-
> From: Alan Woodward mailto:romseyg...@gmail.com> >
> Sent: Tuesday, May 25, 2021 7:26 PM
> To: d...@lucene.apache.org  
> Subject: Misconfigured gradle rat inputs?
> 
> There’s a subject line I never thought I’d type :)
> 
> Firstly: can I say how much I appreciate all the work that’s gone into the 
> gradle
> build? I’ve been doing lots of small PRs for the spans-to-queries work and 
> being
> able to run checks multiple times in an extremely efficient manner has been a
> life saver.  Massive thanks to Dawid, and also to Robert for all the work on
> speeding up tests.
> 
> I think may have found a bug in the input configuration for our license header
> checks.  Thanks to the new build, I have been running `./gradlew check` before
> pushing code, but it has let through files with missing headers a few times,
> which were subsequently caught by the GitHub action running on the PR.
> 
> So I tried the following:
> - start a new git branch
> - run ./gradlew rat -> everything should pass
> - edit one of the files to remove the license header
> - run ./gradlew rat -> still passes!
> - run ./gradlew clean
> - run ./gradlew rat -> now I get an error
> 
> This looks to me like the fileset that the rat task is looking at is not set 
> up
> correctly, but I don’t know enough gradle to actually work out what is wrong
> and what the fix should be.
> 
> - A
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>  
> For additional commands, e-mail: dev-h...@lucene.apache.org 
>  


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
 
For additional commands, e-mail: dev-h...@lucene.apache.org