Thanks Houston! ~ David
On Fri, Oct 27, 2023 at 2:03 PM Houston Putman <[email protected]> wrote: > After fixing the docker tests, I believe all of the other Solr-Check and > Solr-Smoketest errors, that were the result of running Solr processes, have > gone away. > The MTLs issue still exists, and there are other issues with the > smoketest but at least there is progress. > > We should definitely move the docker tests to use BATS so that we can have > better control over test cleanup. But that's not going to be a very easy > migration. > > - Houston > > On Thu, Oct 26, 2023 at 1:00 PM Houston Putman <[email protected]> wrote: > > > Ok, I think I fixed the docker tests. The other issues all still apply > > though. > > > > - Houston > > > > On Thu, Oct 26, 2023 at 12:16 PM Houston Putman <[email protected]> > > wrote: > > > >> The Jenkins builds aren't in a great state right now. > >> > >> Currently the Solr-Check-main > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Check-main> build is > >> failing consistently because of random Solr processes being found on the > >> box (when the integration tests expect nothing else to be running). Now > >> that we have port randomization for the integration tests, its a very > good > >> sign that the found Solr processes all use port 8983, meaning that we > >> aren't leaking Solrs in the integration tests. > >> > >> Because of this, the culprit seems to be that the smoke tests (which > >> still start a Solr on port 8983) are leaking processes, and looking at > the > >> logs, that seems to be the case (Solr-Smoketest-9.4 > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Smoketest-9.4>, > >> Solr-Smoketest-9.x > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Smoketest-9.x>). So > >> fixing the Smoketests leaking Solr processes will in turn fix both the > >> smoke test builds and the main check. > >> > >> As for the Solr-Check-9.x > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x> build, it is > >> running on Crave, so it doesn't have the same issue with leaked Solr > >> processes. However on crave, there seems to be an issue with the mTLS > >> tests. (Solr-Check-main also has this issue, but only on the > lucene-solr-1 > >> machine, not lucene-solr-2 strangely). We need to investigate why the > TLS > >> tests pass locally for everyone (and on 1/2 of the Jenkins boxes), but > not > >> on crave. > >> > >> Lastly, the Docker tests are broken in a very strange way. A while ago, > I > >> added tests to make sure that the prometheus exporter can communicate > >> correctly in docker. This test seems to fail on both > >> Solr-Docker-Nightly-main > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Docker-Nightly-main> > and > >> Solr-Docker-Nightly-9.x > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Docker-Nightly-9.x>. At > >> first I thought the issue was that the Jenkins servers had different > Docker > >> networking that didn't support these tests, and I let it be for a bit. > Now > >> we are running Solr-Docker-Nightly-9.4 > >> <https://ci-builds.apache.org/job/Solr/job/Solr-Docker-Nightly-9.4>, > >> which has the same tests included and it passes. So it does seem like > the > >> Jenkins servers allow us to use Docker networking in the ways we want, > but > >> for some reason 9.x and 9.4 (which should be relatively identical) don't > >> behave the same way. Looking at the err logs, the problem is > >> > >>> /opt/solr/docker/scripts/docker-entrypoint.sh: line 48: exec: > >>> solr-exporter: not found > >>> > >> On the top of my head I think this might be using the slim docker image? > >> Because otherwise there's no reason why the solr exporter wouldn't be > >> there... (Also no idea why it wouldn't work the same on the 9.4 > build...) > >> > >> Anyways, this is just a list of what's going on. I'll try to fix the > >> docker stuff, but would love help with the other builds! > >> > >> - Houston > >> > > >
