After fixing the docker tests, I believe all of the other Solr-Check and
Solr-Smoketest errors, that were the result of running Solr processes, have
gone away.
The MTLs issue still exists, and there are other issues with the
smoketest but at least there is progress.

We should definitely move the docker tests to use BATS so that we can have
better control over test cleanup. But that's not going to be a very easy
migration.

- Houston

On Thu, Oct 26, 2023 at 1:00 PM Houston Putman <hous...@apache.org> wrote:

> Ok, I think I fixed the docker tests. The other issues all still apply
> though.
>
> - Houston
>
> On Thu, Oct 26, 2023 at 12:16 PM Houston Putman <hous...@apache.org>
> wrote:
>
>> The Jenkins builds aren't in a great state right now.
>>
>> Currently the Solr-Check-main
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Check-main> build is
>> failing consistently because of random Solr processes being found on the
>> box (when the integration tests expect nothing else to be running). Now
>> that we have port randomization for the integration tests, its a very good
>> sign that the found Solr processes all use port 8983, meaning that we
>> aren't leaking Solrs in the integration tests.
>>
>> Because of this, the culprit seems to be that the smoke tests (which
>> still start a Solr on port 8983) are leaking processes, and looking at the
>> logs, that seems to be the case (Solr-Smoketest-9.4
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Smoketest-9.4>,
>> Solr-Smoketest-9.x
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Smoketest-9.x>). So
>> fixing the Smoketests leaking Solr processes will in turn fix both the
>> smoke test builds and the main check.
>>
>> As for the Solr-Check-9.x
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x> build, it is
>> running on Crave, so it doesn't have the same issue with leaked Solr
>> processes. However on crave, there seems to be an issue with the mTLS
>> tests. (Solr-Check-main also has this issue, but only on the lucene-solr-1
>> machine, not lucene-solr-2 strangely). We need to investigate why the TLS
>> tests pass locally for everyone (and on 1/2 of the Jenkins boxes), but not
>> on crave.
>>
>> Lastly, the Docker tests are broken in a very strange way. A while ago, I
>> added tests to make sure that the prometheus exporter can communicate
>> correctly in docker. This test seems to fail on both
>> Solr-Docker-Nightly-main
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Docker-Nightly-main> and
>> Solr-Docker-Nightly-9.x
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Docker-Nightly-9.x>. At
>> first I thought the issue was that the Jenkins servers had different Docker
>> networking that didn't support these tests, and I let it be for a bit. Now
>> we are running Solr-Docker-Nightly-9.4
>> <https://ci-builds.apache.org/job/Solr/job/Solr-Docker-Nightly-9.4>,
>> which has the same tests included and it passes. So it does seem like the
>> Jenkins servers allow us to use Docker networking in the ways we want, but
>> for some reason 9.x and 9.4 (which should be relatively identical) don't
>> behave the same way. Looking at the err logs, the problem is
>>
>>> /opt/solr/docker/scripts/docker-entrypoint.sh: line 48: exec:
>>> solr-exporter: not found
>>>
>> On the top of my head I think this might be using the slim docker image?
>> Because otherwise there's no reason why the solr exporter wouldn't be
>> there... (Also no idea why it wouldn't work the same on the 9.4 build...)
>>
>> Anyways, this is just a list of what's going on. I'll try to fix the
>> docker stuff, but would love help with the other builds!
>>
>> - Houston
>>
>

Reply via email to