damccorm commented on PR #35216: URL: https://github.com/apache/beam/pull/35216#issuecomment-2997045270
> Hey @damccorm, > > Most of the main tasks in this PR are nearly complete. However, the `beam_PreCommit_Python_ML` job is currently failing due to an issue related to Docker container port mapping. > > I've already tried a few workarounds such as ensuring there are no port conflicts and it turned out this is not the issue. This runs fine on a typical GitHub hosted Ubuntu runner. The runner for this `beam_PreCommit_Python_ML` job is self-hosted which leads me to mostly think this is likely a configuration issue specific to the self-hosted environment. > > Given I have little input into the setup or computing environment of the self-hosted runner for this job, How am I supposed to troubleshoot and debug this further? > > ``` > cls = <class 'apache_beam.ml.rag.enrichment.milvus_search_it_test.TestMilvusSearchEnrichment'> > > @classmethod > def setUpClass(cls): > > cls._db = MilvusEnrichmentTestHelper.start_db_container(cls._version) > > apache_beam/ml/rag/enrichment/milvus_search_it_test.py:420: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > apache_beam/ml/rag/enrichment/milvus_search_it_test.py:273: in start_db_container > vector_db_container.start() > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/milvus/__init__.py:83: in start > self._connect() > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/core/waiting_utils.py:59: in wrapper > return wrapped(*args, **kwargs) > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/milvus/__init__.py:65: in _connect > self._healthcheck() > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/core/waiting_utils.py:59: in wrapper > return wrapped(*args, **kwargs) > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/milvus/__init__.py:74: in _healthcheck > healthcheck_url = self._get_healthcheck_url() > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/milvus/__init__.py:69: in _get_healthcheck_url > port = self.get_exposed_port(self.healthcheck_port) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > wrapped = <bound method DockerContainer.get_exposed_port of <testcontainers.milvus.MilvusContainer object at 0x7e27846ac4f0>> > instance = <testcontainers.milvus.MilvusContainer object at 0x7e27846ac4f0> > args = (9091,), kwargs = {} > > @wrapt.decorator > def wrapper(wrapped: Callable, instance: Any, args: list, kwargs: dict) -> Any: > from testcontainers.core.container import DockerContainer > > if isinstance(instance, DockerContainer): > logger.info("Waiting for container %s with image %s to be ready ...", instance._container, instance.image) > else: > logger.info("Waiting for %s to be ready ...", instance) > > exception = None > for attempt_no in range(config.max_tries): > try: > return wrapped(*args, **kwargs) > except transient_exceptions as e: > logger.debug( > f"Connection attempt '{attempt_no + 1}' of '{config.max_tries + 1}' " > f"failed: {traceback.format_exc()}" > ) > > time.sleep(config.sleep_time) > E Failed: Timeout (>600.0s) from pytest-timeout. > > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/core/waiting_utils.py:65: Failed > _ ERROR at setup of TestMilvusSearchEnrichment.test_vector_search_with_inner_product_similarity _ > [gw2] linux -- Python 3.9.22 /runner/_work/beam/beam/sdks/python/test-suites/tox/py39/build/srcs/sdks/python/target/.tox-py39-ml/py39-ml/bin/python > > wrapped = <bound method DockerContainer.get_exposed_port of <testcontainers.milvus.MilvusContainer object at 0x7e27846ac4f0>> > instance = <testcontainers.milvus.MilvusContainer object at 0x7e27846ac4f0> > args = (9091,), kwargs = {} > > @wrapt.decorator > def wrapper(wrapped: Callable, instance: Any, args: list, kwargs: dict) -> Any: > from testcontainers.core.container import DockerContainer > > if isinstance(instance, DockerContainer): > logger.info("Waiting for container %s with image %s to be ready ...", instance._container, instance.image) > else: > logger.info("Waiting for %s to be ready ...", instance) > > exception = None > > for attempt_no in range(config.max_tries): > try: > > return wrapped(*args, **kwargs) > > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/core/waiting_utils.py:59: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > target/.tox-py39-ml/py39-ml/lib/python3.9/site-packages/testcontainers/core/container.py:155: in get_exposed_port > return self.get_docker_client().port(self._container.id, port) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > self = <testcontainers.core.docker_client.DockerClient object at 0x7e27846acdc0> > container_id = 'a261d8725a7cc1549[4082](https://github.com/apache/beam/actions/runs/15813470637/job/44568183065?pr=35216#step:8:4084)8cafa5b084f506ab9a4f7c8b852c92be5d98638abdd' > port = 9091 > > def port(self, container_id: str, port: int) -> int: > """ > Lookup the public-facing port that is NAT-ed to :code:`port`. > """ > port_mappings = self.client.api.port(container_id, port) > if not port_mappings: > > raise ConnectionError(f"Port mapping for container {container_id} and port {port} is " "not available") > E ConnectionError: Port mapping for container a261d8725a7cc154940828cafa5b084f506ab9a4f7c8b852c92be5d98638abdd and port 9091 is not available > ``` It looks to me like this comes from specifically the healthcheck port - https://github.com/testcontainers/testcontainers-python/blob/f467c842b851613b9a087bd5f9a08d8c39577cb8/modules/milvus/testcontainers/milvus/__init__.py#L47 But I think the core problem is probably that we're already in a docker container in our CI environment and docker-in-docker can be problematic. Here is what I'd probably recommend: 1) In `setupClass`, if setting up the containers fails, skip the test. This isn't ideal, but it will allow us to make progress (and we can manually verify the tests for now) 2) In a future PR, we can add a workflow to execute tests which don't run correctly on self-hosted. This would involve: - Adding a marker like: `no_self_hosted` https://github.com/apache/beam/blob/b74c49602162cd752ae27d67481eed44dcfd06ea/sdks/python/pytest.ini#L32 - Adding a gradle task to run any tests with that marker (installing deps along the way). Initially this could just be milvus tests, but this is a problem I've seen elsewhere as well - Adding a new github workflow to execute those that can run on ubuntu-latest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org