Baunsgaard commented on PR #1953: URL: https://github.com/apache/systemds/pull/1953#issuecomment-1871229023
> What exactly is the issue with running multiple workers in the same JVM? The buffer pool anyway assigns a unique cache file ID. I was probably wrong in my original guess, and I do not know the internals for the pool enough to say if that was it, but we tend for the workers to use up the resources, and having multiple workers use the same source file, and cache the reading of the same path and file seems a likely course to me as any. Especially if one of the workers fails and crashes then we do not get the error out , and instead, the test just runs forever until timeout. It seems like GitHub changed some settings for the workers that provoked this. > I always found the multi-threaded federated tests to be more stable than the multi-process tests. I agree with this statement. especially now when I am trying to convert them all to multi-process. I found out today that Microsoft reduced the memory from 8GB to 7GB for the default nodes that we tested with. We allocate 3gXmx for each worker and many tests use 4 workers. there is nothing written to the log if a worker crashes and the tests are not written in ways that inform us if this happens it just times out. https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners > I could imagine that using multiple ports for all workers in the same JVM might have been an issue in this changed test environment. Can be that they limited the range of ports allowed to be used. but I do not think so since we choose random ports in the allowed range, and I have disabled the parallel running of multiple federated tests at the same time. I am trying to make the tests more robust to crashing workers, and hopefully, this fixes the problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
