Baunsgaard commented on PR #1953:
URL: https://github.com/apache/systemds/pull/1953#issuecomment-1871229023

   > What exactly is the issue with running multiple workers in the same JVM? 
The buffer pool anyway assigns a unique cache file ID.
   
   I was probably wrong in my original guess, and I do not know the internals 
for the pool enough to say if that was it, but we tend for the workers to use 
up the resources, and having multiple workers use the same source file, and 
cache the reading of the same path and file seems a likely course to me as any. 
Especially if one of the workers fails and crashes then we do not get the error 
out , and instead, the test just runs forever until timeout. It seems like 
GitHub changed some settings for the workers that provoked this.
   
   > I always found the multi-threaded federated tests to be more stable than 
the multi-process tests.
   
   I agree with this statement. especially now when I am trying to convert them 
all to multi-process.  I found out today that Microsoft reduced the memory from 
8GB to 7GB for the default nodes that we tested with. We allocate 3gXmx for 
each worker and many tests use 4 workers. there is nothing written to the log 
if a worker crashes and the tests are not written in ways that inform us if 
this happens it just times out. 
   
   
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners
   
   
   > I could imagine that using multiple ports for all workers in the same JVM 
might have been an issue in this changed test environment.
   
   Can be that they limited the range of ports allowed to be used. but I do not 
think so since we choose random ports in the allowed range, and I have disabled 
the parallel running of multiple federated tests at the same time.
   
   I am trying to make the tests more robust to crashing workers, and 
hopefully, this fixes the problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to