Hi,
a quick follow-up just to clarify - I cannot proof that the stop / cont
signals caused the completion of the task, it may have been
coincidence... Right now I have a never-ending rcontrib-process again,
and stop / cont does not help this time.
Cheers,
Lars.
Hi Greg, Jan,
I just observed a similar problem with rcontib. I am running a chain
of vwrays, rtrace, awk, rfluxmtx to calculate daylight coefficients in
an image region (rtrace returns view origin, direction and modifier,
and awk filters so that rays are passed into rfluxmtx only if a
defined modifier is hit). This in general works pretty well, even with
38 processes in parallel, but I just had one rcontrib process stuck at
100% CPU (no memory effects though). Issuing a kill -stop PID; kill
-cont PID sequence on the rcontrib process made it immediately
complete the task. The ambient file can be excluded here as a cause,
since rcontrib does not utilize the ambient cache. This is all on
non-networked filesystems, ubuntu linux.
Cheers, Lars.
Hi Jan,
This could be an unexpected "hang" condition with one of the rtrace
processes, where a single ray evaluation is blocked waiting for
access to the ambient file, while the other processes continue
computing away, filling up the queue with results after the blocked
one. I could see this becoming a runaway memory condition, but I
don't know why a process would be blocked. NFS file locking is used
on the ambient file, and this has been known to fail on some Linux
builds, but I haven't seen it fail by refusing to unlock. (The
problem in the past has been unlocking when it shouldn't.)
If you can monitor your processes, watching for when the parent
becomes large, then stop all the child processes (kill -stop pid1
pid2 ...), restarting them one by one using (kill -continue pidN).
If the parent process starts to shrink after that, or at least
doesn't continue to grow, then this would support my hypothesis.
The other thing to look for is a child process with 0% CPU time. If
none of the child processes are hung, then I'm not sure why memory
would be growing in the parent.
There's no sense trying to fix such an unusual problem until we have
a firmer idea of the cause.
Cheers,
-Greg
From: Jan Wienold <jan.wien...@epfl.ch>
Date: April 25, 2018 1:28:44 AM PDT
Hi Greg,
While doing the renderings for the VR of Kynthia we encountered
"sometimes" (that means not 100% reproducible) problems with the
memory.
We rendered 4 images at the same time sharing an ambient file, each
rtrace was using the -n 2 or -n 3 option.
I made a screenshot of top and some of the processes. If you look at
id 88263 it seems like the "mother-process" uses in total 41 GB
(virt) !! - since some of our machines don't have a large swap
space, some of these processes failed with "cannot allocate memory".
I know that the Virt mem is not a real indicator for what is ever
used, but from our 400 jobs we had around 10 failing with this issue.
The "children" use around 800-900mb, so this is fine and what we
expected. But we dont know how to estimate to total memory usage
(lets say a single rtrace would need 500mb, I would have expected
running -n 2 uses 1GB, but at least there is also the mother
process, which size a bit unpredictable and sometimes exploding.
This "growth" of the mother process happens always at the end of the
images (lets say 90% finished).
Interestingly when restarting the processes the fail never happened
again (but I have to admit I didn't restart the simulation
explicitly on the same machine, since I had a fully automized
process, where the failed ones were automatically restarted on one
of the 50 machines we had available.)
Finally we finished all 400(!) renderings with a very good quality.
So this is not an urgent issue, but we wanted to report this. Maybe
you have some rules of thumb to calculate the memory usage when
applying the -n option when the usage of a single process is known?
best
Jan
--
Dr.-Ing. Jan Wienold
_______________________________________________
Radiance-dev mailing list
Radiance-dev@radiance-online.org
https://www.radiance-online.org/mailman/listinfo/radiance-dev
_______________________________________________
Radiance-dev mailing list
Radiance-dev@radiance-online.org
https://www.radiance-online.org/mailman/listinfo/radiance-dev
_______________________________________________
Radiance-dev mailing list
Radiance-dev@radiance-online.org
https://www.radiance-online.org/mailman/listinfo/radiance-dev