Hi Lars, Did you check to make sure that none of your surfaces has 100% or greater reflection? This can throw ray processing into a loop.
If you compile with the "-g" in place of "-O" and terminate the process when it's stuck with a "kill -QUIT" signal, you should be able to get a backtrace to find out where the process was hung. Cheers, -Greg P.S. I did introduce a change to the ray queuing code used by both rtrace and rcontrib that should prevent runaway memory growth, but it won't prevent an infinite loop. Those are the worst. > From: "Lars O. Grobe" <[email protected]> > Date: May 4, 2018 8:43:55 AM PDT > > Hi, > > a quick follow-up just to clarify - I cannot proof that the stop / cont > signals caused the completion of the task, it may have been coincidence... > Right now I have a never-ending rcontrib-process again, and stop / cont does > not help this time. > > Cheers, > Lars. > >> Hi Greg, Jan, >> >> I just observed a similar problem with rcontib. I am running a chain of >> vwrays, rtrace, awk, rfluxmtx to calculate daylight coefficients in an image >> region (rtrace returns view origin, direction and modifier, and awk filters >> so that rays are passed into rfluxmtx only if a defined modifier is hit). >> This in general works pretty well, even with 38 processes in parallel, but I >> just had one rcontrib process stuck at 100% CPU (no memory effects though). >> Issuing a kill -stop PID; kill -cont PID sequence on the rcontrib process >> made it immediately complete the task. The ambient file can be excluded here >> as a cause, since rcontrib does not utilize the ambient cache. This is all >> on non-networked filesystems, ubuntu linux. >> >> Cheers, Lars. >> >>> Hi Jan, >>> >>> This could be an unexpected "hang" condition with one of the rtrace >>> processes, where a single ray evaluation is blocked waiting for access to >>> the ambient file, while the other processes continue computing away, >>> filling up the queue with results after the blocked one. I could see this >>> becoming a runaway memory condition, but I don't know why a process would >>> be blocked. NFS file locking is used on the ambient file, and this has >>> been known to fail on some Linux builds, but I haven't seen it fail by >>> refusing to unlock. (The problem in the past has been unlocking when it >>> shouldn't.) >>> >>> If you can monitor your processes, watching for when the parent becomes >>> large, then stop all the child processes (kill -stop pid1 pid2 ...), >>> restarting them one by one using (kill -continue pidN). If the parent >>> process starts to shrink after that, or at least doesn't continue to grow, >>> then this would support my hypothesis. >>> >>> The other thing to look for is a child process with 0% CPU time. If none >>> of the child processes are hung, then I'm not sure why memory would be >>> growing in the parent. >>> >>> There's no sense trying to fix such an unusual problem until we have a >>> firmer idea of the cause. >>> >>> Cheers, >>> -Greg >>> >>>> From: Jan Wienold <[email protected]> >>>> Date: April 25, 2018 1:28:44 AM PDT >>>> >>>> Hi Greg, >>>> >>>> While doing the renderings for the VR of Kynthia we encountered >>>> "sometimes" (that means not 100% reproducible) problems with the memory. >>>> >>>> We rendered 4 images at the same time sharing an ambient file, each rtrace >>>> was using the -n 2 or -n 3 option. >>>> >>>> I made a screenshot of top and some of the processes. If you look at id >>>> 88263 it seems like the "mother-process" uses in total 41 GB (virt) !! - >>>> since some of our machines don't have a large swap space, some of these >>>> processes failed with "cannot allocate memory". I know that the Virt mem >>>> is not a real indicator for what is ever used, but from our 400 jobs we >>>> had around 10 failing with this issue. >>>> >>>> The "children" use around 800-900mb, so this is fine and what we expected. >>>> But we dont know how to estimate to total memory usage (lets say a single >>>> rtrace would need 500mb, I would have expected running -n 2 uses 1GB, but >>>> at least there is also the mother process, which size a bit unpredictable >>>> and sometimes exploding. >>>> >>>> This "growth" of the mother process happens always at the end of the >>>> images (lets say 90% finished). >>>> >>>> Interestingly when restarting the processes the fail never happened again >>>> (but I have to admit I didn't restart the simulation explicitly on the >>>> same machine, since I had a fully automized process, where the failed ones >>>> were automatically restarted on one of the 50 machines we had available.) >>>> >>>> Finally we finished all 400(!) renderings with a very good quality. >>>> >>>> So this is not an urgent issue, but we wanted to report this. Maybe you >>>> have some rules of thumb to calculate the memory usage when applying the >>>> -n option when the usage of a single process is known? >>>> >>>> best >>>> >>>> Jan >>>> >>>> -- >>>> Dr.-Ing. Jan Wienold >>> _______________________________________________ >>> Radiance-dev mailing list >>> [email protected] >>> https://www.radiance-online.org/mailman/listinfo/radiance-dev >> >> >> _______________________________________________ >> Radiance-dev mailing list >> [email protected] >> https://www.radiance-online.org/mailman/listinfo/radiance-dev > > > _______________________________________________ > Radiance-dev mailing list > [email protected] > https://www.radiance-online.org/mailman/listinfo/radiance-dev _______________________________________________ Radiance-dev mailing list [email protected] https://www.radiance-online.org/mailman/listinfo/radiance-dev
