Hi Ruouchun, this might be a question that is hard to answer and I'm more or less just looking for some ideas that might solve my issue. I've been working on modifying the code in such a way that I can target a GPU device in the configuration process of the DEM-Engine. If I'm just running a test script the correct device is getting allocated and is used; the simulation is running without crashing.
You can find my fork of the project here: https://github.com/jtbreis/DEM-Engine.git When combining it with my solver, the solver keeps crashing with the same error and in the same location: *what(): GPU Assertion: an illegal memory access was encountered. This happened in /tmp/chrono-dem/src/algorithms/DEMCubWrappers.cu:69* In the simulation the first contacts between particles occur. I've already had some ideas but they didn't really get me anywhere. Maybe you've some ideas for me, or are aware of a reason why the code would crash in that location. Julian On Monday, June 10, 2024 at 5:10:03 PM UTC+2 Julian Reis wrote: > Hi Ruochun, > > I've been trying to find something but haven't been successful so far. > > I'm working on an SPH solver that is based on OpenFPM ( > http://openfpm.mpi-cbg.de). Since all the CUDA calls are basically > running through OpenFPM, I'm assuming that there has to be an issue there > somewhere. > Also, after I removed all the explicitly set solver settings, the > simulation already crashes during the setup with the following error. > terminate called after throwing an instance of 'std::runtime_error' > what(): GPU Assertion: an illegal memory access was encountered. This > happened in /tmp/chrono-dem/src/algorithms/DEMCubContactDetection.cu:384 > Based on this error, you're probably right about some device > synchronization behavior. > > I'm going to keep looking for a solution but since I'm quite new to GPU > computing this could probably take some time. > > Julian > > On Saturday, June 8, 2024 at 12:26:47 PM UTC+2 Ruochun Zhang wrote: > >> Hi Julian, >> >> That is a possibility and it's interesting. DEME does automatically use >> up to 2 available GPUs the user doesn't control this behavior, yet it >> creates, uses, and synchronizes its own two GPU streams, so I don't think >> it will affect other GPU-bound applications running simultaneously. >> However, admittedly I never tested running another GPU-bound applications >> as a part of the co-simulation, and maybe I should, in due time. It >> obviously can be an interplay, but I am of course more inclined to guess >> that it's because of some inappropriate device synchronization calls from >> the other application in question. >> >> Please let us know what you find. And if you can let us know what this >> application you ran alongside was, maybe we can help better. >> >> Thank you, >> Ruochun >> >> On Friday, June 7, 2024 at 11:15:13 PM UTC+8 [email protected] wrote: >> >>> Hi Ruochun, >>> Thank you for your help again. >>> After doing some more testing and running my test case independently >>> from my code, I found that it was not a problem of the setup. >>> >>> My simulation seems to crash because of an unknown interaction with my >>> code. My code is also using the same GPU for calculations and apparently >>> there seems to be problem there. >>> When I run the setup and exclude all the other GPU calls, the simulation >>> runs without any problems. So there has to be a problem there somewhere... >>> I know this is a problem that I probably have to figure out by myself. >>> My only question would be if you maybe have any experiences with >>> co-simulations with other GPU solvers. >>> >>> Julian >>> >>> On Thursday, June 6, 2024 at 5:38:17 PM UTC+2 Ruochun Zhang wrote: >>> >>>> Hi Julian, >>>> >>>> Let me get a couple of things out of the way first. >>>> >>>> 1. Quite often you see "illegal memory access" errors in DEME when the >>>> simulation diverges very badly. If it diverges only a bit badly, you are >>>> more likely to see "velocity too high" or "too many spheres in bin" >>>> errors. >>>> I don't fully understand the mechanism but it is empirically so. >>>> 2. That "233 contacts were active at time 0.192906 on dT, but they are >>>> not detected on kT, therefore being removed unexpectedly!" is in fact a >>>> warning and you can turn it off. But it does indicate the simulation is >>>> already not running normally at that point. >>>> 3. You usually don't have to worry about the bin size, or explicitly >>>> set it. It will be selected and adapted during the simulation (unless you >>>> turn this functionality off). A bin can at least hold 32768 spheres and it >>>> should have enough time to adapt if the number of spheres per bin is >>>> raising alarmingly. So if the initial bin size does matter in how long >>>> your >>>> simulation can run, the simulation is probably escalating quickly from the >>>> start anyway, and you should worry about other things. >>>> >>>> The information you gave allows me to make some guesses about the >>>> cause, but not much more than that -- especially when the parameters you >>>> showed seem reasonable. I suspect that this is due to an invisible >>>> boundary >>>> at an unexpected location. I suggest you debug using the following >>>> procedure: >>>> 1. Remove all analytical boundaries (no automatic box domain boundaries >>>> ("none" option), no analytical objects etc.) and the mesh, then run the >>>> simulation. Make sure you see the particles free fall in space without a >>>> problem. >>>> 2. Then add your meshed box back to the simulation. See if the >>>> particles can make contact with it normally. >>>> 3. Revert everything back to the original and see if it runs. >>>> >>>> This should help you isolate the problem: which wall is affecting? >>>> >>>> Thank you, >>>> Ruochun >>>> On Wednesday, June 5, 2024 at 10:02:58 PM UTC+8 [email protected] >>>> wrote: >>>> >>>>> [image: Screenshot 2024-06-05 at 15.56.14.png]Hi Ruochun, >>>>> Thank you for that suggestion, this implementation will work for me >>>>> for now. >>>>> >>>>> I've been trying to run some simulations but my simulations keep >>>>> crashing. >>>>> It is difficult to share a code snippet because I've already >>>>> abstracted the DEME calls in my code. >>>>> >>>>> My basic test setup right now looks as follows tough: >>>>> - cube represented as a mesh which I'm planning to use as my >>>>> boundaries (also tried the same with using the >>>>> BoxDomainBoundaryConditions), the normals are pointing inwards >>>>> accordingly >>>>> (1.4 in each direction) >>>>> - the cube inside is filled with clumps, simple spheres with diameter >>>>> 0.04 and a spacing of 0.1 >>>>> - Mat properties: {"E", 1e9}, {"nu", 0.33}, {"CoR", 0.8}, {"mu", 0.3}, >>>>> {"Crr", 0.00} >>>>> - timestep: 1e-6 >>>>> - initial bin size: 1.0 >>>>> (I had problems when I was not settings the initial bin size, the >>>>> simulation already crashed during initialisation. Then I saw the comment >>>>> of >>>>> setting the bin size to 25*granular radius in the DEMdemo_Mixer file. >>>>> Then >>>>> the simulation kept running for a while.) >>>>> - max CDUpdateFreq: 20 >>>>> >>>>> Eventually the simulation crashes with the following error: >>>>> // 233 contacts were active at time 0.192906 on dT, but they are not >>>>> detected on kT, therefore being removed unexpectedly! >>>>> // terminate called recursively >>>>> // terminate called after throwing an instance of 'std::runtime_error' >>>>> // GPU Assertion: an illegal memory access was encountered. This >>>>> happened in /tmp/chrono-dem/src/DEM/dT.cpp:1941 (also happened at >>>>> different >>>>> locations in dT.cpp) >>>>> >>>>> I've been trying to tweak some of the parameters but couldn't find a >>>>> set of reasonable parameters. Do you maybe have any suggestions on what >>>>> could be wrong? >>>>> I've added an screenshot from the last output I could get from the >>>>> simulation, for me it looks fine until then. I've also added my particle >>>>> output file and mesh file, it's the output from my code so it's a H5 file >>>>> if that is of any help. >>>>> >>>>> Also I don't really understand how I should set the initial bin size, >>>>> can you maybe give me some insight on how this parameter affects the >>>>> simulation? >>>>> >>>>> Thank you for your help, so far! >>>>> Julian >>>>> >>>>> On Wednesday, May 15, 2024 at 11:22:41 AM UTC+2 Ruochun Zhang wrote: >>>>> >>>>>> To achieve what you need, there might be an easy way with the current >>>>>> code. First, know that you can change the time step size by calling >>>>>> *UpdateStepSize*. You can replace long *DyDynamics* calls with >>>>>> step-by-step calls to circumvent the problem. That is, replacing >>>>>> >>>>>> >>>>>> *my_tracker->AddAcc(...);* >>>>>> *DEMSim.DoDynamics(a_long_time);* >>>>>> >>>>>> with >>>>>> >>>>>> *DEMSim.UpdateStepSize(current_stepsize);* >>>>>> *for (double t = 0.; t < a_long_time; t+=current_stepsize) {* >>>>>> * my_tracker->AddAcc(...);* >>>>>> * DEMSim.DoDynamics(**current_stepsize**); * >>>>>> >>>>>> *}* >>>>>> You may be concerned about the performance and indeed, transferring >>>>>> an array to the device at each step will take its toll, but it's >>>>>> probably >>>>>> not that bad considering how heavy each DEM step is anyway (I may add >>>>>> another utility that applies a persistent acceleration later on). On the >>>>>> other hand, splitting a *DoDynamics *call into multiple pieces in a >>>>>> for loop alone, should affect the performance little, so you should not >>>>>> be >>>>>> worried. This way, it should be safe to advance the fluid simulation for >>>>>> several time steps and then advance the DEM simulation by one step. In >>>>>> fact, I do this in my co-simulations just fine. >>>>>> >>>>>> A note: In theory *UpdateStepSize *should only be used from a >>>>>> synchronized solver stance, meaning after a *DoDynamicsThenSync* >>>>>> call, because the step size is used to determine how proactive the >>>>>> contact >>>>>> detection has to be. But if your step size change is a micro tweak, then >>>>>> you should be able to get away with it even if it follows asynchronized >>>>>> calls aka *DoDynamics*. >>>>>> >>>>>> As for the call duration being smaller than the step size (but larger >>>>>> than 0): This is a good question. Right now it should still advance the >>>>>> simulation by a time step, which puts the simulation time ahead of what >>>>>> you >>>>>> would expect. So it's better to *UpdateStepSize *as needed to stay >>>>>> safe. This behavior might be improved later. >>>>>> >>>>>> Thank you, >>>>>> Ruochun >>>>>> >>>>>> On Wednesday, May 15, 2024 at 6:27:26 AM UTC+8 [email protected] >>>>>> wrote: >>>>>> >>>>>>> Thank you for your fast reply, you've been very helpful already. >>>>>>> >>>>>>> I'm using the trackers to track granular particles inside a fluid >>>>>>> flow. >>>>>>> Thank you for pointing out the difference in time step size and the >>>>>>> time duration of the DoDynamics call, I'm pretty sure that is where my >>>>>>> error is coming from. >>>>>>> Since we're using an adaptive time stepping for the fluid >>>>>>> simulation, it can happen that the time step for the flow varies >>>>>>> throughout >>>>>>> the simulation. For this reason I'm running the DoDynamics call with >>>>>>> the >>>>>>> time step size of the fluid simulation. Usually the time step for the >>>>>>> flow >>>>>>> is much smaller than the DEM time step. (*additional question towards >>>>>>> the >>>>>>> end) >>>>>>> It would be possible to add an additional time step criterion based >>>>>>> on the DEM simulation on the fluid side, but this would probably result >>>>>>> in >>>>>>> unnecessary long simulations, since we haven't fully coupled the system >>>>>>> yet. >>>>>>> >>>>>>> So when I'm passing the states of my particles, I want them to move >>>>>>> according to the forces of the fluid. The problem I observed is exactly >>>>>>> what you described, basically I'm just applying a short acceleration in >>>>>>> the >>>>>>> first DEM time step but after that the particle is not further >>>>>>> accelerated >>>>>>> by that force. I was able to recreate some experimental results by >>>>>>> pre-calculating the resulting velocities from the acceleration but this >>>>>>> is >>>>>>> definitely not a long term solution... >>>>>>> >>>>>>> For this particular case it would be handy for if the acceleration >>>>>>> is cleared again after a DoDynamics call, and stays constant over the >>>>>>> time >>>>>>> steps of the DoDynamics call. >>>>>>> Is this something that would be easy for me to tweak in the code? Or >>>>>>> do you maybe have an alternative suggestion for me? >>>>>>> >>>>>>> * additional question: I don't know if this will ever be the case in >>>>>>> my simulation but what would happen if the DoDynamics duration is >>>>>>> smaller >>>>>>> then the DEM time step? >>>>>>> >>>>>>> Thank you, Julian >>>>>>> >>>>>>> On Tuesday, May 14, 2024 at 7:03:59 PM UTC+2 Ruochun Zhang wrote: >>>>>>> >>>>>>>> Hi Julian, >>>>>>>> >>>>>>>> Glad that you are able to move on to doing co-simulations. >>>>>>>> >>>>>>>> If you use a tracker to add acceleration to some owners, then it >>>>>>>> affects only the next time step. This is to be consistent with other >>>>>>>> tracker Set methods (such as SetPos) because, well, they technically >>>>>>>> only >>>>>>>> affect the simulation once, too. This is also because setting >>>>>>>> acceleration >>>>>>>> with trackers is assumed to be used in a co-simulation, and in this >>>>>>>> case, >>>>>>>> the acceleration probably changes at each step. If the acceleration >>>>>>>> modification was to affect indefinitely then it would be the user's >>>>>>>> responsibility to deactivate it once it is not needed. Of course, this >>>>>>>> is >>>>>>>> not necessarily the best or most intuitive design choice and I am open >>>>>>>> to >>>>>>>> suggestions. >>>>>>>> >>>>>>>> The acceleration prescription can only be added before >>>>>>>> initialization because it is just-in-time compiled into the CUDA >>>>>>>> kernels to >>>>>>>> make it more efficient. They are expected to be non-changing during >>>>>>>> the >>>>>>>> simulation and, although fixed prescribed motions are very common in >>>>>>>> DEM >>>>>>>> simulation, they are perhaps not suitable to be used in >>>>>>>> co-simulations. >>>>>>>> >>>>>>>> If in your test case the added acceleration seems to have no >>>>>>>> effect, then it's likely that it is too small, or the DoDynamics is >>>>>>>> called >>>>>>>> with a time length that is significantly larger than the time step >>>>>>>> size. If >>>>>>>> this is not the case and you suspect it is due to a bug, please >>>>>>>> provide a >>>>>>>> minimal reproducible example so I can look into it. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Ruochun >>>>>>>> >>>>>>>> On Monday, May 13, 2024 at 9:05:34 PM UTC+8 [email protected] >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Ruochun, >>>>>>>>> I've upgraded my hardware and now everything is working fine. >>>>>>>>> >>>>>>>>> I'm trying to run a co-simulation with the DEM-Engine where it >>>>>>>>> would be necessary to pass the acceleration for each particle to the >>>>>>>>> simulation. >>>>>>>>> From the code, I've seen that there are two options, either by >>>>>>>>> adding an acceleration or using a prescribed force/acceleration. >>>>>>>>> >>>>>>>>> If I read the comments from the code correctly, the acceleration >>>>>>>>> is only added for the next time step but not constant over the >>>>>>>>> DoDynamics >>>>>>>>> call? >>>>>>>>> From my tests it looks like the acceleration has no effect on the >>>>>>>>> trajectory of my particle. >>>>>>>>> On the other hand, the prescribed acceleration can only be added >>>>>>>>> during the initialisation, and not between DoDynamics calls. >>>>>>>>> >>>>>>>>> Is there an option to add an acceleration to a particle that >>>>>>>>> affects the particle over the whole DoDynamics call? >>>>>>>>> >>>>>>>>> Thank you for help >>>>>>>>> Julian >>>>>>>>> On Friday, March 29, 2024 at 9:23:29 PM UTC+1 Ruochun Zhang wrote: >>>>>>>>> >>>>>>>>>> Hi Julian, >>>>>>>>>> >>>>>>>>>> I see. The minimum CC tested was 6.1 (10 series). 9 and 10 series >>>>>>>>>> are a big jump and DEME is a new package that uses newer CUDA >>>>>>>>>> features a >>>>>>>>>> lot. Most likely GTX 970 is not going to support them. Quite a good >>>>>>>>>> reason >>>>>>>>>> to get an upgrade I would say, no? >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Ruochun >>>>>>>>>> On Saturday, March 30, 2024 at 3:38:40 AM UTC+8 >>>>>>>>>> [email protected] wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ruochun, >>>>>>>>>>> Thank you for your answer and trying to help me. >>>>>>>>>>> I have been able to run a simulation in the container using the >>>>>>>>>>> same image on another GPU machine (a cluster with several NVIDIA >>>>>>>>>>> RTX 2080Ti >>>>>>>>>>> w/ 12GB). >>>>>>>>>>> When I'm trying to run a simulation on my local machine, that >>>>>>>>>>> I'm using for development purposes with a (NVIDIA GTX 970 w/ 4GB) >>>>>>>>>>> the >>>>>>>>>>> simulation crashes. >>>>>>>>>>> I also tried to run the simulation outside of a container, and >>>>>>>>>>> the simulation still crashes with the same error. Also other >>>>>>>>>>> projects using >>>>>>>>>>> CUDA do run on my local machine. >>>>>>>>>>> Both machines the cluster and local machine run the exact same >>>>>>>>>>> CUDA and NVIDIA drivers, so I'm assuming running the simulation >>>>>>>>>>> inside the >>>>>>>>>>> Docker Container is not the issue. >>>>>>>>>>> >>>>>>>>>>> I'm assuming that there is an issue with the compute >>>>>>>>>>> capabilities of my local GPU, is there any kind of minimum hardware >>>>>>>>>>> requirements? >>>>>>>>>>> >>>>>>>>>>> Julian >>>>>>>>>>> >>>>>>>>>>> On Friday, March 29, 2024 at 7:57:49 PM UTC+1 Ruochun Zhang >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Just to be clear, DEM-Engine runs on a single GPU as well and >>>>>>>>>>>> there is no difference other than being (around) half as fast. >>>>>>>>>>>> >>>>>>>>>>>> Ruochun >>>>>>>>>>>> >>>>>>>>>>>> On Friday, March 29, 2024 at 10:58:18 PM UTC+8 >>>>>>>>>>>> [email protected] wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I was able to run a simulation on a different GPU setup, using >>>>>>>>>>>>> 2 GPUS. Is it not possible to run the DEM-Engine on a single GPU? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thursday, March 28, 2024 at 4:55:44 PM UTC+1 Julian Reis >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've tried to setup a Docker Container for the DEM-Engine >>>>>>>>>>>>>> using the nvidia/cuda-12.0.1-devel-ubuntu22.04 as a base >>>>>>>>>>>>>> image. >>>>>>>>>>>>>> I followed the compile instructions from the github-repo and >>>>>>>>>>>>>> the code compiles fine. >>>>>>>>>>>>>> When I'm trying to run any of the test-cases tough, the >>>>>>>>>>>>>> simulation crashes with the following error: >>>>>>>>>>>>>> Bus error (core dumped) >>>>>>>>>>>>>> Right after the following outputs for the demo file >>>>>>>>>>>>>> SingleSphereCollide: >>>>>>>>>>>>>> These owners are tracked: 0, >>>>>>>>>>>>>> Meshes' owner--offset pairs: {1, 0}, {2, 1}, >>>>>>>>>>>>>> kT received a velocity update: 1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Are you aware of any problems like this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Julian >>>>>>>>>>>>>> >>>>>>>>>>>>> -- You received this message because you are subscribed to the Google Groups "ProjectChrono" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/projectchrono/bc591a57-3337-425d-a9ba-bcc5eff41621n%40googlegroups.com.
