Den Jan 16, 2014 kl. 9:47 AM skrev Garth N. Wells: > On 2014-01-15 22:36, Mikael Mortensen wrote: >> I have had a few testruns and compared my memory usage to a >> Navier-Stokes solver from OpenFOAM. Some results thus far for a box of >> size 40**3 are (using /proc/pid/statm to measure memory usage, see >> https://github.com/mikaem/fenicstools/blob/master/fem/getMemoryUsage.cpp >> [2]) >> 1 CPU >> OpenFOAM: 324 MB RSS (443 MB VM) >> Fenics: 442 MB RSS (mesh 16 MB) + 60 MB for importing dolfin (1.0 GB >> VM) >> 2 CPUs >> OpenFOAM: 447 MB (1.6 GB VM) >> Fenics: 520 MB (mesh 200 MB) + 120 MB for importing dolfin (1.7 GB VM) >> 4 CPUs >> OpenFOAM: 540 MB RSS (2.0 GB VM) >> Fenics: 624 MB (mesh 215 MB) + 250 MB for importing dolfin (3.0 GB VM) > > Looks pretty good.
> >> This is after Garth's removal of the D--0--D connectivity and I'm not >> attempting to clean out topology or anything after the mesh has been >> created. I'm actually very happy with these results:-) More CPUs >> require more memory, but it increases more for OpenFOAM than for >> Fenics and the difference is not all that much. Importing dolfin is a >> considerable part of the total cost, but the cost does not increase >> with mesh size so it's not of much concern (also, this depends on how >> many dependencies you carry around). I added virtual memory in >> parenthesis, but I'm not sure what to make of it. >> Some other things I have learned (very) recently: >> 1) Clearing out memory during runtime is apparently not straight >> forward because, in an effort to optimize speed (allocation is slow), >> the system may choose to keep you previously allocated memory close >> (in RAM). Even with the swap trick the operating system may actually >> choose to keep the allocated memory available and not free it (see, >> e.g., >> http://stackoverflow.com/questions/6526983/free-memory-from-a-long-vector >> [3]). Apparently there is a compiler option GLIBCXX_FORCE_NEW that is >> intended to disable this caching of memory, but I have not been able >> to make it work. With the swap trick memory is definitely freed quite >> efficiently, but it is frustrating trying to figure out where memory >> goes when we are not in total control of allocation and deallocation. >> One example - I use project to initialize a Function. During project >> rss memory increases substantially due to, e.g., a matrix being >> created behind the scenes. One would expect that this memory is freed >> after finishing, but that does not always happen. The additional >> memory may be kept in RAM and used later. If, for example, I create a >> matrix just after project, then memory use may actually not increase >> at all, because the OS can just reuse the memory it did not free. I >> guess this is something one has to live with using dynamic memory >> allocation? I really know very little of how these things works, so it >> would be great if someone who does could comment? > > I wouldn't worry about this. It can be hard to measure memory via the OS > reporting tools because of the way the OS manages memory, as you describe > above. Just leave it to the OS. You could try KCachegrind to measure and > visualise where memory is being used in DOLFIN. I'll check it out. > >> 2) Both MeshGeometry.clear() and MeshTopology.clear() should probably >> make use of the swap trick instead of std::vector<>.clear(). At least >> then the memory ´may´ be freed when clearing the mesh. > > We could bite the bullet and start using C++11. It has a shrink_to_fit > function: > > http://en.cppreference.com/w/cpp/container/vector/shrink_to_fit > > This would be useful when we clear a container and when we know that no more > entries will be added. I think shrink_to_fit will give the same performance, i.e. memory may be freed or not. At least http://www.cplusplus.com/reference/vector/vector/shrink_to_fit/ states that the operation may cause a reallocation. > >> 3) Memory use for creating a parallel UnitCubeMesh(40, 40, 40) goes >> like this: >> 16 MB in creating the serial mesh >> 56 MB in creating LocalMeshData class >> 41 MB in creating cell partition in >> MeshPartitioning::build_distributed_mesh >> 101 MB in creating mesh from LocalMeshData and cell partitioning in >> MeshPartitioning::build >> No memory is freed after the mesh has been created, even though it >> looks to me like LocalMeshData runs out of scope? > > LocalMeshData should be destroyed. Take a look with KCachegrind. There is > also gperftools: > > https://code.google.com/p/gperftools/ > > for memory profiling. > >> Mikael >> PS Not really relevant to this test, but my NS-solver is 20% faster >> than OpenFOAM for this channel-flow test-case:-) > > Nice. On the same mesh? Does your solver have an extra order of accuracy over > OpenFOAM? Ups. Forgot to mention that OpenFOAM uses hex-mesh, which should explain much of the difference in memory. Same number of dofs, same order of accuracy (P1P1) in space. OpenFOAM solver is purely implicit though, whereas mine is Crank-Nicolson. This test was mainly performed to check sanity of memory use in moving from serial to parallel and that looks good. OpenFOAM solver would also benefit from using better solvers. M > > Garth > > >> Den Jan 15, 2014 kl. 10:48 AM skrev Garth N. Wells: >>> On 2014-01-15 07:13, Anders Logg wrote: >>>> On Tue, Jan 14, 2014 at 08:08:50PM +0000, Garth N. Wells wrote: >>> On 2014-01-14 19:28, Anders Logg wrote: >>>>> On Tue, Jan 14, 2014 at 05:19:55PM +0000, Garth N. Wells wrote: >>>>>> On 2014-01-14 16:24, Anders Logg wrote: >>>>>>> On Mon, Jan 13, 2014 at 07:16:01PM +0000, Garth N. Wells wrote: >>>>>>>> On 2014-01-13 18:42, Anders Logg wrote: >>>>>>>>> On Mon, Jan 13, 2014 at 12:45:11PM +0000, Garth N. Wells >>>> wrote: >>>>>>>>>> I've just pushed some changes to master, which for >>>>>>>>>> from dolfin import * >>>>>>>>>> mesh = UnitCubeMesh(128, 128, 128) >>>>>>>>>> mesh.init(2) >>>>>>>>>> mesh.init(2, 3) >>>>>>>>> g> >>>>>>>>>> give a factor 2 reduction in memory usage and a factor 2 >>>> speedup. >>>>>>>>>> Change is at >>>>>>>>>> https://bitbucket.org/fenics-project/dolfin/commits/8265008 >>>> [1]. >>>>>>>>>> The improvement is primarily due to d--d connectivity no >>>> being >>>>>>>>>> computed. I'll submit a pull request to throw an error when >>>> d--d >>>>>>>>>> connectivity is requested. The only remaining place where >>>> d--d is >>>>>>>>>> (implicitly) computed is in the code for finding constrained >>>> mesh >>>>>>>>>> entities (e.g., for periodic bcs). The code in question is >>>>>>>>>> for (MeshEntityIterator(facet, dim) e; . . . .) >>>>>>>>>> when dim is the same as the topological dimension if the >>>> facet. As >>>>>>>>>> in other places, it would be consistent (and the original >>>> intention >>>>>>>>>> of the programmer) if this just iterated over the facet >>>> itself >>>>>>>>>> rather than all facets attached to it via a vertex. >>>>>>>>> I don't see why an error message is needed. Could we not just >>>> add the >>>>>>>>> possibility to specify what d -- d means? It might be useful >>>> for other >>>>>>>>> algorithms to be able to compute that data. >>>>>>>> I think it should be removed because it's (i) ad-hoc, (ii) is >>>> not >>>>>>>> used/required in the library and (iii) is a memory monster. >>>>>>>> Moreover, we have dimension-independent algorithms that work >>>> when >>>>>>>> d--d connectivity is a connection from an entity to itself >>>> (for >>>>>>>> which we don't need computation and memory eating). We >>>> shouldn't >>>>>>>> have an unnecessary, memory monster d-0-d data structure being >>>>>>>> created opaquely for no purpose, which is what what >>>>>>>> for (MeshEntityIterator e(entity, dim); . . . .){....} >>>>>>>> does at present when (dimension of entity) = dim. The >>>> excessive >>>>>>>> memory usage is an issue for big problems. >>>>>>>> If a user wants d-0-d it can be built explicitly, which makes >>>> both >>>>>>>> the purpose and the intention clear. >>>>>>> How can it be built explicitly without calling mesh.init(d, d)? >>>>>> Build d--0, then 0--d: >>>>>> for (MeshEntityIterator e0(mesh, d); .....) >>>>>> for (VertexIterator v(*e0); ....) >>>>>> for (MeshEntityIterator e2(*v, d); .....) >>>>> Yes, that's even more explicity but since we already have a >>>> function >>>>> that computes exactly that, I don't see the urge to remove it. >>>>>> Note that d--0--d is no longer used in DOLFIN (except by >>>> accident in >>>>>> finding periodic bcs because of inconsistent behind-the-scenes >>>>>> behaviour, and it should be changed because it's expensive and >>>> not >>>>>> required for periodic bcs). >>>>> Not true - see line 67 of Extrapolation.cpp. >>>> That code is not covered by any test or demo. >>> Yes it is, by a unit test under test/unit/adaptivity/ and all the >>> auto-adaptive demos, including the documented demo >>> auto-adaptive-poisson. >> OK. I don't know why it didn't trigger a failure before when I added >> an error when d0==d1 and ran the tests. >>>>> Having easy access to >>>>> neighboring cells (however one chooses to define what it means to >>>> be a >>>>> cell neighbor of a cell) can be useful to many algorithms that >>>> perform >>>>> local search algorithms or solve local problems. >>>> Sure, but again the user should decide how a connection is >>>> defined. >>> Yes, I agree. >>>>> Again, I don't see the urge to remove this function just because >>>> it >>>>> consumes a lot of memory. The important point is to avoid using >>>> it for >>>>> standard algorithms in DOLFIN, like boundary conditions. >>>> That's not a good argument. If poor/slow/memory hogging code is >>>> left >>>> in the library, it eventually gets used in the library despite any >>>> warnings. It's happened time and time again. >>> I have never understood this urge to remove all functionality that >>> can >>> potentially be misused. We are adults so it should be enough to >>> write >>> a warning in the documentation. >> This has proven not to be enough in the past. Moreover, it is good to >> make it easy for users to write good code by not providing functions >> to are easily misused, and it's good to make it easy to write code >> that doesn't only work well for small serial cases. >> Garth >>>>> The proper >>>>> way to fix functions that should not use this particular function >>>> (in >>>>> for example periodic bcs) would be to create an issue for its >>>> misuse. >>>> The periodic bc code does not mis-use the function. The flaw is >>>> that >>>> the mesh library magically and unnecessarily creates a memory >>>> monster behind the scenes. >>> I agree with that. >>> -- >>> Anders >> _______________________________________________ >> fenics mailing list >> [email protected] >> http://fenicsproject.org/mailman/listinfo/fenics >> Links: >> ------ >> [1] https://bitbucket.org/fenics-project/dolfin/commits/8265008 >> [2] https://github.com/mikaem/fenicstools/blob/master/fem/getMemoryUsage.cpp >> [3] http://stackoverflow.com/questions/6526983/free-memory-from-a-long-vector > _______________________________________________ > fenics mailing list > [email protected] > http://fenicsproject.org/mailman/listinfo/fenics _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
