Re: [FEniCS] Memory consumption of mesh

Mikael Mortensen Thu, 16 Jan 2014 01:32:27 -0800

Den Jan 16, 2014 kl. 9:47 AM skrev Garth N. Wells:

> On 2014-01-15 22:36, Mikael Mortensen wrote:
>> I have had a few testruns and compared my memory usage to a
>> Navier-Stokes solver from OpenFOAM. Some results thus far for a box of
>> size 40**3 are (using /proc/pid/statm to measure memory usage, see
>> https://github.com/mikaem/fenicstools/blob/master/fem/getMemoryUsage.cpp
>> [2])
>> 1 CPU
>> OpenFOAM: 324 MB RSS (443 MB VM)
>> Fenics: 442 MB RSS (mesh 16 MB) + 60 MB for importing dolfin (1.0 GB
>> VM)
>> 2 CPUs
>> OpenFOAM: 447 MB (1.6 GB VM)
>> Fenics: 520 MB (mesh 200 MB) + 120 MB for importing dolfin (1.7 GB VM)
>> 4 CPUs
>> OpenFOAM: 540 MB RSS (2.0 GB VM)
>> Fenics: 624 MB (mesh 215 MB) + 250 MB for importing dolfin (3.0 GB VM)
> 
> Looks pretty good.


> 
>> This is after Garth's removal of the D--0--D connectivity and I'm not
>> attempting to clean out topology or anything after the mesh has been
>> created. I'm actually very happy with these results:-) More CPUs
>> require more memory, but it increases more for OpenFOAM than for
>> Fenics and the difference is not all that much. Importing dolfin is a
>> considerable part of the total cost, but the cost does not increase
>> with mesh size so it's not of much concern (also, this depends on how
>> many dependencies you carry around). I added virtual memory in
>> parenthesis, but I'm not sure what to make of it.
>> Some other things I have learned (very) recently:
>> 1) Clearing out memory during runtime is apparently not straight
>> forward because, in an effort to optimize speed (allocation is slow),
>> the system may choose to keep you previously allocated memory close
>> (in RAM). Even with the swap trick the operating system may actually
>> choose to keep the allocated memory available and not free it (see,
>> e.g.,
>> http://stackoverflow.com/questions/6526983/free-memory-from-a-long-vector
>> [3]). Apparently there is a compiler option GLIBCXX_FORCE_NEW that is
>> intended to disable this caching of memory, but I have not been able
>> to make it work. With the swap trick memory is definitely freed quite
>> efficiently, but it is frustrating trying to figure out where memory
>> goes when we are not in total control of allocation and deallocation.
>> One example - I use project to initialize a Function. During project
>> rss memory increases substantially due to, e.g., a matrix being
>> created behind the scenes. One would expect that this memory is freed
>> after finishing, but that does not always happen. The additional
>> memory may be kept in RAM and used later. If, for example, I create a
>> matrix just after project, then memory use may actually not increase
>> at all, because the OS can just reuse the memory it did not free. I
>> guess this is something one has to live with using dynamic memory
>> allocation? I really know very little of how these things works, so it
>> would be great if someone who does could comment?
> 
> I wouldn't worry about this. It can be hard to measure memory via the OS 
> reporting tools because of the way the OS manages memory, as you describe 
> above. Just leave it to the OS. You could try KCachegrind to measure and 
> visualise where memory is being used in DOLFIN.

I'll check it out.

> 
>> 2) Both MeshGeometry.clear() and MeshTopology.clear() should probably
>> make use of the swap trick instead of std::vector<>.clear(). At least
>> then the memory ´may´ be freed when clearing the mesh.
> 
> We could bite the bullet and start using C++11. It has a shrink_to_fit 
> function:
> 
>   http://en.cppreference.com/w/cpp/container/vector/shrink_to_fit
> 
> This would be useful when we clear a container and when we know that no more 
> entries will be added.

I think shrink_to_fit will give the same performance, i.e. memory may be freed 
or not. At least 
http://www.cplusplus.com/reference/vector/vector/shrink_to_fit/  states that 
the operation may cause a reallocation.

> 
>> 3) Memory use for creating a parallel UnitCubeMesh(40, 40, 40) goes
>> like this:
>> 16 MB in creating the serial mesh
>> 56 MB in creating LocalMeshData class
>> 41 MB in creating cell partition in
>> MeshPartitioning::build_distributed_mesh
>> 101 MB in creating mesh from LocalMeshData and cell partitioning in
>> MeshPartitioning::build
>> No memory is freed after the mesh has been created, even though it
>> looks to me like LocalMeshData runs out of scope?
> 
> LocalMeshData should be destroyed. Take a look with KCachegrind. There is 
> also gperftools:
> 
>    https://code.google.com/p/gperftools/
> 
> for memory profiling.
> 
>> Mikael
>> PS Not really relevant to this test, but my NS-solver is 20% faster
>> than OpenFOAM for this channel-flow test-case:-)
> 
> Nice. On the same mesh? Does your solver have an extra order of accuracy over 
> OpenFOAM?

Ups. Forgot to mention that OpenFOAM uses hex-mesh, which should explain much 
of the difference in memory. Same number of dofs, same order of accuracy (P1P1) 
in space. OpenFOAM solver is purely implicit though, whereas mine is 
Crank-Nicolson. This test was mainly performed to check sanity of memory use in 
moving from serial to parallel and that looks good. OpenFOAM solver would also 
benefit from using better solvers.

M

> 
> Garth
> 
> 
>> Den Jan 15, 2014 kl. 10:48 AM skrev Garth N. Wells:
>>> On 2014-01-15 07:13, Anders Logg wrote:
>>>> On Tue, Jan 14, 2014 at 08:08:50PM +0000, Garth N. Wells wrote:
>>> On 2014-01-14 19:28, Anders Logg wrote:
>>>>> On Tue, Jan 14, 2014 at 05:19:55PM +0000, Garth N. Wells wrote:
>>>>>> On 2014-01-14 16:24, Anders Logg wrote:
>>>>>>> On Mon, Jan 13, 2014 at 07:16:01PM +0000, Garth N. Wells wrote:
>>>>>>>> On 2014-01-13 18:42, Anders Logg wrote:
>>>>>>>>> On Mon, Jan 13, 2014 at 12:45:11PM +0000, Garth N. Wells
>>>> wrote:
>>>>>>>>>> I've just pushed some changes to master, which for
>>>>>>>>>> from dolfin import *
>>>>>>>>>> mesh = UnitCubeMesh(128, 128, 128)
>>>>>>>>>> mesh.init(2)
>>>>>>>>>> mesh.init(2, 3)
>>>>>>>>> g>
>>>>>>>>>> give a factor 2 reduction in memory usage and a factor 2
>>>> speedup.
>>>>>>>>>> Change is at
>>>>>>>>>> https://bitbucket.org/fenics-project/dolfin/commits/8265008
>>>> [1].
>>>>>>>>>> The improvement is primarily due to d--d connectivity no
>>>> being
>>>>>>>>>> computed. I'll submit a pull request to throw an error when
>>>> d--d
>>>>>>>>>> connectivity is requested. The only remaining place where
>>>> d--d is
>>>>>>>>>> (implicitly) computed is in the code for finding constrained
>>>> mesh
>>>>>>>>>> entities (e.g., for periodic bcs). The code in question is
>>>>>>>>>> for (MeshEntityIterator(facet, dim) e; . . . .)
>>>>>>>>>> when dim is the same as the topological dimension if the
>>>> facet. As
>>>>>>>>>> in other places, it would be consistent (and the original
>>>> intention
>>>>>>>>>> of the programmer) if this just iterated over the facet
>>>> itself
>>>>>>>>>> rather than all facets attached to it via a vertex.
>>>>>>>>> I don't see why an error message is needed. Could we not just
>>>> add the
>>>>>>>>> possibility to specify what d -- d means? It might be useful
>>>> for other
>>>>>>>>> algorithms to be able to compute that data.
>>>>>>>> I think it should be removed because it's (i) ad-hoc, (ii) is
>>>> not
>>>>>>>> used/required in the library and (iii) is a memory monster.
>>>>>>>> Moreover, we have dimension-independent algorithms that work
>>>> when
>>>>>>>> d--d connectivity is a connection from an entity to itself
>>>> (for
>>>>>>>> which we don't need computation and memory eating). We
>>>> shouldn't
>>>>>>>> have an unnecessary, memory monster d-0-d data structure being
>>>>>>>> created opaquely for no purpose, which is what what
>>>>>>>> for (MeshEntityIterator e(entity, dim); . . . .){....}
>>>>>>>> does at present when (dimension of entity) = dim. The
>>>> excessive
>>>>>>>> memory usage is an issue for big problems.
>>>>>>>> If a user wants d-0-d it can be built explicitly, which makes
>>>> both
>>>>>>>> the purpose and the intention clear.
>>>>>>> How can it be built explicitly without calling mesh.init(d, d)?
>>>>>> Build d--0, then 0--d:
>>>>>> for (MeshEntityIterator e0(mesh, d); .....)
>>>>>> for (VertexIterator v(*e0); ....)
>>>>>> for (MeshEntityIterator e2(*v, d); .....)
>>>>> Yes, that's even more explicity but since we already have a
>>>> function
>>>>> that computes exactly that, I don't see the urge to remove it.
>>>>>> Note that d--0--d is no longer used in DOLFIN (except by
>>>> accident in
>>>>>> finding periodic bcs because of inconsistent behind-the-scenes
>>>>>> behaviour, and it should be changed because it's expensive and
>>>> not
>>>>>> required for periodic bcs).
>>>>> Not true - see line 67 of Extrapolation.cpp.
>>>> That code is not covered by any test or demo.
>>> Yes it is, by a unit test under test/unit/adaptivity/ and all the
>>> auto-adaptive demos, including the documented demo
>>> auto-adaptive-poisson.
>> OK. I don't know why it didn't trigger a failure before when I added
>> an error when d0==d1 and ran the tests.
>>>>> Having easy access to
>>>>> neighboring cells (however one chooses to define what it means to
>>>> be a
>>>>> cell neighbor of a cell) can be useful to many algorithms that
>>>> perform
>>>>> local search algorithms or solve local problems.
>>>> Sure, but again the user should decide how a connection is
>>>> defined.
>>> Yes, I agree.
>>>>> Again, I don't see the urge to remove this function just because
>>>> it
>>>>> consumes a lot of memory. The important point is to avoid using
>>>> it for
>>>>> standard algorithms in DOLFIN, like boundary conditions.
>>>> That's not a good argument. If poor/slow/memory hogging code is
>>>> left
>>>> in the library, it eventually gets used in the library despite any
>>>> warnings. It's happened time and time again.
>>> I have never understood this urge to remove all functionality that
>>> can
>>> potentially be misused. We are adults so it should be enough to
>>> write
>>> a warning in the documentation.
>> This has proven not to be enough in the past. Moreover, it is good to
>> make it easy for users to write good code by not providing functions
>> to are easily misused, and it's good to make it easy to write code
>> that doesn't only work well for small serial cases.
>> Garth
>>>>> The proper
>>>>> way to fix functions that should not use this particular function
>>>> (in
>>>>> for example periodic bcs) would be to create an issue for its
>>>> misuse.
>>>> The periodic bc code does not mis-use the function. The flaw is
>>>> that
>>>> the mesh library magically and unnecessarily creates a memory
>>>> monster behind the scenes.
>>> I agree with that.
>>> --
>>> Anders
>> _______________________________________________
>> fenics mailing list
>> [email protected]
>> http://fenicsproject.org/mailman/listinfo/fenics
>> Links:
>> ------
>> [1] https://bitbucket.org/fenics-project/dolfin/commits/8265008
>> [2] https://github.com/mikaem/fenicstools/blob/master/fem/getMemoryUsage.cpp
>> [3] http://stackoverflow.com/questions/6526983/free-memory-from-a-long-vector
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Memory consumption of mesh

Reply via email to