Hello,

I also managed to reproduce this in Docker (4 GB limit):

# cat /proc/self/cgroup 
0::/
# cat /sys/fs/cgroup/memory.max
4294967296
# autotest/cpp/gdallimits 
CPLGetNumCPUs = 32
CPLGetUsablePhysicalRAM = 62 GB

(podman behaves exactly the same)

Laurentiu

On Thu, Jan 26, 2023, at 03:13, Angus Dickey wrote:
> Even,
> 
> Thanks, that is some quick turn around! I imagine Proxmox 
> <https://www.proxmox.com/en/> or LXD 
> <https://linuxcontainers.org/lxd/introduction/> are pretty much what everyone 
> uses to create linux containers. LXC is the underlying technology but also 
> has a set of command line tools that can be used to create containers. In 
> your case it sounds like LXD can't choose a subnet for your linux bridge, 
> which is mysterious and I don't know how to fix that.
> 
> I tried your update inside a container and am still seeing the problem where 
> GDAL thinks it has the full host memory:
> 
> $ gdalinfo --version
> GDAL 3.7.0dev, released 2023/99/99 (debug build)
> $ ./get_gdal_memory
> GDAL version is 3.7.0dev
> GDAL thinks it has 135083474944 bytes of physical memory
> GDAL thinks it has 135083474944 bytes of usable physical memory
> sysinfo() thinks it has 135083474944 bytes of physical memory
> $ free -h
>                total        used        free      shared  buff/cache   
> available
> Mem:           2.0Gi       152Mi       1.1Gi       0.0Ki       755Mi       
> 1.8Gi
> Swap:          256Mi          0B       256Mi
> $ cat /proc/meminfo | grep MemTotal
> MemTotal:        2048000 kB
> 
> I wanted to dig a bit but am no expert in containerization and cgroup v2. It 
> seems that some tools show the memory the container has (free  
> <https://man7.org/linux/man-pages/man1/free.1.html>& /proc/meminfo 
> <https://man7.org/linux/man-pages/man5/proc.5.html>) and others (sysinfo 
> <https://man7.org/linux/man-pages/man2/sysinfo.2.html>) show the host memory. 
> For cgroups v2 I see your code is trying to find the max memory from a 
> specific memory.max file in /sys/fs/cgroup/. In my *containers *that file 
> (actually all the memory.max files) contain the default value "max".
> 
> $ find /sys/fs/cgroup -type f -name memory.max -exec sh -c "cat '{}'" \;
> max
> max
> max
> ... all max ...
> max
> 
> If I try the same thing on the *host *I actually find it is set to the 
> expected value.
> 
> cat $ /sys/fs/cgroup/lxc/901/memory.max
> 2097152000
> 
> The cgroup values on the host appear to be what is limiting the container 
> memory, more rules can be added inside the container but they are still 
> beholden to the host rules. I am not sure how free & /proc/memory are getting 
> the correct available memory but maybe I will ask the proxmox or LXD people.
> 
> Thanks again,
> 
> Angus
> 
> 
> On Wed, Jan 25, 2023 at 4:49 AM Even Rouault <even.roua...@spatialys.com> 
> wrote:
>> Angus,
>> 
>> I'm not familiar with LXC. I tried to setup LXD with 
>> https://linuxcontainers.org/lxd/introduction/ but it fails with a mysterious 
>> "Error: Failed to create local member network "lxdbr0" in project "default": 
>> Failed generating auto config: Failed to automatically find an unused IPv4 
>> subnet, manual configuration required"
>> 
>> Anyway, I've attempted in https://github.com/OSGeo/gdal/pull/7124 to better 
>> take into account cgroup to get memory limitation. Could you give this a try?
>> 
>> Even
>> 
>> Le 25/01/2023 à 06:24, Angus Dickey a écrit :
>>> Even,
>>> 
>>> 
>>> 
>>> Thanks for the reply, I went ahead and compiled the latest GDAL 3.6.2 on 
>>> Ubuntu 22.04. Unfortunately I ended up with a similar result, GDAL thinks 
>>> it has 755GB of RAM to work with when it only has 2GB:
>>> 
>>> 
>>> 
>>> $ gdalinfo --version
>>> GDAL 3.6.2, released 2023/01/02 (debug build)
>>> 
>>> $ ./get_gdal_memory
>>> GDAL version is 3.6.2
>>> GDAL thinks is has 811526475776 bytes of physical memory
>>> GDAL thinks it has 811526475776 bytes of usable physical memory
>>> 
>>> $ free -h
>>>                total        used        free      shared  buff/cache   
>>> available
>>> Mem:           2.0Gi       148Mi       1.2Gi       0.0Ki       639Mi       
>>> 1.8Gi
>>> Swap:          256Mi          0B       256Mi
>>> 
>>> 
>>> My knowledge on the subject is limited but I think Linux containers (LXC) 
>>> uses cgroups and not setrlimit to limit resources, so maybe that is why the 
>>> new changes had no effect. To reproduce this issue you can create a 
>>> container using  LXC, LXD, or a hypervision like proxmox (what I am using) 
>>> and call CPLGetUsablePhysicalRAM().
>>> 
>>> If there is any other info that might be helpful let me know. I might try a 
>>> Docker container (it also uses cgroups) and is more popular than LXC, 
>>> although it fulfills a different function.
>>> 
>>> thanks,
>>> 
>>> Angus
>>> 
>>> 
>>> On Tue, Jan 24, 2023 at 5:50 PM Even Rouault <even.roua...@spatialys.com> 
>>> wrote:
>>>> Angus,
>>>> 
>>>> there has been a recent extra fix that landed in GDAL 3.6.2 that might 
>>>> possibly help: https://github.com/OSGeo/gdal/pull/6926
>>>> 
>>>> Even
>>>> 
>>>> Le 25/01/2023 à 01:36, Angus Dickey a écrit :
>>>>> Hi all,
>>>>> 
>>>>> I am running into an issue where GDAL is overestimating the amount of 
>>>>> physical memory it has leading to it locking up the OS by taking 100% of 
>>>>> the memory. Here is an example program that illustrates the issue:
>>>>> 
>>>>> #include <stdio.h>
>>>>> #include "gdal.h"
>>>>> 
>>>>> int main(void) {
>>>>>    printf("GDAL version is %s\n", GDALVersionInfo("RELEASE_NAME"));
>>>>>    printf("GDAL thinks is has %lld bytes of physical memory\n", 
>>>>> CPLGetPhysicalRAM());
>>>>>    printf("GDAL thinks it has %lld bytes of usable physical memory\n", 
>>>>> CPLGetUsablePhysicalRAM());
>>>>>    return 0;
>>>>> }
>>>>> 
>>>>> When this is compiled with GDAL 3.5.1 on Ubuntu 22.04 we get:
>>>>> 
>>>>> $ ./get_gdal_memory 
>>>>> GDAL version is 3.5.1
>>>>> GDAL thinks is has 811526475776 bytes of physical memory
>>>>> GDAL thinks it has 811526475776 bytes of usable physical memory
>>>>> 
>>>>> Which is not consistent with the actual available memory:
>>>>> 
>>>>> $ free -h
>>>>>                total        used        free      shared  buff/cache   
>>>>> available
>>>>> Mem:           2.0Gi       148Mi       1.2Gi       0.0Ki       639Mi      
>>>>>  1.8Gi
>>>>> Swap:          256Mi          0B       256Mi
>>>>> 
>>>>> So GDAL thinks it has 755GB of memory but it only has 2GB, this causes 
>>>>> issues with the raster read cache and maybe elsewhere. I suspect this is 
>>>>> happening because it is running in a Linux container 
>>>>> <https://linuxcontainers.org/> and GDAL is getting the total physical 
>>>>> memory of the host, not the container. The strange thing is Linux 
>>>>> containers use cgroups for memory restrictions and the API docs mention 
>>>>> it was fixed in GDAL 2.4.0 
>>>>> <https://gdal.org/api/cpl.html#_CPPv417CPLGetPhysicalRAMv> but I am still 
>>>>> seeing the issue in 3.5.1.
>>>>> 
>>>>> Any help or insight would be appreciated; I am happy to provide any 
>>>>> additional information or testing.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Angus
>>>>> 
>>>>> _______________________________________________
>>>>> gdal-dev mailing list
>>>>> gdal-dev@lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>> 
>>>> -- 
>>>> http://www.spatialys.com
>>>> My software is free, but my time generally not.
>> -- 
>> http://www.spatialys.com
>> My software is free, but my time generally not.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to