Hello, I also managed to reproduce this in Docker (4 GB limit):
# cat /proc/self/cgroup 0::/ # cat /sys/fs/cgroup/memory.max 4294967296 # autotest/cpp/gdallimits CPLGetNumCPUs = 32 CPLGetUsablePhysicalRAM = 62 GB (podman behaves exactly the same) Laurentiu On Thu, Jan 26, 2023, at 03:13, Angus Dickey wrote: > Even, > > Thanks, that is some quick turn around! I imagine Proxmox > <https://www.proxmox.com/en/> or LXD > <https://linuxcontainers.org/lxd/introduction/> are pretty much what everyone > uses to create linux containers. LXC is the underlying technology but also > has a set of command line tools that can be used to create containers. In > your case it sounds like LXD can't choose a subnet for your linux bridge, > which is mysterious and I don't know how to fix that. > > I tried your update inside a container and am still seeing the problem where > GDAL thinks it has the full host memory: > > $ gdalinfo --version > GDAL 3.7.0dev, released 2023/99/99 (debug build) > $ ./get_gdal_memory > GDAL version is 3.7.0dev > GDAL thinks it has 135083474944 bytes of physical memory > GDAL thinks it has 135083474944 bytes of usable physical memory > sysinfo() thinks it has 135083474944 bytes of physical memory > $ free -h > total used free shared buff/cache > available > Mem: 2.0Gi 152Mi 1.1Gi 0.0Ki 755Mi > 1.8Gi > Swap: 256Mi 0B 256Mi > $ cat /proc/meminfo | grep MemTotal > MemTotal: 2048000 kB > > I wanted to dig a bit but am no expert in containerization and cgroup v2. It > seems that some tools show the memory the container has (free > <https://man7.org/linux/man-pages/man1/free.1.html>& /proc/meminfo > <https://man7.org/linux/man-pages/man5/proc.5.html>) and others (sysinfo > <https://man7.org/linux/man-pages/man2/sysinfo.2.html>) show the host memory. > For cgroups v2 I see your code is trying to find the max memory from a > specific memory.max file in /sys/fs/cgroup/. In my *containers *that file > (actually all the memory.max files) contain the default value "max". > > $ find /sys/fs/cgroup -type f -name memory.max -exec sh -c "cat '{}'" \; > max > max > max > ... all max ... > max > > If I try the same thing on the *host *I actually find it is set to the > expected value. > > cat $ /sys/fs/cgroup/lxc/901/memory.max > 2097152000 > > The cgroup values on the host appear to be what is limiting the container > memory, more rules can be added inside the container but they are still > beholden to the host rules. I am not sure how free & /proc/memory are getting > the correct available memory but maybe I will ask the proxmox or LXD people. > > Thanks again, > > Angus > > > On Wed, Jan 25, 2023 at 4:49 AM Even Rouault <even.roua...@spatialys.com> > wrote: >> Angus, >> >> I'm not familiar with LXC. I tried to setup LXD with >> https://linuxcontainers.org/lxd/introduction/ but it fails with a mysterious >> "Error: Failed to create local member network "lxdbr0" in project "default": >> Failed generating auto config: Failed to automatically find an unused IPv4 >> subnet, manual configuration required" >> >> Anyway, I've attempted in https://github.com/OSGeo/gdal/pull/7124 to better >> take into account cgroup to get memory limitation. Could you give this a try? >> >> Even >> >> Le 25/01/2023 à 06:24, Angus Dickey a écrit : >>> Even, >>> >>> >>> >>> Thanks for the reply, I went ahead and compiled the latest GDAL 3.6.2 on >>> Ubuntu 22.04. Unfortunately I ended up with a similar result, GDAL thinks >>> it has 755GB of RAM to work with when it only has 2GB: >>> >>> >>> >>> $ gdalinfo --version >>> GDAL 3.6.2, released 2023/01/02 (debug build) >>> >>> $ ./get_gdal_memory >>> GDAL version is 3.6.2 >>> GDAL thinks is has 811526475776 bytes of physical memory >>> GDAL thinks it has 811526475776 bytes of usable physical memory >>> >>> $ free -h >>> total used free shared buff/cache >>> available >>> Mem: 2.0Gi 148Mi 1.2Gi 0.0Ki 639Mi >>> 1.8Gi >>> Swap: 256Mi 0B 256Mi >>> >>> >>> My knowledge on the subject is limited but I think Linux containers (LXC) >>> uses cgroups and not setrlimit to limit resources, so maybe that is why the >>> new changes had no effect. To reproduce this issue you can create a >>> container using LXC, LXD, or a hypervision like proxmox (what I am using) >>> and call CPLGetUsablePhysicalRAM(). >>> >>> If there is any other info that might be helpful let me know. I might try a >>> Docker container (it also uses cgroups) and is more popular than LXC, >>> although it fulfills a different function. >>> >>> thanks, >>> >>> Angus >>> >>> >>> On Tue, Jan 24, 2023 at 5:50 PM Even Rouault <even.roua...@spatialys.com> >>> wrote: >>>> Angus, >>>> >>>> there has been a recent extra fix that landed in GDAL 3.6.2 that might >>>> possibly help: https://github.com/OSGeo/gdal/pull/6926 >>>> >>>> Even >>>> >>>> Le 25/01/2023 à 01:36, Angus Dickey a écrit : >>>>> Hi all, >>>>> >>>>> I am running into an issue where GDAL is overestimating the amount of >>>>> physical memory it has leading to it locking up the OS by taking 100% of >>>>> the memory. Here is an example program that illustrates the issue: >>>>> >>>>> #include <stdio.h> >>>>> #include "gdal.h" >>>>> >>>>> int main(void) { >>>>> printf("GDAL version is %s\n", GDALVersionInfo("RELEASE_NAME")); >>>>> printf("GDAL thinks is has %lld bytes of physical memory\n", >>>>> CPLGetPhysicalRAM()); >>>>> printf("GDAL thinks it has %lld bytes of usable physical memory\n", >>>>> CPLGetUsablePhysicalRAM()); >>>>> return 0; >>>>> } >>>>> >>>>> When this is compiled with GDAL 3.5.1 on Ubuntu 22.04 we get: >>>>> >>>>> $ ./get_gdal_memory >>>>> GDAL version is 3.5.1 >>>>> GDAL thinks is has 811526475776 bytes of physical memory >>>>> GDAL thinks it has 811526475776 bytes of usable physical memory >>>>> >>>>> Which is not consistent with the actual available memory: >>>>> >>>>> $ free -h >>>>> total used free shared buff/cache >>>>> available >>>>> Mem: 2.0Gi 148Mi 1.2Gi 0.0Ki 639Mi >>>>> 1.8Gi >>>>> Swap: 256Mi 0B 256Mi >>>>> >>>>> So GDAL thinks it has 755GB of memory but it only has 2GB, this causes >>>>> issues with the raster read cache and maybe elsewhere. I suspect this is >>>>> happening because it is running in a Linux container >>>>> <https://linuxcontainers.org/> and GDAL is getting the total physical >>>>> memory of the host, not the container. The strange thing is Linux >>>>> containers use cgroups for memory restrictions and the API docs mention >>>>> it was fixed in GDAL 2.4.0 >>>>> <https://gdal.org/api/cpl.html#_CPPv417CPLGetPhysicalRAMv> but I am still >>>>> seeing the issue in 3.5.1. >>>>> >>>>> Any help or insight would be appreciated; I am happy to provide any >>>>> additional information or testing. >>>>> >>>>> Thanks, >>>>> >>>>> Angus >>>>> >>>>> _______________________________________________ >>>>> gdal-dev mailing list >>>>> gdal-dev@lists.osgeo.org >>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev >>>>> >>>> -- >>>> http://www.spatialys.com >>>> My software is free, but my time generally not. >> -- >> http://www.spatialys.com >> My software is free, but my time generally not. > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev