Re: [ceph-users] RADOS async client memory usage explodes when reading several objects in sequence

Casey Bodley Wed, 12 Sep 2018 07:04:58 -0700


On 09/12/2018 05:29 AM, Daniel Goldbach wrote:

Hi all,
We're reading from a Ceph Luminous pool using the librados asychronousI/O API. We're seeing some concerning memory usage patterns when weread many objects in sequence.
The expected behaviour is that our memory usage stabilises at a smallamount, since we're just fetching objects and ignoring their data.What we instead find is that the memory usage of our program growslinearly with the amount of data read for an interval of time, andthen continues to grow at a much slower but still consistent pace.This memory is not freed until program termination. My guess is thatthis is an issue with Ceph's memory allocator.
To demonstrate, we create 20000 objects of size 10KB, and of size100KB, and of size 1MB:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <rados/librados.h>

    int main() {
rados_t cluster;
rados_create(&cluster, "test");
rados_conf_read_file(cluster, "/etc/ceph/ceph.conf");
rados_connect(cluster);

rados_ioctx_t io;
rados_ioctx_create(cluster, "test", &io);

        char data[1000000];
memset(data, 'a', 1000000);

        char smallobj_name[16], mediumobj_name[16], largeobj_name[16];
        int i;
        for (i = 0; i < 20000; i++) {
sprintf(smallobj_name, "10kobj_%d", i);
rados_write(io, smallobj_name, data, 10000, 0);

sprintf(mediumobj_name, "100kobj_%d", i);
rados_write(io, mediumobj_name, data, 100000, 0);

sprintf(largeobj_name, "1mobj_%d", i);
rados_write(io, largeobj_name, data, 1000000, 0);

printf("wrote %s of size 10000, %s of size 100000, %s of size 1000000\n",
      smallobj_name, mediumobj_name, largeobj_name);
        }

return 0;
    }

    $ gcc create.c -lrados -o create
    $ ./create
wrote 10kobj_0 of size 10000, 100kobj_0 of size 100000, 1mobj_0 ofsize 1000000 wrote 10kobj_1 of size 10000, 100kobj_1 of size 100000, 1mobj_1 ofsize 1000000
    [...]
wrote 10kobj_19998 of size 10000, 100kobj_19998 of size 100000,1mobj_19998 of size 1000000 wrote 10kobj_19999 of size 10000, 100kobj_19999 of size 100000,1mobj_19999 of size 1000000
Now we read each of these objects with the async API, into the samebuffer. First we read just the the 10KB objects first:
    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <rados/librados.h>

    void readobj(rados_ioctx_t* io, char objname[]);

    int main() {
        rados_t cluster;
rados_create(&cluster, "test");
rados_conf_read_file(cluster, "/etc/ceph/ceph.conf");
rados_connect(cluster);

rados_ioctx_t io;
rados_ioctx_create(cluster, "test", &io);

        char smallobj_name[16];
        int i, total_bytes_read = 0;

        for (i = 0; i < 20000; i++) {
sprintf(smallobj_name, "10kobj_%d", i);
readobj(&io, smallobj_name);

total_bytes_read += 10000;
printf("Read %s for total %d\n", smallobj_name, total_bytes_read);
        }

getchar();
        return 0;
    }

    void readobj(rados_ioctx_t* io, char objname[]) {
        char data[1000000];
        unsigned long bytes_read;
rados_completion_t completion;
        int retval;

rados_read_op_t read_op = rados_create_read_op();
rados_read_op_read(read_op, 0, 10000, data, &bytes_read, &retval);
retval = rados_aio_create_completion(NULL, NULL, NULL,&completion);
assert(retval == 0);
retval = rados_aio_read_op_operate(read_op, *io, completion,objname, 0);
assert(retval == 0);

rados_aio_wait_for_complete(completion);
rados_aio_get_return_value(completion);
    }

    $ gcc read.c -lrados -o read_small -Wall -g && ./read_small
    Read 10kobj_0 for total 10000
    Read 10kobj_1 for total 20000
    [...]
    Read 10kobj_19998 for total 199990000
    Read 10kobj_19999 for total 200000000
We read 200MB. A graph of the resident set size of the program isattached as mem-graph-10k.png, with seconds on x axis and KB on the yaxis. You can see that the memory usage increases throughout, whichitself is unexpected since that memory should be freed over time andwe should only hold 10KB of object data in memory at a time. The rateof growth decreases and eventually stabilises, and by the end we'veused 60MB of RAM.
We repeat this experiment for the 100KB and 1MB objects and find thatafter all reads they use 140MB and 500MB of RAM, and memory usagepresumably would continue to grow if there were more objects. This isorders of magnitude more memory than what I would expect theseprograms to use.
  * We do not get this behaviour with the synchronous API, and the
    memory usage remains stable at just a few MB.
  * We've found that for some reason, this doesn't happen (or doesn't
    happen as severely) if we intersperse large reads with much
    smaller reads. In this case, the memory usage seems to stabilise
    at a reasonable number.
  * Valgrind only reports a trivial amount of unreachable memory.
  * Memory usage doesn't increase in this manner if we repeatedly read
    the same object over and over again. It hovers around 20MB.
  * In other experiments we've done, with different object data and
    distributions of object sizes, we've seen memory usage grow even
    larger in proportion to the amount of data read.
We maintain a long-running (order of weeks) services that read objectsfrom Ceph and send them elsewhere. Over time, the memory usage of someof these services have grown to more than 6GB, which is unreasonable.
--
Regards,
Dan G


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

It looks like the async example is missing calls to rados_aio_release()to clean up the completions. I'm not sure that would account for all ofthe memory growth, but that's where I would start. Past that, runningthe client under valgrind massif should help with further investigation.


Casey
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS async client memory usage explodes when reading several objects in sequence

Reply via email to