Re: [PATCH] buffer::ptr::cmp only compares up to the smallest length
Hi Sage, My bad, indeed. I'll resubmit a patch. Cheers On 02/17/2013 03:11 AM, Sage Weil wrote: On Sun, 17 Feb 2013, Loic Dachary wrote: When running bufferptr a(A, 1); bufferptr ab(AB, 2); a.cmp(ab); it returned zero because cmp only compared up to the length of the smallest buffer. The tests comparing the length of the buffers are moved before the memcmp comparing the actual content of the buffers. http://tracker.ceph.com/issues/4170 refs #4170 Signed-off-by: Loic Dachary l...@dachary.org The problem here is that for B cmp AB, we should be 1, because 'B' 'A'. The length only matters if we reach the end and everything so far is equal. --- src/common/buffer.cc |8 +--- src/test/bufferlist.cc | 14 ++ 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/src/common/buffer.cc b/src/common/buffer.cc index e10d6c9..5a88849 100644 --- a/src/common/buffer.cc +++ b/src/common/buffer.cc @@ -368,17 +368,11 @@ bool buffer_track_alloc = get_env_bool(CEPH_BUFFER_TRACK); int buffer::ptr::cmp(const ptr o) { -int l = _len o._len ? _len : o._len; -if (l) { - int r = memcmp(c_str(), o.c_str(), l); - if (!r) -return r; I think this is the bug.. it should be if (r) return r, and fall through below only if r == 0 because the buffers are equal. sage -} if (_len o._len) return -1; if (_len o._len) return 1; -return 0; +return memcmp(c_str(), o.c_str(), _len); } bool buffer::ptr::is_zero() const diff --git a/src/test/bufferlist.cc b/src/test/bufferlist.cc index 71c2e79..aac41c6 100644 --- a/src/test/bufferlist.cc +++ b/src/test/bufferlist.cc @@ -9,6 +9,20 @@ #define MAX_TEST 100 +TEST(BufferPtr, cmp) { + bufferptr empty; + bufferptr a(A, 1); + bufferptr ab(AB, 2); + bufferptr af(AF, 2); + EXPECT_GE(-1, empty.cmp(a)); + EXPECT_LE(1, a.cmp(empty)); + EXPECT_GE(-1, a.cmp(ab)); + EXPECT_LE(1, ab.cmp(a)); + EXPECT_EQ(0, ab.cmp(ab)); + EXPECT_GE(-1, ab.cmp(af)); + EXPECT_LE(1, af.cmp(ab)); +} + TEST(BufferList, zero) { // // void zero() -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: Hit suicide timeout after adding new osd
On Thu, Jan 24, 2013 at 10:01 PM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 8:39 AM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian S?gaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? I was able to catch this too on rejoin to very busy cluster and seems I need to lower this value at least at start time. Also c5fe0965572c074a2a33660719ce3222d18c1464 has increased overall time before restarted or new osd will join a cluster, and for 2M objects/3T of replicated data restart of the cluster was took almost a hour before it actually begins to work. The worst thing is that a single osd, if restarted, will mark as up after couple of minutes, then after almost half of hour(eating 100 percent of one cpu, ) as down and then cluster will start to redistribute data after 300s timeout, osd still doing something. Okay, something is very wrong. Can you reproduce this with a log? Or even a partial log while it is spinning? You can adjust the log level on a running process with ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_osd 20 ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_ms 1 We haven't been able to reproduce this, so I'm very much interested in any light you can shine here. Unfortunately cluster finally hit ``suicide timeout'' by every osd, so there was no logs, only some backtraces[1]. Yesterday after an osd was not able to join cluster in a hour, I decided to wait until data is remapped, then tried to restart cluster, leaving it overnight, to morning all osd processes are dead, with the same backtraces. Before it, after a silly node crash(related to deadlocks in kernel kvm code), some pgs remains to stay in peering state without any blocker in json output, so I had decided to restart osd to which primary copy belongs, because it helped before. So most interesting part is missing, but I`ll reformat cluster soon and will try to catch this again after filling some data in. [1]. http://xdel.ru/downloads/ceph-log/osd-heartbeat/ Thanks, I believe I see the problem. The peering workqueue is way behind, and it is trying to it all in one lump, timing out the work queue. The workaround is to increase the timeout. We'll put together a proper fix. sage Hi Sage, Single OSDs still not able to join a cluster after restart, osd process eats one core and reads disk by long continuous periods, about hundreds of seconds, then staying eating 100% of core, then repeat. On relatively new cluster, it is not repeatable even with almost same data commit, only week or two of writes, snapshot creation, etc. exposes that. Please see log below: 2013-02-17 12:08:17.503992 7fbe8795c780 0 ceph version 0.56.3-2-g290a352 (290a352c3f9e241deac562e980ac8c6a74033ba6), process ceph-osd, pid 29283 starting osd.26 at :/0 osd_data /var/lib/ceph/osd/26 /var/lib/ceph/osd/journal/journal26 2013-02-17 12:08:17.508193 7fbe8795c780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6803/29283 need_addr=1 2013-02-17 12:08:17.508222 7fbe8795c780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6804/29283 need_addr=1 2013-02-17 12:08:17.508244 7fbe8795c780 1 accepter.accepter.bind my_inst.addr is
Re: [0.48.3] OSD memory leak when scrubbing
+1 -- Regards, Sébastien Han. On Sat, Feb 16, 2013 at 10:09 AM, Wido den Hollander w...@42on.com wrote: On 02/16/2013 08:09 AM, Andrey Korolyov wrote: Can anyone who hit this bug please confirm that your system contains libc 2.15+? I've seen this with 0.56.2 as well on Ubuntu 12.04. Ubuntu 12.04 comes with 2.15-0ubuntu10.3 Haven't gotten around to adding a heap profiler to it. Wido On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han han.sebast...@gmail.com wrote: oh nice, the pattern also matches path :D, didn't know that thanks Greg -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum g...@inktank.com wrote: Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core -Greg On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han han.sebast...@gmail.com wrote: ok I finally managed to get something on my test cluster, unfortunately, the dump goes to / any idea to change the destination path? My production / won't be big enough... -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick dan.m...@inktank.com wrote: ...and/or do you have the corepath set interestingly, or one of the core-trapping mechanisms turned on? On 02/04/2013 11:29 AM, Sage Weil wrote: On Mon, 4 Feb 2013, S?bastien Han wrote: Hum just tried several times on my test cluster and I can't get any core dump. Does Ceph commit suicide or something? Is it expected behavior? SIGSEGV should trigger the usual path that dumps a stack trace and then dumps core. Was your ulimit -c set before the daemon was started? sage -- Regards, S?bastien Han. On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han han.sebast...@gmail.com wrote: Hi Lo?c, Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). Cheer -- Regards, S?bastien Han. On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han han.sebast...@gmail.com wrote: Hi Lo?c, Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). Cheers -- Regards, S?bastien Han. On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary l...@dachary.org wrote: Hi, As discussed during FOSDEM, the script you wrote to kill the OSD when it grows too much could be amended to core dump instead of just being killed restarted. The binary + core could probably be used to figure out where the leak is. You should make sure the OSD current working directory is in a file system with enough free disk space to accomodate for the dump and set ulimit -c unlimited before running it ( your system default is probably ulimit -c 0 which inhibits core dumps ). When you detect that OSD grows too much kill it with kill -SEGV $pid and upload the core found in the working directory, together with the binary in a public place. If the osd binary is compiled with -g but without changing the -O settings, you should have a larger binary file but no negative impact on performances. Forensics analysis will be made a lot easier with the debugging symbols. My 2cts On 01/31/2013 08:57 PM, Sage Weil wrote: On Thu, 31 Jan 2013, Sylvain Munaut wrote: Hi, I disabled scrubbing using ceph osd tell \* injectargs '--osd-scrub-min-interval 100' ceph osd tell \* injectargs '--osd-scrub-max-interval 1000' and the leak seems to be gone. See the graph at http://i.imgur.com/A0KmVot.png with the OSD memory for the 12 osd processes over the last 3.5 days. Memory was rising every 24h. I did the change yesterday around 13h00 and OSDs stopped growing. OSD memory even seems to go down slowly by small blocks. Of course I assume disabling scrubbing is not a long term solution and I should re-enable it ... (how do I do that btw ? what were the default values for those parameters) It depends on the exact commit you're on. You can see the defaults if you do ceph-osd --show-config | grep osd_scrub Thanks for testing this... I have a few other ideas to try to reproduce. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Lo?c Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to
Re: [PATCH] buffer::ptr::cmp only compares up to the smallest length
Applied, thanks! I left this in master since the only current caller I see is in the omap scrubbing code. We may want to backport it soonish, though. sage On Sun, 17 Feb 2013, Loic Dachary wrote: When running bufferptr a(A, 1); bufferptr ab(AB, 2); a.cmp(ab); it returned zero because. cmp only compared up to the length of the smallest buffer and returned if they are identical. The function is modified to compare the length of the buffers instead of returning. http://tracker.ceph.com/issues/4170 refs #4170 Signed-off-by: Loic Dachary l...@dachary.org --- src/common/buffer.cc |2 +- src/test/bufferlist.cc | 17 + 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/src/common/buffer.cc b/src/common/buffer.cc index e10d6c9..df50cfc 100644 --- a/src/common/buffer.cc +++ b/src/common/buffer.cc @@ -371,7 +371,7 @@ bool buffer_track_alloc = get_env_bool(CEPH_BUFFER_TRACK); int l = _len o._len ? _len : o._len; if (l) { int r = memcmp(c_str(), o.c_str(), l); - if (!r) + if (r) return r; } if (_len o._len) diff --git a/src/test/bufferlist.cc b/src/test/bufferlist.cc index 71c2e79..7abced1 100644 --- a/src/test/bufferlist.cc +++ b/src/test/bufferlist.cc @@ -9,6 +9,23 @@ #define MAX_TEST 100 +TEST(BufferPtr, cmp) { + bufferptr empty; + bufferptr a(A, 1); + bufferptr ab(AB, 2); + bufferptr af(AF, 2); + bufferptr acc(ACC, 3); + EXPECT_GE(-1, empty.cmp(a)); + EXPECT_LE(1, a.cmp(empty)); + EXPECT_GE(-1, a.cmp(ab)); + EXPECT_LE(1, ab.cmp(a)); + EXPECT_EQ(0, ab.cmp(ab)); + EXPECT_GE(-1, ab.cmp(af)); + EXPECT_LE(1, af.cmp(ab)); + EXPECT_GE(-1, acc.cmp(af)); + EXPECT_LE(1, af.cmp(acc)); +} + TEST(BufferList, zero) { // // void zero() -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On Sun, 17 Feb 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 10:01 PM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 8:39 AM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian S?gaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? I was able to catch this too on rejoin to very busy cluster and seems I need to lower this value at least at start time. Also c5fe0965572c074a2a33660719ce3222d18c1464 has increased overall time before restarted or new osd will join a cluster, and for 2M objects/3T of replicated data restart of the cluster was took almost a hour before it actually begins to work. The worst thing is that a single osd, if restarted, will mark as up after couple of minutes, then after almost half of hour(eating 100 percent of one cpu, ) as down and then cluster will start to redistribute data after 300s timeout, osd still doing something. Okay, something is very wrong. Can you reproduce this with a log? Or even a partial log while it is spinning? You can adjust the log level on a running process with ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_osd 20 ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_ms 1 We haven't been able to reproduce this, so I'm very much interested in any light you can shine here. Unfortunately cluster finally hit ``suicide timeout'' by every osd, so there was no logs, only some backtraces[1]. Yesterday after an osd was not able to join cluster in a hour, I decided to wait until data is remapped, then tried to restart cluster, leaving it overnight, to morning all osd processes are dead, with the same backtraces. Before it, after a silly node crash(related to deadlocks in kernel kvm code), some pgs remains to stay in peering state without any blocker in json output, so I had decided to restart osd to which primary copy belongs, because it helped before. So most interesting part is missing, but I`ll reformat cluster soon and will try to catch this again after filling some data in. [1]. http://xdel.ru/downloads/ceph-log/osd-heartbeat/ Thanks, I believe I see the problem. The peering workqueue is way behind, and it is trying to it all in one lump, timing out the work queue. The workaround is to increase the timeout. We'll put together a proper fix. sage Hi Sage, Single OSDs still not able to join a cluster after restart, osd process eats one core and reads disk by long continuous periods, about hundreds of seconds, then staying eating 100% of core, then repeat. On relatively new cluster, it is not repeatable even with almost same data commit, only week or two of writes, snapshot creation, etc. exposes that. Please see log below: This sounds like the fixed code chewing through PGs with very old epochs due to the (now fixed) bug in .2. Are there some pools that were completely idle on this cluster prior to the upgrade? The problem before was that we weren't persisting the new osdmap epochs for those PGs if they never saw any IO, which means they have to do catch-up on restart. They will come up
[PATCH] unit tests for src/common/buffer.{cc,h}
Implement unit tests covering most lines of code ( 92% ) and all methods as show by the output of make check-coverage : http://dachary.org/wp-uploads/2013/03/ceph-lcov/ . The following static constructors are implemented by opaque classes defined in buffer.cc ( buffer::raw_char, buffer::raw_posix_aligned etc. ). Testing the implementation of these classes is done by variations of the calls to the static constructors. copy(const char *c, unsigned len); create(unsigned len); claim_char(unsigned len, char *buf); create_malloc(unsigned len); claim_malloc(unsigned len, char *buf); create_static(unsigned len, char *buf); create_page_aligned(unsigned len); The raw_mmap_pages class cannot be tested because it is commented out in raw_posix_aligned. The raw_hack_aligned class is only tested under Cygwin. The raw_posix_aligned class is not tested under Cygwin. The unittest_bufferlist.sh script calls unittest_bufferlist with the CEPH_BUFFER_TRACK=true environment variable to enable the code tracking the memory usage. It cannot be done within the bufferlist.cc file itself because it relies on the initialization of a global variable ( buffer_track_alloc ). When raw_posix_aligned is called on DARWIN, the data is not aligned on CEPH_PAGE_SIZE because it calls valloc(size) which is the equivalent of memalign(sysconf(_SC_PAGESIZE),size) and not memalign(CEPH_PAGE_SIZE,size). For this reason the alignment test is de-activated on DARWIN. The tests are grouped in TEST(BufferPtr, ... ) for buffer::ptr TEST(BufferListIterator, ...) for buffer::list::iterator TEST(BufferList, ...) for buffer::list TEST(BufferHash, ...) for buffer::hash and each method ( and all variations of the prototype ) are included into a single TEST() function. Although most aspects of the methods are tested, including exceptions and border cases, inconsistencies are not highlighted . For instance buffer::list::iterator i; i.advance(1); would dereference a buffer::raw NULL pointer although buffer::ptr p; p.wasted() asserts instead of dereferencing the buffer::raw NULL pointer. It would be better to always assert in case a NULL pointer is about to be used. But this is a minor inconsistency that is probably not worth a test. The following buffer::list methods ssize_t read_fd(int fd, size_t len); int write_fd(int fd) const; are not fully tested because the border cases cannot be reliably reproduced. Going thru a pointer indirection when calling the ::writev or safe_read functions would allow the test to create mockups to synthetize the conditions for border cases. tracker.ceph.com/issues/4066 refs #4066 Signed-off-by: Loic Dachary l...@dachary.org --- src/Makefile.am|5 +- src/test/bufferlist.cc | 1801 +--- src/unittest_bufferlist.sh | 19 + 3 files changed, 1731 insertions(+), 94 deletions(-) create mode 100755 src/unittest_bufferlist.sh diff --git a/src/Makefile.am b/src/Makefile.am index 556de51..1725588 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -19,7 +19,8 @@ EXTRA_DIST = \ libs3/libs3.spec \ libs3/mswin \ libs3/src \ - libs3/test + libs3/test \ + unittest_bufferlist.sh CLEANFILES = bin_PROGRAMS = @@ -38,7 +39,7 @@ check_PROGRAMS = # tests to actually run on make check; if you need extra, non-test, # executables built, you need to replace this with manual assignments # target by target -TESTS = $(check_PROGRAMS) +TESTS = $(check_PROGRAMS) unittest_bufferlist.sh check-local: $(srcdir)/test/encoding/check-generated.sh diff --git a/src/test/bufferlist.cc b/src/test/bufferlist.cc index 7abced1..6f8ba19 100644 --- a/src/test/bufferlist.cc +++ b/src/test/bufferlist.cc @@ -1,77 +1,1650 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab +/* + * Ceph - scalable distributed file system + * + * Copyright (C) 2013 Cloudwatt libre.licens...@cloudwatt.com + * + * Author: Loic Dachary l...@dachary.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Library Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Library Public License for more details. + * + */ + #include tr1/memory +#include limits.h +#include errno.h +#include sys/uio.h #include include/buffer.h #include include/encoding.h +#include common/environment.h #include gtest/gtest.h #include stdlib.h - +#include fcntl.h +#include sys/stat.h #define MAX_TEST 100 -TEST(BufferPtr, cmp) { - bufferptr empty; - bufferptr a(A, 1); - bufferptr ab(AB, 2); - bufferptr af(AF, 2); - bufferptr acc(ACC,
Re: [PATCH] unit tests for src/common/buffer.{cc,h}
Hi Loic, I merged this in, which two small changes: - the malloc ULLONG_MAX tests were succeeding and eating RAM on my box; commented them out. - the BIG_SZ buffer on teh stack was segfaulting; put it on the heap. Otherwise, looks great! I'm very pleased to have test coverage on this code. :) sage On Sun, 17 Feb 2013, Loic Dachary wrote: Implement unit tests covering most lines of code ( 92% ) and all methods as show by the output of make check-coverage : http://dachary.org/wp-uploads/2013/03/ceph-lcov/ . The following static constructors are implemented by opaque classes defined in buffer.cc ( buffer::raw_char, buffer::raw_posix_aligned etc. ). Testing the implementation of these classes is done by variations of the calls to the static constructors. copy(const char *c, unsigned len); create(unsigned len); claim_char(unsigned len, char *buf); create_malloc(unsigned len); claim_malloc(unsigned len, char *buf); create_static(unsigned len, char *buf); create_page_aligned(unsigned len); The raw_mmap_pages class cannot be tested because it is commented out in raw_posix_aligned. The raw_hack_aligned class is only tested under Cygwin. The raw_posix_aligned class is not tested under Cygwin. The unittest_bufferlist.sh script calls unittest_bufferlist with the CEPH_BUFFER_TRACK=true environment variable to enable the code tracking the memory usage. It cannot be done within the bufferlist.cc file itself because it relies on the initialization of a global variable ( buffer_track_alloc ). When raw_posix_aligned is called on DARWIN, the data is not aligned on CEPH_PAGE_SIZE because it calls valloc(size) which is the equivalent of memalign(sysconf(_SC_PAGESIZE),size) and not memalign(CEPH_PAGE_SIZE,size). For this reason the alignment test is de-activated on DARWIN. The tests are grouped in TEST(BufferPtr, ... ) for buffer::ptr TEST(BufferListIterator, ...) for buffer::list::iterator TEST(BufferList, ...) for buffer::list TEST(BufferHash, ...) for buffer::hash and each method ( and all variations of the prototype ) are included into a single TEST() function. Although most aspects of the methods are tested, including exceptions and border cases, inconsistencies are not highlighted . For instance buffer::list::iterator i; i.advance(1); would dereference a buffer::raw NULL pointer although buffer::ptr p; p.wasted() asserts instead of dereferencing the buffer::raw NULL pointer. It would be better to always assert in case a NULL pointer is about to be used. But this is a minor inconsistency that is probably not worth a test. The following buffer::list methods ssize_t read_fd(int fd, size_t len); int write_fd(int fd) const; are not fully tested because the border cases cannot be reliably reproduced. Going thru a pointer indirection when calling the ::writev or safe_read functions would allow the test to create mockups to synthetize the conditions for border cases. tracker.ceph.com/issues/4066 refs #4066 Signed-off-by: Loic Dachary l...@dachary.org --- src/Makefile.am|5 +- src/test/bufferlist.cc | 1801 +--- src/unittest_bufferlist.sh | 19 + 3 files changed, 1731 insertions(+), 94 deletions(-) create mode 100755 src/unittest_bufferlist.sh diff --git a/src/Makefile.am b/src/Makefile.am index 556de51..1725588 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -19,7 +19,8 @@ EXTRA_DIST = \ libs3/libs3.spec \ libs3/mswin \ libs3/src \ - libs3/test + libs3/test \ + unittest_bufferlist.sh CLEANFILES = bin_PROGRAMS = @@ -38,7 +39,7 @@ check_PROGRAMS = # tests to actually run on make check; if you need extra, non-test, # executables built, you need to replace this with manual assignments # target by target -TESTS = $(check_PROGRAMS) +TESTS = $(check_PROGRAMS) unittest_bufferlist.sh check-local: $(srcdir)/test/encoding/check-generated.sh diff --git a/src/test/bufferlist.cc b/src/test/bufferlist.cc index 7abced1..6f8ba19 100644 --- a/src/test/bufferlist.cc +++ b/src/test/bufferlist.cc @@ -1,77 +1,1650 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab +/* + * Ceph - scalable distributed file system + * + * Copyright (C) 2013 Cloudwatt libre.licens...@cloudwatt.com + * + * Author: Loic Dachary l...@dachary.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Library Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Library