Bug#494221: Bug is actually in librrd4, backtrace included
Hi, (This is a follow-up for Debian bug #498183 - see [1] for details. Please keep [EMAIL PROTECTED] Cc'ed when replying.) [1] http://bugs.debian.org/498183 On Sun, Sep 07, 2008 at 09:51:16PM +0100, Jurij Smakov wrote: I don't have any problem reproducing it on sparc, so reopening. The segfault occurs in rrd_open() function in librrd4, as following gdb session illustrates (rebuilt rrd with debugging symbols to get it): [...] (gdb) list 363 rra_start += 364 rrd-rra_def[i].row_cnt * rrd-stat_head-ds_cnt * 365 sizeof(rrd_value_t); 366 } 367 #ifdef USE_MADVISE 368 madvise(rrd_file-file_start + dontneed_start, 369 rrd_file-file_len - dontneed_start, MADV_DONTNEED); 370 #endif 371 #ifdef HAVE_POSIX_FADVISE 372 posix_fadvise(rrd_file-fd, dontneed_start, [...] (gdb) print dontneed_start $16 = 8192 (gdb) print rrd_file-file_len $17 = 972 (gdb) print rrd_file-file_len - dontneed_start $18 = 4294960076 [...] (gdb) n Program received signal SIGSEGV, Segmentation fault. rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372 372 posix_fadvise(rrd_file-fd, dontneed_start, Disabling display 5 to avoid infinite recursion. 5: i = Cannot access memory at address 0xffe8 (See [2] for the full session dump.) Jurij, thanks a lot for the detailed information - that was very helpful. [2] http://bugs.debian.org/498183#25 I guess that the problem here is passing negative second argument to madvise() which makes it very unhappy and smashes the stack, but I did not grok the code yet to understand what's going on here. Yes, that seems to be the problem. Roughly, what's going on here is that we're stepping through all RRAs of the RRD file and mark cold blocks as unused (using madvise() und posix_fadvise()). dontneed_start is used as an offset into the file in that search. After we've stepped through all RRAs, the last call to madvise() (which will then trigger the segfault) marks the remainder of the file as unused as well. Now, we might have already passed the end of the file as dontneed_start is increased by multiples of the page size only. E.g. this happens if the page size is larger than the file size as in this case. For some reasons that I don't know, amd64 and i386 don't seem to care about that. I was not able to reproduce the problem but I could verify the same situation in the debugger. The attached patch should solve this issue. I've simply added a check if we're already passed the end of the file. Since I do not have access to a sparc box, I'd like to get some feedback if that really solves the issue. Also, I'd like Tobi (or anyone else involved in that specific code) to comment on that just to make sure that I did not miss some important fact. Thanks in advance! Cheers, Sebastian -- Sebastian tokkee Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/ Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin diff --git a/program/src/rrd_open.c b/program/src/rrd_open.c index 51b621f..af91c7b 100644 --- a/program/src/rrd_open.c +++ b/program/src/rrd_open.c @@ -363,14 +363,19 @@ void rrd_dontneed( rrd-rra_def[i].row_cnt * rrd-stat_head-ds_cnt * sizeof(rrd_value_t); } + +if (dontneed_start rrd_file-file_len) { #ifdef USE_MADVISE -madvise(rrd_file-file_start + dontneed_start, -rrd_file-file_len - dontneed_start, MADV_DONTNEED); + madvise(rrd_file-file_start + dontneed_start, + rrd_file-file_len - dontneed_start, MADV_DONTNEED); #endif #ifdef HAVE_POSIX_FADVISE -posix_fadvise(rrd_file-fd, dontneed_start, - rrd_file-file_len - dontneed_start, POSIX_FADV_DONTNEED); + posix_fadvise(rrd_file-fd, dontneed_start, + rrd_file-file_len - dontneed_start, + POSIX_FADV_DONTNEED); #endif +} + #if defined DEBUG DEBUG 1 mincore_print(rrd_file, after); #endif signature.asc Description: Digital signature
Bug#494221: Bug is actually in librrd4, backtrace included
reopen 494221 clone 494221 -1 reassign -1 librrd4 retitle -1 librrd4: segfault in rrd_open() on sparc block 494221 by -1 thanks I don't have any problem reproducing it on sparc, so reopening. The segfault occurs in rrd_open() function in librrd4, as following gdb session illustrates (rebuilt rrd with debugging symbols to get it): Starting program: /usr/bin/rrdtool create zero.rrd DS:mon_25:GAUGE:600:U:U RRA:AVERAGE:0:1:1 RRA:LAST:0:1:1 RRA:MAX:0:1:1 [Thread debugging using libthread_db enabled] [New Thread 0xf751e700 (LWP 29891)] [Switching to Thread 0xf751e700 (LWP 29891)] Breakpoint 1, rrd_dontneed (rrd_file=0x28680, rrd=0xfff0f2c8) at rrd_open.c:329 329 ssize_t _page_size = sysconf(_SC_PAGESIZE); (gdb) n 336 rra_start = rrd_file-header_len; (gdb) 337 dontneed_start = PAGE_START(rra_start) + _page_size; (gdb) 338 for (i = 0; i rrd-stat_head-rra_cnt; ++i) { (gdb) 339 active_block = (gdb) display i 5: i = 0 (gdb) n 343 if (active_block dontneed_start) { 5: i = 0 (gdb) 355 dontneed_start = active_block; 5: i = 0 (gdb) 358 if (rrd-stat_head-pdp_step * rrd-rra_def[i].pdp_cnt - 5: i = 0 (gdb) 361 dontneed_start += _page_size; 5: i = 0 (gdb) 363 rra_start += 5: i = 0 (gdb) 338 for (i = 0; i rrd-stat_head-rra_cnt; ++i) { 5: i = 0 (gdb) 339 active_block = 5: i = 1 (gdb) 343 if (active_block dontneed_start) { 5: i = 1 (gdb) 355 dontneed_start = active_block; 5: i = 1 (gdb) 358 if (rrd-stat_head-pdp_step * rrd-rra_def[i].pdp_cnt - 5: i = 1 (gdb) 361 dontneed_start += _page_size; 5: i = 1 (gdb) 363 rra_start += 5: i = 1 (gdb) 338 for (i = 0; i rrd-stat_head-rra_cnt; ++i) { 5: i = 1 (gdb) 339 active_block = 5: i = 2 (gdb) 343 if (active_block dontneed_start) { 5: i = 2 (gdb) 355 dontneed_start = active_block; 5: i = 2 (gdb) 358 if (rrd-stat_head-pdp_step * rrd-rra_def[i].pdp_cnt - 5: i = 2 (gdb) 361 dontneed_start += _page_size; 5: i = 2 (gdb) 363 rra_start += 5: i = 2 (gdb) 338 for (i = 0; i rrd-stat_head-rra_cnt; ++i) { 5: i = 2 (gdb) 368 madvise(rrd_file-file_start + dontneed_start, 5: i = 3 (gdb) list 363 rra_start += 364 rrd-rra_def[i].row_cnt * rrd-stat_head-ds_cnt * 365 sizeof(rrd_value_t); 366 } 367 #ifdef USE_MADVISE 368 madvise(rrd_file-file_start + dontneed_start, 369 rrd_file-file_len - dontneed_start, MADV_DONTNEED); 370 #endif 371 #ifdef HAVE_POSIX_FADVISE 372 posix_fadvise(rrd_file-fd, dontneed_start, (gdb) print rrd_file $14 = (rrd_file_t *) 0x28680 (gdb) print rrd_file-file_start $15 = 0xf7f48000 RRD (gdb) print dontneed_start $16 = 8192 (gdb) print rrd_file-file_len $17 = 972 (gdb) print rrd_file-file_len - dontneed_start $18 = 4294960076 (gdb) bt #0 rrd_dontneed (rrd_file=0x28680, rrd=0xfff0f2c8) at rrd_open.c:368 #1 0xf7ef6134 in rrd_create_fn (file_name=0xfff0f9ad zero.rrd, rrd=0xfff0f3c4) at rrd_create.c:827 #2 0xf7ef4fd0 in rrd_create_r (filename=0xfff0f9ad zero.rrd, pdp_step=300, last_up=1220819559, argc=4, argv=0xfff0f8a0) at rrd_create.c:555 #3 0xf7ef356c in rrd_create (argc=6, argv=0xfff0f898) at rrd_create.c:100 #4 0x000133ec in HandleInputLine (argc=7, argv=0xfff0f894, out=0xf7baaaf8) at rrd_tool.c:622 #5 0x00012b54 in main (argc=7, argv=0xfff0f894) at rrd_tool.c:494 (gdb) n Program received signal SIGSEGV, Segmentation fault. rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372 372 posix_fadvise(rrd_file-fd, dontneed_start, Disabling display 5 to avoid infinite recursion. 5: i = Cannot access memory at address 0xffe8 I guess that the problem here is passing negative second argument to madvise() which makes it very unhappy and smashes the stack, but I did not grok the code yet to understand what's going on here. Cheers. -- Jurij Smakov [EMAIL PROTECTED] Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]