Bug#494221: Bug is actually in librrd4, backtrace included
Hi, (This is a follow-up for Debian bug #498183 - see [1] for details. Please keep [EMAIL PROTECTED] Cc'ed when replying.) [1] http://bugs.debian.org/498183 On Sun, Sep 07, 2008 at 09:51:16PM +0100, Jurij Smakov wrote: > I don't have any problem reproducing it on sparc, so reopening. The > segfault occurs in rrd_open() function in librrd4, as following gdb > session illustrates (rebuilt rrd with debugging symbols to get it): [...] > (gdb) list > 363 rra_start += > 364 rrd->rra_def[i].row_cnt * rrd->stat_head->ds_cnt * > 365 sizeof(rrd_value_t); > 366 } > 367 #ifdef USE_MADVISE > 368 madvise(rrd_file->file_start + dontneed_start, > 369 rrd_file->file_len - dontneed_start, MADV_DONTNEED); > 370 #endif > 371 #ifdef HAVE_POSIX_FADVISE > 372 posix_fadvise(rrd_file->fd, dontneed_start, [...] > (gdb) print dontneed_start > $16 = 8192 > (gdb) print rrd_file->file_len > $17 = 972 > (gdb) print rrd_file->file_len - dontneed_start > $18 = 4294960076 [...] > (gdb) n > > Program received signal SIGSEGV, Segmentation fault. > rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372 > 372 posix_fadvise(rrd_file->fd, dontneed_start, > Disabling display 5 to avoid infinite recursion. > 5: i = Cannot access memory at address 0xffe8 (See [2] for the full session dump.) Jurij, thanks a lot for the detailed information - that was very helpful. [2] http://bugs.debian.org/498183#25 > I guess that the problem here is passing negative second argument to > madvise() which makes it very unhappy and smashes the stack, but I did > not grok the code yet to understand what's going on here. Yes, that seems to be the problem. Roughly, what's going on here is that we're stepping through all RRAs of the RRD file and mark "cold" blocks as unused (using madvise() und posix_fadvise()). dontneed_start is used as an offset into the file in that search. After we've stepped through all RRAs, the last call to madvise() (which will then trigger the segfault) marks the remainder of the file as unused as well. Now, we might have already passed the end of the file as dontneed_start is increased by multiples of the page size only. E.g. this happens if the page size is larger than the file size as in this case. For some reasons that I don't know, amd64 and i386 don't seem to care about that. I was not able to reproduce the problem but I could verify the same situation in the debugger. The attached patch should solve this issue. I've simply added a check if we're already passed the end of the file. Since I do not have access to a sparc box, I'd like to get some feedback if that really solves the issue. Also, I'd like Tobi (or anyone else involved in that specific code) to comment on that just to make sure that I did not miss some important fact. Thanks in advance! Cheers, Sebastian -- Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/ Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin diff --git a/program/src/rrd_open.c b/program/src/rrd_open.c index 51b621f..af91c7b 100644 --- a/program/src/rrd_open.c +++ b/program/src/rrd_open.c @@ -363,14 +363,19 @@ void rrd_dontneed( rrd->rra_def[i].row_cnt * rrd->stat_head->ds_cnt * sizeof(rrd_value_t); } + +if (dontneed_start < rrd_file->file_len) { #ifdef USE_MADVISE -madvise(rrd_file->file_start + dontneed_start, -rrd_file->file_len - dontneed_start, MADV_DONTNEED); + madvise(rrd_file->file_start + dontneed_start, + rrd_file->file_len - dontneed_start, MADV_DONTNEED); #endif #ifdef HAVE_POSIX_FADVISE -posix_fadvise(rrd_file->fd, dontneed_start, - rrd_file->file_len - dontneed_start, POSIX_FADV_DONTNEED); + posix_fadvise(rrd_file->fd, dontneed_start, + rrd_file->file_len - dontneed_start, + POSIX_FADV_DONTNEED); #endif +} + #if defined DEBUG && DEBUG > 1 mincore_print(rrd_file, "after"); #endif signature.asc Description: Digital signature
Bug#494221: Bug is actually in librrd4, backtrace included
reopen 494221 clone 494221 -1 reassign -1 librrd4 retitle -1 librrd4: segfault in rrd_open() on sparc block 494221 by -1 thanks I don't have any problem reproducing it on sparc, so reopening. The segfault occurs in rrd_open() function in librrd4, as following gdb session illustrates (rebuilt rrd with debugging symbols to get it): Starting program: /usr/bin/rrdtool create zero.rrd DS:mon_25:GAUGE:600:U:U RRA:AVERAGE:0:1:1 RRA:LAST:0:1:1 RRA:MAX:0:1:1 [Thread debugging using libthread_db enabled] [New Thread 0xf751e700 (LWP 29891)] [Switching to Thread 0xf751e700 (LWP 29891)] Breakpoint 1, rrd_dontneed (rrd_file=0x28680, rrd=0xfff0f2c8) at rrd_open.c:329 329 ssize_t _page_size = sysconf(_SC_PAGESIZE); (gdb) n 336 rra_start = rrd_file->header_len; (gdb) 337 dontneed_start = PAGE_START(rra_start) + _page_size; (gdb) 338 for (i = 0; i < rrd->stat_head->rra_cnt; ++i) { (gdb) 339 active_block = (gdb) display i 5: i = 0 (gdb) n 343 if (active_block > dontneed_start) { 5: i = 0 (gdb) 355 dontneed_start = active_block; 5: i = 0 (gdb) 358 if (rrd->stat_head->pdp_step * rrd->rra_def[i].pdp_cnt - 5: i = 0 (gdb) 361 dontneed_start += _page_size; 5: i = 0 (gdb) 363 rra_start += 5: i = 0 (gdb) 338 for (i = 0; i < rrd->stat_head->rra_cnt; ++i) { 5: i = 0 (gdb) 339 active_block = 5: i = 1 (gdb) 343 if (active_block > dontneed_start) { 5: i = 1 (gdb) 355 dontneed_start = active_block; 5: i = 1 (gdb) 358 if (rrd->stat_head->pdp_step * rrd->rra_def[i].pdp_cnt - 5: i = 1 (gdb) 361 dontneed_start += _page_size; 5: i = 1 (gdb) 363 rra_start += 5: i = 1 (gdb) 338 for (i = 0; i < rrd->stat_head->rra_cnt; ++i) { 5: i = 1 (gdb) 339 active_block = 5: i = 2 (gdb) 343 if (active_block > dontneed_start) { 5: i = 2 (gdb) 355 dontneed_start = active_block; 5: i = 2 (gdb) 358 if (rrd->stat_head->pdp_step * rrd->rra_def[i].pdp_cnt - 5: i = 2 (gdb) 361 dontneed_start += _page_size; 5: i = 2 (gdb) 363 rra_start += 5: i = 2 (gdb) 338 for (i = 0; i < rrd->stat_head->rra_cnt; ++i) { 5: i = 2 (gdb) 368 madvise(rrd_file->file_start + dontneed_start, 5: i = 3 (gdb) list 363 rra_start += 364 rrd->rra_def[i].row_cnt * rrd->stat_head->ds_cnt * 365 sizeof(rrd_value_t); 366 } 367 #ifdef USE_MADVISE 368 madvise(rrd_file->file_start + dontneed_start, 369 rrd_file->file_len - dontneed_start, MADV_DONTNEED); 370 #endif 371 #ifdef HAVE_POSIX_FADVISE 372 posix_fadvise(rrd_file->fd, dontneed_start, (gdb) print rrd_file $14 = (rrd_file_t *) 0x28680 (gdb) print rrd_file->file_start $15 = 0xf7f48000 "RRD" (gdb) print dontneed_start $16 = 8192 (gdb) print rrd_file->file_len $17 = 972 (gdb) print rrd_file->file_len - dontneed_start $18 = 4294960076 (gdb) bt #0 rrd_dontneed (rrd_file=0x28680, rrd=0xfff0f2c8) at rrd_open.c:368 #1 0xf7ef6134 in rrd_create_fn (file_name=0xfff0f9ad "zero.rrd", rrd=0xfff0f3c4) at rrd_create.c:827 #2 0xf7ef4fd0 in rrd_create_r (filename=0xfff0f9ad "zero.rrd", pdp_step=300, last_up=1220819559, argc=4, argv=0xfff0f8a0) at rrd_create.c:555 #3 0xf7ef356c in rrd_create (argc=6, argv=0xfff0f898) at rrd_create.c:100 #4 0x000133ec in HandleInputLine (argc=7, argv=0xfff0f894, out=0xf7baaaf8) at rrd_tool.c:622 #5 0x00012b54 in main (argc=7, argv=0xfff0f894) at rrd_tool.c:494 (gdb) n Program received signal SIGSEGV, Segmentation fault. rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372 372 posix_fadvise(rrd_file->fd, dontneed_start, Disabling display 5 to avoid infinite recursion. 5: i = Cannot access memory at address 0xffe8 I guess that the problem here is passing negative second argument to madvise() which makes it very unhappy and smashes the stack, but I did not grok the code yet to understand what's going on here. Cheers. -- Jurij Smakov [EMAIL PROTECTED] Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]