Bug#494221: Bug is actually in librrd4, backtrace included

2008-09-18 Thread Sebastian Harl
Hi,

(This is a follow-up for Debian bug #498183 - see [1] for details.
Please keep [EMAIL PROTECTED] Cc'ed when replying.)

[1] http://bugs.debian.org/498183

On Sun, Sep 07, 2008 at 09:51:16PM +0100, Jurij Smakov wrote:
 I don't have any problem reproducing it on sparc, so reopening. The 
 segfault occurs in rrd_open() function in librrd4, as following gdb 
 session illustrates (rebuilt rrd with debugging symbols to get it):
[...]
 (gdb) list
 363 rra_start +=
 364 rrd-rra_def[i].row_cnt * rrd-stat_head-ds_cnt *
 365 sizeof(rrd_value_t);
 366 }
 367 #ifdef USE_MADVISE
 368 madvise(rrd_file-file_start + dontneed_start,
 369 rrd_file-file_len - dontneed_start, MADV_DONTNEED);
 370 #endif
 371 #ifdef HAVE_POSIX_FADVISE
 372 posix_fadvise(rrd_file-fd, dontneed_start,
[...]
 (gdb) print dontneed_start
 $16 = 8192
 (gdb) print rrd_file-file_len
 $17 = 972
 (gdb) print rrd_file-file_len - dontneed_start
 $18 = 4294960076
[...]
 (gdb) n
 
 Program received signal SIGSEGV, Segmentation fault.
 rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372
 372 posix_fadvise(rrd_file-fd, dontneed_start,
 Disabling display 5 to avoid infinite recursion.
 5: i = Cannot access memory at address 0xffe8

(See [2] for the full session dump.)

Jurij, thanks a lot for the detailed information - that was very
helpful.

[2] http://bugs.debian.org/498183#25

 I guess that the problem here is passing negative second argument to 
 madvise() which makes it very unhappy and smashes the stack, but I did 
 not grok the code yet to understand what's going on here.

Yes, that seems to be the problem. Roughly, what's going on here is
that we're stepping through all RRAs of the RRD file and mark cold
blocks as unused (using madvise() und posix_fadvise()). dontneed_start
is used as an offset into the file in that search.

After we've stepped through all RRAs, the last call to madvise() (which
will then trigger the segfault) marks the remainder of the file as
unused as well. Now, we might have already passed the end of the file as
dontneed_start is increased by multiples of the page size only. E.g.
this happens if the page size is larger than the file size as in this
case.

For some reasons that I don't know, amd64 and i386 don't seem to care
about that. I was not able to reproduce the problem but I could verify
the same situation in the debugger.

The attached patch should solve this issue. I've simply added a check if
we're already passed the end of the file. Since I do not have access to
a sparc box, I'd like to get some feedback if that really solves the
issue. Also, I'd like Tobi (or anyone else involved in that specific
code) to comment on that just to make sure that I did not miss some
important fact. Thanks in advance!

Cheers,
Sebastian

-- 
Sebastian tokkee Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/

Those who would give up Essential Liberty to purchase a little Temporary
Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin

diff --git a/program/src/rrd_open.c b/program/src/rrd_open.c
index 51b621f..af91c7b 100644
--- a/program/src/rrd_open.c
+++ b/program/src/rrd_open.c
@@ -363,14 +363,19 @@ void rrd_dontneed(
 rrd-rra_def[i].row_cnt * rrd-stat_head-ds_cnt *
 sizeof(rrd_value_t);
 }
+
+if (dontneed_start  rrd_file-file_len) {
 #ifdef USE_MADVISE
-madvise(rrd_file-file_start + dontneed_start,
-rrd_file-file_len - dontneed_start, MADV_DONTNEED);
+	madvise(rrd_file-file_start + dontneed_start,
+		rrd_file-file_len - dontneed_start, MADV_DONTNEED);
 #endif
 #ifdef HAVE_POSIX_FADVISE
-posix_fadvise(rrd_file-fd, dontneed_start,
-  rrd_file-file_len - dontneed_start, POSIX_FADV_DONTNEED);
+	posix_fadvise(rrd_file-fd, dontneed_start,
+			  rrd_file-file_len - dontneed_start,
+			  POSIX_FADV_DONTNEED);
 #endif
+}
+
 #if defined DEBUG  DEBUG  1
 mincore_print(rrd_file, after);
 #endif


signature.asc
Description: Digital signature


Bug#494221: Bug is actually in librrd4, backtrace included

2008-09-07 Thread Jurij Smakov
reopen 494221
clone 494221 -1
reassign -1 librrd4
retitle -1 librrd4: segfault in rrd_open() on sparc
block 494221 by -1
thanks

I don't have any problem reproducing it on sparc, so reopening. The 
segfault occurs in rrd_open() function in librrd4, as following gdb 
session illustrates (rebuilt rrd with debugging symbols to get it):

Starting program: /usr/bin/rrdtool create zero.rrd DS:mon_25:GAUGE:600:U:U 
RRA:AVERAGE:0:1:1 RRA:LAST:0:1:1 RRA:MAX:0:1:1
[Thread debugging using libthread_db enabled]
[New Thread 0xf751e700 (LWP 29891)]
[Switching to Thread 0xf751e700 (LWP 29891)]

Breakpoint 1, rrd_dontneed (rrd_file=0x28680, rrd=0xfff0f2c8) at rrd_open.c:329
329 ssize_t   _page_size = sysconf(_SC_PAGESIZE);
(gdb) n
336 rra_start = rrd_file-header_len;
(gdb) 
337 dontneed_start = PAGE_START(rra_start) + _page_size;
(gdb) 
338 for (i = 0; i  rrd-stat_head-rra_cnt; ++i) {
(gdb) 
339 active_block =
(gdb) display i
5: i = 0
(gdb) n
343 if (active_block  dontneed_start) {
5: i = 0
(gdb) 
355 dontneed_start = active_block;
5: i = 0
(gdb) 
358 if (rrd-stat_head-pdp_step * rrd-rra_def[i].pdp_cnt -
5: i = 0
(gdb) 
361 dontneed_start += _page_size;
5: i = 0
(gdb) 
363 rra_start +=
5: i = 0
(gdb) 
338 for (i = 0; i  rrd-stat_head-rra_cnt; ++i) {
5: i = 0
(gdb) 
339 active_block =
5: i = 1
(gdb) 
343 if (active_block  dontneed_start) {
5: i = 1
(gdb) 
355 dontneed_start = active_block;
5: i = 1
(gdb) 
358 if (rrd-stat_head-pdp_step * rrd-rra_def[i].pdp_cnt -
5: i = 1
(gdb) 
361 dontneed_start += _page_size;
5: i = 1
(gdb) 
363 rra_start +=
5: i = 1
(gdb) 
338 for (i = 0; i  rrd-stat_head-rra_cnt; ++i) {
5: i = 1
(gdb) 
339 active_block =
5: i = 2
(gdb) 
343 if (active_block  dontneed_start) {
5: i = 2
(gdb) 
355 dontneed_start = active_block;
5: i = 2
(gdb) 
358 if (rrd-stat_head-pdp_step * rrd-rra_def[i].pdp_cnt -
5: i = 2
(gdb) 
361 dontneed_start += _page_size;
5: i = 2
(gdb) 
363 rra_start +=
5: i = 2
(gdb) 
338 for (i = 0; i  rrd-stat_head-rra_cnt; ++i) {
5: i = 2
(gdb) 
368 madvise(rrd_file-file_start + dontneed_start,
5: i = 3
(gdb) list
363 rra_start +=
364 rrd-rra_def[i].row_cnt * rrd-stat_head-ds_cnt *
365 sizeof(rrd_value_t);
366 }
367 #ifdef USE_MADVISE
368 madvise(rrd_file-file_start + dontneed_start,
369 rrd_file-file_len - dontneed_start, MADV_DONTNEED);
370 #endif
371 #ifdef HAVE_POSIX_FADVISE
372 posix_fadvise(rrd_file-fd, dontneed_start,
(gdb) print rrd_file
$14 = (rrd_file_t *) 0x28680
(gdb) print rrd_file-file_start
$15 = 0xf7f48000 RRD
(gdb) print dontneed_start
$16 = 8192
(gdb) print rrd_file-file_len
$17 = 972
(gdb) print rrd_file-file_len - dontneed_start
$18 = 4294960076
(gdb) bt
#0  rrd_dontneed (rrd_file=0x28680, rrd=0xfff0f2c8) at rrd_open.c:368
#1  0xf7ef6134 in rrd_create_fn (file_name=0xfff0f9ad zero.rrd, 
rrd=0xfff0f3c4) at rrd_create.c:827
#2  0xf7ef4fd0 in rrd_create_r (filename=0xfff0f9ad zero.rrd, pdp_step=300, 
last_up=1220819559, argc=4, argv=0xfff0f8a0) at rrd_create.c:555
#3  0xf7ef356c in rrd_create (argc=6, argv=0xfff0f898) at rrd_create.c:100
#4  0x000133ec in HandleInputLine (argc=7, argv=0xfff0f894, out=0xf7baaaf8) at 
rrd_tool.c:622
#5  0x00012b54 in main (argc=7, argv=0xfff0f894) at rrd_tool.c:494
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
rrd_dontneed (rrd_file=Cannot access memory at address 0x44) at rrd_open.c:372
372 posix_fadvise(rrd_file-fd, dontneed_start,
Disabling display 5 to avoid infinite recursion.
5: i = Cannot access memory at address 0xffe8

I guess that the problem here is passing negative second argument to 
madvise() which makes it very unhappy and smashes the stack, but I did 
not grok the code yet to understand what's going on here.

Cheers.
-- 
Jurij Smakov   [EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]