[ https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874998#comment-13874998 ]
James Peach commented on TS-1648: --------------------------------- [~briang] what is needed to test this fix? > Segmentation fault in dir_clear_range() > --------------------------------------- > > Key: TS-1648 > URL: https://issues.apache.org/jira/browse/TS-1648 > Project: Traffic Server > Issue Type: Bug > Components: Cache > Affects Versions: 3.3.0, 3.2.0 > Environment: reverse proxy > Reporter: Tomasz Kuzemko > Assignee: John Plevyak > Labels: A > Fix For: 3.3.4 > > Attachments: > 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, > cachedir_int64-jp-1.patch > > > I use ATS as a reverse proxy. I have a fairly large disk cache consisting of > 2x 10TB raw disks. I do not use cache compression. After a few days of > running (this is a dev machine - not handling any traffic) ATS begins to > crash with a segfault shortly after start: > {code} > [Jan 11 16:11:00.690] Server {0x7ffff2bb8700} DEBUG: (rusage) took rusage > snap 1357917060690487000 > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff20ad700 (LWP 17292)] > 0x0000000000696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) > at CacheDir.cc:382 > 382 CacheDir.cc: No such file or directory. > in CacheDir.cc > (gdb) p i > $1 = 214748365 > (gdb) l > 377 in CacheDir.cc > (gdb) p dir_index(vol, i) > $2 = (Dir *) 0x7ff997a04002 > (gdb) p dir_index(vol, i-1) > $3 = (Dir *) 0x7ffa97a03ff8 > (gdb) p *dir_index(vol, i-1) > $4 = {w = {0, 0, 0, 0, 0}} > (gdb) p *dir_index(vol, i-2) > $5 = {w = {0, 0, 52431, 52423, 0}} > (gdb) p *dir_index(vol, i) > Cannot access memory at address 0x7ff997a04002 > (gdb) p *dir_index(vol, i+2) > Cannot access memory at address 0x7ff997a04016 > (gdb) p *dir_index(vol, i+1) > Cannot access memory at address 0x7ff997a0400c > (gdb) p vol->buckets * DIR_DEPTH * vol->segments > $6 = 1246953472 > (gdb) bt > #0 0x0000000000696a71 in dir_clear_range (start=640, end=17024, > vol=0x16057d0) at CacheDir.cc:382 > #1 0x000000000068aba2 in Vol::handle_recover_from_data (this=0x16057d0, > event=3900, data=0x16058a0) at Cache.cc:1384 > #2 0x00000000004e8e1c in Continuation::handleEvent (this=0x16057d0, > event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146 > #3 0x0000000000692385 in AIOCallbackInternal::io_complete (this=0x16058a0, > event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80 > #4 0x00000000004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, > data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146 > #5 0x0000000000700fec in EThread::process_event (this=0x7ffff36c4010, > e=0x135afc0, calling_code=1) at UnixEThread.cc:142 > #6 0x00000000007011ff in EThread::execute (this=0x7ffff36c4010) at > UnixEThread.cc:191 > #7 0x00000000006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88 > #8 0x00007ffff797e8ca in start_thread () from /lib/libpthread.so.0 > #9 0x00007ffff55c6b6d in clone () from /lib/libc.so.6 > #10 0x0000000000000000 in ?? () > {code} > This is fixed by running "traffic_server -Kk" to clear the cache. But after a > few days the issue reappears. > I will keep the current faulty setup as-is in case you need me to provide > more data. I tried to make a core dump but it took a couple of GB even after > gzip (I can however provide it on request). > *Edit* > OS is Debian GNU/Linux 6.0.6 with custom built kernel > 3.2.13-grsec-xxxx-grs-ipv6-64 -- This message was sent by Atlassian JIRA (v6.1.5#6160)