[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-01-11 Thread Tomasz Kuzemko (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Kuzemko updated TS-1648:
---

Affects Version/s: 3.3.0
   3.2.0

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
>Reporter: Tomasz Kuzemko
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-01-11 Thread Tomasz Kuzemko (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Kuzemko updated TS-1648:
---

Environment: reverse proxy

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-01-14 Thread Tomasz Kuzemko (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Kuzemko updated TS-1648:
---

Description: 
I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 2x 
10TB raw disks. I do not use cache compression. After a few days of running 
(this is a dev machine - not handling any traffic) ATS begins to crash with a 
segfault shortly after start:

[Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage snap 
1357917060690487000

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x720ad700 (LWP 17292)]
0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at 
CacheDir.cc:382
382 CacheDir.cc: No such file or directory.
in CacheDir.cc
(gdb) p i
$1 = 214748365
(gdb) l
377 in CacheDir.cc
(gdb) p dir_index(vol, i)
$2 = (Dir *) 0x7ff997a04002
(gdb) p dir_index(vol, i-1)
$3 = (Dir *) 0x7ffa97a03ff8
(gdb) p *dir_index(vol, i-1)
$4 = {w = {0, 0, 0, 0, 0}}
(gdb) p *dir_index(vol, i-2)
$5 = {w = {0, 0, 52431, 52423, 0}}
(gdb) p *dir_index(vol, i)
Cannot access memory at address 0x7ff997a04002
(gdb) p *dir_index(vol, i+2)
Cannot access memory at address 0x7ff997a04016
(gdb) p *dir_index(vol, i+1)
Cannot access memory at address 0x7ff997a0400c
(gdb) p vol->buckets * DIR_DEPTH * vol->segments
$6 = 1246953472
(gdb) bt
#0  0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
at CacheDir.cc:382
#1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
event=3900, data=0x16058a0) at Cache.cc:1384
#2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
#3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
#4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
#5  0x00700fec in EThread::process_event (this=0x736c4010, 
e=0x135afc0, calling_code=1) at UnixEThread.cc:142
#6  0x007011ff in EThread::execute (this=0x736c4010) at 
UnixEThread.cc:191
#7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
#8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
#9  0x755c6b6d in clone () from /lib/libc.so.6
#10 0x in ?? ()


This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
few days the issue reappears.

I will keep the current faulty setup as-is in case you need me to provide more 
data. I tried to make a core dump but it took a couple of GB even after gzip (I 
can however provide it on request).


*Edit*
OS is Debian GNU/Linux 6.0.6 with custom built kernel 
3.2.13-grsec--grs-ipv6-64

  was:
I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 2x 
10TB raw disks. I do not use cache compression. After a few days of running 
(this is a dev machine - not handling any traffic) ATS begins to crash with a 
segfault shortly after start:

[Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage snap 
1357917060690487000

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x720ad700 (LWP 17292)]
0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at 
CacheDir.cc:382
382 CacheDir.cc: No such file or directory.
in CacheDir.cc
(gdb) p i
$1 = 214748365
(gdb) l
377 in CacheDir.cc
(gdb) p dir_index(vol, i)
$2 = (Dir *) 0x7ff997a04002
(gdb) p dir_index(vol, i-1)
$3 = (Dir *) 0x7ffa97a03ff8
(gdb) p *dir_index(vol, i-1)
$4 = {w = {0, 0, 0, 0, 0}}
(gdb) p *dir_index(vol, i-2)
$5 = {w = {0, 0, 52431, 52423, 0}}
(gdb) p *dir_index(vol, i)
Cannot access memory at address 0x7ff997a04002
(gdb) p *dir_index(vol, i+2)
Cannot access memory at address 0x7ff997a04016
(gdb) p *dir_index(vol, i+1)
Cannot access memory at address 0x7ff997a0400c
(gdb) p vol->buckets * DIR_DEPTH * vol->segments
$6 = 1246953472
(gdb) bt
#0  0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
at CacheDir.cc:382
#1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
event=3900, data=0x16058a0) at Cache.cc:1384
#2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
#3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
#4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
#5  0x00700fec in EThread::process_event (this=0x736c4010, 
e=0x135afc0, calling_code=1) at UnixEThread.cc:142
#6  0x007011ff in EThread::execute (this=0x736c4010) at 
UnixEThread.cc:191
#7 

[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-01-14 Thread Tomasz Kuzemko (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Kuzemko updated TS-1648:
---

Attachment: 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch

Changing index variable type from 'int' to 'long' seems to fix this.

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: weijin
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-03-15 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-1648:
--

Fix Version/s: 3.3.2

weijin: What should we do with this bug? I'm marking it for v3.3.2 for now.

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: weijin
> Fix For: 3.3.2
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-03 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-1648:
--

Labels: A  (was: )

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: weijin
>  Labels: A
> Fix For: 3.3.3
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1648:
-

Attachment: cachedir_int64-jp-1.patch

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: John Plevyak
>  Labels: A
> Fix For: 3.3.3
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, 
> cachedir_int64-jp-1.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated TS-1648:


Description: 
I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 2x 
10TB raw disks. I do not use cache compression. After a few days of running 
(this is a dev machine - not handling any traffic) ATS begins to crash with a 
segfault shortly after start:

{code}
[Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage snap 
1357917060690487000

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x720ad700 (LWP 17292)]
0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at 
CacheDir.cc:382
382 CacheDir.cc: No such file or directory.
in CacheDir.cc
(gdb) p i
$1 = 214748365
(gdb) l
377 in CacheDir.cc
(gdb) p dir_index(vol, i)
$2 = (Dir *) 0x7ff997a04002
(gdb) p dir_index(vol, i-1)
$3 = (Dir *) 0x7ffa97a03ff8
(gdb) p *dir_index(vol, i-1)
$4 = {w = {0, 0, 0, 0, 0}}
(gdb) p *dir_index(vol, i-2)
$5 = {w = {0, 0, 52431, 52423, 0}}
(gdb) p *dir_index(vol, i)
Cannot access memory at address 0x7ff997a04002
(gdb) p *dir_index(vol, i+2)
Cannot access memory at address 0x7ff997a04016
(gdb) p *dir_index(vol, i+1)
Cannot access memory at address 0x7ff997a0400c
(gdb) p vol->buckets * DIR_DEPTH * vol->segments
$6 = 1246953472
(gdb) bt
#0  0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
at CacheDir.cc:382
#1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
event=3900, data=0x16058a0) at Cache.cc:1384
#2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
#3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
#4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
#5  0x00700fec in EThread::process_event (this=0x736c4010, 
e=0x135afc0, calling_code=1) at UnixEThread.cc:142
#6  0x007011ff in EThread::execute (this=0x736c4010) at 
UnixEThread.cc:191
#7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
#8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
#9  0x755c6b6d in clone () from /lib/libc.so.6
#10 0x in ?? ()
{code}

This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
few days the issue reappears.

I will keep the current faulty setup as-is in case you need me to provide more 
data. I tried to make a core dump but it took a couple of GB even after gzip (I 
can however provide it on request).


*Edit*
OS is Debian GNU/Linux 6.0.6 with custom built kernel 
3.2.13-grsec--grs-ipv6-64

  was:
I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 2x 
10TB raw disks. I do not use cache compression. After a few days of running 
(this is a dev machine - not handling any traffic) ATS begins to crash with a 
segfault shortly after start:

[Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage snap 
1357917060690487000

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x720ad700 (LWP 17292)]
0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at 
CacheDir.cc:382
382 CacheDir.cc: No such file or directory.
in CacheDir.cc
(gdb) p i
$1 = 214748365
(gdb) l
377 in CacheDir.cc
(gdb) p dir_index(vol, i)
$2 = (Dir *) 0x7ff997a04002
(gdb) p dir_index(vol, i-1)
$3 = (Dir *) 0x7ffa97a03ff8
(gdb) p *dir_index(vol, i-1)
$4 = {w = {0, 0, 0, 0, 0}}
(gdb) p *dir_index(vol, i-2)
$5 = {w = {0, 0, 52431, 52423, 0}}
(gdb) p *dir_index(vol, i)
Cannot access memory at address 0x7ff997a04002
(gdb) p *dir_index(vol, i+2)
Cannot access memory at address 0x7ff997a04016
(gdb) p *dir_index(vol, i+1)
Cannot access memory at address 0x7ff997a0400c
(gdb) p vol->buckets * DIR_DEPTH * vol->segments
$6 = 1246953472
(gdb) bt
#0  0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
at CacheDir.cc:382
#1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
event=3900, data=0x16058a0) at Cache.cc:1384
#2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
#3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
#4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
#5  0x00700fec in EThread::process_event (this=0x736c4010, 
e=0x135afc0, calling_code=1) at UnixEThread.cc:142
#6  0x007011ff in EThread::execute (this=0x736c4010) at 
UnixEThread.cc:

[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2014-01-16 Thread Brian Geffon (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Geffon updated TS-1648:
-

Fix Version/s: 4.1.2

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: John Plevyak
>  Labels: A
> Fix For: 3.3.4, 4.1.2
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, 
> cachedir_int64-jp-1.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> {code}
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> {code}
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (TS-1648) Segmentation fault in dir_clear_range()

2014-01-16 Thread Brian Geffon (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Geffon updated TS-1648:
-

Fix Version/s: (was: 4.1.2)

> Segmentation fault in dir_clear_range()
> ---
>
> Key: TS-1648
> URL: https://issues.apache.org/jira/browse/TS-1648
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.3.0, 3.2.0
> Environment: reverse proxy
>Reporter: Tomasz Kuzemko
>Assignee: John Plevyak
>  Labels: A
> Fix For: 3.3.4
>
> Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, 
> cachedir_int64-jp-1.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> {code}
> [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x720ad700 (LWP 17292)]
> 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>   in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x00700fec in EThread::process_event (this=0x736c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x007011ff in EThread::execute (this=0x736c4010) at 
> UnixEThread.cc:191
> #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x755c6b6d in clone () from /lib/libc.so.6
> #10 0x in ?? ()
> {code}
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec--grs-ipv6-64



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)