[jira] [Assigned] (TS-1039) PATCH: use pcre-config to find libpcre

2011-12-09 Thread Assigned

 [ 
https://issues.apache.org/jira/browse/TS-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Galić reassigned TS-1039:
--

Assignee: Igor Galić

 PATCH: use pcre-config to find libpcre
 --

 Key: TS-1039
 URL: https://issues.apache.org/jira/browse/TS-1039
 Project: Traffic Server
  Issue Type: Improvement
  Components: Build
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Attachments: 0001-Use-pcre-config-to-find-libpcre.patch


 This patch uses pcre-config to determine the compilation options needed to 
 use libpcre. This is an improvement over the exiting configure arguments 
 since it will work without user intervention in more circumstances. The 
 existing configuration option still works as expected for compatibility 
 reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1039) PATCH: use pcre-config to find libpcre

2011-12-09 Thread Commented

[ 
https://issues.apache.org/jira/browse/TS-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166036#comment-13166036
 ] 

Igor Galić commented on TS-1039:


Thank you very much James.

I now *really* think I should write a short wiki/doc on how to submit patches. 
Or just ask Linus to make git's default {{diff}} output *usable*, you know.. 
for {{patch}}.

By putting this in your {{~/.gitconfig}}, they can actually become usable:
{noformat}
[diff]
noprefix = true
{noformat}

 PATCH: use pcre-config to find libpcre
 --

 Key: TS-1039
 URL: https://issues.apache.org/jira/browse/TS-1039
 Project: Traffic Server
  Issue Type: Improvement
  Components: Build
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Attachments: 0001-Use-pcre-config-to-find-libpcre.patch


 This patch uses pcre-config to determine the compilation options needed to 
 use libpcre. This is an improvement over the exiting configure arguments 
 since it will work without user intervention in more circumstances. The 
 existing configuration option still works as expected for compatibility 
 reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (TS-1042) PATCH: correct debug message in FetchSM

2011-12-09 Thread Assigned

 [ 
https://issues.apache.org/jira/browse/TS-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Galić reassigned TS-1042:
--

Assignee: Igor Galić

 PATCH: correct debug message in FetchSM
 ---

 Key: TS-1042
 URL: https://issues.apache.org/jira/browse/TS-1042
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Attachments: 0004-Fix-FetchSM-debugging-message.patch


 In the FetchSM module, there is a debug message that can walk off the end of 
 the buffer. This patch corrects that by limiting the printed string to the 
 known length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1039) PATCH: use pcre-config to find libpcre

2011-12-09 Thread Commented

[ 
https://issues.apache.org/jira/browse/TS-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166114#comment-13166114
 ] 

Igor Galić commented on TS-1039:


Tested both scenarios, worked out fine -- thanks. Patch applied!

 PATCH: use pcre-config to find libpcre
 --

 Key: TS-1039
 URL: https://issues.apache.org/jira/browse/TS-1039
 Project: Traffic Server
  Issue Type: Improvement
  Components: Build
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Fix For: 3.1.2

 Attachments: 0001-Use-pcre-config-to-find-libpcre.patch


 This patch uses pcre-config to determine the compilation options needed to 
 use libpcre. This is an improvement over the exiting configure arguments 
 since it will work without user intervention in more circumstances. The 
 existing configuration option still works as expected for compatibility 
 reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1042) PATCH: correct debug message in FetchSM

2011-12-09 Thread Commented

[ 
https://issues.apache.org/jira/browse/TS-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166115#comment-13166115
 ] 

Igor Galić commented on TS-1042:


crawling through {{printf(3)}}, giving up, asking {{##C}} - I now know what 
{{printf(%*.*s, length, length, string);}} does!

Thank you again for the patch, I aplied it in r1212343

 PATCH: correct debug message in FetchSM
 ---

 Key: TS-1042
 URL: https://issues.apache.org/jira/browse/TS-1042
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Fix For: 3.1.2

 Attachments: 0004-Fix-FetchSM-debugging-message.patch


 In the FetchSM module, there is a debug message that can walk off the end of 
 the buffer. This patch corrects that by limiting the printed string to the 
 known length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1042) PATCH: correct debug message in FetchSM

2011-12-09 Thread Leif Hedstrom (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166222#comment-13166222
 ] 

Leif Hedstrom commented on TS-1042:
---

Hmmm, why %*.*s, length, length) ? Everywhere else we just do %.*s, length, 
string)

 PATCH: correct debug message in FetchSM
 ---

 Key: TS-1042
 URL: https://issues.apache.org/jira/browse/TS-1042
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Fix For: 3.1.2

 Attachments: 0004-Fix-FetchSM-debugging-message.patch


 In the FetchSM module, there is a debug message that can walk off the end of 
 the buffer. This patch corrects that by limiting the printed string to the 
 known length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1042) PATCH: correct debug message in FetchSM

2011-12-09 Thread James Peach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166330#comment-13166330
 ] 

James Peach commented on TS-1042:
-

To be honest, it's mostly from habit. However %*.*s prints exactly the number 
of bytes whereas %.*s prints up to the number of bytes. In this particular 
case we know the number of bytes and we want to print all of them so %*.*s 
seems like the right choice. But %.*s will fix the bug just as well.

 PATCH: correct debug message in FetchSM
 ---

 Key: TS-1042
 URL: https://issues.apache.org/jira/browse/TS-1042
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: James Peach
Assignee: Igor Galić
Priority: Minor
 Fix For: 3.1.2

 Attachments: 0004-Fix-FetchSM-debugging-message.patch


 In the FetchSM module, there is a debug message that can walk off the end of 
 the buffer. This patch corrects that by limiting the printed string to the 
 known length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1041) PATCH: guarantee to populate sockaddr length for TSHostLookupResultAddrGet

2011-12-09 Thread James Peach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166332#comment-13166332
 ] 

James Peach commented on TS-1041:
-

I was referring to the sockaddr you get from TSHostLookupResultAddrGet(), where 
you look at sa_len to figure out how many bytes to copy.

 PATCH: guarantee to populate sockaddr length for TSHostLookupResultAddrGet
 --

 Key: TS-1041
 URL: https://issues.apache.org/jira/browse/TS-1041
 Project: Traffic Server
  Issue Type: Improvement
  Components: DNS
 Environment: Mac OS X 10.7
Reporter: James Peach
Priority: Minor
 Attachments: 0003-Ensure-sockaddr-length-is-always-populated.patch


 The sockaddr returned by TSHostLookupResultAddrGet() does not always get it's 
 sa_len field populated correctly. This patch guarantees to populate it to the 
 correct value so that plugin authors can rely on that field when copying the 
 TSHostLookupResultAddrGet() result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1040) PATCH: teach TSHostLookup to use const

2011-12-09 Thread James Peach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166334#comment-13166334
 ] 

James Peach commented on TS-1040:
-

Yep I meant to click the donate button but forgot. I didn't see a way to toggle 
it after the fact. Do you need me to attach it again?

 PATCH: teach TSHostLookup to use const
 --

 Key: TS-1040
 URL: https://issues.apache.org/jira/browse/TS-1040
 Project: Traffic Server
  Issue Type: Improvement
  Components: DNS
Reporter: James Peach
Priority: Minor
 Attachments: 
 0002-TSHostLookup-should-take-const-hostname-argument.patch


 This patch improves the TSHostLookup() API by specifying it's hostname 
 argument as const. This reduces the number of casts required of plugin 
 authors.
 The new prototype is:
 tsapi TSAction TSHostLookup(TSCont contp, const char* hostname, size_t 
 namelen)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-857) Crash Report: HttpTunnel::chain_abort_all - HttpServerSession::do_io_close - UnixNetVConnection::do_io_close

2011-12-09 Thread John Plevyak (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166354#comment-13166354
 ] 

John Plevyak commented on TS-857:
-

So, this patch throws a VC_EVENT_DO_CLOSE event at a NetVC, but it doesn't 
record that fact that the event is outstanding?   What prevents the NetVC from 
being deallocated before the event is processed?

 Crash Report: HttpTunnel::chain_abort_all - HttpServerSession::do_io_close 
 - UnixNetVConnection::do_io_close
 --

 Key: TS-857
 URL: https://issues.apache.org/jira/browse/TS-857
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP, Network
Affects Versions: 3.1.0
 Environment: in my branch that is something same as 3.0.x
Reporter: Zhao Yongming
Assignee: weijin
 Fix For: 3.1.3

 Attachments: ts-857.diff, ts-857.diff


 here is the bt from the crash, some of the information is missing due to we 
 have not enable the --enable-debug configure options.
 {code}
 [New process 7532]
 #0  ink_stack_trace_get (stack=value optimized out, len=value optimized 
 out, signalhandler_frame=value optimized out)
 at ink_stack_trace.cc:68
 68fp = (void **) (*fp);
 (gdb) bt
 #0  ink_stack_trace_get (stack=value optimized out, len=value optimized 
 out, signalhandler_frame=value optimized out)
 at ink_stack_trace.cc:68
 #1  0x2ba641dccef1 in ink_stack_trace_dump (sighandler_frame=value 
 optimized out) at ink_stack_trace.cc:114
 #2  0x004df020 in signal_handler (sig=value optimized out) at 
 signals.cc:225
 #3  signal handler called
 #4  0x006a1ea9 in UnixNetVConnection::do_io_close (this=0x1cc9bd20, 
 alerrno=value optimized out)
 at ../../iocore/eventsystem/I_Lock.h:297
 #5  0x0051f1d0 in HttpServerSession::do_io_close 
 (this=0x2aaab0042c80, alerrno=20600) at HttpServerSession.cc:127
 #6  0x0056d1e9 in HttpTunnel::chain_abort_all (this=0x2aabeeffdd70, 
 p=0x2aabeeffdf68) at HttpTunnel.cc:1300
 #7  0x005269ca in HttpSM::tunnel_handler_ua (this=0x2aabeeffc070, 
 event=104, c=0x2aabeeffdda8) at HttpSM.cc:2987
 #8  0x00571dfc in HttpTunnel::consumer_handler (this=0x2aabeeffdd70, 
 event=104, c=0x2aabeeffdda8) at HttpTunnel.cc:1232
 #9  0x00572032 in HttpTunnel::main_handler (this=0x2aabeeffdd70, 
 event=1088608784, data=value optimized out)
 at HttpTunnel.cc:1456
 #10 0x006a6307 in write_to_net_io (nh=0x2b12d688, vc=0x1cc876e0, 
 thread=value optimized out)
 at ../../iocore/eventsystem/I_Continuation.h:146
 #11 0x0069ce97 in NetHandler::mainNetEvent (this=0x2b12d688, 
 event=value optimized out, e=0x171c1ed0) at UnixNet.cc:405
 #12 0x006cddaf in EThread::process_event (this=0x2b12c010, 
 e=0x171c1ed0, calling_code=5) at I_Continuation.h:146
 #13 0x006ce6bc in EThread::execute (this=0x2b12c010) at 
 UnixEThread.cc:262
 #14 0x006cd0ee in spawn_thread_internal (a=0x171b58f0) at Thread.cc:88
 #15 0x003c33c064a7 in start_thread () from /lib64/libpthread.so.0
 #16 0x003c330d3c2d in clone () from /lib64/libc.so.6
 (gdb) info f
 Stack level 0, frame at 0x40e2b790:
  rip = 0x2ba641dccdf3 in ink_stack_trace_get(void**, int, int) 
 (ink_stack_trace.cc:68); saved rip 0x2ba641dccef1
  called by frame at 0x40e2bbe0
  source language c++.
  Arglist at 0x40e2b770, args: stack=value optimized out, len=value 
 optimized out, signalhandler_frame=value optimized out
  Locals at 0x40e2b770, Previous frame's sp is 0x40e2b790
  Saved registers:
   rbx at 0x40e2b778, rbp at 0x40e2b780, rip at 0x40e2b788
 (gdb) 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-857) Crash Report: HttpTunnel::chain_abort_all - HttpServerSession::do_io_close - UnixNetVConnection::do_io_close

2011-12-09 Thread John Plevyak (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166353#comment-13166353
 ] 

John Plevyak commented on TS-857:
-

So, this patch throws a VC_EVENT_DO_CLOSE event at a NetVC, but it doesn't 
record that fact that the event is outstanding?   What prevents the NetVC from 
being deallocated before the event is processed?

 Crash Report: HttpTunnel::chain_abort_all - HttpServerSession::do_io_close 
 - UnixNetVConnection::do_io_close
 --

 Key: TS-857
 URL: https://issues.apache.org/jira/browse/TS-857
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP, Network
Affects Versions: 3.1.0
 Environment: in my branch that is something same as 3.0.x
Reporter: Zhao Yongming
Assignee: weijin
 Fix For: 3.1.3

 Attachments: ts-857.diff, ts-857.diff


 here is the bt from the crash, some of the information is missing due to we 
 have not enable the --enable-debug configure options.
 {code}
 [New process 7532]
 #0  ink_stack_trace_get (stack=value optimized out, len=value optimized 
 out, signalhandler_frame=value optimized out)
 at ink_stack_trace.cc:68
 68fp = (void **) (*fp);
 (gdb) bt
 #0  ink_stack_trace_get (stack=value optimized out, len=value optimized 
 out, signalhandler_frame=value optimized out)
 at ink_stack_trace.cc:68
 #1  0x2ba641dccef1 in ink_stack_trace_dump (sighandler_frame=value 
 optimized out) at ink_stack_trace.cc:114
 #2  0x004df020 in signal_handler (sig=value optimized out) at 
 signals.cc:225
 #3  signal handler called
 #4  0x006a1ea9 in UnixNetVConnection::do_io_close (this=0x1cc9bd20, 
 alerrno=value optimized out)
 at ../../iocore/eventsystem/I_Lock.h:297
 #5  0x0051f1d0 in HttpServerSession::do_io_close 
 (this=0x2aaab0042c80, alerrno=20600) at HttpServerSession.cc:127
 #6  0x0056d1e9 in HttpTunnel::chain_abort_all (this=0x2aabeeffdd70, 
 p=0x2aabeeffdf68) at HttpTunnel.cc:1300
 #7  0x005269ca in HttpSM::tunnel_handler_ua (this=0x2aabeeffc070, 
 event=104, c=0x2aabeeffdda8) at HttpSM.cc:2987
 #8  0x00571dfc in HttpTunnel::consumer_handler (this=0x2aabeeffdd70, 
 event=104, c=0x2aabeeffdda8) at HttpTunnel.cc:1232
 #9  0x00572032 in HttpTunnel::main_handler (this=0x2aabeeffdd70, 
 event=1088608784, data=value optimized out)
 at HttpTunnel.cc:1456
 #10 0x006a6307 in write_to_net_io (nh=0x2b12d688, vc=0x1cc876e0, 
 thread=value optimized out)
 at ../../iocore/eventsystem/I_Continuation.h:146
 #11 0x0069ce97 in NetHandler::mainNetEvent (this=0x2b12d688, 
 event=value optimized out, e=0x171c1ed0) at UnixNet.cc:405
 #12 0x006cddaf in EThread::process_event (this=0x2b12c010, 
 e=0x171c1ed0, calling_code=5) at I_Continuation.h:146
 #13 0x006ce6bc in EThread::execute (this=0x2b12c010) at 
 UnixEThread.cc:262
 #14 0x006cd0ee in spawn_thread_internal (a=0x171b58f0) at Thread.cc:88
 #15 0x003c33c064a7 in start_thread () from /lib64/libpthread.so.0
 #16 0x003c330d3c2d in clone () from /lib64/libc.so.6
 (gdb) info f
 Stack level 0, frame at 0x40e2b790:
  rip = 0x2ba641dccdf3 in ink_stack_trace_get(void**, int, int) 
 (ink_stack_trace.cc:68); saved rip 0x2ba641dccef1
  called by frame at 0x40e2bbe0
  source language c++.
  Arglist at 0x40e2b770, args: stack=value optimized out, len=value 
 optimized out, signalhandler_frame=value optimized out
  Locals at 0x40e2b770, Previous frame's sp is 0x40e2b790
  Saved registers:
   rbx at 0x40e2b778, rbp at 0x40e2b780, rip at 0x40e2b788
 (gdb) 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-949) key-volume hash table is not consistent when a disk is marked as bad or removed due to failure

2011-12-09 Thread John Plevyak (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-949:


Attachment: TS-949-jp2.patch

This patch uses a table of random number selected based on the size of the disk 
partition and selects the closest to the center of each bucket as the bucket 
owner.  This is stable for inserts, removes and never switches between disks 
which remain present.  This should address the issue.

 key-volume hash table is not consistent when a disk is marked as bad or 
 removed due to failure
 ---

 Key: TS-949
 URL: https://issues.apache.org/jira/browse/TS-949
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.0
 Environment: Multi-volume cache with apparently faulty drives
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 3.1.2

 Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch


 The method for resolving collisions when distributing hash-table space to 
 volumes for the object_key-volume hash table creates inconsistency when a 
 disk is determined to be bad, or when a failed disk is removed from the 
 volume.config.
 Background:
 The hash space is distributed by round robin draft where each volume drafts 
 a random index in the hash table until the hash space is exhausted.  The 
 random order in which a given volume drafts hash table slots is consistent 
 across reboot/crash/disk-failure, however when a volume attempts to draft a 
 slot which has already been occupied, it skips to its next random pick and 
 attempts to draft that slot until it finds an open slot.  This ensures that 
 the hash is partitioned evenly between volumes.
 The issue:
 Resolving slot contention breaks the consistency as it is dependent on the 
 order that the volumes draft.  When rebuilding the hash after disk failure or 
 reboot with fewer drives, a volume may secure an index that was previously 
 occupied by the dead-disk.  In the old hash, the surviving volume would have 
 selected another random index due to contention.  If this index is taken, by 
 the next draft round it will represent an inconsistent key-volume result.  
 The effects of one inconsistency will then cascade as whichever volume 
 occupies that index after removing a dead disk is now behind on its draft 
 sequence as well. 
 An Example:
 ||Disk||Draft Sequence||
 |A|1,4,7,5|
 |B|4,2,8,1|
 |C|3,7,5,2|
 Pre-failure Hash Table after 2 rounds of draft:
 |A|B|C|B|C|?|A|?|
 Post-failure of drive B Hash Table after 3 rounds of draft:
 |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
 Two slots have become inconsistent and more will probably follow.  These 
 inconsistencies become objects stored in a volume but lost to the top level 
 cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-857) Crash Report: HttpTunnel::chain_abort_all - HttpServerSession::do_io_close - UnixNetVConnection::do_io_close

2011-12-09 Thread Alan M. Carroll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166513#comment-13166513
 ] 

Alan M. Carroll commented on TS-857:


Where would it record the pending event? The presumption is that the only other 
thread that might close it is the one that will process the event.

But the real issue is that sometimes the lock is dropped between the time the 
lock try is done and the event is scheduled, leading to scheduling on the null 
thread. If you look at TS-934 you can see how that issue is handled there, but 
even that's not sufficient because normal VC processing from 
NetHandler::mainNetEvent does not lock the VCs, so there isn't any way do 
anything safely in this case.

 Crash Report: HttpTunnel::chain_abort_all - HttpServerSession::do_io_close 
 - UnixNetVConnection::do_io_close
 --

 Key: TS-857
 URL: https://issues.apache.org/jira/browse/TS-857
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP, Network
Affects Versions: 3.1.0
 Environment: in my branch that is something same as 3.0.x
Reporter: Zhao Yongming
Assignee: weijin
 Fix For: 3.1.3

 Attachments: ts-857.diff, ts-857.diff


 here is the bt from the crash, some of the information is missing due to we 
 have not enable the --enable-debug configure options.
 {code}
 [New process 7532]
 #0  ink_stack_trace_get (stack=value optimized out, len=value optimized 
 out, signalhandler_frame=value optimized out)
 at ink_stack_trace.cc:68
 68fp = (void **) (*fp);
 (gdb) bt
 #0  ink_stack_trace_get (stack=value optimized out, len=value optimized 
 out, signalhandler_frame=value optimized out)
 at ink_stack_trace.cc:68
 #1  0x2ba641dccef1 in ink_stack_trace_dump (sighandler_frame=value 
 optimized out) at ink_stack_trace.cc:114
 #2  0x004df020 in signal_handler (sig=value optimized out) at 
 signals.cc:225
 #3  signal handler called
 #4  0x006a1ea9 in UnixNetVConnection::do_io_close (this=0x1cc9bd20, 
 alerrno=value optimized out)
 at ../../iocore/eventsystem/I_Lock.h:297
 #5  0x0051f1d0 in HttpServerSession::do_io_close 
 (this=0x2aaab0042c80, alerrno=20600) at HttpServerSession.cc:127
 #6  0x0056d1e9 in HttpTunnel::chain_abort_all (this=0x2aabeeffdd70, 
 p=0x2aabeeffdf68) at HttpTunnel.cc:1300
 #7  0x005269ca in HttpSM::tunnel_handler_ua (this=0x2aabeeffc070, 
 event=104, c=0x2aabeeffdda8) at HttpSM.cc:2987
 #8  0x00571dfc in HttpTunnel::consumer_handler (this=0x2aabeeffdd70, 
 event=104, c=0x2aabeeffdda8) at HttpTunnel.cc:1232
 #9  0x00572032 in HttpTunnel::main_handler (this=0x2aabeeffdd70, 
 event=1088608784, data=value optimized out)
 at HttpTunnel.cc:1456
 #10 0x006a6307 in write_to_net_io (nh=0x2b12d688, vc=0x1cc876e0, 
 thread=value optimized out)
 at ../../iocore/eventsystem/I_Continuation.h:146
 #11 0x0069ce97 in NetHandler::mainNetEvent (this=0x2b12d688, 
 event=value optimized out, e=0x171c1ed0) at UnixNet.cc:405
 #12 0x006cddaf in EThread::process_event (this=0x2b12c010, 
 e=0x171c1ed0, calling_code=5) at I_Continuation.h:146
 #13 0x006ce6bc in EThread::execute (this=0x2b12c010) at 
 UnixEThread.cc:262
 #14 0x006cd0ee in spawn_thread_internal (a=0x171b58f0) at Thread.cc:88
 #15 0x003c33c064a7 in start_thread () from /lib64/libpthread.so.0
 #16 0x003c330d3c2d in clone () from /lib64/libc.so.6
 (gdb) info f
 Stack level 0, frame at 0x40e2b790:
  rip = 0x2ba641dccdf3 in ink_stack_trace_get(void**, int, int) 
 (ink_stack_trace.cc:68); saved rip 0x2ba641dccef1
  called by frame at 0x40e2bbe0
  source language c++.
  Arglist at 0x40e2b770, args: stack=value optimized out, len=value 
 optimized out, signalhandler_frame=value optimized out
  Locals at 0x40e2b770, Previous frame's sp is 0x40e2b790
  Saved registers:
   rbx at 0x40e2b778, rbp at 0x40e2b780, rip at 0x40e2b788
 (gdb) 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1031) reduce lock in netHandler and reduce the possiblity of acquiring expire server sessions

2011-12-09 Thread John Plevyak (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166654#comment-13166654
 ] 

John Plevyak commented on TS-1031:
--

I don't understand why this is necessary.  Nobody should call do_io_close() 
until they have cleared ALL pointers to the NetVC.  This seems like a hack to 
prevent buggy code from crashing in this particular way rather than just doing 
other bad things (including crashing in some other way).

 reduce lock in netHandler and reduce the possiblity of acquiring expire 
 server sessions
 ---

 Key: TS-1031
 URL: https://issues.apache.org/jira/browse/TS-1031
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.1.1
Reporter: Zhao Yongming
Assignee: weijin
Priority: Minor
 Attachments: ts-1031.diff


 reduce lock in netHandler and reduce the possiblity of acquiring expire 
 server sessions. put your patch here for review :D

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2011-12-09 Thread John Plevyak (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1312#comment-1312
 ] 

John Plevyak commented on TS-937:
-

Let's nuke TS_HAS_PURIFY.  If we want to make a valgrind target (e.g. something 
which would enable normal malloc) than that is an idea, but this macro is 
confusing.

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2011-12-09 Thread Leif Hedstrom (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166718#comment-13166718
 ] 

Leif Hedstrom commented on TS-937:
--

Sold.

I did add a --disable-freelist  configure option a while ago, which turns the 
freelist into malloc/free calls (I hope at least, unless I fucked it up :). The 
thought was that we'd use this option for memory debugging either with 
valgrind, or e.g. tcmalloc.

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira