[jira] [Commented] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259650#comment-14259650
 ] 

Susan Hinrichs commented on TS-3261:


Correct, I would not suggest adding the asserts in production.  Just added them 
for my own tests.

Looking into the implementation of SSL_new() (for openssl 1.0.1f), the function 
does a memset 0 after it gets the memory for the new SSL object.  So even if 
the item was pulled off of a free list, the rbio and wbio members should be 
NULL.

 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   160956416 |  115515392 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |   2048 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   98304 |  50688 |256 | 
 memory/ioBufAllocator[1]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259650#comment-14259650
 ] 

Susan Hinrichs edited comment on TS-3261 at 12/28/14 2:29 PM:
--

Correct, I would not suggest adding the asserts in production.  Just added them 
for my own tests.

Looking into the implementation of SSL_new() (for openssl 1.0.1f), the function 
does a memset 0 after it gets the memory for the new SSL object.  So even if 
the item was pulled off of a free list, the rbio and wbio members should be 
NULL.

The interesting lines from SSL_new are

s=(SSL *)OPENSSL_malloc(sizeof(SSL));
if (s == NULL) goto err;
memset(s,0,sizeof(SSL));



was (Author: shinrich):
Correct, I would not suggest adding the asserts in production.  Just added them 
for my own tests.

Looking into the implementation of SSL_new() (for openssl 1.0.1f), the function 
does a memset 0 after it gets the memory for the new SSL object.  So even if 
the item was pulled off of a free list, the rbio and wbio members should be 
NULL.

 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   160956416 |  115515392 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |   2048 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   

[jira] [Commented] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259666#comment-14259666
 ] 

Sudheer Vinukonda commented on TS-3261:
---

Ah, Thanks Susan! That makes me wonder whether setting rbio/wbio directly (like 
you already do in {{SSL_set_rbio}}) might be a better option than using  
{{SSL_set_bio()}} which has all these additional defensive checks. 

 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   160956416 |  115515392 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |   2048 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   98304 |  50688 |256 | 
 memory/ioBufAllocator[1]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259666#comment-14259666
 ] 

Sudheer Vinukonda edited comment on TS-3261 at 12/28/14 4:03 PM:
-

Ah, Thanks Susan! That makes me wonder whether setting rbio/wbio directly (like 
you already do in {{SSL_set_rbio}}) might be a better option than using  
{{SSL_set_bio()}} which has all these additional defensive checks. 

On the other hand, the drawback is that, it sounds a little hacky to directly 
update openSSL's data members without going through an interface and may be a 
pain point in future in terms of maintenance or if we consider alternatives to 
openSSL. So, perhaps, it's not worth making that little optimization.


was (Author: sudheerv):
Ah, Thanks Susan! That makes me wonder whether setting rbio/wbio directly (like 
you already do in {{SSL_set_rbio}}) might be a better option than using  
{{SSL_set_bio()}} which has all these additional defensive checks. 

 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   160956416 |  115515392 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |   2048 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   98304 |  50688 |256 | 
 memory/ioBufAllocator[1]
 {code}


[jira] [Commented] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259669#comment-14259669
 ] 

Sudheer Vinukonda commented on TS-3261:
---

Memory looks stable so far with the patch that removes the redundant 
BIO_free(). Will monitor for a few more hours to confirm and commit the fix.

top output after about 17 hours:

{code}
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
14253 nobody20   0 14.3g 9.6g 1.5g S 100.6 41.6 834:22.12 
/home/y/bin/traffic_server -M --httpport 
80:fd=7,81:fd=8,82:fd=9,83:fd=10,84:fd=11,85:fd=12,90:fd=13,80:fd=14:ipv6,81:fd=15:ipv6
{code}

 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   160956416 |  115515392 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |   2048 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   98304 |  50688 |256 | 
 memory/ioBufAllocator[1]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259521#comment-14259521
 ] 

Sudheer Vinukonda edited comment on TS-3265 at 12/28/14 4:09 PM:
-

Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession's}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better to extract the 
{{req}} object from {{req_map using}} an iterator (instead of directly 
accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}


was (Author: sudheerv):
Looking at the gdb output available, this seems to have been caused by an 
inconsistency between sm object's req_map and the fetch_sm object - Not 
entirely clear how this could happen (unfortunately, lost the core to analyze 
further), although, it would be better to extract the req object from req_map 
using an iterator (instead of directly accessing it array-based) and add 
checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 #2  0x0063373c in 

[jira] [Comment Edited] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259521#comment-14259521
 ] 

Sudheer Vinukonda edited comment on TS-3265 at 12/28/14 4:10 PM:
-

Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession's}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better to extract the 
{{req}} object from {{req_map}} using an iterator (instead of directly 
accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}


was (Author: sudheerv):
Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession's}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better to extract the 
{{req}} object from {{req_map using}} an iterator (instead of directly 
accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 

[jira] [Comment Edited] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259521#comment-14259521
 ] 

Sudheer Vinukonda edited comment on TS-3265 at 12/28/14 4:13 PM:
-

Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better to extract the 
{{req}} object from {{req_map}} using an iterator (instead of directly 
accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}


was (Author: sudheerv):
Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession's}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better to extract the 
{{req}} object from {{req_map}} using an iterator (instead of directly 
accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 #2 

[jira] [Comment Edited] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259521#comment-14259521
 ] 

Sudheer Vinukonda edited comment on TS-3265 at 12/28/14 4:13 PM:
-

Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better/safer to 
extract the {{req}} object from {{req_map}} using an iterator (instead of 
directly accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}


was (Author: sudheerv):
Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better to extract the 
{{req}} object from {{req_map}} using an iterator (instead of directly 
accessing it array-based) and add checking for null pointer.

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57

[jira] [Comment Edited] (TS-2780) Core dump in SpdyRequest::clear() in production testing of SPDY

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055129#comment-14055129
 ] 

Sudheer Vinukonda edited comment on TS-2780 at 12/28/14 4:20 PM:
-

I am finally able to reproduce this error message and understand the issue. 
Basically, this happens, due to a race condition, when the client sends a 
{{DATA}} frame while ATS sends an error in {{SYN_REPLY}} (e.g. a POST 
transaction with client trying to send a large data file, but, ATS denies POST 
transactions from the client using ip_allow ACL). 

{code}
E.g. % spdycat -3 -v https://abc.com -d /tmp/1  /tmp/2 (where /tmp/1 is a 
large data file with about 5K lines). 
{code}

The issue happens due to the code in {{spdy_on_data_chunk_recv_callback()}}, 
{{spdy_on_data_recv_callback()}} etc. Basically, when the error happens, the 
request is deleted and erased from {{SpdyClientSession-req_map}}. However, the 
data frame may still be received after the req object is deleted. The below 
code correctly ignores the data frames, since, the request has been deleted 
already - but, the statement {{SpdyRequest *req = sm-req_map[stream_id];}} 
results in adding an element with key {{stream_id}} to the {{req_map}} with 
value {{NULL}}. A correct way to check if the req has been erased from the map 
is to check if the iterator pointing to that element is valid. Will test and 
upload a patch shortly ..

{code}
void
spdy_on_data_chunk_recv_callback(spdylay_session * /*session*/, uint8_t 
/*flags*/,
 int32_t stream_id, const uint8_t *data,
 size_t len, void *user_data)
{
  SpdyClientSession *sm = (SpdyClientSession *)user_data;
  SpdyRequest *req = sm-req_map[stream_id];
  //
  // SpdyRequest has been deleted on error, drop this data;
  //
  if (!req)
return; 
{code}




was (Author: sudheerv):
I am finally able to reproduce this error message and understand the issue. 
Basically, this happens, due to a race condition, when the client sends a DATA 
frame while ATS sends an error in SYN_REPLY (e.g. a POST transaction with 
client trying to send a large data file, but, ATS denies POST transactions from 
the client using ip_allow ACL). 

E.g. % spdycat -3 -v https://abc.com -d /tmp/1  /tmp/2 (where /tmp/1 is a 
large data file with about 5K lines). 

The issue happens due to the code in spdy_on_data_chunk_recv_callback(), 
spdy_on_data_recv_callback() etc. Basically, when the error happens, the 
request is deleted and erased from client_session-req_map. However, the data 
frame may still be received after the req object is deleted. The below code 
correctly ignores the data frames, since, the request has been deleted already 
- but, the statement SpdyRequest *req = sm-req_map[stream_id]; results in 
adding an element with key stream_id to the req_map with value NULL. A correct 
way to check if the req has been erased from the map is to check if the 
iterator pointing to that element is valid. Will test and upload a patch 
shortly ..

{code}
void
spdy_on_data_chunk_recv_callback(spdylay_session * /*session*/, uint8_t 
/*flags*/,
 int32_t stream_id, const uint8_t *data,
 size_t len, void *user_data)
{
  SpdyClientSession *sm = (SpdyClientSession *)user_data;
  SpdyRequest *req = sm-req_map[stream_id];
  //
  // SpdyRequest has been deleted on error, drop this data;
  //
  if (!req)
return; 
{code}



 Core dump in SpdyRequest::clear() in production testing of SPDY
 ---

 Key: TS-2780
 URL: https://issues.apache.org/jira/browse/TS-2780
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Sudheer Vinukonda
Assignee: Bryan Call
  Labels: spdy, yahoo
 Fix For: 5.1.0

 Attachments: TS-2780.diff, ts2780.diff


 Noticed the below core dump in SpdyRequest::clear() during production testing 
 on our proxy platform.
 {code}
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /home/y/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0x3420a0f500)[0x2b47cff7d500]
 /home/y/bin/traffic_server(_ZN11SpdyRequest5clearEv+0x14)[0x5d06d4]
 /home/y/bin/traffic_server(_ZN6SpdySM5clearEv+0x34)[0x5d0b84]
 /home/y/bin/traffic_server[0x5d13d1]
 /home/y/bin/traffic_server(_ZN18UnixNetVConnection14readSignalDoneEiP10NetHandler+0x3d)[0x6efbcd]
 /home/y/bin/traffic_server(_ZN17SSLNetVConnection11net_read_ioEP10NetHandlerP7EThread+0xc76)[0x6df3e6]
 /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x6e6b12]
 /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x8f)[0x714aaf]
 /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x493)[0x715453]
 /home/y/bin/traffic_server[0x713e5a]
 

[jira] [Comment Edited] (TS-2780) Core dump in SpdyRequest::clear() in production testing of SPDY

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055248#comment-14055248
 ] 

Sudheer Vinukonda edited comment on TS-2780 at 12/28/14 4:21 PM:
-

Patch to replace incorrect access of erased {{req_map}} elements as 
{{req_map[stream_id]}}, with accessing them via an iterator pointing to the 
element.


was (Author: sudheerv):
Patch to replace incorrect access of erased req_map elements as 
req_map[stream_id], with accessing them via an iterator pointing to the element.

 Core dump in SpdyRequest::clear() in production testing of SPDY
 ---

 Key: TS-2780
 URL: https://issues.apache.org/jira/browse/TS-2780
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Sudheer Vinukonda
Assignee: Bryan Call
  Labels: spdy, yahoo
 Fix For: 5.1.0

 Attachments: TS-2780.diff, ts2780.diff


 Noticed the below core dump in SpdyRequest::clear() during production testing 
 on our proxy platform.
 {code}
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /home/y/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0x3420a0f500)[0x2b47cff7d500]
 /home/y/bin/traffic_server(_ZN11SpdyRequest5clearEv+0x14)[0x5d06d4]
 /home/y/bin/traffic_server(_ZN6SpdySM5clearEv+0x34)[0x5d0b84]
 /home/y/bin/traffic_server[0x5d13d1]
 /home/y/bin/traffic_server(_ZN18UnixNetVConnection14readSignalDoneEiP10NetHandler+0x3d)[0x6efbcd]
 /home/y/bin/traffic_server(_ZN17SSLNetVConnection11net_read_ioEP10NetHandlerP7EThread+0xc76)[0x6df3e6]
 /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x6e6b12]
 /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x8f)[0x714aaf]
 /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x493)[0x715453]
 /home/y/bin/traffic_server[0x713e5a]
 /lib64/libpthread.so.0(+0x3420a07851)[0x2b47cff75851]
 /lib64/libc.so.6(clone+0x6d)[0x34202e894d]
 {code}
 gdb output shows that req may be a null pointer in SpdySM::clear()
 {code}
 (gdb) bt
 #0  SpdyRequest::clear (this=0x0) at SpdySM.cc:44
 #1  0x005d0b84 in SpdySM::clear (this=0x2aefdc4b46a0) at SpdySM.cc:116
 #2  0x005d13d1 in spdy_default_handler (contp=value optimized out, 
 event=TS_EVENT_VCONN_INACTIVITY_TIMEOUT, edata=value optimized out) at 
 SpdySM.cc:258
 #3  0x006eeabb in handleEvent (event=value optimized out, 
 vc=0x2aefcc141ca0) at ../../iocore/eventsystem/I_Continuation.h:146
 #4  read_signal_and_update (event=value optimized out, vc=0x2aefcc141ca0) 
 at UnixNetVConnection.cc:216
 #5  0x006f3cd4 in UnixNetVConnection::mainEvent (this=0x2aefcc141ca0, 
 event=value optimized out, e=value optimized out) at 
 UnixNetVConnection.cc:1152
 #6  0x006e7d0e in handleEvent (this=0x1634af0, event=value optimized 
 out, e=0x136e850) at ../../iocore/eventsystem/I_Continuation.h:146
 #7  InactivityCop::check_inactivity (this=0x1634af0, event=value optimized 
 out, e=0x136e850) at UnixNet.cc:67
 #8  0x00714aaf in handleEvent (this=0x2aef48f91010, e=0x136e850, 
 calling_code=2) at I_Continuation.h:146
 #9  EThread::process_event (this=0x2aef48f91010, e=0x136e850, calling_code=2) 
 at UnixEThread.cc:145
 #10 0x00715623 in EThread::execute (this=0x2aef48f91010) at 
 UnixEThread.cc:224
 #11 0x00713e5a in spawn_thread_internal (a=0x163d000) at Thread.cc:88
 #12 0x2aed0a8a1851 in start_thread () from /lib64/libpthread.so.0
 #13 0x0034202e894d in clone () from /lib64/libc.so.6
 (gdb) print this
 $1 = (SpdyRequest * const) 0x0
 (gdb) up
 #1  0x005d0b84 in SpdySM::clear (this=0x2aefdc4b46a0) at SpdySM.cc:116
 116   in SpdySM.cc
 (gdb) print req
 $2 = (SpdyRequest *) 0x0
 (gdb) print iter
 $3 = {first = 25, second = }
 (gdb) print req_map
 $4 = std::map with 3 elements = {
   [1] = 0x2aefd81e5ae0,
   [25] = 0x0,
   [27] = 0x0
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259521#comment-14259521
 ] 

Sudheer Vinukonda edited comment on TS-3265 at 12/28/14 4:26 PM:
-

Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better/safer to 
extract the {{req}} object from {{req_map}} using an iterator (instead of 
directly accessing it array-based) and add checking for null pointer. Accessing 
elements in a {{std::map}} directly using an array index ({{operator[]}}), 
might result in adding the element with value NULL, if not found (refer 
TS-2780)- 
{code}
http://stackoverflow.com/questions/1639544/why-does-stdmap-operator-create-an-object-if-the-key-doesnt-exist
{code}

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}


was (Author: sudheerv):
Looking at the gdb output available, this seems to have been caused by an 
inconsistency between {{SpdyClientSession}} object's {{req_map}} and the 
{{fetch_sm}} object - Not entirely clear how this could happen (unfortunately, 
lost the core to analyze further), although, it would be better/safer to 
extract the {{req}} object from {{req_map}} using an iterator (instead of 
directly accessing it array-based) and add checking for null pointer. Accessing 
elements in a {{std::map}} directly using an array index, might result in 
adding the element with value NULL, if not found (also, refer TS-2780).

I am testing the below patch that checks for null pointer (along with an error 
log for the condition).

{code}
diff --git a/proxy/spdy/SpdyCallbacks.cc b/proxy/spdy/SpdyCallbacks.cc
index 4d1aceb..7d7605d 100644
--- a/proxy/spdy/SpdyCallbacks.cc
+++ b/proxy/spdy/SpdyCallbacks.cc
@@ -52,7 +52,11 @@ spdy_callbacks_init(spdylay_session_callbacks *callbacks)
 void
 spdy_prepare_status_response_and_clean_request(SpdyClientSession *sm, int 
stream_id, const char *status)
 {
-  SpdyRequest *req = sm-req_map[stream_id];
+  SpdyRequest *req = sm-find_request(stream_id);
+  if (!req) {
+Error (spdy_prepare_status_response_and_clean_request, req object null 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+return;
+  }
   string date_str = http_date(time(0));
   const char **nv = new const char*[8+req-headers.size()*2+1];
 
@@ -328,7 +332,11 @@ spdy_on_ctrl_recv_callback(spdylay_session *session, 
spdylay_frame_type type,
 
   case SPDYLAY_HEADERS:
 stream_id = frame-syn_stream.stream_id;
-req = sm-req_map[stream_id];
+req = sm-find_request(stream_id);
+if (!req) {
+  Error (spdy_on_ctrl_recv_callback, req object null on SPDYLAY_HEADERS 
for sm % PRId64 , stream_id %d, sm-sm_id, stream_id);
+  return;
+}
 req-append_nv(frame-headers.nv);
 break;
{code}

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 

[jira] [Comment Edited] (TS-2780) Core dump in SpdyRequest::clear() in production testing of SPDY

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055129#comment-14055129
 ] 

Sudheer Vinukonda edited comment on TS-2780 at 12/28/14 4:27 PM:
-

I am finally able to reproduce this error message and understand the issue. 
Basically, this happens, due to a race condition, when the client sends a 
{{DATA}} frame while ATS sends an error in {{SYN_REPLY}} (e.g. a POST 
transaction with client trying to send a large data file, but, ATS denies POST 
transactions from the client using ip_allow ACL). 

{code}
E.g. % spdycat -3 -v https://abc.com -d /tmp/1  /tmp/2 (where /tmp/1 is a 
large data file with about 5K lines). 
{code}

The issue happens due to the code in {{spdy_on_data_chunk_recv_callback()}}, 
{{spdy_on_data_recv_callback()}} etc. Basically, when the error happens, the 
request is deleted and erased from {{SpdyClientSession-req_map}}. However, the 
data frame may still be received after the req object is deleted. The below 
code correctly ignores the data frames, since, the request has been deleted 
already - but, the statement {{SpdyRequest *req = sm-req_map[stream_id];}} 
results in adding an element with key {{stream_id}} to the {{req_map}} with 
value {{NULL}}, due to the way {{operator[]}} is implemented in {{std::map}}.

{code}
http://stackoverflow.com/questions/1639544/why-does-stdmap-operator-create-an-object-if-the-key-doesnt-exist
{code}
A correct way to check if the req has been erased from the map is to check if 
the iterator pointing to that element is valid. Will test and upload a patch 
shortly ..

{code}
void
spdy_on_data_chunk_recv_callback(spdylay_session * /*session*/, uint8_t 
/*flags*/,
 int32_t stream_id, const uint8_t *data,
 size_t len, void *user_data)
{
  SpdyClientSession *sm = (SpdyClientSession *)user_data;
  SpdyRequest *req = sm-req_map[stream_id];
  //
  // SpdyRequest has been deleted on error, drop this data;
  //
  if (!req)
return; 
{code}




was (Author: sudheerv):
I am finally able to reproduce this error message and understand the issue. 
Basically, this happens, due to a race condition, when the client sends a 
{{DATA}} frame while ATS sends an error in {{SYN_REPLY}} (e.g. a POST 
transaction with client trying to send a large data file, but, ATS denies POST 
transactions from the client using ip_allow ACL). 

{code}
E.g. % spdycat -3 -v https://abc.com -d /tmp/1  /tmp/2 (where /tmp/1 is a 
large data file with about 5K lines). 
{code}

The issue happens due to the code in {{spdy_on_data_chunk_recv_callback()}}, 
{{spdy_on_data_recv_callback()}} etc. Basically, when the error happens, the 
request is deleted and erased from {{SpdyClientSession-req_map}}. However, the 
data frame may still be received after the req object is deleted. The below 
code correctly ignores the data frames, since, the request has been deleted 
already - but, the statement {{SpdyRequest *req = sm-req_map[stream_id];}} 
results in adding an element with key {{stream_id}} to the {{req_map}} with 
value {{NULL}}. A correct way to check if the req has been erased from the map 
is to check if the iterator pointing to that element is valid. Will test and 
upload a patch shortly ..

{code}
void
spdy_on_data_chunk_recv_callback(spdylay_session * /*session*/, uint8_t 
/*flags*/,
 int32_t stream_id, const uint8_t *data,
 size_t len, void *user_data)
{
  SpdyClientSession *sm = (SpdyClientSession *)user_data;
  SpdyRequest *req = sm-req_map[stream_id];
  //
  // SpdyRequest has been deleted on error, drop this data;
  //
  if (!req)
return; 
{code}



 Core dump in SpdyRequest::clear() in production testing of SPDY
 ---

 Key: TS-2780
 URL: https://issues.apache.org/jira/browse/TS-2780
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Sudheer Vinukonda
Assignee: Bryan Call
  Labels: spdy, yahoo
 Fix For: 5.1.0

 Attachments: TS-2780.diff, ts2780.diff


 Noticed the below core dump in SpdyRequest::clear() during production testing 
 on our proxy platform.
 {code}
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /home/y/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0x3420a0f500)[0x2b47cff7d500]
 /home/y/bin/traffic_server(_ZN11SpdyRequest5clearEv+0x14)[0x5d06d4]
 /home/y/bin/traffic_server(_ZN6SpdySM5clearEv+0x34)[0x5d0b84]
 /home/y/bin/traffic_server[0x5d13d1]
 /home/y/bin/traffic_server(_ZN18UnixNetVConnection14readSignalDoneEiP10NetHandler+0x3d)[0x6efbcd]
 /home/y/bin/traffic_server(_ZN17SSLNetVConnection11net_read_ioEP10NetHandlerP7EThread+0xc76)[0x6df3e6]
 

[jira] [Comment Edited] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259669#comment-14259669
 ] 

Sudheer Vinukonda edited comment on TS-3261 at 12/28/14 6:00 PM:
-

Memory looks stable so far with the patch that removes the redundant 
{{BIO_free()}}. Will monitor for a few more hours to confirm and commit the fix.

top output after about 17 hours:

{code}
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
14253 nobody20   0 14.3g 9.6g 1.5g S 100.6 41.6 834:22.12 
/home/y/bin/traffic_server -M --httpport 
80:fd=7,81:fd=8,82:fd=9,83:fd=10,84:fd=11,85:fd=12,90:fd=13,80:fd=14:ipv6,81:fd=15:ipv6
{code}


was (Author: sudheerv):
Memory looks stable so far with the patch that removes the redundant 
BIO_free(). Will monitor for a few more hours to confirm and commit the fix.

top output after about 17 hours:

{code}
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
14253 nobody20   0 14.3g 9.6g 1.5g S 100.6 41.6 834:22.12 
/home/y/bin/traffic_server -M --httpport 
80:fd=7,81:fd=8,82:fd=9,83:fd=10,84:fd=11,85:fd=12,90:fd=13,80:fd=14:ipv6,81:fd=15:ipv6
{code}

 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   

[jira] [Commented] (TS-3261) possible slow leak in v5.2.0

2014-12-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259689#comment-14259689
 ] 

ASF subversion and git services commented on TS-3261:
-

Commit 0924e5d721681d77d769a6fe8a5228fcf70bfe91 in trafficserver's branch 
refs/heads/master from [~sudheerv]
[ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=0924e5d ]

[TS-3261]: remove redundant BIO_free() for new SSL object, per Leif's suggestion


 possible slow leak in v5.2.0
 

 Key: TS-3261
 URL: https://issues.apache.org/jira/browse/TS-3261
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Susan Hinrichs
Priority: Critical
 Fix For: 5.3.0


 After fixing the iobuffer leak in TS-3257, the iobuffers seem stable on 
 v5.2.0, but, there still seems to be a slow memory leak. The RES memory from 
 top shows 15g after running v5.2.0 in prod for more than 3 days, whereas the 
 corresponding v5.0 host shows 10g after running for more than a week.
 Below is the dump of iobuffers between the v5.2.0 and v5.0 host - as 
 expected, most iobufs are lower than v5.0 host (since, the v5.0 host been 
 running longer), except the 32k buffer (iobuf[8]). But, the leak doesn't seem 
 to be explained by the difference in 32k buffers either, as it is not high 
 enough to explain the 5g difference in total memory.
 v5.2.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
67108864 |   25165824 |2097152 | 
 memory/ioBufAllocator[14]
  2013265920 | 1825570816 |1048576 | 
 memory/ioBufAllocator[13]
   620756992 |  549978112 | 524288 | 
 memory/ioBufAllocator[12]
   780140544 |  593494016 | 262144 | 
 memory/ioBufAllocator[11]
   742391808 |  574619648 | 131072 | 
 memory/ioBufAllocator[10]
   901775360 |  735576064 |  65536 | 
 memory/ioBufAllocator[9]
  1189085184 | 1093304320 |  32768 | 
 memory/ioBufAllocator[8]
   474480640 |  348733440 |  16384 | 
 memory/ioBufAllocator[7]
   269221888 |  211320832 |   8192 | 
 memory/ioBufAllocator[6]
   156762112 |  142999552 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   65536 |256 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
 {code}
 v.5.0.0 host:
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   134217728 |   31457280 |2097152 | 
 memory/ioBufAllocator[14]
  2147483648 | 1843396608 |1048576 | 
 memory/ioBufAllocator[13]
   788529152 |  608174080 | 524288 | 
 memory/ioBufAllocator[12]
   897581056 |  680525824 | 262144 | 
 memory/ioBufAllocator[11]
   796917760 |  660471808 | 131072 | 
 memory/ioBufAllocator[10]
   985661440 |  818479104 |  65536 | 
 memory/ioBufAllocator[9]
   873463808 |  677969920 |  32768 | 
 memory/ioBufAllocator[8]
   544735232 |  404439040 |  16384 | 
 memory/ioBufAllocator[7]
   310902784 |  237887488 |   8192 | 
 memory/ioBufAllocator[6]
   160956416 |  115515392 |   4096 | 
 memory/ioBufAllocator[5]
   0 |  0 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |   2048 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   98304 |  50688 |256 | 
 memory/ioBufAllocator[1]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259690#comment-14259690
 ] 

ASF subversion and git services commented on TS-3265:
-

Commit fc9f67072a2ee949894f5d38abb30e1e556e9e26 in trafficserver's branch 
refs/heads/master from [~sudheerv]
[ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=fc9f670 ]

[TS-3265]: Fix spdy req_map access to use iterator instead of operator[]


 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 #2  0x0063373c in spdy_process_fetch (event=TS_EVENT_ERROR, 
 sm=0x2ad800d4d730, edata=0x2ad818105c50) at SpdyClientSession.cc:350
 #3  0x00633151 in SpdyClientSession::state_session_readwrite 
 (this=0x2ad800d4d730, event=3, edata=0x2ad818105c50) at 
 SpdyClientSession.cc:253
 #4  0x00502eae in Continuation::handleEvent (this=0x2ad800d4d730, 
 event=3, data=0x2ad818105c50) at ../iocore/eventsystem/I_Continuation.h:146
 #5  0x00501116 in FetchSM::InvokePluginExt (this=0x2ad818105c50, 
 fetch_event=3) at FetchSM.cc:245
 #6  0x00501e59 in FetchSM::process_fetch_write (this=0x2ad818105c50, 
 event=3) at FetchSM.cc:495
 #7  0x00501f3e in FetchSM::fetch_handler (this=0x2ad818105c50, 
 event=3, edata=0x2ad86c90b960) at FetchSM.cc:511
 #8  0x00502eae in Continuation::handleEvent (this=0x2ad818105c50, 
 event=3, data=0x2ad86c90b960) at ../iocore/eventsystem/I_Continuation.h:146
 #9  0x0053e9d5 in PluginVC::process_write_side (this=0x2ad86c90b810, 
 other_side_call=false) at PluginVC.cc:519
 #10 0x0053dbcd in PluginVC::main_handler (this=0x2ad86c90b810, 
 event=2, data=0x2ad80802c690) at PluginVC.cc:210
 #11 0x00502eae in Continuation::handleEvent (this=0x2ad86c90b810, 
 event=2, data=0x2ad80802c690) at ../iocore/eventsystem/I_Continuation.h:146
 #12 0x00773136 in EThread::process_event (this=0x2ad6cc10, 
 e=0x2ad80802c690, calling_code=2) at UnixEThread.cc:144
 #13 0x00773451 in EThread::execute (this=0x2ad6cc10) at 
 UnixEThread.cc:223
 #14 0x007726f1 in spawn_thread_internal (a=0x161fd50) at Thread.cc:88
 #15 0x2ad6c6d30851 in start_thread () from /lib64/libpthread.so.0
 #16 0x003296ee890d in clone () from /lib64/libc.so.6
 {code}
 Looking at the core in gdb, it seems the {{req}} object extracted from the 
 {{sm-req_map}} based on {{stream_id}} is null. However, the {{req}} object 
 (extracted from {{TSFetchUserDataGet}}) from the caller 
 {{spdy_process_fetch}} in frame#2 seems valid. Unfortunately, the core was 
 deleted before I could examine more info (like, the sm object's req_map or 
 the req object details etc).
 {code}
 (gdb) frame 1
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 57SpdyCallbacks.cc: No such file or directory.
   in SpdyCallbacks.cc
 (gdb) print req
 $1 = (SpdyRequest *) 0x0
 (gdb) up
 #2  0x0063373c in spdy_process_fetch (event=TS_EVENT_ERROR, 
 sm=0x2ad800d4d730, edata=0x2ad818105c50) at SpdyClientSession.cc:350
 350   SpdyClientSession.cc: No such file or directory.
   in SpdyClientSession.cc
 (gdb) print req
 $2 = (SpdyRequest *) 0x2ad7c4e8be90
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Sudheer Vinukonda (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheer Vinukonda updated TS-3265:
--
Backport to Version: 5.2.0

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 #2  0x0063373c in spdy_process_fetch (event=TS_EVENT_ERROR, 
 sm=0x2ad800d4d730, edata=0x2ad818105c50) at SpdyClientSession.cc:350
 #3  0x00633151 in SpdyClientSession::state_session_readwrite 
 (this=0x2ad800d4d730, event=3, edata=0x2ad818105c50) at 
 SpdyClientSession.cc:253
 #4  0x00502eae in Continuation::handleEvent (this=0x2ad800d4d730, 
 event=3, data=0x2ad818105c50) at ../iocore/eventsystem/I_Continuation.h:146
 #5  0x00501116 in FetchSM::InvokePluginExt (this=0x2ad818105c50, 
 fetch_event=3) at FetchSM.cc:245
 #6  0x00501e59 in FetchSM::process_fetch_write (this=0x2ad818105c50, 
 event=3) at FetchSM.cc:495
 #7  0x00501f3e in FetchSM::fetch_handler (this=0x2ad818105c50, 
 event=3, edata=0x2ad86c90b960) at FetchSM.cc:511
 #8  0x00502eae in Continuation::handleEvent (this=0x2ad818105c50, 
 event=3, data=0x2ad86c90b960) at ../iocore/eventsystem/I_Continuation.h:146
 #9  0x0053e9d5 in PluginVC::process_write_side (this=0x2ad86c90b810, 
 other_side_call=false) at PluginVC.cc:519
 #10 0x0053dbcd in PluginVC::main_handler (this=0x2ad86c90b810, 
 event=2, data=0x2ad80802c690) at PluginVC.cc:210
 #11 0x00502eae in Continuation::handleEvent (this=0x2ad86c90b810, 
 event=2, data=0x2ad80802c690) at ../iocore/eventsystem/I_Continuation.h:146
 #12 0x00773136 in EThread::process_event (this=0x2ad6cc10, 
 e=0x2ad80802c690, calling_code=2) at UnixEThread.cc:144
 #13 0x00773451 in EThread::execute (this=0x2ad6cc10) at 
 UnixEThread.cc:223
 #14 0x007726f1 in spawn_thread_internal (a=0x161fd50) at Thread.cc:88
 #15 0x2ad6c6d30851 in start_thread () from /lib64/libpthread.so.0
 #16 0x003296ee890d in clone () from /lib64/libc.so.6
 {code}
 Looking at the core in gdb, it seems the {{req}} object extracted from the 
 {{sm-req_map}} based on {{stream_id}} is null. However, the {{req}} object 
 (extracted from {{TSFetchUserDataGet}}) from the caller 
 {{spdy_process_fetch}} in frame#2 seems valid. Unfortunately, the core was 
 deleted before I could examine more info (like, the sm object's req_map or 
 the req object details etc).
 {code}
 (gdb) frame 1
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 57SpdyCallbacks.cc: No such file or directory.
   in SpdyCallbacks.cc
 (gdb) print req
 $1 = (SpdyRequest *) 0x0
 (gdb) up
 #2  0x0063373c in spdy_process_fetch (event=TS_EVENT_ERROR, 
 sm=0x2ad800d4d730, edata=0x2ad818105c50) at SpdyClientSession.cc:350
 350   SpdyClientSession.cc: No such file or directory.
   in SpdyClientSession.cc
 (gdb) print req
 $2 = (SpdyRequest *) 0x2ad7c4e8be90
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3263) Segmentation fault about MIOBuffer

2014-12-28 Thread Qiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259799#comment-14259799
 ] 

Qiang Li commented on TS-3263:
--

I saw it on 5.0.0 and 5.2.0
some simple plugin about add/del headers

 Segmentation fault about MIOBuffer
 --

 Key: TS-3263
 URL: https://issues.apache.org/jira/browse/TS-3263
 Project: Traffic Server
  Issue Type: Bug
Reporter: Qiang Li
  Labels: crash
 Fix For: 5.3.0


 traffic.out
 {code}
 traffic_server: Segmentation fault (Address not mapped to object 
 [(nil)])traffic_server - STACK TRACE: 
 /usr/bin/traffic_server(_Z19crash_logger_invokeiP7siginfoPv+0xc3)[0x4fea52]
 /lib64/libpthread.so.0(+0xf710)[0x2aef4c9ea710]
 /lib64/libc.so.6(memcpy+0x11)[0x2aef4d97a681]
 /usr/bin/traffic_server(_ZN9MIOBuffer5writeEPKvl+0xb9)[0x7b8a4b]
 /usr/bin/traffic_server(_ZN8PluginVC14transfer_bytesEP9MIOBufferP14IOBufferReaderl+0xb8)[0x53ce2c]
 /usr/bin/traffic_server(_ZN8PluginVC17process_read_sideEb+0x4bf)[0x53da59]
 /usr/bin/traffic_server(_ZN8PluginVC18process_write_sideEb+0x6ca)[0x53d546]
 /usr/bin/traffic_server(_ZN8PluginVC12main_handlerEiPv+0x5a9)[0x53c409]
 /usr/bin/traffic_server(_ZN12Continuation11handleEventEiPv+0x6c)[0x50192c]
 /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0xc6)[0x7bb6ba]
 /usr/bin/traffic_server(_ZN7EThread7executeEv+0xa0)[0x7bb888]
 /usr/bin/traffic_server[0x7bac75]
 /lib64/libpthread.so.0(+0x79d1)[0x2aef4c9e29d1]
 /lib64/libc.so.6(clone+0x6d)[0x2aef4d9d99dd]
 {code}
 core dump
 {code}
 (gdb) bt
 #0  0x2aef4d97a681 in memcpy () from /lib64/libc.so.6
 #1  0x007b8a4b in MIOBuffer::write (this=0x2aad417c50e0, 
 abuf=0x2aac99d20ffb, alen=5) at IOBuffer.cc:93
 #2  0x0053ce2c in PluginVC::transfer_bytes (this=0x2aaae43cd360, 
 transfer_to=0x2aad417c50e0, transfer_from=0x2aac6c94a2c8, act_on=16848)
 at PluginVC.cc:452
 #3  0x0053da59 in PluginVC::process_read_side (this=0x2aaae43cd360, 
 other_side_call=true) at PluginVC.cc:653
 #4  0x0053d546 in PluginVC::process_write_side (this=0x2aaae43cd550, 
 other_side_call=false) at PluginVC.cc:565
 #5  0x0053c409 in PluginVC::main_handler (this=0x2aaae43cd550, 
 event=1, data=0x2aad55853420) at PluginVC.cc:210
 #6  0x0050192c in Continuation::handleEvent (this=0x2aaae43cd550, 
 event=1, data=0x2aad55853420) at ../iocore/eventsystem/I_Continuation.h:146
 #7  0x007bb6ba in EThread::process_event (this=0x2aef5511f010, 
 e=0x2aad55853420, calling_code=1) at UnixEThread.cc:144
 #8  0x007bb888 in EThread::execute (this=0x2aef5511f010) at 
 UnixEThread.cc:195
 #9  0x007bac75 in spawn_thread_internal (a=0x2c6ce00) at Thread.cc:88
 #10 0x2aef4c9e29d1 in start_thread () from /lib64/libpthread.so.0
 #11 0x2aef4d9d99dd in clone () from /lib64/libc.so.6
 (gdb) f 1
 #1  0x007b8a4b in MIOBuffer::write (this=0x2aad417c50e0, 
 abuf=0x2aac99d20ffb, alen=5) at IOBuffer.cc:93
 93  ::memcpy(_writer-end(), buf, f);
 (gdb) l
 88if (!_writer)
 89  add_block();
 90int64_t f = _writer-write_avail();
 91f = f  len ? f : len;
 92if (f  0) {
 93  ::memcpy(_writer-end(), buf, f);
 94  _writer-fill(f);
 95  buf += f;
 96  len -= f;
 97}
 (gdb) p *this
 $1 = {size_index = 46923640729072, water_mark = 32768, _writer = {m_ptr = 
 0x2aac52ec6d40}, readers = {{accessor = 0x0, mbuf = 0x0, block = {m_ptr = 
 0x0}, 
   start_offset = 0, size_limit = 9223372036854775807}, {accessor = 0x0, 
 mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0, 
   size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = 
 {m_ptr = 0x0}, start_offset = 0, size_limit = 9223372036854775807}, {
   accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0, 
 size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {
 m_ptr = 0x0}, start_offset = 0, size_limit = 9223372036854775807}}, 
 _location = 0x7e96d8 memory/IOBuffer/HttpSM.cc:6319}
 (gdb) p *_writer-m_ptr
 $2 = {RefCountObj = {ForceVFPTToTop = {_vptr.ForceVFPTToTop = 0x7be4b0}, 
 m_refcount = 1}, _start = 0x0, _end = 0x0, _buf_end = 0x2aad42efede1 \256~, 
   _location = 0x7e96d8 memory/IOBuffer/HttpSM.cc:6319, data = {m_ptr = 
 0x2aaabc356ab0}, next = {m_ptr = 0x0}}
 (gdb) p *_writer-m_ptr-data-m_ptr
 $3 = {RefCountObj = {ForceVFPTToTop = {_vptr.ForceVFPTToTop = 0x7be4f0}, 
 m_refcount = 1}, _size_index = 46923640729072, _mem_type = DEFAULT_ALLOC, 
   _data = 0x0, _location = 0x7e96d8 memory/IOBuffer/HttpSM.cc:6319}
 (gdb) 
 {code}
 HttpSM.cc:
 {code}
 6315:  alloc_index = find_server_buffer_size();
 6316: #ifndef USE_NEW_EMPTY_MIOBUFFER
 6317:  MIOBuffer *buf = new_MIOBuffer(alloc_index);
 6318: #else
 6319:  MIOBuffer *buf = 

[jira] [Updated] (TS-2883) core dump in TSFetchCreate()

2014-12-28 Thread Wei Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Sun updated TS-2883:

Attachment: TS-2883_2.diff

For those who may encounter the same issue, we used the attached patch in 
production for several months and haven't observed the crash so far. The theory 
is that we observed the same trace by running a unit test to free FetchSM 
multiple times in a multiple threads environment.

 core dump in TSFetchCreate()
 

 Key: TS-2883
 URL: https://issues.apache.org/jira/browse/TS-2883
 Project: Traffic Server
  Issue Type: Bug
  Components: Core, SPDY
Reporter: Sudheer Vinukonda
Assignee: Bryan Call
  Labels: crash, yahoo
 Fix For: 5.3.0

 Attachments: TS-2883.diff, TS-2883_2.diff


 {code}
 (gdb) bt
 #0  ink_freelist_new (f=0x2923050) at ink_queue.cc:202
 #1  0x004c0cd2 in alloc (contp=0x2b86821e2120, 
 method=TS_FETCH_METHOD_POST, 
 url=0x2b865d64f228 
 https://aa-mg5.mail.yahoo.com/ws/mail/v2.0/jsonrpc?appid=YahooMailNeom=BatchExecutewssid=nG7kmTWsJCDaction=compose_0_savedraftactionCount=88debugmid=2_0_0_3_126623_AMNwimIAAC82U5ZXLwAAAFWGrR8deb;...,
  version=0x2b865da5ace8 HTTP/1.1, client_addr=value optimized out, 
 flags=value optimized out) at ../lib/ts/Allocator.h:117
 #2  TSFetchCreate (contp=0x2b86821e2120, method=TS_FETCH_METHOD_POST, 
 url=0x2b865d64f228 
 https://aa-mg5.mail.yahoo.com/ws/mail/v2.0/jsonrpc?appid=YahooMailNeom=BatchExecutewssid=nG7kmTWsJCDaction=compose_0_savedraftactionCount=88debugmid=2_0_0_3_126623_AMNwimIAAC82U5ZXLwAAAFWGrR8deb;...,
  version=0x2b865da5ace8 HTTP/1.1, client_addr=value optimized out, 
 flags=value optimized out) at InkAPI.cc:7289
 #3  0x005f117e in spdy_fetcher_launch (req=0x2b871c2fa900, 
 method=TS_FETCH_METHOD_POST) at SpdyCallbacks.cc:187
 #4  0x005f1faa in spdy_process_syn_stream_frame (session=value 
 optimized out, type=value optimized out, frame=value optimized out, 
 user_data=0x2b86821e2120)
 at SpdyCallbacks.cc:295
 #5  spdy_on_ctrl_recv_callback (session=value optimized out, type=value 
 optimized out, frame=value optimized out, user_data=0x2b86821e2120) at 
 SpdyCallbacks.cc:335
 #6  0x00738050 in spdylay_session_on_syn_stream_received 
 (session=0x2b865defce10, frame=0x2b858f987a20) at spdylay_session.c:1782
 #7  0x00738d57 in spdylay_session_process_ctrl_frame 
 (session=0x2b865defce10, in=0x2b858f987a90 \200\003, inlen=value optimized 
 out) at spdylay_session.c:2246
 #8  spdylay_session_mem_recv (session=0x2b865defce10, in=0x2b858f987a90 
 \200\003, inlen=value optimized out) at spdylay_session.c:2805
 #9  0x00738f89 in spdylay_session_recv (session=0x2b865defce10) at 
 spdylay_session.c:2828
 #10 0x005ef17b in spdy_process_read (this=0x2b86821e2120, event=100, 
 edata=value optimized out) at SpdyClientSession.cc:279
 #11 SpdyClientSession::state_session_readwrite (this=0x2b86821e2120, 
 event=100, edata=value optimized out) at SpdyClientSession.cc:236
 #12 0x0070dbd7 in handleEvent (this=0x2b86fc1d2cf0, event=value 
 optimized out) at ../../iocore/eventsystem/I_Continuation.h:146
 #13 read_signal_and_update (this=0x2b86fc1d2cf0, event=value optimized out) 
 at UnixNetVConnection.cc:138
 #14 UnixNetVConnection::readSignalAndUpdate (this=0x2b86fc1d2cf0, 
 event=value optimized out) at UnixNetVConnection.cc:914
 #15 0x006fe66f in SSLNetVConnection::net_read_io 
 (this=0x2b86fc1d2cf0, nh=0x2b858d46bbf0, lthread=0x2b858d468010) at 
 SSLNetVConnection.cc:294
 #16 0x00705a72 in NetHandler::mainNetEvent (this=0x2b858d46bbf0, 
 event=value optimized out, e=value optimized out) at UnixNet.cc:399
 #17 0x007323ef in handleEvent (this=0x2b858d468010, e=0x2a7db30, 
 calling_code=5) at I_Continuation.h:146
 #18 EThread::process_event (this=0x2b858d468010, e=0x2a7db30, calling_code=5) 
 at UnixEThread.cc:145
 #19 0x00732d93 in EThread::execute (this=0x2b858d468010) at 
 UnixEThread.cc:269
 #20 0x0073179a in spawn_thread_internal (a=0x2e404e0) at Thread.cc:88
 #21 0x2b835bf28851 in start_thread () from /lib64/libpthread.so.0
 #22 0x00361a0e894d in clone () from /lib64/libc.so.6
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3029) ats-5.0.1 crash when SPDY enabled

2014-12-28 Thread Wei Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259879#comment-14259879
 ] 

Wei Sun edited comment on TS-3029 at 12/29/14 6:28 AM:
---

We observed the following error logs when the crash happened:
[Aug 19 16:00:16.381] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch fetch
error, fetch_sm 0x2b907bb63d10, ret -1 for sm_id 6086046, stream_id 103, req
time 1408464016381152420, url
https://s.yimg.com/bt/api/res/1.2/MzHNr6ctgTgvHxpIc2CQQw--/YXBwaWQ9eW5ld3M7Zmk9ZmlsbDtoPTk4O3B5b2ZmPTE1O3E9NzU7dz0xODA-/http://media.zenfs.com/en_GB/Sports/Eurosport/1139793-18355072-2560-1440.jpg
[Aug 19 16:00:16.381] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch sending
STATUS_500, fetch_sm (nil), ret -1 for sm_id 6086046, stream_id 103, req time
1408464016381152420, url
https://s.yimg.com/bt/api/res/1.2/MzHNr6ctgTgvHxpIc2CQQw--/YXBwaWQ9eW5ld3M7Zmk9ZmlsbDtoPTk4O3B5b2ZmPTE1O3E9NzU7dz0xODA-/http://media.zenfs.com/en_GB/Sports/Eurosport/1139793-18355072-2560-1440.jpg
[Aug 19 16:00:16.429] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch fetch
error, fetch_sm (nil), ret -1 for sm_id 6086046, stream_id 103, req time
1408464016381152420, url 
[Aug 19 16:00:16.429] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch sending
STATUS_500, fetch_sm (nil), ret -1 for sm_id 6086046, stream_id 103, req time
1408464016381152420, url 

Looks like the stream request (SpdyRequest* req) was erased from the req_map 
due to a fetch error, but the same stream request was accessed again.
{code}
(gdb) f 1
#1  0x0061e386 in spdy_prepare_status_response (sm=0x2b90b68a5d90,
stream_id=103, status=0x792cad 500 Internal Server Error) at
SpdyCallbacks.cc:57
57SpdyCallbacks.cc: No such file or directory.
in SpdyCallbacks.cc
(gdb) p req
$10 = (SpdyRequest *) 0x0
(gdb) p sm-req_map
$11 = std::map with 1 elements = {
  [103] = 0x0
}
(gdb) f 2
#2  0x0061b4ef in spdy_process_fetch (event=TS_EVENT_VCONN_READ_READY,
sm=0x2b90b68a5d90, edata=0x2b907bb63d10) at SpdyClientSession.cc:381
381SpdyClientSession.cc: No such file or directory.
in SpdyClientSession.cc
(gdb) p req
$12 = (SpdyRequest *) 0x2b932f141d70
{code}

Ran the below fix (one-line change) for several months, didn't observe the same 
issue.
{code}
diff --git a/proxy/spdy/SpdyClientSession.cc b/proxy/spdy/SpdyClientSession.cc
index 79fed77..58e544f 100644
--- a/proxy/spdy/SpdyClientSession.cc
+++ b/proxy/spdy/SpdyClientSession.cc
@@ -371,7 +371,6 @@ spdy_process_fetch(TSEvent event, SpdyClientSession *sm, 
void *edata)
   ret = 0; // Ignore fetch errors after FETCH BODY DONE
 else {
   Error(spdy_process_fetch fetch error, fetch_sm %p, ret %d for sm_id % 
PRId64 , stream_id %u, req time % PRId64 , url %s, req-fetch_sm, ret, 
sm-sm_id, req-stream_id, req-start_time, req-url.c_str());
-  req-fetch_sm = NULL;
 }
 break;
   }
{code}


was (Author: sunwei):
We observed the following error logs when the crash happened:
[Aug 19 16:00:16.381] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch fetch
error, fetch_sm 0x2b907bb63d10, ret -1 for sm_id 6086046, stream_id 103, req
time 1408464016381152420, url
https://s.yimg.com/bt/api/res/1.2/MzHNr6ctgTgvHxpIc2CQQw--/YXBwaWQ9eW5ld3M7Zmk9ZmlsbDtoPTk4O3B5b2ZmPTE1O3E9NzU7dz0xODA-/http://media.zenfs.com/en_GB/Sports/Eurosport/1139793-18355072-2560-1440.jpg
[Aug 19 16:00:16.381] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch sending
STATUS_500, fetch_sm (nil), ret -1 for sm_id 6086046, stream_id 103, req time
1408464016381152420, url
https://s.yimg.com/bt/api/res/1.2/MzHNr6ctgTgvHxpIc2CQQw--/YXBwaWQ9eW5ld3M7Zmk9ZmlsbDtoPTk4O3B5b2ZmPTE1O3E9NzU7dz0xODA-/http://media.zenfs.com/en_GB/Sports/Eurosport/1139793-18355072-2560-1440.jpg
[Aug 19 16:00:16.429] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch fetch
error, fetch_sm (nil), ret -1 for sm_id 6086046, stream_id 103, req time
1408464016381152420, url 
[Aug 19 16:00:16.429] Server {0x2b8fa6f5e700} ERROR: spdy_process_fetch sending
STATUS_500, fetch_sm (nil), ret -1 for sm_id 6086046, stream_id 103, req time
1408464016381152420, url 

Looks like the stream request (SpdyRequest* req) was erased from the req_map 
due to a fetch error, but the same stream request was accessed again.
{code}
(gdb) f 1
#1  0x0061e386 in spdy_prepare_status_response (sm=0x2b90b68a5d90,
stream_id=103, status=0x792cad 500 Internal Server Error) at
SpdyCallbacks.cc:57
57SpdyCallbacks.cc: No such file or directory.
in SpdyCallbacks.cc
(gdb) p req
$10 = (SpdyRequest *) 0x0
(gdb) p sm-req_map
$11 = std::map with 1 elements = {
  [103] = 0x0
}
(gdb) f 2
#2  0x0061b4ef in spdy_process_fetch (event=TS_EVENT_VCONN_READ_READY,
sm=0x2b90b68a5d90, edata=0x2b907bb63d10) at SpdyClientSession.cc:381
381SpdyClientSession.cc: No such file or directory.
in SpdyClientSession.cc
(gdb) p req
$12 = (SpdyRequest *) 0x2b932f141d70
{code}


[jira] [Commented] (TS-3265) core dump in spdy_prepare_status_response_and_clean_request

2014-12-28 Thread Yunkai Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259902#comment-14259902
 ] 

Yunkai Zhang commented on TS-3265:
--

Cool, nice catch!

 core dump in spdy_prepare_status_response_and_clean_request
 ---

 Key: TS-3265
 URL: https://issues.apache.org/jira/browse/TS-3265
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Affects Versions: 5.2.0
Reporter: Sudheer Vinukonda
Assignee: Sudheer Vinukonda
 Fix For: 5.3.0


 Noticed the below core dump in v5.2.0.
 {code}
 (gdb) bt
 #0  0x00638442 in std::vectorstd::pairstd::basic_stringchar, 
 std::char_traitschar, std::allocatorchar , std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar  , 
 std::allocatorstd::pairstd::basic_stringchar, std::char_traitschar, 
 std::allocatorchar , std::basic_stringchar, std::char_traitschar, 
 std::allocatorchar::size (
 this=0x38) at 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:533
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 #2  0x0063373c in spdy_process_fetch (event=TS_EVENT_ERROR, 
 sm=0x2ad800d4d730, edata=0x2ad818105c50) at SpdyClientSession.cc:350
 #3  0x00633151 in SpdyClientSession::state_session_readwrite 
 (this=0x2ad800d4d730, event=3, edata=0x2ad818105c50) at 
 SpdyClientSession.cc:253
 #4  0x00502eae in Continuation::handleEvent (this=0x2ad800d4d730, 
 event=3, data=0x2ad818105c50) at ../iocore/eventsystem/I_Continuation.h:146
 #5  0x00501116 in FetchSM::InvokePluginExt (this=0x2ad818105c50, 
 fetch_event=3) at FetchSM.cc:245
 #6  0x00501e59 in FetchSM::process_fetch_write (this=0x2ad818105c50, 
 event=3) at FetchSM.cc:495
 #7  0x00501f3e in FetchSM::fetch_handler (this=0x2ad818105c50, 
 event=3, edata=0x2ad86c90b960) at FetchSM.cc:511
 #8  0x00502eae in Continuation::handleEvent (this=0x2ad818105c50, 
 event=3, data=0x2ad86c90b960) at ../iocore/eventsystem/I_Continuation.h:146
 #9  0x0053e9d5 in PluginVC::process_write_side (this=0x2ad86c90b810, 
 other_side_call=false) at PluginVC.cc:519
 #10 0x0053dbcd in PluginVC::main_handler (this=0x2ad86c90b810, 
 event=2, data=0x2ad80802c690) at PluginVC.cc:210
 #11 0x00502eae in Continuation::handleEvent (this=0x2ad86c90b810, 
 event=2, data=0x2ad80802c690) at ../iocore/eventsystem/I_Continuation.h:146
 #12 0x00773136 in EThread::process_event (this=0x2ad6cc10, 
 e=0x2ad80802c690, calling_code=2) at UnixEThread.cc:144
 #13 0x00773451 in EThread::execute (this=0x2ad6cc10) at 
 UnixEThread.cc:223
 #14 0x007726f1 in spawn_thread_internal (a=0x161fd50) at Thread.cc:88
 #15 0x2ad6c6d30851 in start_thread () from /lib64/libpthread.so.0
 #16 0x003296ee890d in clone () from /lib64/libc.so.6
 {code}
 Looking at the core in gdb, it seems the {{req}} object extracted from the 
 {{sm-req_map}} based on {{stream_id}} is null. However, the {{req}} object 
 (extracted from {{TSFetchUserDataGet}}) from the caller 
 {{spdy_process_fetch}} in frame#2 seems valid. Unfortunately, the core was 
 deleted before I could examine more info (like, the sm object's req_map or 
 the req object details etc).
 {code}
 (gdb) frame 1
 #1  0x0063667e in spdy_prepare_status_response_and_clean_request 
 (sm=0x2ad800d4d730, stream_id=1, status=0x7b6b4d 500 Internal Server Error) 
 at SpdyCallbacks.cc:57
 57SpdyCallbacks.cc: No such file or directory.
   in SpdyCallbacks.cc
 (gdb) print req
 $1 = (SpdyRequest *) 0x0
 (gdb) up
 #2  0x0063373c in spdy_process_fetch (event=TS_EVENT_ERROR, 
 sm=0x2ad800d4d730, edata=0x2ad818105c50) at SpdyClientSession.cc:350
 350   SpdyClientSession.cc: No such file or directory.
   in SpdyClientSession.cc
 (gdb) print req
 $2 = (SpdyRequest *) 0x2ad7c4e8be90
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)