[jira] [Created] (TS-3329) ATS shouldn't start if SSL is configured and certificate can't be loaded
kang li created TS-3329: --- Summary: ATS shouldn't start if SSL is configured and certificate can't be loaded Key: TS-3329 URL: https://issues.apache.org/jira/browse/TS-3329 Project: Traffic Server Issue Type: Improvement Components: SSL Reporter: kang li requirement by [~dcarlin]: {quote} It seems ATS will start up even if the certificate file isn't present. ATS settings in records.config: CONFIG proxy.config.ssl.server.cert_chain.filename STRING digicert.pem CONFIG proxy.config.ssl.server.cert.path STRING conf/yts/ssl ATS settings in ssl_multicert.config: dest_ip=* ssl_cert_name=ycpi_ssl_cert.pem What happened was that this volume /home/y/conf/yts/ssl wasn't mounted - so the SSL cert and chain cert were inaccessible. ATS started anyways just returning errors on 443. Healthchecks were served on port 80 via HTTP, so it appeared to that the site was OK. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3329) ATS shouldn't start if SSL is configured and certificate can't be loaded
[ https://issues.apache.org/jira/browse/TS-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-3329: Attachment: patch.diff Just exit when ATS load SSL certificate failure. ATS shouldn't start if SSL is configured and certificate can't be loaded Key: TS-3329 URL: https://issues.apache.org/jira/browse/TS-3329 Project: Traffic Server Issue Type: Improvement Components: SSL Reporter: kang li Attachments: patch.diff requirement by [~dcarlin]: {quote} It seems ATS will start up even if the certificate file isn't present. ATS settings in records.config: CONFIG proxy.config.ssl.server.cert_chain.filename STRING digicert.pem CONFIG proxy.config.ssl.server.cert.path STRING conf/yts/ssl ATS settings in ssl_multicert.config: dest_ip=* ssl_cert_name=ycpi_ssl_cert.pem What happened was that this volume /home/y/conf/yts/ssl wasn't mounted - so the SSL cert and chain cert were inaccessible. ATS started anyways just returning errors on 443. Healthchecks were served on port 80 via HTTP, so it appeared to that the site was OK. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3235) PluginVC crashed with unrecognized event
[ https://issues.apache.org/jira/browse/TS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251285#comment-14251285 ] kang li commented on TS-3235: - This was turn out to be a race condition issue. That seems two thread are in race condition to create a {{sm_lock_retry_event}}. This event may be created by {{PluginVC::do_io_read}}, {{PluginVC::do_io_write}}, {{PluginVC::reenable}} and {{PluginVC::do_io_close}} through {{PluginVC::setup_event_cb}}. {code} void PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr) { ink_assert(magic == PLUGIN_VC_MAGIC_ALIVE); if (*e_ptr == NULL) { // We locked the pointer so we can now allocate an event // to call us back if (in == 0) { if(this_ethread()-tt == REGULAR) { *e_ptr = this_ethread()-schedule_imm_local(this); } {code} The core dump seems caused by this scenario: thread 1 and thread 2 check *e_ptr as NULL. Then they all enter into to create a {{sm_lock_retry_event}}. This thread2 create a new event2 override the event1 created by thread1. So PluginVC's {{sm_lock_retry_event}} pointed to event2. But when event1 was rescheduled to PluginVC, it can't recognize the event as a valid event. Then caused the core dump. We had make a simple fix to use a mutex to double check the {{e_ptr}} in {{PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)}}. The core dump seems not happened again. I are not sure if we worked in a best pattern as a intercept plugin. As our plugin call InterceptPlugin::produce in a separated thread to write back the response. The core dump may be somewhat related to this work pattern. PluginVC crashed with unrecognized event Key: TS-3235 URL: https://issues.apache.org/jira/browse/TS-3235 Project: Traffic Server Issue Type: Bug Components: CPP API, HTTP, Plugins Reporter: kang li Assignee: Brian Geffon Fix For: 5.3.0 We are using atscppapi to create Intercept plugin. From the coredump , that seems Continuation of the InterceptPlugin was already been destroyed. {code} #0 0x00375ac32925 in raise () from /lib64/libc.so.6 #1 0x00375ac34105 in abort () from /lib64/libc.so.6 #2 0x2b21eeae3458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b21eeae3525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`, ap=0x2b21f4913ad0) at ink_error.cc:65 #4 0x2b21eeae35ee in ink_fatal (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b21eeae2160 in _ink_assert (expression=0x76ddb8 call_event == core_lock_retry_event, file=0x76dd04 PluginVC.cc, line=203) at ink_assert.cc:37 #6 0x00530217 in PluginVC::main_handler (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at PluginVC.cc:203 #7 0x004f5854 in Continuation::handleEvent (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at ../iocore/eventsystem/I_Continuation.h:146 #8 0x00755d26 in EThread::process_event (this=0x309b250, e=0xe0f5b80, calling_code=1) at UnixEThread.cc:145 #9 0x0075610a in EThread::execute (this=0x309b250) at UnixEThread.cc:239 #10 0x00755284 in spawn_thread_internal (a=0x2849330) at Thread.cc:88 #11 0x2b21ef05f9d1 in start_thread () from /lib64/libpthread.so.0 #12 0x00375ace8b7d in clone () from /lib64/libc.so.6 (gdb) p sm_lock_retry_event $13 = (Event *) 0x2b2496146e90 (gdb) p core_lock_retry_event $14 = (Event *) 0x0 (gdb) p active_event $15 = (Event *) 0x0 (gdb) p inactive_event $16 = (Event *) 0x0 (gdb) p *(INKContInternal*)this-core_obj-connect_to Cannot access memory at address 0x2b269cd46c10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3235) PluginVC crashed with unrecognized event
[ https://issues.apache.org/jira/browse/TS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251285#comment-14251285 ] kang li edited comment on TS-3235 at 12/18/14 6:35 AM: --- This was turn out to be a race condition issue. That seems two thread are in race condition to create a {{sm_lock_retry_event}}. This event may be created by {{PluginVC::do_io_read}}, {{PluginVC::do_io_write}}, {{PluginVC::reenable}} and {{PluginVC::do_io_close}} through {{PluginVC::setup_event_cb}}. {code} void PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr) { ink_assert(magic == PLUGIN_VC_MAGIC_ALIVE); if (*e_ptr == NULL) { // We locked the pointer so we can now allocate an event // to call us back if (in == 0) { if(this_ethread()-tt == REGULAR) { *e_ptr = this_ethread()-schedule_imm_local(this); } {code} The core dump seems caused by this scenario: thread 1 and thread 2 check *e_ptr as NULL. Then they all enter into to create a {{sm_lock_retry_event}}. This thread2 create a new event2 override the event1 created by thread1. So PluginVC's {{sm_lock_retry_event}} pointed to event2. But when event1 was rescheduled to PluginVC, it can't recognize the event as a valid event. Then caused the core dump. We had make a simple fix to use a mutex to double check the {{e_ptr}} in {{PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)}}. The core dump seems not happened again. I are not clear if this a proper fix. As we may not work in the best pattern as a intercept plugin. Our plugin call InterceptPlugin::produce in a separated thread to write back the response. The core dump may be somewhat related to this work pattern. was (Author: kang li): This was turn out to be a race condition issue. That seems two thread are in race condition to create a {{sm_lock_retry_event}}. This event may be created by {{PluginVC::do_io_read}}, {{PluginVC::do_io_write}}, {{PluginVC::reenable}} and {{PluginVC::do_io_close}} through {{PluginVC::setup_event_cb}}. {code} void PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr) { ink_assert(magic == PLUGIN_VC_MAGIC_ALIVE); if (*e_ptr == NULL) { // We locked the pointer so we can now allocate an event // to call us back if (in == 0) { if(this_ethread()-tt == REGULAR) { *e_ptr = this_ethread()-schedule_imm_local(this); } {code} The core dump seems caused by this scenario: thread 1 and thread 2 check *e_ptr as NULL. Then they all enter into to create a {{sm_lock_retry_event}}. This thread2 create a new event2 override the event1 created by thread1. So PluginVC's {{sm_lock_retry_event}} pointed to event2. But when event1 was rescheduled to PluginVC, it can't recognize the event as a valid event. Then caused the core dump. We had make a simple fix to use a mutex to double check the {{e_ptr}} in {{PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)}}. The core dump seems not happened again. I are not sure if we worked in a best pattern as a intercept plugin. As our plugin call InterceptPlugin::produce in a separated thread to write back the response. The core dump may be somewhat related to this work pattern. PluginVC crashed with unrecognized event Key: TS-3235 URL: https://issues.apache.org/jira/browse/TS-3235 Project: Traffic Server Issue Type: Bug Components: CPP API, HTTP, Plugins Reporter: kang li Assignee: Brian Geffon Fix For: 5.3.0 We are using atscppapi to create Intercept plugin. From the coredump , that seems Continuation of the InterceptPlugin was already been destroyed. {code} #0 0x00375ac32925 in raise () from /lib64/libc.so.6 #1 0x00375ac34105 in abort () from /lib64/libc.so.6 #2 0x2b21eeae3458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b21eeae3525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`, ap=0x2b21f4913ad0) at ink_error.cc:65 #4 0x2b21eeae35ee in ink_fatal (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b21eeae2160 in _ink_assert (expression=0x76ddb8 call_event == core_lock_retry_event, file=0x76dd04 PluginVC.cc, line=203) at ink_assert.cc:37 #6 0x00530217 in PluginVC::main_handler (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at PluginVC.cc:203 #7 0x004f5854 in Continuation::handleEvent (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at ../iocore/eventsystem/I_Continuation.h:146 #8 0x00755d26 in EThread::process_event (this=0x309b250, e=0xe0f5b80, calling_code=1) at UnixEThread.cc:145 #9 0x0075610a in EThread::execute
[jira] [Commented] (TS-3235) PluginVC crashed with unrecognized event
[ https://issues.apache.org/jira/browse/TS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246368#comment-14246368 ] kang li commented on TS-3235: - Hi [~amc], This is the reentrancy_count and deletable flag of the PASSIVE_VC. This is from another core dump, which seems INKContInternal was marked as closed. {code} (gdb) p this-vc_type $8 = PLUGIN_VC_PASSIVE (gdb) p this-deletable $9 = false (gdb) p this-reentrancy_count $10 = 1 (gdb) p this-other_side-deletable $11 = false (gdb) p this-other_side-reentrancy_count $12 = 0 (gdb) p *(INKContInternal*)this-core_obj-connect_to $13 = {DummyVConnection = {VConnection = {Continuation = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x769690}, handler = ( int (Continuation::*)(Continuation *, int, void *)) 0x50a236 INKContInternal::handle_event(int, void*), mutex = {m_ptr = 0x2b47eee4b0d0}, link = {SLinkContinuation = {next = 0x0}, prev = 0x0}}, lerrno = 0}, No data fields}, mdata = 0x2b47cce65e10, m_event_func = 0x2b4356aaa620 (anonymous namespace)::handleEvents(TSCont, TSEvent, void*), m_event_count = 0, m_closed = 1, m_deletable = 0, m_deleted = 0, m_free_magic = INKCONT_INTERN_MAGIC_ALIVE} {code} Regards, Kang PluginVC crashed with unrecognized event Key: TS-3235 URL: https://issues.apache.org/jira/browse/TS-3235 Project: Traffic Server Issue Type: Bug Components: CPP API, HTTP, Plugins Reporter: kang li Assignee: Brian Geffon We are using atscppapi to create Intercept plugin. From the coredump , that seems Continuation of the InterceptPlugin was already been destroyed. {code} #0 0x00375ac32925 in raise () from /lib64/libc.so.6 #1 0x00375ac34105 in abort () from /lib64/libc.so.6 #2 0x2b21eeae3458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b21eeae3525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`, ap=0x2b21f4913ad0) at ink_error.cc:65 #4 0x2b21eeae35ee in ink_fatal (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b21eeae2160 in _ink_assert (expression=0x76ddb8 call_event == core_lock_retry_event, file=0x76dd04 PluginVC.cc, line=203) at ink_assert.cc:37 #6 0x00530217 in PluginVC::main_handler (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at PluginVC.cc:203 #7 0x004f5854 in Continuation::handleEvent (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at ../iocore/eventsystem/I_Continuation.h:146 #8 0x00755d26 in EThread::process_event (this=0x309b250, e=0xe0f5b80, calling_code=1) at UnixEThread.cc:145 #9 0x0075610a in EThread::execute (this=0x309b250) at UnixEThread.cc:239 #10 0x00755284 in spawn_thread_internal (a=0x2849330) at Thread.cc:88 #11 0x2b21ef05f9d1 in start_thread () from /lib64/libpthread.so.0 #12 0x00375ace8b7d in clone () from /lib64/libc.so.6 (gdb) p sm_lock_retry_event $13 = (Event *) 0x2b2496146e90 (gdb) p core_lock_retry_event $14 = (Event *) 0x0 (gdb) p active_event $15 = (Event *) 0x0 (gdb) p inactive_event $16 = (Event *) 0x0 (gdb) p *(INKContInternal*)this-core_obj-connect_to Cannot access memory at address 0x2b269cd46c10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3235) PluginVC crashed with unrecognized event
kang li created TS-3235: --- Summary: PluginVC crashed with unrecognized event Key: TS-3235 URL: https://issues.apache.org/jira/browse/TS-3235 Project: Traffic Server Issue Type: Bug Components: CPP API, HTTP, Plugins Reporter: kang li Assignee: Brian Geffon We are using atscppapi to create Intercept plugin. From the coredump , that seems Continuation of the InterceptPlugin was already been destroyed. {code} #0 0x00375ac32925 in raise () from /lib64/libc.so.6 #1 0x00375ac34105 in abort () from /lib64/libc.so.6 #2 0x2b21eeae3458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b21eeae3525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`, ap=0x2b21f4913ad0) at ink_error.cc:65 #4 0x2b21eeae35ee in ink_fatal (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b21eeae2160 in _ink_assert (expression=0x76ddb8 call_event == core_lock_retry_event, file=0x76dd04 PluginVC.cc, line=203) at ink_assert.cc:37 #6 0x00530217 in PluginVC::main_handler (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at PluginVC.cc:203 #7 0x004f5854 in Continuation::handleEvent (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at ../iocore/eventsystem/I_Continuation.h:146 #8 0x00755d26 in EThread::process_event (this=0x309b250, e=0xe0f5b80, calling_code=1) at UnixEThread.cc:145 #9 0x0075610a in EThread::execute (this=0x309b250) at UnixEThread.cc:239 #10 0x00755284 in spawn_thread_internal (a=0x2849330) at Thread.cc:88 #11 0x2b21ef05f9d1 in start_thread () from /lib64/libpthread.so.0 #12 0x00375ace8b7d in clone () from /lib64/libc.so.6 (gdb) p sm_lock_retry_event $13 = (Event *) 0x2b2496146e90 (gdb) p core_lock_retry_event $14 = (Event *) 0x0 (gdb) p active_event $15 = (Event *) 0x0 (gdb) p inactive_event $16 = (Event *) 0x0 (gdb) p *(INKContInternal*)this-core_obj-connect_to Cannot access memory at address 0x2b269cd46c10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3119) Add option to support SO_LINGER to origin server
[ https://issues.apache.org/jira/browse/TS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168085#comment-14168085 ] kang li commented on TS-3119: - Hi [~jpe...@apache.org], The timeout should be 0 if we turn on SO_LINGER option as we are using non-blocking socket. I have tried a non-zero timeout, but the setsockopt failed. If we add a new sock option {code} static unit32_t const SOCK_OPT_LINGER_ON = 4; {code} Then we could configure {code} proxy.config.net.sock_option_flag_in proxy.config.net.sock_option_flag_out {code} to make client connection or server connection LINGER for 0 ms by need. Add option to support SO_LINGER to origin server Key: TS-3119 URL: https://issues.apache.org/jira/browse/TS-3119 Project: Traffic Server Issue Type: Improvement Components: Network Reporter: kang li Fix For: 5.2.0 Attachments: linger.diff When install ATS and apache in the same box to do SSL termination. We saw port exhaustion, performance drop and request missing through ATS. Before migration we are using stunnel to do SSL termination. There were no such problem. After investigation we found add SO_LINGER option to origin could resolve these problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3119) Add option to support SO_LINGER to origin server
[ https://issues.apache.org/jira/browse/TS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-3119: Attachment: linger.diff Add option to support SO_LINGER to origin server Key: TS-3119 URL: https://issues.apache.org/jira/browse/TS-3119 Project: Traffic Server Issue Type: Improvement Components: Network Reporter: kang li Fix For: 5.2.0 Attachments: linger.diff When install ATS and apache in the same box to do SSL termination. We saw port exhaustion, performance drop and request missing through ATS. Before migration we are using stunnel to do SSL termination. There were no such problem. After investigation we found add SO_LINGER option to origin could resolve these problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3119) Add option to support SO_LINGER to origin server
[ https://issues.apache.org/jira/browse/TS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166573#comment-14166573 ] kang li commented on TS-3119: - We merging this patch I noticed that we do not need set linger_timeout as ATS are using non-blocking socket. Would it be better just add a new tag in sockopt_flags ? {code} static unit32_t const SOCK_OPT_LINGER_OUT = 4; {code} This would make the code more clean. Add option to support SO_LINGER to origin server Key: TS-3119 URL: https://issues.apache.org/jira/browse/TS-3119 Project: Traffic Server Issue Type: Improvement Components: Network Reporter: kang li Fix For: 5.2.0 Attachments: linger.diff When install ATS and apache in the same box to do SSL termination. We saw port exhaustion, performance drop and request missing through ATS. Before migration we are using stunnel to do SSL termination. There were no such problem. After investigation we found add SO_LINGER option to origin could resolve these problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3119) Add option to support SO_LINGER to origin server
[ https://issues.apache.org/jira/browse/TS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166573#comment-14166573 ] kang li edited comment on TS-3119 at 10/10/14 9:03 AM: --- When merging this patch I noticed that we do not need set linger_timeout as ATS are using non-blocking socket. Would it be better just add a new tag in sockopt_flags ? {code} static unit32_t const SOCK_OPT_LINGER_OUT = 4; {code} This would make the code more clean. was (Author: kang li): We merging this patch I noticed that we do not need set linger_timeout as ATS are using non-blocking socket. Would it be better just add a new tag in sockopt_flags ? {code} static unit32_t const SOCK_OPT_LINGER_OUT = 4; {code} This would make the code more clean. Add option to support SO_LINGER to origin server Key: TS-3119 URL: https://issues.apache.org/jira/browse/TS-3119 Project: Traffic Server Issue Type: Improvement Components: Network Reporter: kang li Fix For: 5.2.0 Attachments: linger.diff When install ATS and apache in the same box to do SSL termination. We saw port exhaustion, performance drop and request missing through ATS. Before migration we are using stunnel to do SSL termination. There were no such problem. After investigation we found add SO_LINGER option to origin could resolve these problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162992#comment-14162992 ] kang li commented on TS-3078: - Yeah, that make more sense. But should we have the ability to detect and respond the disk issue more reasonable. This core dump take me quite a lot of time to figure it out as it looks more like a memory issue. MIMEHdrImpl::unmarshal crash caused by cache corruption --- Key: TS-3078 URL: https://issues.apache.org/jira/browse/TS-3078 Project: Traffic Server Issue Type: Bug Components: Cache, MIME Affects Versions: 4.0.2 Reporter: kang li Labels: crash {code} (gdb) bt #0 0x005c22c6 in unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3534 #1 MIMEHdrImpl::unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3590 #2 0x005b4bcb in HdrHeap::unmarshal (this=0x2aaed05f6140, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:926 #3 0x005b8be1 in HTTPInfo::unmarshal (buf=0x2aaed05f6048 \355\336\315\253, len=3280, block_ref=0x2aae080feec0) at HTTP.cc:1948 #4 0x00635e43 in unmarshal_helper (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2066 #5 CacheVC::handleReadDone (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2195 #6 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #7 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #8 0x006a990f in handleEvent (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at I_Continuation.h:146 #9 EThread::process_event (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at UnixEThread.cc:141 #10 0x006aa48b in EThread::execute (this=0x2aadf0809010) at UnixEThread.cc:192 #11 0x006a87aa in spawn_thread_internal (a=0x13e0a80) at Thread.cc:88 #12 0x2aadeaf3c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0034cfce8b6d in clone () from /lib64/libc.so.6 (gdb) p m_freetop $1 = 892220471 (gdb) p m_field_slots $2 = {{m_ptr_name = 0x33203a2274616c22 Address 0x33203a2274616c22 out of bounds, m_ptr_value = 0x393938312e33 Address 0x393938312e33 out of bounds, m_next_dup = 0x3a226e6f6c22202c, m_wks_idx = 11552, m_len_name = 14137, m_len_value = 3289390, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 3 '\003', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x766c4ccefc939174 Address 0x766c4ccefc939174 out of bounds, m_ptr_value = 0x222056def09983ac Address 0x222056def09983ac out of bounds, m_next_dup = 0x203a4d1444c0d5b3, m_wks_idx = 21538, m_len_name = 9048, m_len_value = 7890260, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x656f7722202c2273 Address 0x656f7722202c2273 out of bounds, m_ptr_value = 0x3039373231203a22 Address 0x3039373231203a22 out of bounds, m_next_dup = 0x697a22202c393739, m_wks_idx = 8816, m_len_name = 8250, m_len_value = 3553058, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x7d5d4b2bf0819670 Address 0x7d5d4b2bf0819670 out of bounds, m_ptr_value = 0x796b8e1844d2836c Address 0x796b8e1844d2836c out of bounds, m_next_dup = 0x756c8c24f2da9b62, m_wks_idx = 8805, m_len_name = 31546, m_len_value = 7152160, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x22207b203a227370 Address 0x22207b203a227370 out of bounds, m_ptr_value = 0x22203a2272646461 Address 0x22203a2272646461 out of bounds, m_next_dup = 0x626d452034323332, m_wks_idx = 29285, m_len_name = 22304, m_len_value = 6582127, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 1 '\001'}, {m_ptr_name = 0x6c654b202c724420 Address 0x6c654b202c724420 out of bounds, {code} This only happened in one machine in four nodes. The frequency is about several days one time. After disable plugins and restart ATS the problem still exist. And there are also some warning in traffic.out. One interesting thing is that the warning all happened after 776 bytes. {code} WARNING: Unmarshal failed due to unknow obj type 173 after 776 bytes Dumping header heap @ 0x2b37b251e140 - len 2250
[jira] [Created] (TS-3119) Add option to support SO_LINGER to origin server
kang li created TS-3119: --- Summary: Add option to support SO_LINGER to origin server Key: TS-3119 URL: https://issues.apache.org/jira/browse/TS-3119 Project: Traffic Server Issue Type: Improvement Components: Network Reporter: kang li When install ATS and apache in the same box to do SSL termination. We saw port exhaustion, performance drop and request missing through ATS. Before migration we are using stunnel to do SSL termination. There were no such problem. After investigation we found add SO_LINGER option to origin could resolve these problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152848#comment-14152848 ] kang li commented on TS-3078: - This was turn out of a hardware issue. The disk is a old SSD disk. From the core dump Doc heap memory, that seems some filed in the HttpHdr metadata was changed. So it pointed to an invalid address, then cause the coredump. But if {code} proxy.config.cache.enable_checksum {code} was enabled. ATS can tolerant the disk error. But in the diags.log there would be some warning log of the checksum error and magic check error. There are also coredump message like this: {code} #0 0x0034cfc32925 in raise () from /lib64/libc.so.6 #1 0x0034cfc34105 in abort () from /lib64/libc.so.6 #2 0x2b726c052ce9 in ink_die_die_die (retval=15745) at ink_error.cc:43 #3 0x2b726c052f13 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=value optimized out, ap=0x2b726f442ad0) at ink_error.cc:65 #4 0x2b726c053048 in ink_fatal (return_code=15745, message_format=0x3d87 Address 0x3d87 out of bounds) at ink_error.cc:73 #5 0x2b726c05152f in _ink_assert (expression=0x0, file=0x6 Address 0x6 out of bounds, line=-1) at ink_assert.cc:37 #6 0x005b4c19 in HdrHeap::unmarshal (this=0x2b72d7bf89f0, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:880 #7 0x005b8c32 in HTTPInfo::unmarshal (buf=0x2b72d7bf8048 \355\336\315\253, len=1056, block_ref=0x2b72a0105d00) at HTTP.cc:1962 #8 0x00635e43 in unmarshal_helper (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2066 #9 CacheVC::handleReadDone (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2195 #10 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #11 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #12 0x006a990f in handleEvent (this=0x2b726dc2c010, e=0x2b728415c110, calling_code=1) at I_Continuation.h:146 #13 EThread::process_event (this=0x2b726dc2c010, e=0x2b728415c110, calling_code=1) at UnixEThread.cc:141 #14 0x006aa48b in EThread::execute (this=0x2b726dc2c010) at UnixEThread.cc:192 #15 0x006a87aa in spawn_thread_internal (a=0x2229050) at Thread.cc:88 #16 0x2b726c5d59d1 in start_thread () from /lib64/libpthread.so.0 #17 0x0034cfce8b6d in clone () from /lib64/libc.so.6 {code} MIMEHdrImpl::unmarshal crash caused by cache corruption --- Key: TS-3078 URL: https://issues.apache.org/jira/browse/TS-3078 Project: Traffic Server Issue Type: Bug Components: Cache, MIME Affects Versions: 4.0.2 Reporter: kang li Labels: crash Fix For: 5.2.0 {code} (gdb) bt #0 0x005c22c6 in unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3534 #1 MIMEHdrImpl::unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3590 #2 0x005b4bcb in HdrHeap::unmarshal (this=0x2aaed05f6140, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:926 #3 0x005b8be1 in HTTPInfo::unmarshal (buf=0x2aaed05f6048 \355\336\315\253, len=3280, block_ref=0x2aae080feec0) at HTTP.cc:1948 #4 0x00635e43 in unmarshal_helper (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2066 #5 CacheVC::handleReadDone (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2195 #6 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #7 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #8 0x006a990f in handleEvent (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at I_Continuation.h:146 #9 EThread::process_event (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at UnixEThread.cc:141 #10 0x006aa48b in EThread::execute (this=0x2aadf0809010) at UnixEThread.cc:192 #11 0x006a87aa in spawn_thread_internal (a=0x13e0a80) at Thread.cc:88 #12 0x2aadeaf3c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0034cfce8b6d in clone () from /lib64/libc.so.6 (gdb) p m_freetop $1 = 892220471 (gdb) p m_field_slots $2 = {{m_ptr_name =
[jira] [Comment Edited] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152848#comment-14152848 ] kang li edited comment on TS-3078 at 9/30/14 6:27 AM: -- This was turn out of a hardware issue. The disk is a old SSD disk. From the core dump Doc heap memory, that seems some filed in the HttpHdr metadata was changed. So it pointed to an invalid address, then cause the coredump. But if {code} proxy.config.cache.enable_checksum {code} was enabled. ATS can tolerant the disk error. But in the diags.log there would be some warning log of the checksum error and magic check error. There are also coredump message like this: {code} #0 0x0034cfc32925 in raise () from /lib64/libc.so.6 #1 0x0034cfc34105 in abort () from /lib64/libc.so.6 #2 0x2b726c052ce9 in ink_die_die_die (retval=15745) at ink_error.cc:43 #3 0x2b726c052f13 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=value optimized out, ap=0x2b726f442ad0) at ink_error.cc:65 #4 0x2b726c053048 in ink_fatal (return_code=15745, message_format=0x3d87 Address 0x3d87 out of bounds) at ink_error.cc:73 #5 0x2b726c05152f in _ink_assert (expression=0x0, file=0x6 Address 0x6 out of bounds, line=-1) at ink_assert.cc:37 #6 0x005b4c19 in HdrHeap::unmarshal (this=0x2b72d7bf89f0, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:880 #7 0x005b8c32 in HTTPInfo::unmarshal (buf=0x2b72d7bf8048 \355\336\315\253, len=1056, block_ref=0x2b72a0105d00) at HTTP.cc:1962 #8 0x00635e43 in unmarshal_helper (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2066 #9 CacheVC::handleReadDone (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2195 #10 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #11 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #12 0x006a990f in handleEvent (this=0x2b726dc2c010, e=0x2b728415c110, calling_code=1) at I_Continuation.h:146 #13 EThread::process_event (this=0x2b726dc2c010, e=0x2b728415c110, calling_code=1) at UnixEThread.cc:141 #14 0x006aa48b in EThread::execute (this=0x2b726dc2c010) at UnixEThread.cc:192 #15 0x006a87aa in spawn_thread_internal (a=0x2229050) at Thread.cc:88 #16 0x2b726c5d59d1 in start_thread () from /lib64/libpthread.so.0 #17 0x0034cfce8b6d in clone () from /lib64/libc.so.6 {code} was (Author: kang li): This was turn out of a hardware issue. The disk is a old SSD disk. From the core dump Doc heap memory, that seems some filed in the HttpHdr metadata was changed. So it pointed to an invalid address, then cause the coredump. But if {code} proxy.config.cache.enable_checksum {code} was enabled. ATS can tolerant the disk error. But in the diags.log there would be some warning log of the checksum error and magic check error. There are also coredump message like this: {code} #0 0x0034cfc32925 in raise () from /lib64/libc.so.6 #1 0x0034cfc34105 in abort () from /lib64/libc.so.6 #2 0x2b726c052ce9 in ink_die_die_die (retval=15745) at ink_error.cc:43 #3 0x2b726c052f13 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=value optimized out, ap=0x2b726f442ad0) at ink_error.cc:65 #4 0x2b726c053048 in ink_fatal (return_code=15745, message_format=0x3d87 Address 0x3d87 out of bounds) at ink_error.cc:73 #5 0x2b726c05152f in _ink_assert (expression=0x0, file=0x6 Address 0x6 out of bounds, line=-1) at ink_assert.cc:37 #6 0x005b4c19 in HdrHeap::unmarshal (this=0x2b72d7bf89f0, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:880 #7 0x005b8c32 in HTTPInfo::unmarshal (buf=0x2b72d7bf8048 \355\336\315\253, len=1056, block_ref=0x2b72a0105d00) at HTTP.cc:1962 #8 0x00635e43 in unmarshal_helper (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2066 #9 CacheVC::handleReadDone (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2195 #10 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #11 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #12 0x006a990f in handleEvent (this=0x2b726dc2c010, e=0x2b728415c110,
[jira] [Resolved] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li resolved TS-3078. - Resolution: Fixed After the disk replaced the problem resolved. MIMEHdrImpl::unmarshal crash caused by cache corruption --- Key: TS-3078 URL: https://issues.apache.org/jira/browse/TS-3078 Project: Traffic Server Issue Type: Bug Components: Cache, MIME Affects Versions: 4.0.2 Reporter: kang li Labels: crash Fix For: 5.2.0 {code} (gdb) bt #0 0x005c22c6 in unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3534 #1 MIMEHdrImpl::unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3590 #2 0x005b4bcb in HdrHeap::unmarshal (this=0x2aaed05f6140, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:926 #3 0x005b8be1 in HTTPInfo::unmarshal (buf=0x2aaed05f6048 \355\336\315\253, len=3280, block_ref=0x2aae080feec0) at HTTP.cc:1948 #4 0x00635e43 in unmarshal_helper (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2066 #5 CacheVC::handleReadDone (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2195 #6 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #7 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #8 0x006a990f in handleEvent (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at I_Continuation.h:146 #9 EThread::process_event (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at UnixEThread.cc:141 #10 0x006aa48b in EThread::execute (this=0x2aadf0809010) at UnixEThread.cc:192 #11 0x006a87aa in spawn_thread_internal (a=0x13e0a80) at Thread.cc:88 #12 0x2aadeaf3c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0034cfce8b6d in clone () from /lib64/libc.so.6 (gdb) p m_freetop $1 = 892220471 (gdb) p m_field_slots $2 = {{m_ptr_name = 0x33203a2274616c22 Address 0x33203a2274616c22 out of bounds, m_ptr_value = 0x393938312e33 Address 0x393938312e33 out of bounds, m_next_dup = 0x3a226e6f6c22202c, m_wks_idx = 11552, m_len_name = 14137, m_len_value = 3289390, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 3 '\003', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x766c4ccefc939174 Address 0x766c4ccefc939174 out of bounds, m_ptr_value = 0x222056def09983ac Address 0x222056def09983ac out of bounds, m_next_dup = 0x203a4d1444c0d5b3, m_wks_idx = 21538, m_len_name = 9048, m_len_value = 7890260, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x656f7722202c2273 Address 0x656f7722202c2273 out of bounds, m_ptr_value = 0x3039373231203a22 Address 0x3039373231203a22 out of bounds, m_next_dup = 0x697a22202c393739, m_wks_idx = 8816, m_len_name = 8250, m_len_value = 3553058, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x7d5d4b2bf0819670 Address 0x7d5d4b2bf0819670 out of bounds, m_ptr_value = 0x796b8e1844d2836c Address 0x796b8e1844d2836c out of bounds, m_next_dup = 0x756c8c24f2da9b62, m_wks_idx = 8805, m_len_name = 31546, m_len_value = 7152160, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x22207b203a227370 Address 0x22207b203a227370 out of bounds, m_ptr_value = 0x22203a2272646461 Address 0x22203a2272646461 out of bounds, m_next_dup = 0x626d452034323332, m_wks_idx = 29285, m_len_name = 22304, m_len_value = 6582127, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 1 '\001'}, {m_ptr_name = 0x6c654b202c724420 Address 0x6c654b202c724420 out of bounds, {code} This only happened in one machine in four nodes. The frequency is about several days one time. After disable plugins and restart ATS the problem still exist. And there are also some warning in traffic.out. One interesting thing is that the warning all happened after 776 bytes. {code} WARNING: Unmarshal failed due to unknow obj type 173 after 776 bytes Dumping header heap @ 0x2b37b251e140 - len 2250 -- 0x2b37b251e140: 0xdcbafeed 0x0 0xb251e4b8 0x2b37 0x2b37b251e150: 0xb251e1c8 0x2b37 0x378 0x0 0x2b37b251e160: 0x0 0x0 0x0 0x0 0x2b37b251e170: 0x0 0x0 0x956b2af0
[jira] [Comment Edited] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152848#comment-14152848 ] kang li edited comment on TS-3078 at 9/30/14 6:29 AM: -- This was turn out of a hardware issue. The disk is a old SSD disk have no ECC mechanism. From the core dump Doc heap memory, that seems some filed in the HttpHdr metadata was changed. So it pointed to an invalid address, then cause the coredump. But if {code} proxy.config.cache.enable_checksum {code} was enabled. ATS can tolerant the disk error. But in the diags.log there would be some warning log of the checksum error and magic check error. There are also coredump message like this: {code} #0 0x0034cfc32925 in raise () from /lib64/libc.so.6 #1 0x0034cfc34105 in abort () from /lib64/libc.so.6 #2 0x2b726c052ce9 in ink_die_die_die (retval=15745) at ink_error.cc:43 #3 0x2b726c052f13 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=value optimized out, ap=0x2b726f442ad0) at ink_error.cc:65 #4 0x2b726c053048 in ink_fatal (return_code=15745, message_format=0x3d87 Address 0x3d87 out of bounds) at ink_error.cc:73 #5 0x2b726c05152f in _ink_assert (expression=0x0, file=0x6 Address 0x6 out of bounds, line=-1) at ink_assert.cc:37 #6 0x005b4c19 in HdrHeap::unmarshal (this=0x2b72d7bf89f0, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:880 #7 0x005b8c32 in HTTPInfo::unmarshal (buf=0x2b72d7bf8048 \355\336\315\253, len=1056, block_ref=0x2b72a0105d00) at HTTP.cc:1962 #8 0x00635e43 in unmarshal_helper (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2066 #9 CacheVC::handleReadDone (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2195 #10 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #11 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #12 0x006a990f in handleEvent (this=0x2b726dc2c010, e=0x2b728415c110, calling_code=1) at I_Continuation.h:146 #13 EThread::process_event (this=0x2b726dc2c010, e=0x2b728415c110, calling_code=1) at UnixEThread.cc:141 #14 0x006aa48b in EThread::execute (this=0x2b726dc2c010) at UnixEThread.cc:192 #15 0x006a87aa in spawn_thread_internal (a=0x2229050) at Thread.cc:88 #16 0x2b726c5d59d1 in start_thread () from /lib64/libpthread.so.0 #17 0x0034cfce8b6d in clone () from /lib64/libc.so.6 {code} was (Author: kang li): This was turn out of a hardware issue. The disk is a old SSD disk. From the core dump Doc heap memory, that seems some filed in the HttpHdr metadata was changed. So it pointed to an invalid address, then cause the coredump. But if {code} proxy.config.cache.enable_checksum {code} was enabled. ATS can tolerant the disk error. But in the diags.log there would be some warning log of the checksum error and magic check error. There are also coredump message like this: {code} #0 0x0034cfc32925 in raise () from /lib64/libc.so.6 #1 0x0034cfc34105 in abort () from /lib64/libc.so.6 #2 0x2b726c052ce9 in ink_die_die_die (retval=15745) at ink_error.cc:43 #3 0x2b726c052f13 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=value optimized out, ap=0x2b726f442ad0) at ink_error.cc:65 #4 0x2b726c053048 in ink_fatal (return_code=15745, message_format=0x3d87 Address 0x3d87 out of bounds) at ink_error.cc:73 #5 0x2b726c05152f in _ink_assert (expression=0x0, file=0x6 Address 0x6 out of bounds, line=-1) at ink_assert.cc:37 #6 0x005b4c19 in HdrHeap::unmarshal (this=0x2b72d7bf89f0, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:880 #7 0x005b8c32 in HTTPInfo::unmarshal (buf=0x2b72d7bf8048 \355\336\315\253, len=1056, block_ref=0x2b72a0105d00) at HTTP.cc:1962 #8 0x00635e43 in unmarshal_helper (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2066 #9 CacheVC::handleReadDone (this=0x2b72f8044f10, event=value optimized out, e=value optimized out) at Cache.cc:2195 #10 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #11 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #12 0x006a990f in handleEvent (this=0x2b726dc2c010,
[jira] [Commented] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145759#comment-14145759 ] kang li commented on TS-3078: - I have enalbed proxy.config.cache.enable_checksum in this machine. And now I received [Sep 23 23:14:21.876] Server {0x2b3d26228700} NOTE: cache: checksum error for [9879981592683297697 13154904490352685041] len 3952, hlen 2968, disk /home/y/var/ats_cache/cache.db, offset 98520955904 size 4096 [Sep 23 23:14:21.876] Server {0x2b3d26228700} WARNING: Head: Doc checksum does not match for A13B4983E9BE1C89F1AFA9B1EF9B8FB6 in Diags.log. And there are still a lot more logs show: [Sep 22 03:24:41.729] Server {0x2ab39d312700} WARNING: Head : Doc magic does not match for 706642A0B2C73F8CFEFB597947395B39. I also checked the corrupted Doc data in the ram, that there were core dumps with same key have same corrupted data. So I think was this caused by cache write failure for some reason ? MIMEHdrImpl::unmarshal crash caused by cache corruption --- Key: TS-3078 URL: https://issues.apache.org/jira/browse/TS-3078 Project: Traffic Server Issue Type: Bug Components: Cache, MIME Affects Versions: 4.0.2 Reporter: kang li {code} (gdb) bt #0 0x005c22c6 in unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3534 #1 MIMEHdrImpl::unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3590 #2 0x005b4bcb in HdrHeap::unmarshal (this=0x2aaed05f6140, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:926 #3 0x005b8be1 in HTTPInfo::unmarshal (buf=0x2aaed05f6048 \355\336\315\253, len=3280, block_ref=0x2aae080feec0) at HTTP.cc:1948 #4 0x00635e43 in unmarshal_helper (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2066 #5 CacheVC::handleReadDone (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2195 #6 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #7 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #8 0x006a990f in handleEvent (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at I_Continuation.h:146 #9 EThread::process_event (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at UnixEThread.cc:141 #10 0x006aa48b in EThread::execute (this=0x2aadf0809010) at UnixEThread.cc:192 #11 0x006a87aa in spawn_thread_internal (a=0x13e0a80) at Thread.cc:88 #12 0x2aadeaf3c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0034cfce8b6d in clone () from /lib64/libc.so.6 (gdb) p m_freetop $1 = 892220471 (gdb) p m_field_slots $2 = {{m_ptr_name = 0x33203a2274616c22 Address 0x33203a2274616c22 out of bounds, m_ptr_value = 0x393938312e33 Address 0x393938312e33 out of bounds, m_next_dup = 0x3a226e6f6c22202c, m_wks_idx = 11552, m_len_name = 14137, m_len_value = 3289390, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 3 '\003', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x766c4ccefc939174 Address 0x766c4ccefc939174 out of bounds, m_ptr_value = 0x222056def09983ac Address 0x222056def09983ac out of bounds, m_next_dup = 0x203a4d1444c0d5b3, m_wks_idx = 21538, m_len_name = 9048, m_len_value = 7890260, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x656f7722202c2273 Address 0x656f7722202c2273 out of bounds, m_ptr_value = 0x3039373231203a22 Address 0x3039373231203a22 out of bounds, m_next_dup = 0x697a22202c393739, m_wks_idx = 8816, m_len_name = 8250, m_len_value = 3553058, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x7d5d4b2bf0819670 Address 0x7d5d4b2bf0819670 out of bounds, m_ptr_value = 0x796b8e1844d2836c Address 0x796b8e1844d2836c out of bounds, m_next_dup = 0x756c8c24f2da9b62, m_wks_idx = 8805, m_len_name = 31546, m_len_value = 7152160, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x22207b203a227370 Address 0x22207b203a227370 out of bounds, m_ptr_value = 0x22203a2272646461 Address 0x22203a2272646461 out of bounds, m_next_dup = 0x626d452034323332, m_wks_idx = 29285, m_len_name = 22304, m_len_value = 6582127, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 1 '\001',
[jira] [Commented] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139915#comment-14139915 ] kang li commented on TS-3085: - Hi [~sudheerv], I think the SSL stack corruption may be related to [TS:2986|https://issues.apache.org/jira/browse/TS-2986]. As it remove SSLErrorVC to eliminate the SSL error log in diags.log. SSLErrorVC would call ERR_get_error_line_data to clean the error stack. Large POSTs over (relatively) slower connections failing in ats5 Key: TS-3085 URL: https://issues.apache.org/jira/browse/TS-3085 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.0.1 Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Labels: yahoo Fix For: 5.2.0 We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
kang li created TS-3078: --- Summary: MIMEHdrImpl::unmarshal crash caused by cache corruption Key: TS-3078 URL: https://issues.apache.org/jira/browse/TS-3078 Project: Traffic Server Issue Type: Bug Components: Cache, MIME Reporter: kang li {code} (gdb) bt #0 0x005c22c6 in unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3534 #1 MIMEHdrImpl::unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3590 #2 0x005b4bcb in HdrHeap::unmarshal (this=0x2aaed05f6140, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:926 #3 0x005b8be1 in HTTPInfo::unmarshal (buf=0x2aaed05f6048 \355\336\315\253, len=3280, block_ref=0x2aae080feec0) at HTTP.cc:1948 #4 0x00635e43 in unmarshal_helper (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2066 #5 CacheVC::handleReadDone (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2195 #6 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #7 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #8 0x006a990f in handleEvent (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at I_Continuation.h:146 #9 EThread::process_event (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at UnixEThread.cc:141 #10 0x006aa48b in EThread::execute (this=0x2aadf0809010) at UnixEThread.cc:192 #11 0x006a87aa in spawn_thread_internal (a=0x13e0a80) at Thread.cc:88 #12 0x2aadeaf3c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0034cfce8b6d in clone () from /lib64/libc.so.6 (gdb) p m_freetop $1 = 892220471 (gdb) p m_field_slots $2 = {{m_ptr_name = 0x33203a2274616c22 Address 0x33203a2274616c22 out of bounds, m_ptr_value = 0x393938312e33 Address 0x393938312e33 out of bounds, m_next_dup = 0x3a226e6f6c22202c, m_wks_idx = 11552, m_len_name = 14137, m_len_value = 3289390, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 3 '\003', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x766c4ccefc939174 Address 0x766c4ccefc939174 out of bounds, m_ptr_value = 0x222056def09983ac Address 0x222056def09983ac out of bounds, m_next_dup = 0x203a4d1444c0d5b3, m_wks_idx = 21538, m_len_name = 9048, m_len_value = 7890260, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x656f7722202c2273 Address 0x656f7722202c2273 out of bounds, m_ptr_value = 0x3039373231203a22 Address 0x3039373231203a22 out of bounds, m_next_dup = 0x697a22202c393739, m_wks_idx = 8816, m_len_name = 8250, m_len_value = 3553058, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x7d5d4b2bf0819670 Address 0x7d5d4b2bf0819670 out of bounds, m_ptr_value = 0x796b8e1844d2836c Address 0x796b8e1844d2836c out of bounds, m_next_dup = 0x756c8c24f2da9b62, m_wks_idx = 8805, m_len_name = 31546, m_len_value = 7152160, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x22207b203a227370 Address 0x22207b203a227370 out of bounds, m_ptr_value = 0x22203a2272646461 Address 0x22203a2272646461 out of bounds, m_next_dup = 0x626d452034323332, m_wks_idx = 29285, m_len_name = 22304, m_len_value = 6582127, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 1 '\001'}, {m_ptr_name = 0x6c654b202c724420 Address 0x6c654b202c724420 out of bounds, {code} This only happened in one machine in four nodes. The frequency is about several days one time. After disable plugins and restart ATS the problem still exist. And there are also some warning in traffic.out. One interesting thing is that the warning all happened after 776 bytes. {code} WARNING: Unmarshal failed due to unknow obj type 173 after 776 bytes Dumping header heap @ 0x2b37b251e140 - len 2250 -- 0x2b37b251e140: 0xdcbafeed 0x0 0xb251e4b8 0x2b37 0x2b37b251e150: 0xb251e1c8 0x2b37 0x378 0x0 0x2b37b251e160: 0x0 0x0 0x0 0x0 0x2b37b251e170: 0x0 0x0 0x956b2af0 0x2b39 0x2b37b251e180: 0xb251e4b8 0x2b37 0x552 0x312e312f 0x2b37b251e190: 0x31323420 0x63655220 0x0 0x0 0x2b37b251e1a0: 0xd646e75 0x6361430a 0x432d6568 0x72746e6f 0x2b37b251e1b0: 0x0 0x0 0xd657461 0x6e6f430a 0x2b37b251e1c0: 0x89 0x7079542d 0x3003 0x1 0x2b37b251e1d0: 0x10001 0x480004 0xb251e448 0x2b37 0x2b37b251e1e0: 0xb251e4b8 0x2b37 0x6e0003 0x0 0x2b37b251e1f0: 0xb251e1f8 0x2b37 0x25004 0x2b35
[jira] [Updated] (TS-3078) MIMEHdrImpl::unmarshal crash caused by cache corruption
[ https://issues.apache.org/jira/browse/TS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-3078: Affects Version/s: 4.0.2 MIMEHdrImpl::unmarshal crash caused by cache corruption --- Key: TS-3078 URL: https://issues.apache.org/jira/browse/TS-3078 Project: Traffic Server Issue Type: Bug Components: Cache, MIME Affects Versions: 4.0.2 Reporter: kang li {code} (gdb) bt #0 0x005c22c6 in unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3534 #1 MIMEHdrImpl::unmarshal (this=0x2aaed05f61f8, offset=46930308587840) at MIME.cc:3590 #2 0x005b4bcb in HdrHeap::unmarshal (this=0x2aaed05f6140, buf_length=value optimized out, obj_type=value optimized out, found_obj=value optimized out, block_ref=value optimized out) at HdrHeap.cc:926 #3 0x005b8be1 in HTTPInfo::unmarshal (buf=0x2aaed05f6048 \355\336\315\253, len=3280, block_ref=0x2aae080feec0) at HTTP.cc:1948 #4 0x00635e43 in unmarshal_helper (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2066 #5 CacheVC::handleReadDone (this=0x2aae10081b90, event=value optimized out, e=value optimized out) at Cache.cc:2195 #6 0x005f1845 in handleEvent (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:146 #7 AIOCallbackInternal::io_complete (this=value optimized out, event=value optimized out, data=value optimized out) at ../../iocore/aio/P_AIO.h:123 #8 0x006a990f in handleEvent (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at I_Continuation.h:146 #9 EThread::process_event (this=0x2aadf0809010, e=0x2aae3812b710, calling_code=1) at UnixEThread.cc:141 #10 0x006aa48b in EThread::execute (this=0x2aadf0809010) at UnixEThread.cc:192 #11 0x006a87aa in spawn_thread_internal (a=0x13e0a80) at Thread.cc:88 #12 0x2aadeaf3c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0034cfce8b6d in clone () from /lib64/libc.so.6 (gdb) p m_freetop $1 = 892220471 (gdb) p m_field_slots $2 = {{m_ptr_name = 0x33203a2274616c22 Address 0x33203a2274616c22 out of bounds, m_ptr_value = 0x393938312e33 Address 0x393938312e33 out of bounds, m_next_dup = 0x3a226e6f6c22202c, m_wks_idx = 11552, m_len_name = 14137, m_len_value = 3289390, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 3 '\003', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x766c4ccefc939174 Address 0x766c4ccefc939174 out of bounds, m_ptr_value = 0x222056def09983ac Address 0x222056def09983ac out of bounds, m_next_dup = 0x203a4d1444c0d5b3, m_wks_idx = 21538, m_len_name = 9048, m_len_value = 7890260, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x656f7722202c2273 Address 0x656f7722202c2273 out of bounds, m_ptr_value = 0x3039373231203a22 Address 0x3039373231203a22 out of bounds, m_next_dup = 0x697a22202c393739, m_wks_idx = 8816, m_len_name = 8250, m_len_value = 3553058, m_n_v_raw_printable = 0 '\000', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 0 '\000'}, {m_ptr_name = 0x7d5d4b2bf0819670 Address 0x7d5d4b2bf0819670 out of bounds, m_ptr_value = 0x796b8e1844d2836c Address 0x796b8e1844d2836c out of bounds, m_next_dup = 0x756c8c24f2da9b62, m_wks_idx = 8805, m_len_name = 31546, m_len_value = 7152160, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 0 '\000', m_readiness = 2 '\002', m_flags = 1 '\001'}, { m_ptr_name = 0x22207b203a227370 Address 0x22207b203a227370 out of bounds, m_ptr_value = 0x22203a2272646461 Address 0x22203a2272646461 out of bounds, m_next_dup = 0x626d452034323332, m_wks_idx = 29285, m_len_name = 22304, m_len_value = 6582127, m_n_v_raw_printable = 1 '\001', m_n_v_raw_printable_pad = 1 '\001', m_readiness = 3 '\003', m_flags = 1 '\001'}, {m_ptr_name = 0x6c654b202c724420 Address 0x6c654b202c724420 out of bounds, {code} This only happened in one machine in four nodes. The frequency is about several days one time. After disable plugins and restart ATS the problem still exist. And there are also some warning in traffic.out. One interesting thing is that the warning all happened after 776 bytes. {code} WARNING: Unmarshal failed due to unknow obj type 173 after 776 bytes Dumping header heap @ 0x2b37b251e140 - len 2250 -- 0x2b37b251e140: 0xdcbafeed 0x0 0xb251e4b8 0x2b37 0x2b37b251e150: 0xb251e1c8 0x2b37 0x378 0x0 0x2b37b251e160: 0x0 0x0 0x0 0x0 0x2b37b251e170: 0x0 0x0 0x956b2af0 0x2b39 0x2b37b251e180: 0xb251e4b8 0x2b37 0x552 0x312e312f 0x2b37b251e190: 0x31323420 0x63655220 0x0
[jira] [Commented] (TS-306) enable log rotation for diags.log
[ https://issues.apache.org/jira/browse/TS-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131326#comment-14131326 ] kang li commented on TS-306: I'm now working at this issue. There are two common ways to do log rotation. 1. Use pipe for log rotation Pro: Easy to configure Could take advantage of several logging tools, ie. cronolog, multilog. Con: Need additional processes to do logging. 2. Use logrotate for log rotation Pro: System level log rotation management. Con: Need extra configuration. Need to figure out how to reload log files. I prefer to use pipe to do log rotation as it’s easy to use. Is there any concern for the pipe file? Any suggestions would be appreciated. Comment from [~bcall]: I would lean towards using inotify/kqueue to see if the file has been removed or renamed and then reopen the file. There is more overhead and more complexity with the external tools with the pipe approach. -Bryan enable log rotation for diags.log - Key: TS-306 URL: https://issues.apache.org/jira/browse/TS-306 Project: Traffic Server Issue Type: Improvement Components: Logging Reporter: Miles Libbey Priority: Critical Fix For: 5.3.0 (from yahoo bug 913896) Original description by Leif Hedstrom 3 years ago at 2006-12-04 12:42 There might be reasons why this file might get filled up, e.g. libraries used by plugins producing output on STDOUT/STDERR. A few suggestions have been made, to somehow rotate traffic.out. One possible solution (suggested by Ryan) is to use cronolog (http://cronolog.org/), which seems like a fine idea. Comment 1 by Joseph Rothrock 2 years ago at 2007-10-17 09:13:24 Maybe consider rolling diags.log as well. -Feature enhancement. Comment 2 by Kevin Dalley 13 months ago at 2009-03-04 15:32:18 When traffic.out gets filled up, error.log stops filing up, even though rotation is turned on. This is counter-intuitive. Rotation does not control traffic.out, but a large traffic.out will stop error.log from being written. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2653) SSL Error message cleanup
[ https://issues.apache.org/jira/browse/TS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112019#comment-14112019 ] kang li commented on TS-2653: - Hi [~amc], One thing I want to mention is that Brian have moved almost all these SSL errors into debug log in [TS:2986|https://issues.apache.org/jira/browse/TS-2986]. And add a stats to the error numbers. SSL Error message cleanup - Key: TS-2653 URL: https://issues.apache.org/jira/browse/TS-2653 Project: Traffic Server Issue Type: Bug Components: Logging, SSL Reporter: Bryan Call Assignee: Susan Hinrichs Fix For: 5.2.0 We see a lot of SSL error messages in production. It would be good to determine if these are really errors or remove logging of some of these errors: {code} -bash-4.1$ tail -10 diags.log | cut -f4-20 -d : | grep SSL | sort | uniq -c | sort -rn 3108 SSL::36:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3079 SSL::32:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3068 SSL::27:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3051 SSL::44:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3043 SSL::24:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3041 SSL::47:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3041 SSL::38:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3040 SSL::46:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3025 SSL::34:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3025 SSL::25:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3021 SSL::31:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3011 SSL::42:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3006 SSL::39:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3004 SSL::29:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3000 SSL::30:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2996 SSL::43:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2993 SSL::45:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2977 SSL::40:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2976 SSL::33:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2974 SSL::41:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2974 SSL::28:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2958 SSL::37:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2947 SSL::35:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2922 SSL::26:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 28 SSL::36:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 26 SSL::24:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 25 SSL::44:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 25 SSL::27:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 24 SSL::34:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 24 SSL::30:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 23 SSL::39:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 23 SSL::33:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 23 SSL::32:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 22 SSL::44:error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca:s3_pkt.c:1256:SSL alert number 48 21
[jira] [Created] (TS-3018) Client Request's default port can't be changed after scheme was changed
kang li created TS-3018: --- Summary: Client Request's default port can't be changed after scheme was changed Key: TS-3018 URL: https://issues.apache.org/jira/browse/TS-3018 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: kang li If I received a client request with scheme http and there are no port set in Host header or absolute URI in http request. Then I change the scheme to https. But when in url remap, it still get the port as 80 which indicated by the previous scheme. {code} 1475 } else if (0 != (m_host_mime = const_castHTTPHdr*(this)-get_host_port_values(0, m_host_length, port_ptr, 0))) { 1476 if (port_ptr) { 1477 m_port = 0; 1478 for ( ; is_digit(*port_ptr) ; ++port_ptr ) 1479 m_port = m_port * 10 + *port_ptr - '0'; 1480 m_port_in_header = (0 != m_port); 1481 } 1482 m_port = url_canonicalize_port(url-m_url_impl-m_url_type, m_port); {code} That seems the m_port was not reset to 0 when it second time try to get the port by url type (which was reset in UrlSchemeSet). So it always get the port in the initial time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-3018) Client Request's default port can't be changed after scheme was changed
[ https://issues.apache.org/jira/browse/TS-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098571#comment-14098571 ] kang li commented on TS-3018: - There is another issue that when I try to use TSUrlPortSet to change the port of a client request. But that seems won't take effect except I also set the host in the url. I'm not clear if this is by design. But the issue quite confuse me that I use the interface but that doesn't work. Util I read the code, that if we can read the host from the url then we can read the port. {code} 1470 if (0 != url-host_get(m_host_length)) { 1471 m_target_in_url = true; 1472 m_port = url-port_get(); 1473 m_port_in_header = 0 != url-port_get_raw(); 1474 m_host_mime = NULL; {code} Client Request's default port can't be changed after scheme was changed --- Key: TS-3018 URL: https://issues.apache.org/jira/browse/TS-3018 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: kang li If I received a client request with scheme http and there are no port set in Host header or absolute URI in http request. Then I change the scheme to https. But when in url remap, it still get the port as 80 which indicated by the previous scheme. {code} 1475 } else if (0 != (m_host_mime = const_castHTTPHdr*(this)-get_host_port_values(0, m_host_length, port_ptr, 0))) { 1476 if (port_ptr) { 1477 m_port = 0; 1478 for ( ; is_digit(*port_ptr) ; ++port_ptr ) 1479 m_port = m_port * 10 + *port_ptr - '0'; 1480 m_port_in_header = (0 != m_port); 1481 } 1482 m_port = url_canonicalize_port(url-m_url_impl-m_url_type, m_port); {code} That seems the m_port was not reset to 0 when it second time try to get the port by url type (which was reset in UrlSchemeSet). So it always get the port in the initial time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2580) SSL Connection reset by peer errors in 4.2.0-rc0
[ https://issues.apache.org/jira/browse/TS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013364#comment-14013364 ] kang li commented on TS-2580: - This log imported in the patch of [TS-2096|https://issues.apache.org/jira/browse/TS-2096]. As before this patch, {quote} We were passing a format string to SSLDiagnostic(), but never actually using it when we logged the error. {quote} SSL Connection reset by peer errors in 4.2.0-rc0 Key: TS-2580 URL: https://issues.apache.org/jira/browse/TS-2580 Project: Traffic Server Issue Type: Bug Components: Network, SSL Affects Versions: 4.2.0 Reporter: David Carlin Assignee: Bryan Call Priority: Blocker Fix For: 5.0.0 This error is filling /var/log/messages when using 4.2.0-rc0: {noformat}Feb 20 04:44:33 l1 traffic_server[28428]: {0x2adcd9029700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer{noformat} Did something change in the logging level and the connection resets are normal? Whats interesting is that this problem comes and goes as I progress through the git bisect for TS-2564, so I may need to do another git bisect for this issue. TS-2548 would help me troubleshoot this via tcpdump. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2580) SSL Connection reset by peer errors in 4.2.0-rc0
[ https://issues.apache.org/jira/browse/TS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013482#comment-14013482 ] kang li commented on TS-2580: - I have found this issue in https://issues.apache.org/jira/browse/TS-2548?focusedCommentId=13966382page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13966382. This may related unexpected close from peer side. Could we just change the log level to debug? SSL Connection reset by peer errors in 4.2.0-rc0 Key: TS-2580 URL: https://issues.apache.org/jira/browse/TS-2580 Project: Traffic Server Issue Type: Bug Components: Network, SSL Affects Versions: 4.2.0 Reporter: David Carlin Assignee: Bryan Call Priority: Blocker Fix For: 5.0.0 This error is filling /var/log/messages when using 4.2.0-rc0: {noformat}Feb 20 04:44:33 l1 traffic_server[28428]: {0x2adcd9029700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer{noformat} Did something change in the logging level and the connection resets are normal? Whats interesting is that this problem comes and goes as I progress through the git bisect for TS-2564, so I may need to do another git bisect for this issue. TS-2548 would help me troubleshoot this via tcpdump. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TS-2580) SSL Connection reset by peer errors in 4.2.0-rc0
[ https://issues.apache.org/jira/browse/TS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2580: Attachment: TS-2580.diff Suppress verbose error log of SSL_ERROR_SYSCALL in ssl_read_from_net. SSL Connection reset by peer errors in 4.2.0-rc0 Key: TS-2580 URL: https://issues.apache.org/jira/browse/TS-2580 Project: Traffic Server Issue Type: Bug Components: Network, SSL Affects Versions: 4.2.0 Reporter: David Carlin Assignee: Bryan Call Priority: Blocker Fix For: 5.0.0 Attachments: TS-2580.diff This error is filling /var/log/messages when using 4.2.0-rc0: {noformat}Feb 20 04:44:33 l1 traffic_server[28428]: {0x2adcd9029700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer{noformat} Did something change in the logging level and the connection resets are normal? Whats interesting is that this problem comes and goes as I progress through the git bisect for TS-2564, so I may need to do another git bisect for this issue. TS-2548 would help me troubleshoot this via tcpdump. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TS-2837) Dangling pointer in URLImpl which may cause core dump
[ https://issues.apache.org/jira/browse/TS-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2837: Attachment: ts-2837.diff Dangling pointer in URLImpl which may cause core dump - Key: TS-2837 URL: https://issues.apache.org/jira/browse/TS-2837 Project: Traffic Server Issue Type: Bug Components: HTTP, Logging Affects Versions: 4.0.2 Reporter: kang li Labels: crash Attachments: ts-2837.diff There were core dump shows that URLImpl::m_ptr_printed_string was out of bound. {code} #0 ink_strlcpy (dst=value optimized out, src=0x2b64901a79a4 Address 0x2b64901a79a4 out of bounds, siz=value optimized out) at ink_string.cc:226 #1 0x0058d820 in LogAccessHttp::init (this=0x2b631752da20) at LogAccessHttp.cc:96 #2 0x0058a96f in resolve_logfield_string (context=0x2b631752da20, format_str=0x3468e10 !DOCTYPE html\nhtml lang=\en-us\head\nmeta http-equiv=\content-type\ content=\text/html; charset=UTF-8\\nmeta charset=\utf-8\\ntitleYahoo/title\nmeta name=\viewport\ content=\wid...) at LogAccess.cc:1396 #3 0x00508fed in HttpBodyTemplate::build_instantiated_buffer (this=0x3468de0, context=value optimized out, buflen_return=0x2b63e2c42068) at HttpBodyFactory.cc:1015 #4 0x00509ea4 in HttpBodyFactory::fabricate (this=value optimized out, acpt_language_list=0x2b631752e130, acpt_charset_list=0x2b631752dfd0, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, buffer_length_return=0x2b63e2c42068, content_language_return=0x2b631752e2a8, content_charset_return=0x2b631752e2a0, set_return=0x2b631752e298) at HttpBodyFactory.cc:451 #5 0x0050b1b7 in HttpBodyFactory::fabricate_with_old_api(const char *, HttpTransact::State *, int64_t, int64_t *, char *, size_t, char *, size_t, const char *, typedef __va_list_tag __va_list_tag *) (this=0x344e850, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, max_buffer_length=8192, resulting_buffer_length=0x2b63e2c42068, content_language_out_buf=0x2b631752e460 en, content_language_buf_size=256, content_type_out_buf=0x2b631752e360 text/html, content_type_buf_size=256, format=0x6ce288 internal error - server connection terminated, ap=0x2b631752e590) at HttpBodyFactory.cc:137 #6 0x0054cd74 in HttpTransact::build_error_response (s=0x2b63e2c416d8, status_code=HTTP_STATUS_BAD_GATEWAY, reason_phrase_or_null=0x6ce288 internal error - server connection terminated, error_body_type=value optimized out, format=0x6ce288 internal error - server connection terminated) at HttpTransact.cc:7998 #7 0x00551b42 in HttpTransact::handle_server_connection_not_open (s=0x2b63e2c416d8) at HttpTransact.cc:3719 #8 0x00562f31 in HttpTransact::HandleResponse (s=0x2b63e2c416d8) at HttpTransact.cc:3180 #9 0x0051c588 in HttpSM::call_transact_and_set_next_state (this=0x2b63e2c41670, f=value optimized out) at HttpSM.cc:6817 #10 0x00531f89 in HttpSM::state_http_server_open (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:1712 #11 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:2548 #12 0x006878ec in handleEvent (this=0x2b64ac19f870, t=0x2b6315111010) at ../../iocore/eventsystem/I_Continuation.h:146 #13 UnixNetVConnection::connectUp (this=0x2b64ac19f870, t=0x2b6315111010) at UnixNetVConnection.cc:1104 #14 0x006849ca in UnixNetProcessor::connect_re_internal (this=value optimized out, cont=value optimized out, target=value optimized out, opt=0x2b631752f990) at UnixNetProcessor.cc:250 #15 0x005314ae in connect_re (this=0x2b63e2c41670, raw=value optimized out) at ../../iocore/net/P_UnixNetProcessor.h:89 #16 HttpSM::do_http_server_open (this=0x2b63e2c41670, raw=value optimized out) at HttpSM.cc:4676 #17 0x00536446 in HttpSM::set_next_state (this=0x2b63e2c41670) at HttpSM.cc:7006 #18 0x005371ef in HttpSM::state_send_server_request_header (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2008 #19 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2548 #20 0x006872a1 in handleEvent (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at ../../iocore/eventsystem/I_Continuation.h:146 #21 read_signal_and_update (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:138 #22 read_signal_done (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:168 #23 0x00689ab6 in read_from_net (nh=0x2b6315114e20, vc=0x2b64144c7d20, thread=value optimized out) at
[jira] [Created] (TS-2837) Dangling pointer in URLImpl which may cause core dump
kang li created TS-2837: --- Summary: Dangling pointer in URLImpl which may cause core dump Key: TS-2837 URL: https://issues.apache.org/jira/browse/TS-2837 Project: Traffic Server Issue Type: Bug Components: HTTP, Logging Reporter: kang li There were core dump shows that URLImpl::m_ptr_printed_string was out of bound. {code} #0 ink_strlcpy (dst=value optimized out, src=0x2b64901a79a4 Address 0x2b64901a79a4 out of bounds, siz=value optimized out) at ink_string.cc:226 #1 0x0058d820 in LogAccessHttp::init (this=0x2b631752da20) at LogAccessHttp.cc:96 #2 0x0058a96f in resolve_logfield_string (context=0x2b631752da20, format_str=0x3468e10 !DOCTYPE html\nhtml lang=\en-us\head\nmeta http-equiv=\content-type\ content=\text/html; charset=UTF-8\\nmeta charset=\utf-8\\ntitleYahoo/title\nmeta name=\viewport\ content=\wid...) at LogAccess.cc:1396 #3 0x00508fed in HttpBodyTemplate::build_instantiated_buffer (this=0x3468de0, context=value optimized out, buflen_return=0x2b63e2c42068) at HttpBodyFactory.cc:1015 #4 0x00509ea4 in HttpBodyFactory::fabricate (this=value optimized out, acpt_language_list=0x2b631752e130, acpt_charset_list=0x2b631752dfd0, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, buffer_length_return=0x2b63e2c42068, content_language_return=0x2b631752e2a8, content_charset_return=0x2b631752e2a0, set_return=0x2b631752e298) at HttpBodyFactory.cc:451 #5 0x0050b1b7 in HttpBodyFactory::fabricate_with_old_api(const char *, HttpTransact::State *, int64_t, int64_t *, char *, size_t, char *, size_t, const char *, typedef __va_list_tag __va_list_tag *) (this=0x344e850, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, max_buffer_length=8192, resulting_buffer_length=0x2b63e2c42068, content_language_out_buf=0x2b631752e460 en, content_language_buf_size=256, content_type_out_buf=0x2b631752e360 text/html, content_type_buf_size=256, format=0x6ce288 internal error - server connection terminated, ap=0x2b631752e590) at HttpBodyFactory.cc:137 #6 0x0054cd74 in HttpTransact::build_error_response (s=0x2b63e2c416d8, status_code=HTTP_STATUS_BAD_GATEWAY, reason_phrase_or_null=0x6ce288 internal error - server connection terminated, error_body_type=value optimized out, format=0x6ce288 internal error - server connection terminated) at HttpTransact.cc:7998 #7 0x00551b42 in HttpTransact::handle_server_connection_not_open (s=0x2b63e2c416d8) at HttpTransact.cc:3719 #8 0x00562f31 in HttpTransact::HandleResponse (s=0x2b63e2c416d8) at HttpTransact.cc:3180 #9 0x0051c588 in HttpSM::call_transact_and_set_next_state (this=0x2b63e2c41670, f=value optimized out) at HttpSM.cc:6817 #10 0x00531f89 in HttpSM::state_http_server_open (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:1712 #11 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:2548 #12 0x006878ec in handleEvent (this=0x2b64ac19f870, t=0x2b6315111010) at ../../iocore/eventsystem/I_Continuation.h:146 #13 UnixNetVConnection::connectUp (this=0x2b64ac19f870, t=0x2b6315111010) at UnixNetVConnection.cc:1104 #14 0x006849ca in UnixNetProcessor::connect_re_internal (this=value optimized out, cont=value optimized out, target=value optimized out, opt=0x2b631752f990) at UnixNetProcessor.cc:250 #15 0x005314ae in connect_re (this=0x2b63e2c41670, raw=value optimized out) at ../../iocore/net/P_UnixNetProcessor.h:89 #16 HttpSM::do_http_server_open (this=0x2b63e2c41670, raw=value optimized out) at HttpSM.cc:4676 #17 0x00536446 in HttpSM::set_next_state (this=0x2b63e2c41670) at HttpSM.cc:7006 #18 0x005371ef in HttpSM::state_send_server_request_header (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2008 #19 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2548 #20 0x006872a1 in handleEvent (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at ../../iocore/eventsystem/I_Continuation.h:146 #21 read_signal_and_update (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:138 #22 read_signal_done (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:168 #23 0x00689ab6 in read_from_net (nh=0x2b6315114e20, vc=0x2b64144c7d20, thread=value optimized out) at UnixNetVConnection.cc:291 #24 0x0068 in NetHandler::mainNetEvent (this=0x2b6315114e20, event=value optimized out, e=value optimized out) at UnixNet.cc:386 #25 0x006a95cf in handleEvent (this=0x2b6315111010, e=0x2bdb270, calling_code=5) at I_Continuation.h:146 #26 EThread::process_event (this=0x2b6315111010, e=0x2bdb270, calling_code=5) at
[jira] [Updated] (TS-2837) Dangling pointer in URLImpl which may cause core dump
[ https://issues.apache.org/jira/browse/TS-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2837: Affects Version/s: 4.0.2 Dangling pointer in URLImpl which may cause core dump - Key: TS-2837 URL: https://issues.apache.org/jira/browse/TS-2837 Project: Traffic Server Issue Type: Bug Components: HTTP, Logging Affects Versions: 4.0.2 Reporter: kang li There were core dump shows that URLImpl::m_ptr_printed_string was out of bound. {code} #0 ink_strlcpy (dst=value optimized out, src=0x2b64901a79a4 Address 0x2b64901a79a4 out of bounds, siz=value optimized out) at ink_string.cc:226 #1 0x0058d820 in LogAccessHttp::init (this=0x2b631752da20) at LogAccessHttp.cc:96 #2 0x0058a96f in resolve_logfield_string (context=0x2b631752da20, format_str=0x3468e10 !DOCTYPE html\nhtml lang=\en-us\head\nmeta http-equiv=\content-type\ content=\text/html; charset=UTF-8\\nmeta charset=\utf-8\\ntitleYahoo/title\nmeta name=\viewport\ content=\wid...) at LogAccess.cc:1396 #3 0x00508fed in HttpBodyTemplate::build_instantiated_buffer (this=0x3468de0, context=value optimized out, buflen_return=0x2b63e2c42068) at HttpBodyFactory.cc:1015 #4 0x00509ea4 in HttpBodyFactory::fabricate (this=value optimized out, acpt_language_list=0x2b631752e130, acpt_charset_list=0x2b631752dfd0, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, buffer_length_return=0x2b63e2c42068, content_language_return=0x2b631752e2a8, content_charset_return=0x2b631752e2a0, set_return=0x2b631752e298) at HttpBodyFactory.cc:451 #5 0x0050b1b7 in HttpBodyFactory::fabricate_with_old_api(const char *, HttpTransact::State *, int64_t, int64_t *, char *, size_t, char *, size_t, const char *, typedef __va_list_tag __va_list_tag *) (this=0x344e850, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, max_buffer_length=8192, resulting_buffer_length=0x2b63e2c42068, content_language_out_buf=0x2b631752e460 en, content_language_buf_size=256, content_type_out_buf=0x2b631752e360 text/html, content_type_buf_size=256, format=0x6ce288 internal error - server connection terminated, ap=0x2b631752e590) at HttpBodyFactory.cc:137 #6 0x0054cd74 in HttpTransact::build_error_response (s=0x2b63e2c416d8, status_code=HTTP_STATUS_BAD_GATEWAY, reason_phrase_or_null=0x6ce288 internal error - server connection terminated, error_body_type=value optimized out, format=0x6ce288 internal error - server connection terminated) at HttpTransact.cc:7998 #7 0x00551b42 in HttpTransact::handle_server_connection_not_open (s=0x2b63e2c416d8) at HttpTransact.cc:3719 #8 0x00562f31 in HttpTransact::HandleResponse (s=0x2b63e2c416d8) at HttpTransact.cc:3180 #9 0x0051c588 in HttpSM::call_transact_and_set_next_state (this=0x2b63e2c41670, f=value optimized out) at HttpSM.cc:6817 #10 0x00531f89 in HttpSM::state_http_server_open (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:1712 #11 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:2548 #12 0x006878ec in handleEvent (this=0x2b64ac19f870, t=0x2b6315111010) at ../../iocore/eventsystem/I_Continuation.h:146 #13 UnixNetVConnection::connectUp (this=0x2b64ac19f870, t=0x2b6315111010) at UnixNetVConnection.cc:1104 #14 0x006849ca in UnixNetProcessor::connect_re_internal (this=value optimized out, cont=value optimized out, target=value optimized out, opt=0x2b631752f990) at UnixNetProcessor.cc:250 #15 0x005314ae in connect_re (this=0x2b63e2c41670, raw=value optimized out) at ../../iocore/net/P_UnixNetProcessor.h:89 #16 HttpSM::do_http_server_open (this=0x2b63e2c41670, raw=value optimized out) at HttpSM.cc:4676 #17 0x00536446 in HttpSM::set_next_state (this=0x2b63e2c41670) at HttpSM.cc:7006 #18 0x005371ef in HttpSM::state_send_server_request_header (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2008 #19 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2548 #20 0x006872a1 in handleEvent (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at ../../iocore/eventsystem/I_Continuation.h:146 #21 read_signal_and_update (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:138 #22 read_signal_done (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:168 #23 0x00689ab6 in read_from_net (nh=0x2b6315114e20, vc=0x2b64144c7d20, thread=value optimized out) at UnixNetVConnection.cc:291 #24 0x0068 in
[jira] [Comment Edited] (TS-2837) Dangling pointer in URLImpl which may cause core dump
[ https://issues.apache.org/jira/browse/TS-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007134#comment-14007134 ] kang li edited comment on TS-2837 at 5/23/14 1:00 PM: -- There are still core dump after the patch in TS-1411. Which seems to be the same issue. was (Author: kang li): There are still core dump after the patch in TS-1411. Dangling pointer in URLImpl which may cause core dump - Key: TS-2837 URL: https://issues.apache.org/jira/browse/TS-2837 Project: Traffic Server Issue Type: Bug Components: HTTP, Logging Affects Versions: 4.0.2 Reporter: kang li There were core dump shows that URLImpl::m_ptr_printed_string was out of bound. {code} #0 ink_strlcpy (dst=value optimized out, src=0x2b64901a79a4 Address 0x2b64901a79a4 out of bounds, siz=value optimized out) at ink_string.cc:226 #1 0x0058d820 in LogAccessHttp::init (this=0x2b631752da20) at LogAccessHttp.cc:96 #2 0x0058a96f in resolve_logfield_string (context=0x2b631752da20, format_str=0x3468e10 !DOCTYPE html\nhtml lang=\en-us\head\nmeta http-equiv=\content-type\ content=\text/html; charset=UTF-8\\nmeta charset=\utf-8\\ntitleYahoo/title\nmeta name=\viewport\ content=\wid...) at LogAccess.cc:1396 #3 0x00508fed in HttpBodyTemplate::build_instantiated_buffer (this=0x3468de0, context=value optimized out, buflen_return=0x2b63e2c42068) at HttpBodyFactory.cc:1015 #4 0x00509ea4 in HttpBodyFactory::fabricate (this=value optimized out, acpt_language_list=0x2b631752e130, acpt_charset_list=0x2b631752dfd0, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, buffer_length_return=0x2b63e2c42068, content_language_return=0x2b631752e2a8, content_charset_return=0x2b631752e2a0, set_return=0x2b631752e298) at HttpBodyFactory.cc:451 #5 0x0050b1b7 in HttpBodyFactory::fabricate_with_old_api(const char *, HttpTransact::State *, int64_t, int64_t *, char *, size_t, char *, size_t, const char *, typedef __va_list_tag __va_list_tag *) (this=0x344e850, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, max_buffer_length=8192, resulting_buffer_length=0x2b63e2c42068, content_language_out_buf=0x2b631752e460 en, content_language_buf_size=256, content_type_out_buf=0x2b631752e360 text/html, content_type_buf_size=256, format=0x6ce288 internal error - server connection terminated, ap=0x2b631752e590) at HttpBodyFactory.cc:137 #6 0x0054cd74 in HttpTransact::build_error_response (s=0x2b63e2c416d8, status_code=HTTP_STATUS_BAD_GATEWAY, reason_phrase_or_null=0x6ce288 internal error - server connection terminated, error_body_type=value optimized out, format=0x6ce288 internal error - server connection terminated) at HttpTransact.cc:7998 #7 0x00551b42 in HttpTransact::handle_server_connection_not_open (s=0x2b63e2c416d8) at HttpTransact.cc:3719 #8 0x00562f31 in HttpTransact::HandleResponse (s=0x2b63e2c416d8) at HttpTransact.cc:3180 #9 0x0051c588 in HttpSM::call_transact_and_set_next_state (this=0x2b63e2c41670, f=value optimized out) at HttpSM.cc:6817 #10 0x00531f89 in HttpSM::state_http_server_open (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:1712 #11 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:2548 #12 0x006878ec in handleEvent (this=0x2b64ac19f870, t=0x2b6315111010) at ../../iocore/eventsystem/I_Continuation.h:146 #13 UnixNetVConnection::connectUp (this=0x2b64ac19f870, t=0x2b6315111010) at UnixNetVConnection.cc:1104 #14 0x006849ca in UnixNetProcessor::connect_re_internal (this=value optimized out, cont=value optimized out, target=value optimized out, opt=0x2b631752f990) at UnixNetProcessor.cc:250 #15 0x005314ae in connect_re (this=0x2b63e2c41670, raw=value optimized out) at ../../iocore/net/P_UnixNetProcessor.h:89 #16 HttpSM::do_http_server_open (this=0x2b63e2c41670, raw=value optimized out) at HttpSM.cc:4676 #17 0x00536446 in HttpSM::set_next_state (this=0x2b63e2c41670) at HttpSM.cc:7006 #18 0x005371ef in HttpSM::state_send_server_request_header (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2008 #19 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2548 #20 0x006872a1 in handleEvent (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at ../../iocore/eventsystem/I_Continuation.h:146 #21 read_signal_and_update (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:138 #22 read_signal_done (event=value optimized out,
[jira] [Commented] (TS-2837) Dangling pointer in URLImpl which may cause core dump
[ https://issues.apache.org/jira/browse/TS-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007134#comment-14007134 ] kang li commented on TS-2837: - There are still core dump after the patch in TS-1411. Dangling pointer in URLImpl which may cause core dump - Key: TS-2837 URL: https://issues.apache.org/jira/browse/TS-2837 Project: Traffic Server Issue Type: Bug Components: HTTP, Logging Affects Versions: 4.0.2 Reporter: kang li There were core dump shows that URLImpl::m_ptr_printed_string was out of bound. {code} #0 ink_strlcpy (dst=value optimized out, src=0x2b64901a79a4 Address 0x2b64901a79a4 out of bounds, siz=value optimized out) at ink_string.cc:226 #1 0x0058d820 in LogAccessHttp::init (this=0x2b631752da20) at LogAccessHttp.cc:96 #2 0x0058a96f in resolve_logfield_string (context=0x2b631752da20, format_str=0x3468e10 !DOCTYPE html\nhtml lang=\en-us\head\nmeta http-equiv=\content-type\ content=\text/html; charset=UTF-8\\nmeta charset=\utf-8\\ntitleYahoo/title\nmeta name=\viewport\ content=\wid...) at LogAccess.cc:1396 #3 0x00508fed in HttpBodyTemplate::build_instantiated_buffer (this=0x3468de0, context=value optimized out, buflen_return=0x2b63e2c42068) at HttpBodyFactory.cc:1015 #4 0x00509ea4 in HttpBodyFactory::fabricate (this=value optimized out, acpt_language_list=0x2b631752e130, acpt_charset_list=0x2b631752dfd0, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, buffer_length_return=0x2b63e2c42068, content_language_return=0x2b631752e2a8, content_charset_return=0x2b631752e2a0, set_return=0x2b631752e298) at HttpBodyFactory.cc:451 #5 0x0050b1b7 in HttpBodyFactory::fabricate_with_old_api(const char *, HttpTransact::State *, int64_t, int64_t *, char *, size_t, char *, size_t, const char *, typedef __va_list_tag __va_list_tag *) (this=0x344e850, type=0x6d2c3d connect#failed_connect, context=0x2b63e2c416d8, max_buffer_length=8192, resulting_buffer_length=0x2b63e2c42068, content_language_out_buf=0x2b631752e460 en, content_language_buf_size=256, content_type_out_buf=0x2b631752e360 text/html, content_type_buf_size=256, format=0x6ce288 internal error - server connection terminated, ap=0x2b631752e590) at HttpBodyFactory.cc:137 #6 0x0054cd74 in HttpTransact::build_error_response (s=0x2b63e2c416d8, status_code=HTTP_STATUS_BAD_GATEWAY, reason_phrase_or_null=0x6ce288 internal error - server connection terminated, error_body_type=value optimized out, format=0x6ce288 internal error - server connection terminated) at HttpTransact.cc:7998 #7 0x00551b42 in HttpTransact::handle_server_connection_not_open (s=0x2b63e2c416d8) at HttpTransact.cc:3719 #8 0x00562f31 in HttpTransact::HandleResponse (s=0x2b63e2c416d8) at HttpTransact.cc:3180 #9 0x0051c588 in HttpSM::call_transact_and_set_next_state (this=0x2b63e2c41670, f=value optimized out) at HttpSM.cc:6817 #10 0x00531f89 in HttpSM::state_http_server_open (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:1712 #11 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=201, data=0xff9d) at HttpSM.cc:2548 #12 0x006878ec in handleEvent (this=0x2b64ac19f870, t=0x2b6315111010) at ../../iocore/eventsystem/I_Continuation.h:146 #13 UnixNetVConnection::connectUp (this=0x2b64ac19f870, t=0x2b6315111010) at UnixNetVConnection.cc:1104 #14 0x006849ca in UnixNetProcessor::connect_re_internal (this=value optimized out, cont=value optimized out, target=value optimized out, opt=0x2b631752f990) at UnixNetProcessor.cc:250 #15 0x005314ae in connect_re (this=0x2b63e2c41670, raw=value optimized out) at ../../iocore/net/P_UnixNetProcessor.h:89 #16 HttpSM::do_http_server_open (this=0x2b63e2c41670, raw=value optimized out) at HttpSM.cc:4676 #17 0x00536446 in HttpSM::set_next_state (this=0x2b63e2c41670) at HttpSM.cc:7006 #18 0x005371ef in HttpSM::state_send_server_request_header (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2008 #19 0x00530b38 in HttpSM::main_handler (this=0x2b63e2c41670, event=104, data=0x2b64144c7e28) at HttpSM.cc:2548 #20 0x006872a1 in handleEvent (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at ../../iocore/eventsystem/I_Continuation.h:146 #21 read_signal_and_update (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:138 #22 read_signal_done (event=value optimized out, nh=0x2b6315114e20, vc=0x2b64144c7d20) at UnixNetVConnection.cc:168 #23 0x00689ab6 in read_from_net (nh=0x2b6315114e20, vc=0x2b64144c7d20, thread=value optimized
[jira] [Commented] (TS-2653) SSL Error message cleanup
[ https://issues.apache.org/jira/browse/TS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003197#comment-14003197 ] kang li commented on TS-2653: - Hi [~bcall] , I had investigated the alert 0 error. It occurs in two condition through the tcpdump result and code analysis. 1. libsecurity_ssl read SSL record error, then it send a fatal alert 0 to server. This condition is hard to be avoided as it was triggered in client side. One simple fix for this issue that could just ignore this CLOSE_NOTIFY error which dose in libsecurity_ssl. Or this may related other issues that trigger libsecurity_ssl read errors. 2. ATS read error and then shutdown the TCP connection without close notify to client. This breaks the rfc standard, so libsecurity_ssl respond with fatal alert 0. I have tried fix this problem by send close notify before close tcp connection. But the result shows that close notify didn't been successful sent as the TCP connection may have been shutdown before calling close_UnixNetVConnection. As these alert 0 error doesn't mean real error at it always show successful access log. I'm now working high priority issues. Will move back to this issue if I got free time. SSL Error message cleanup - Key: TS-2653 URL: https://issues.apache.org/jira/browse/TS-2653 Project: Traffic Server Issue Type: Bug Components: Logging, SSL Reporter: Bryan Call Assignee: Bryan Call Fix For: 5.0.0 We see a lot of SSL error messages in production. It would be good to determine if these are really errors or remove logging of some of these errors: {code} -bash-4.1$ tail -10 diags.log | cut -f4-20 -d : | grep SSL | sort | uniq -c | sort -rn 3108 SSL::36:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3079 SSL::32:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3068 SSL::27:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3051 SSL::44:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3043 SSL::24:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3041 SSL::47:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3041 SSL::38:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3040 SSL::46:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3025 SSL::34:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3025 SSL::25:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3021 SSL::31:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3011 SSL::42:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3006 SSL::39:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3004 SSL::29:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3000 SSL::30:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2996 SSL::43:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2993 SSL::45:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2977 SSL::40:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2976 SSL::33:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2974 SSL::41:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2974 SSL::28:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2958 SSL::37:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2947 SSL::35:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2922 SSL::26:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 28 SSL::36:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 26 SSL::24:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 25 SSL::44:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 25 SSL::27:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3
[jira] [Updated] (TS-2789) Typo in HttpSessionManger would cause ATS reuse wrong session to origin server.
[ https://issues.apache.org/jira/browse/TS-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2789: Description: There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be {code} - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) {code} This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. was: There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. Typo in HttpSessionManger would cause ATS reuse wrong session to origin server. --- Key: TS-2789 URL: https://issues.apache.org/jira/browse/TS-2789 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: kang li There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be {code} - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) {code} This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TS-2789) Typo in HttpSessionManger would cause ATS reuse wrong session to origin server.
kang li created TS-2789: --- Summary: Typo in HttpSessionManger would cause ATS reuse wrong session to origin server. Key: TS-2789 URL: https://issues.apache.org/jira/browse/TS-2789 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: kang li There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TS-2789) Typo in HttpSessionManger would cause ATS reuse wrong session to origin server.
[ https://issues.apache.org/jira/browse/TS-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2789: Description: There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. was: There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. Typo in HttpSessionManger would cause ATS reuse wrong session to origin server. --- Key: TS-2789 URL: https://issues.apache.org/jira/browse/TS-2789 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: kang li There is a typo in HttpSessionManger (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) The fix would be - (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(addr) == ats_ip_port_cast(addr))) + (ats_ip_addr_eq(s-server_ip.sa, addr) ats_ip_port_cast(s-server_ip.sa) == ats_ip_port_cast(addr))) This typo skip the port check, so if requests to same origin server would use one same session even though different port. Which would cause ATS-5.0 reuse wrong session to origin. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TS-2548) Add client IP to SSLError() calls in SSLNetVConnection
[ https://issues.apache.org/jira/browse/TS-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2548: Attachment: ssl_log_enhancement.diff Add peer ip address in SSLError and SSLDebug. Add client IP to SSLError() calls in SSLNetVConnection --- Key: TS-2548 URL: https://issues.apache.org/jira/browse/TS-2548 Project: Traffic Server Issue Type: Improvement Components: Logging, SSL Reporter: David Carlin Fix For: 5.0.0 Attachments: ssl_log_enhancement.diff I asked on IRC if we could put the Client IP in the SSL errors that appear in diags.log and /var/log/messages - jpeach replied that it was a matter of adding client IP to SSLError() calls in SSLNetVConnection. This would be very helpful for troubleshooting. Additionally, why are the errors sent to /var/log/messages - writing them to only diags.log is preferable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2548) Add client IP to SSLError() calls in SSLNetVConnection
[ https://issues.apache.org/jira/browse/TS-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966382#comment-13966382 ] kang li commented on TS-2548: - Hi [~jpe...@apache.org], Could we just call SSLError for SSL_ERROR_SSL error in SSLNetVConnection::sslServerHandShakeEvent, ssl_read_from_net which like SSLNetVConnection::sslClientHandShakeEvent, SSLNetVConnection::load_buffer_and_write do. I saw your change from SSLError to SSLDebug in serverHandshake in concern that there may be a lot of error logs. But when I change it to SSLError when SSL_ERROR_SSL occurs I haven't seen a sharp increase of error logs. Most handshake errors related to SSL23_GET_CLIENT_HELLO which only occurs when handshake. {code} [Apr 11 09:24:57.447] Server {0x2af704037700} ERROR: SSL::54:error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO: http request:s23_srvr.c:418:peer address is xx.xx.xx.xx [Apr 11 09:24:57.447] Server {0x2af704037700} ERROR: SSL handshake error {code} {code} -bash-4.1$ grep -R -E Apr\s11 diags.log | grep SSL handshake error | wc -l 848 -bash-4.1$ grep -R -E Apr\s11 diags.log | grep ssl_read_from_net | wc -l 2360 -bash-4.1$ grep -R -E Apr\s11 diags.log | grep SSL23_GET_CLIENT_HELLO | wc -l 599 {code} And when we apply the patch of https://issues.apache.org/jira/browse/TS-2096 in ATS-4.0.2, we see a lot of error logs related to SSL_ERROR_SYSCALL. {code} [Mar 25 22:48:05.325] Server {0x2aae19ce9700} ERROR: [SSL_NetVConnection::ssl_read_from_net] SSL_ERROR_SYSCALL, underlying IO error: Connection reset by peer {code} It happens when some unexpected connection close occurs, ATS would try to retransmit package several times, this would generate error log several times which seems too verbose. This would constitute about half of the error logs. A useful error log should have SSL error info and when it occurred which like below. {code} [Apr 11 09:12:37.988] Server {0x2af702e25700} ERROR: SSL::36:error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca:s3_pkt.c:1256:SSL alert number 48:peer address is xx.xx.xx.xx [Apr 11 09:12:37.988] Server {0x2af702e25700} ERROR: [SSL_NetVConnection::ssl_read_from_net] {code} If you think this fix is reasonable, I'll submit a patch for this. Add client IP to SSLError() calls in SSLNetVConnection --- Key: TS-2548 URL: https://issues.apache.org/jira/browse/TS-2548 Project: Traffic Server Issue Type: Improvement Components: Logging, SSL Reporter: David Carlin Fix For: 5.0.0 Attachments: ssl_log_enhancement.diff I asked on IRC if we could put the Client IP in the SSL errors that appear in diags.log and /var/log/messages - jpeach replied that it was a matter of adding client IP to SSLError() calls in SSLNetVConnection. This would be very helpful for troubleshooting. Additionally, why are the errors sent to /var/log/messages - writing them to only diags.log is preferable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TS-2709) ATS don't send close notify before close connection which break rfc standard and cause some unepected results
kang li created TS-2709: --- Summary: ATS don't send close notify before close connection which break rfc standard and cause some unepected results Key: TS-2709 URL: https://issues.apache.org/jira/browse/TS-2709 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: kang li ATS directly send FIN to client without send close notify before it. This break rfc standard. This can be easily reproduced by set CONFIG proxy.config.http.keep_alive_enabled_in INT 0 http://tools.ietf.org/html/rfc5246#section-7.2.1 7.2.1. Closure Alerts The client and the server must share knowledge that the connection is ending in order to avoid a truncation attack. Either party may initiate the exchange of closing messages. close_notify This message notifies the recipient that the sender will not send any more messages on this connection. Note that as of TLS 1.1, failure to properly close a connection no longer requires that a session not be resumed. This is a change from TLS 1.0 to conform with widespread implementation practice. Either party may initiate a close by sending a close_notify alert. Any data received after a closure alert is ignored. This cause Safari on Apple devices send fatal alert 0 in some condition. This would generate a lot of error log in diags.log. Apple's SSL library libsecurity_ssl treat unexpected shutdown as fatal error in some times. ERROR: SSL::44:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2653) SSL Error message cleanup
[ https://issues.apache.org/jira/browse/TS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964966#comment-13964966 ] kang li commented on TS-2653: - alert 0 error was caused by unexpected shutdown of connection. And this error message mostly send by Safari on Apple devices as Apple's SSL library libsecurity_ssl treat unexpected shutdown as fatal error in some time. SSL Error message cleanup - Key: TS-2653 URL: https://issues.apache.org/jira/browse/TS-2653 Project: Traffic Server Issue Type: Bug Components: Logging, SSL Reporter: Bryan Call Assignee: Bryan Call Fix For: 5.0.0 We see a lot of SSL error messages in production. It would be good to determine if these are really errors or remove logging of some of these errors: {code} -bash-4.1$ tail -10 diags.log | cut -f4-20 -d : | grep SSL | sort | uniq -c | sort -rn 3108 SSL::36:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3079 SSL::32:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3068 SSL::27:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3051 SSL::44:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3043 SSL::24:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3041 SSL::47:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3041 SSL::38:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3040 SSL::46:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3025 SSL::34:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3025 SSL::25:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3021 SSL::31:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3011 SSL::42:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3006 SSL::39:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3004 SSL::29:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 3000 SSL::30:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2996 SSL::43:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2993 SSL::45:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2977 SSL::40:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2976 SSL::33:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2974 SSL::41:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2974 SSL::28:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2958 SSL::37:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2947 SSL::35:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 2922 SSL::26:error:140943E8:SSL routines:SSL3_READ_BYTES:reason(1000):s3_pkt.c:1256:SSL alert number 0 28 SSL::36:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 26 SSL::24:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 25 SSL::44:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 25 SSL::27:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 24 SSL::34:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 24 SSL::30:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 23 SSL::39:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 23 SSL::33:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 23 SSL::32:error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired:s3_pkt.c:1256:SSL alert number 45 22 SSL::44:error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca:s3_pkt.c:1256:SSL alert number 48 21
[jira] [Commented] (TS-2210) add API to get access to the client cert in the SSL Net VC
[ https://issues.apache.org/jira/browse/TS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908112#comment-13908112 ] kang li commented on TS-2210: - Hi James, Is there any update about the Jira ticket ? Do I need some extra modification. add API to get access to the client cert in the SSL Net VC -- Key: TS-2210 URL: https://issues.apache.org/jira/browse/TS-2210 Project: Traffic Server Issue Type: Improvement Components: SSL, TS API Reporter: Bryan Call Assignee: James Peach Labels: Review Fix For: 5.0.0 Attachments: 2210.diff, TS-2210-2.diff In SSLNetVConnection SSL_get_peer_certificate(ssl) is called and client_cert is set. There is a request from Brian France to get access to the client cert. He wants to be able to call X509_NAME_oneline(), X509_get_subject_name(), and X509_get_issuer_name() on the cert. Where the cert is set in the code: iocore/net/SSLNetVConnection.cc:499:client_cert = SSL_get_peer_certificate(ssl); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2210) add API to get access to the client cert in the SSL Net VC
[ https://issues.apache.org/jira/browse/TS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896336#comment-13896336 ] kang li commented on TS-2210: - Hi James, The new API was more concise. I have also done a small test, the new style API worked well. But for SSL_CTX we need SSL to get the verify result and client certificate, and SSLNetVConnection store SSL as the domain. So I think return SSL would be more convenient: {code} void *TSHttpSsnSSLConnectionGet(TSHttpSsn); // Returns SSL * {code} If SSL_CTX was needed, we could use SSL_get_SSL_CTX to get related SSL_CTX. If the newer API was suitable, I would send the API review request. add API to get access to the client cert in the SSL Net VC -- Key: TS-2210 URL: https://issues.apache.org/jira/browse/TS-2210 Project: Traffic Server Issue Type: Improvement Components: SSL, TS API Reporter: Bryan Call Assignee: James Peach Fix For: 5.0.0 Attachments: 2210.diff In SSLNetVConnection SSL_get_peer_certificate(ssl) is called and client_cert is set. There is a request from Brian France to get access to the client cert. He wants to be able to call X509_NAME_oneline(), X509_get_subject_name(), and X509_get_issuer_name() on the cert. Where the cert is set in the code: iocore/net/SSLNetVConnection.cc:499:client_cert = SSL_get_peer_certificate(ssl); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (TS-2210) add API to get access to the client cert in the SSL Net VC
[ https://issues.apache.org/jira/browse/TS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2210: Attachment: TS-2210-2.diff add API to get access to the client cert in the SSL Net VC -- Key: TS-2210 URL: https://issues.apache.org/jira/browse/TS-2210 Project: Traffic Server Issue Type: Improvement Components: SSL, TS API Reporter: Bryan Call Assignee: James Peach Fix For: 5.0.0 Attachments: 2210.diff, TS-2210-2.diff In SSLNetVConnection SSL_get_peer_certificate(ssl) is called and client_cert is set. There is a request from Brian France to get access to the client cert. He wants to be able to call X509_NAME_oneline(), X509_get_subject_name(), and X509_get_issuer_name() on the cert. Where the cert is set in the code: iocore/net/SSLNetVConnection.cc:499:client_cert = SSL_get_peer_certificate(ssl); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (TS-2210) add API to get access to the client cert in the SSL Net VC
[ https://issues.apache.org/jira/browse/TS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kang li updated TS-2210: Attachment: 2210.diff Add api to export SSL client certificate information, include client certificate verify result, issuer DN, subject DN, etc. add API to get access to the client cert in the SSL Net VC -- Key: TS-2210 URL: https://issues.apache.org/jira/browse/TS-2210 Project: Traffic Server Issue Type: Improvement Components: SSL, TS API Reporter: Bryan Call Fix For: 4.2.0 Attachments: 2210.diff In SSLNetVConnection SSL_get_peer_certificate(ssl) is called and client_cert is set. There is a request from Brian France to get access to the client cert. He wants to be able to call X509_NAME_oneline(), X509_get_subject_name(), and X509_get_issuer_name() on the cert. Where the cert is set in the code: iocore/net/SSLNetVConnection.cc:499:client_cert = SSL_get_peer_certificate(ssl); -- This message was sent by Atlassian JIRA (v6.1.5#6160)