[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344988#comment-14344988 ] Feifei Cai commented on TS-2497: For *redirect GET* use case, the test is as follows: {noformat} $ curl http://httpbin.org/redirect-to?url=http://httpbin.org/get -L -v {noformat} However, the *redirect POST* request is not support: {noformat} $ curl http://httpbin.org/redirect-to?url=http://httpbin.org/post -d xxx -L -v {noformat} References: # http://httpbin.org/ # https://github.com/Runscope/httpbin/issues/156 Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344875#comment-14344875 ] Feifei Cai commented on TS-2497: Not quite clear about the POST/PUT redirection use case. For GET methods, I have simple test case; however, I have not set up a test case for POST redirection yet. BTW, I tried reverting the patch , i.e. deallocate memory in both no origin server response and receive origin server response cases, it runs stable for ~3 days in our production server. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306129#comment-14306129 ] Susan Hinrichs commented on TS-2497: Oh, yes, the do_io_read set to 0 is only done when redirect_in_process is false. Probably want to clear the read in all cases. But are you testing with redirects turned on? [~briang] and [~jacksontj] any further thoughts on how to reproduce your original problem so our attempts to fix the memory leak do not re-introduce the crash? How frequent was the original problem? If it happened fairly often, perhaps you could try Feifei's memory leak fix in your environment? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306135#comment-14306135 ] Brian Geffon commented on TS-2497: -- [~shinrich], honestly it was so long ago I really don't remember. I believe it only reproduced for us under highload in production. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306497#comment-14306497 ] Feifei Cai commented on TS-2497: I have not test redirects yet. Will try it. Hi [~briang]], do you remember the crash case? Is it happened on redirects or not? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296851#comment-14296851 ] Feifei Cai commented on TS-2497: Yes, now I have the same question: TS-1425's patch disables read from client side - cancel read_from_net() call using MIOBuffer *post_buffer. This should have fixed the premature free issue. Why we still try to delay free time to prevent conflict with client side read? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296866#comment-14296866 ] Feifei Cai commented on TS-2497: Hi [~shinrich], the read on server_session would be scheduled on transferring server response to client, and it would use new MIOBuffer instead of post_buffer, so it would not cause issue here. I think TS-1425 may have fixed the issue here, or not doing enough yet. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289600#comment-14289600 ] Susan Hinrichs commented on TS-2497: Looking at it some more, I wonder if the solution is just to eliminate the reset, at least in the branch without the deallocate_buffer. Another producer will be added to handle the traffic from the server back to the client, but due to changes in TS-3190, it is safer to leave the original producer/consumer around because the new producer will be started up explicitly. The original logic in many cases would start all producers at once. I'm having problems getting a case to even call handle_post_failure. Will look back at this later. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289632#comment-14289632 ] Brian Geffon commented on TS-2497: -- I'd have to do some reading, I don't really remember to much about this. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289413#comment-14289413 ] Sudheer Vinukonda commented on TS-2497: --- [~ffcai]: Does the below patch fix the leak as well as address the concerns about not breaking TS-2497? diff --git a/proxy/http/HttpTunnel.cc b/proxy/http/HttpTunnel.cc index d75b5a1..75df1a5 100644 --- a/proxy/http/HttpTunnel.cc +++ b/proxy/http/HttpTunnel.cc @@ -621,6 +621,11 @@ HttpTunnel::add_producer(VConnection * vc, if ((p = alloc_producer()) != NULL) { p-vc = vc; p-nbytes = nbytes_arg; +if (p-read_buffer) { + free_MIOBuffer(p-read_buffer); + p-read_buffer = NULL; + p-buffer_start = NULL; +} p-buffer_start = reader_start; p-read_buffer = reader_start-mbuf; p-vc_handler = sm_handler; Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289436#comment-14289436 ] Susan Hinrichs commented on TS-2497: I'm a bit unclear on the original problem that was being solved. Looking at the two commits, it appears that the tunnel.deallocate_buffers(); was moved from always being called in HttpSM::handle_post_failure to only being called if (server_buffer_reader-read_avail() = 0). But tunnel.reset is called in all cases (regardless of the value of server_buffer_reader-read_avail()), so [~ffcai] is seeing a leak in the case where server_buffer_reader-read_avail() 0. But if we add tunnel.deallocate_buffers(); then we are in the original case as far as I can tell. Judging from the original stack trace, it looks like there was a lingering read or write on the tunnel buffer. TS-1425 fixed that for the user agent side by canceling the read on the ua_session. Perhaps the real solution here is to cancel the read on the server_session? And then deallocate_buffers for the tunnel in all cases. [~jacksontj] and [~briang] do you still have your notes on reproducing the original crash? Then we could experiment with adding back the deallocate_buffer with a read cancel and see if we can safely solve the memory leak. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289206#comment-14289206 ] Feifei Cai commented on TS-2497: Memory leak is noticed in our production hosts. It should be related to handling 5xx response from origin sever. The dump info is as follows, it's from 1 host with ~70% POST requests. I enabled memory dump {{proxy.config.dump_mem_info_frequency}} and track {{proxy.config.res_track_memory}}. *traffic.out:* {noformat} allocated |in-use | type size | free list name |||-- 0 | 0 |2097152 | memory/ioBufAllocator[14] 0 | 0 |1048576 | memory/ioBufAllocator[13] 0 | 0 | 524288 | memory/ioBufAllocator[12] 0 | 0 | 262144 | memory/ioBufAllocator[11] 0 | 0 | 131072 | memory/ioBufAllocator[10] 0 | 0 | 65536 | memory/ioBufAllocator[9] 1266679808 | 1262354432 | 32768 | memory/ioBufAllocator[8] 600309760 | 599703552 | 16384 | memory/ioBufAllocator[7] 395051008 | 391086080 | 8192 | memory/ioBufAllocator[6] 229113856 | 224432128 | 4096 | memory/ioBufAllocator[5] 342622208 | 342503424 | 2048 | memory/ioBufAllocator[4] 245104640 | 245042176 | 1024 | memory/ioBufAllocator[3] 2228224 |2176512 |512 | memory/ioBufAllocator[2] 622592 | 607232 |256 | memory/ioBufAllocator[1] 2375680 |2370176 |128 | memory/ioBufAllocator[0] Location | Size In-use ---+ memory/IOBuffer/ProtocolProbeSessionAccept.cc:39 | 66768896 memory/IOBuffer/HttpClientSession.cc:230 |0 memory/IOBuffer/HttpSM.cc:3314 |0 memory/IOBuffer/HttpSM.cc:5349 | 3003506816 memory/IOBuffer/HttpSM.cc:5668 |0 memory/IOBuffer/HttpSM.cc:5874 |0 memory/IOBuffer/HttpSM.cc:5976 |0 memory/IOBuffer/HttpSM.cc:6267 |0 memory/IOBuffer/HttpServerSession.cc:87 |0 memory/IOBuffer/HttpTunnel.cc:95 |0 memory/IOBuffer/HttpTunnel.cc:100 |0 TOTAL | 3070275712 {noformat} I take a refer to [~shaunmcginnity]'s node.js with some changes, and reproduce the memory leak in my local environment. # origin-server.js This origin server responses a 503 when receives more than one single byte, so the post would not complete at most cases. I change [~shaunmcginnity]'s code, make origin server responses to ats, which would make ats hits another code path. # client.js We create a new client per second, and each client try to post 32K bytes data. # ATS *remap.config*: remap all to local port 5000 {quote}map / http://127.0.0.1:5000{quote} *records.config*: listen on 80 {quote}CONFIG proxy.config.http.server_ports STRING 80{quote} Then we can get dump info as follows, and in-use number of MIOBuffer with index=8 (size=32K) would increase 1 per second. {noformat} allocated |in-use | type size | free list name |||-- 1048576 | 32768 | 32768 | memory/ioBufAllocator[8] {noformat} We can also try change the Content-Length in client.js to a smaller size, and MIOBuffer with the corresponding index(0-7) would also increase. I add this simple patch to prevent the memory leak in the case above, just like last commit, and it's verified in 1 test host. free.diff {code} diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc index 932ef97..123b97a 100644 --- a/proxy/http/HttpSM.cc +++ b/proxy/http/HttpSM.cc @@ -5074,6 +5074,7 @@ HttpSM::handle_post_failure() t_state.current.server-keep_alive = HTTP_NO_KEEPALIVE; if (server_buffer_reader-read_avail() 0) { +tunnel.deallocate_buffers(); tunnel.reset(); // There's data from the server so try to read the header setup_server_read_response_header(); {code} *traffic.out* {noformat} allocated |in-use | type size | free list name
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289453#comment-14289453 ] Sudheer Vinukonda commented on TS-2497: --- I wonder if the issue that was originally fixed by [~briang] is similar to the issue resolved in TS-3285 (freeing the MIOBuffer while there's a write/read in progress, which could eventually corrupt the buffer on the free list). Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036672#comment-14036672 ] ASF subversion and git services commented on TS-2497: - Commit 3590612398a1e458a056541a5ca5d0dc44405183 in trafficserver's branch refs/heads/4.2.x from [~briang] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=3590612 ] TS-2497: Memory leak with failed posts Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001605#comment-14001605 ] Shaun McGinnity commented on TS-2497: - Hi Brian, is this the final patch? Also, does this replace the original patch that removed the call to deallocate_buffers? Thanks, Shaun Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001613#comment-14001613 ] Shaun McGinnity commented on TS-2497: - One more comment, we have also applied the patch for https://issues.apache.org/jira/browse/TS-1425 - do you think we still need this patch? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995893#comment-13995893 ] Brian Geffon commented on TS-2497: -- Shaun, I'm glad that patched resolved the leak but this patch appears to introduce another issue. I'll update here when I have more info. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996615#comment-13996615 ] ASF subversion and git services commented on TS-2497: - Commit bcdd0494b7c9dccba13a2c541a5a24cee3315451 in trafficserver's branch refs/heads/master from [~briang] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=bcdd049 ] TS-2497: Memory leak with failed posts Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989885#comment-13989885 ] Brian Geffon commented on TS-2497: -- [~shaunmcginnity], I was able to reproduce the leak, I've attached a new patch TS-2497 that appears to fix the issue, can you please verify? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988307#comment-13988307 ] Brian Geffon commented on TS-2497: -- Shaun, just out of curiosity does the memory stabilize at some point? Because HttpSM::kill_this() calls tunnel.deallocate_buffers(), so I'm wondering if you're just seeing the memory increase as the freelist grows? Or does it continue to grow unbounded? Brian Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988340#comment-13988340 ] Brian Geffon commented on TS-2497: -- [~shaunmcginnity] I believe I see the issue, tunnel.reset() calls memset on producers/consumers making the call to tunnel.deallocate_buffers() do nothing later on, can you please try the following patch: {code} diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc index 3ff0397..e0357f8 100644 --- a/proxy/http/HttpSM.cc +++ b/proxy/http/HttpSM.cc @@ -4940,7 +4940,7 @@ HttpSM::handle_post_failure() // have the full post and it's deallocating the post buffers here enable_redirection = false; tunnel.deallocate_redirect_postdata_buffers(); - tunnel.reset(); + //tunnel.reset(); // Don't even think about doing keep-alive after this debacle t_state.client_info.keep_alive = HTTP_NO_KEEPALIVE; {code} Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988345#comment-13988345 ] Brian Geffon commented on TS-2497: -- this patch may not apply cleanly to 3.2.x (it's a pretty old verison) but you can just modify HttpSM.cc directly and comment out the tunnel.reset() in handle_post_failure(). I believe by not doing the reset and keeping references to the consumers and producers HttpSM::kill_this() will call deallocate_buffers() later and all should be fine. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984173#comment-13984173 ] Shaun McGinnity commented on TS-2497: - Hi Brian, yes, it is a simple test simulating a server closing the connection before reading the complete post. I am sending a post with Content-Length: 310. The server reads 10 bytes and then closes the connection. I have added some debug. On a successful post HttpSM::tunnel_handler_post_or_put deallocates the buffers created for the post tunnel: Apr 29 11:04:10 - INFO - HttpSM::attach_client_session 0x10d5b980 0x10d5d598 Apr 29 11:04:10 - INFO - free_MIOBuffer 0x5001350 Apr 29 11:04:10 - INFO - free_MIOBuffer 0x5001350 DONE Apr 29 11:04:10 - INFO - HttpSM::do_setup_post_tunnel 0x10d5b980 add_producer : buffer 0x5001368 Apr 29 11:04:10 - INFO - HttpTunnel::add_producer : 0x10d5d598 buffer_start = 0x5001368 read_buffer = 0x5001350 Apr 29 11:05:29 - INFO - HttpSM::tunnel_handler_post_or_put : HTTP_SM_POST_SUCCESS deallocate_buffers 0x10d5d598 Apr 29 11:05:29 - INFO - 0x10d5d598 :: deallocate_buffers Apr 29 11:05:29 - INFO - 0x10d5d598 :: deallocate_buffers : free producers[0].read_buffer 0x5001350 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001350 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001350 DONE Apr 29 11:05:29 - INFO - HttpTunnel::reset 0x10d5d598 Apr 29 11:05:29 - INFO - HttpSM::setup_server_transfer 0x10d5b980 add_producer : buffer 0x5001368 Apr 29 11:05:29 - INFO - HttpTunnel::add_producer : 0x10d5d598 buffer_start = 0x5001368 read_buffer = 0x5001350 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620 DONE Apr 29 11:05:29 - INFO - HttpSM::kill_this : deallocate_buffers 0x10d5d598 Apr 29 11:05:29 - INFO - 0x10d5d598 :: deallocate_buffers Apr 29 11:05:29 - INFO - 0x10d5d598 :: deallocate_buffers : free producers[0].read_buffer 0x5001350 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001350 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001350 DONE Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001710 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001710 DONE On a failed post the buffer used in the post tunnel is not freed: Apr 29 11:03:54 - INFO - HttpSM::attach_client_session 0x10d5b980 0x10d5d598 Apr 29 11:03:54 - INFO - free_MIOBuffer 0x5001440 Apr 29 11:03:54 - INFO - free_MIOBuffer 0x5001440 DONE Apr 29 11:03:54 - INFO - HttpSM::do_setup_post_tunnel 0x10d5b980 add_producer : buffer 0x5001458 Apr 29 11:03:54 - INFO - HttpTunnel::add_producer : 0x10d5d598 buffer_start = 0x5001458 read_buffer = 0x5001440 Apr 29 11:03:57 - INFO - HttpSM::tunnel_handler_post_or_put : HTTP_SM_POST_SERVER_FAIL 0x10d5d598 Apr 29 11:03:57 - INFO - HttpSM::handle_post_failure : 0x10d5d598 Apr 29 11:03:57 - INFO - ua_session-do_io_read Apr 29 11:03:57 - INFO - HttpTunnel::reset 0x10d5d598 Apr 29 11:03:57 - INFO - free_MIOBuffer 0x5001620 Apr 29 11:03:57 - INFO - free_MIOBuffer 0x5001620 DONE Apr 29 11:03:57 - INFO - HttpSM::setup_internal_transfer 0x10d5b980 add_producer : buffer 0x5001638 Apr 29 11:03:57 - INFO - HttpTunnel::add_producer : 0x10d5d598 buffer_start = 0x5001638 read_buffer = 0x5001620 Apr 29 11:03:57 - INFO - HttpSM::kill_this : deallocate_buffers 0x10d5d598 Apr 29 11:03:57 - INFO - 0x10d5d598 :: deallocate_buffers Apr 29 11:03:57 - INFO - 0x10d5d598 :: deallocate_buffers : free producers[0].read_buffer 0x5001620 Apr 29 11:03:57 - INFO - free_MIOBuffer 0x5001620 Apr 29 11:03:57 - INFO - free_MIOBuffer 0x5001620 DONE Apr 29 11:03:57 - INFO - free_MIOBuffer 0x5001710 Apr 29 11:03:57 - INFO - free_MIOBuffer 0x5001710 DONE Shaun Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984357#comment-13984357 ] James Peach commented on TS-2497: - [~shaunmcginnity] can you contribute the test to our regression suite? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984574#comment-13984574 ] Brian Geffon commented on TS-2497: -- [~shaunmcginnity] I'll try to reproduce and get back to you. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984585#comment-13984585 ] Brian Geffon commented on TS-2497: -- [~shaunmcginnity], something is a little unclear to me, the only difference between the two is: Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620 Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620 DONE Which happens after add_producer, can you provide a back trace or something for those two frees? It's not clear where they are coming from, also, are you sure this behavior didn't exist before? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984753#comment-13984753 ] Shaun McGinnity commented on TS-2497: - hi Brian, here is the debug with the buffer allocation locations recorded: Apr 29 21:13:29 - INFO - HttpSM::attach_client_session 0x10613980 0x10615598 Apr 29 21:13:29 - INFO - free_MIOBuffer 0x49b5530 memory/IOBuffer/HttpSM.cc:5575 -- setup_server_send_request Apr 29 21:13:29 - INFO - free_MIOBuffer 0x49b5530 DONE Apr 29 21:13:29 - INFO - HttpSM::do_setup_post_tunnel 0x10613980 add_producer : buffer 0x49b5548 Apr 29 21:13:29 - INFO - HttpTunnel::add_producer : 0x10615598 buffer_start = 0x49b5548 read_buffer = 0x49b5530 Apr 29 21:14:47 - INFO - HttpSM::tunnel_handler_post_or_put : HTTP_SM_POST_SUCCESS deallocate_buffers 0x10615598 Apr 29 21:14:47 - INFO - 0x10615598 :: deallocate_buffers Apr 29 21:14:47 - INFO - 0x10615598 :: deallocate_buffers : free producers[0].read_buffer 0x49b5530 Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5530 memory/IOBuffer/HttpSM.cc:5259 -- do_setup_post_tunnel Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5530 DONE Apr 29 21:14:47 - INFO - HttpTunnel::reset 0x10615598 Apr 29 21:14:47 - INFO - HttpSM::setup_server_transfer 0x10613980 add_producer : buffer 0x49b5548 Apr 29 21:14:47 - INFO - HttpTunnel::add_producer : 0x10615598 buffer_start = 0x49b5548 read_buffer = 0x49b5530 Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5620 memory/IOBuffer/HttpServerSession.cc:87-- HttpServerSession::new_connection Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5620 DONE Apr 29 21:14:47 - INFO - HttpSM::kill_this : deallocate_buffers 0x10615598 Apr 29 21:14:47 - INFO - 0x10615598 :: deallocate_buffers Apr 29 21:14:47 - INFO - 0x10615598 :: deallocate_buffers : free producers[0].read_buffer 0x49b5530 Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5530 memory/IOBuffer/HttpSM.cc:6189 -- setup_server_transfer Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5530 DONE Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5710 memory/IOBuffer/HttpClientSession.cc:237 -- HttpClientSession::new_connection Apr 29 21:14:47 - INFO - free_MIOBuffer 0x49b5710 DONE Post failure: Apr 29 21:32:08 - INFO - HttpSM::attach_client_session 0x11869980 0x1186b598 Apr 29 21:32:08 - INFO - free_MIOBuffer 0x5c37530 memory/IOBuffer/HttpSM.cc:5575 -- setup_server_send_request Apr 29 21:32:08 - INFO - free_MIOBuffer 0x5c37530 DONE Apr 29 21:32:08 - INFO - HttpSM::do_setup_post_tunnel 0x11869980 add_producer : buffer 0x5c37548 Apr 29 21:32:08 - INFO - HttpTunnel::add_producer : 0x1186b598 buffer_start = 0x5c37548 read_buffer = 0x5c37530 Apr 29 21:32:11 - INFO - HttpSM::tunnel_handler_post_or_put : HTTP_SM_POST_SERVER_FAIL 0x1186b598 Apr 29 21:32:11 - INFO - HttpSM::handle_post_failure : 0x1186b598 Apr 29 21:32:11 - INFO - ua_session-do_io_read Apr 29 21:32:11 - INFO - HttpTunnel::reset 0x1186b598 Apr 29 21:32:11 - INFO - free_MIOBuffer 0x5c37620 memory/IOBuffer/HttpServerSession.cc:87 -- HttpServerSession::new_connection Apr 29 21:32:11 - INFO - free_MIOBuffer 0x5c37620 DONE Apr 29 21:32:11 - INFO - HttpSM::setup_internal_transfer 0x11869980 add_producer : buffer 0x5c37638 Apr 29 21:32:11 - INFO - HttpTunnel::add_producer : 0x1186b598 buffer_start = 0x5c37638 read_buffer = 0x5c37620 Apr 29 21:32:11 - INFO - HttpSM::kill_this : deallocate_buffers 0x1186b598 Apr 29 21:32:11 - INFO - 0x1186b598 :: deallocate_buffers Apr 29 21:32:11 - INFO - 0x1186b598 :: deallocate_buffers : free producers[0].read_buffer 0x5c37620 Apr 29 21:32:11 - INFO - free_MIOBuffer 0x5c37620 memory/IOBuffer/HttpSM.cc:5889 -- HttpSM::setup_internal_transfer Apr 29 21:32:11 - INFO - free_MIOBuffer 0x5c37620 DONE Apr 29 21:32:11 - INFO - free_MIOBuffer 0x5c37710 memory/IOBuffer/HttpClientSession.cc:237 -- HttpClientSession::new_connection Apr 29 21:32:11 - INFO - free_MIOBuffer 0x5c37710 DONE There are 4 frees compared to 5 in the successful case. The buffer that is not freed is created in HttpSM::do_setup_post_tunnel. With the change removed i.e. call made to HttpTunnel deallocate_buffers in handle_post_failure then we get 5 frees. Apr 29 21:21:51 - INFO - HttpSM::attach_client_session 0x10613980 0x10615598 Apr 29 21:21:51 - INFO - free_MIOBuffer 0x49b5620 memory/IOBuffer/HttpSM.cc:5575 -- setup_server_send_request Apr 29 21:21:51 - INFO - free_MIOBuffer 0x49b5620 DONE Apr 29 21:21:51 - INFO - HttpSM::do_setup_post_tunnel 0x10613980 add_producer : buffer 0x49b5638 Apr 29 21:21:51 - INFO - HttpTunnel::add_producer : 0x10615598 buffer_start = 0x49b5638 read_buffer = 0x49b5620 Apr 29 21:21:54 - INFO - HttpSM::tunnel_handler_post_or_put : HTTP_SM_POST_SERVER_FAIL 0x10615598 Apr 29 21:21:54 - INFO - HttpSM::handle_post_failure : 0x10615598 Apr 29 21:21:54 - INFO - ua_session-do_io_read Apr 29 21:21:54 - INFO - 0x10615598 :: deallocate_buffers Apr 29 21:21:54 - INFO - 0x10615598
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983192#comment-13983192 ] Shaun McGinnity commented on TS-2497: - I've applied this patch to 3.2.X but I am seeing memory growth from unreleased IO Buffers. Is there another patch that needs to be applied along with this? I am running a test against a server that deliberately closes the connection before reading the full post content. Polling the memory statistics every 30s: Apr 28 17:41:10 - INFO - 0 | 0 | 32768 | memory/ioBufAllocator[8] Apr 28 17:41:40 - INFO - 6291456 |5472256 | 32768 | memory/ioBufAllocator[8] Apr 28 17:42:10 - INFO -11534336 | 10616832 | 32768 | memory/ioBufAllocator[8] Apr 28 17:42:39 - INFO -18874368 | 18022400 | 32768 | memory/ioBufAllocator[8] Apr 28 17:43:10 - INFO -24117248 | 22970368 | 32768 | memory/ioBufAllocator[8] Apr 28 17:43:40 - INFO -30408704 | 30048256 | 32768 | memory/ioBufAllocator[8] Apr 28 17:44:10 - INFO -36700160 | 35454976 | 32768 | memory/ioBufAllocator[8] Apr 28 17:44:40 - INFO -42991616 | 41451520 | 32768 | memory/ioBufAllocator[8] Apr 28 17:45:10 - INFO -49283072 | 48103424 | 32768 | memory/ioBufAllocator[8] Apr 28 17:45:40 - INFO -54525952 | 53444608 | 32768 | memory/ioBufAllocator[8] Apr 28 17:46:09 - INFO -61865984 | 60358656 | 32768 | memory/ioBufAllocator[8] Apr 28 17:46:40 - INFO -61865984 | 61472768 | 32768 | memory/ioBufAllocator[8] Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983370#comment-13983370 ] Brian Geffon commented on TS-2497: -- Hi Shaun, do you have a test that we can use to try to reproduce this, I have verified that the buffers were freed on 3.2.4, but it's possible I missed an edge case. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946956#comment-13946956 ] ASF GitHub Bot commented on TS-2497: Github user asfgit closed the pull request at: https://github.com/apache/trafficserver/pull/66 Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.1.3, 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945835#comment-13945835 ] ASF GitHub Bot commented on TS-2497: GitHub user jacksontj opened a pull request: https://github.com/apache/trafficserver/pull/66 Change default to keep_alive_post_out Keepalive post out used to cause issues, but we have since found and fixed the root cause (TS-2497) and we are now running this on in production. This entire feature should be cleaned out-- since it was created to workaround this problem that is now fixed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jacksontj/trafficserver master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/trafficserver/pull/66.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #66 commit e210eb0a35452215c3da9cd32ff916128c73f80f Author: Thomas Jackson jacksontj...@gmail.com Date: 2014-03-24T23:01:21Z Keepalive post out used to cause issues, but we have since found and fixed the root cause (TS-2497) and we are now running this on in production. This entire feature should be cleaned out-- since it was created to workaround this problem that is now fixed. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.1.3, 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877469#comment-13877469 ] ASF subversion and git services commented on TS-2497: - Commit 1de3269d83e915d0aa8b6f44e9e2a5f9cdf02727 in branch refs/heads/4.1.x from [~briang] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=1de3269 ] [TS-2497] Failed post results in tunnel buffers being returned to freelist prematurely, Reporter: Thomas Jackson thjac...@linkedin.com Conflicts: CHANGES Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.1.3, 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873814#comment-13873814 ] Brian Geffon commented on TS-2497: -- Closing the loop on this, yes, I verified that the buffers will correctly be deallocated at the appropriate time. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872731#comment-13872731 ] ASF subversion and git services commented on TS-2497: - Commit 5eab1b838c0bc449f6e0ed881da40e6e832c8ee5 in branch refs/heads/master from [~briang] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=5eab1b8 ] [TS-2497] Failed post results in tunnel buffers being returned to freelist prematurely, Reporter: Thomas Jackson thjac...@linkedin.com Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872954#comment-13872954 ] Leif Hedstrom commented on TS-2497: --- ALso, just naively looking at the patch, you wonder if we now might leak the memory? You verified that it is released elsewhere, right? Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872960#comment-13872960 ] Leif Hedstrom commented on TS-2497: --- Feel free to add it :) Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.1.5#6160)