[jira] [Assigned] (TS-3573) Connection leak caused by TS-3522

2015-04-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3573:
--

Assignee: Susan Hinrichs

> Connection leak caused by TS-3522
> -
>
> Key: TS-3573
> URL: https://issues.apache.org/jira/browse/TS-3573
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> [~degreane] and [~dcarlin] observed significant connection leaks in code with 
> the fix for TS-3522.  @sudheerv identified the problem in the comments of 
> that bug.  Need to clean up handling in the case were the write.vio._cont 
> might be NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3522) Seg Fault due to inactivity_cop after lost continutation from write_signal_and_update

2015-04-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3522:
--

Assignee: Susan Hinrichs  (was: Alan M. Carroll)

> Seg Fault due to inactivity_cop after lost continutation from 
> write_signal_and_update
> -
>
> Key: TS-3522
> URL: https://issues.apache.org/jira/browse/TS-3522
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
>Priority: Blocker
>  Labels: crash
> Fix For: 5.3.0, 6.0.0
>
> Attachments: inactivity_crash.diff, ts-3522-2.diff, ts-3522.diff
>
>
> (gdb) bt full
> #0  0x006ec51e in handleEvent (event=105, vc=0x2b1c900461e0) at 
> ../../iocore/eventsystem/I_Continuation.h:146
> No locals.
> #1  write_signal_and_update (event=105, vc=0x2b1c900461e0) at 
> UnixNetVConnection.cc:154
> No locals.
> #2  0x006ec837 in UnixNetVConnection::mainEvent (this=0x2b1c900461e0, 
> event=, e=) at 
> UnixNetVConnection.cc:1089
> wlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
> signal_event = 105
> next_activity_timeout_at = 0
> t = 0x0
> hlock = {m = {m_ptr = 0x1430c30}, lock_acquired = true}
> rlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
> signal_timeout = 0x2b1c6b9ddc30
> reader_cont = 0x0
> writer_cont = 0x2b1d28051d48
> signal_timeout_at = 0x2b1c900463f8
> #3  0x006e5061 in handleEvent (this=0x14519d0, event= out>, e=0x15792d0) at ../../iocore/eventsystem/I_Continuation.h:146
> No locals.
> #4  InactivityCop::check_inactivity (this=0x14519d0, event= out>, e=0x15792d0) at UnixNet.cc:80
> vc = 0x2b1c900461e0
> lock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
> now = 1428965697221995775
> nh = 0x2b1c695bea30
> __func__ = "check_inactivity"
> #5  0x0070f628 in handleEvent (this=0x2b1c695bb010, e=0x15792d0, 
> calling_code=2) at I_Continuation.h:146
> No locals.
> #6  EThread::process_event (this=0x2b1c695bb010, e=0x15792d0, calling_code=2) 
> at UnixEThread.cc:144
> c_temp = 0x14519d0
> lock = {m = {m_ptr = 0x1430c30}, lock_acquired = true}
> #7  0x007101c1 in EThread::execute (this=0x2b1c695bb010) at 
> UnixEThread.cc:223
> done_one = true
> e = 
> NegativeQueue = {> = {head = 0x1579330}, 
> tail = 0x1579330}
> next_time = 1428963217761407178
> #8  0x0070ea52 in spawn_thread_internal (a=0x144a330) at Thread.cc:88
> p = 0x144a330
> #9  0x00383e8079d1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #10 0x00383e0e88fd in clone () from /lib64/libc.so.6
> No symbol table info available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3573) Connection leak caused by TS-3522

2015-04-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3573:
---
Backport to Version: 5.3.0

> Connection leak caused by TS-3522
> -
>
> Key: TS-3573
> URL: https://issues.apache.org/jira/browse/TS-3573
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> [~degreane] and [~dcarlin] observed significant connection leaks in code with 
> the fix for TS-3522.  @sudheerv identified the problem in the comments of 
> that bug.  Need to clean up handling in the case were the write.vio._cont 
> might be NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3573) Connection leak caused by TS-3522

2015-04-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3573:
---
Description: [~degreane] and [~dcarlin] observed significant connection 
leaks in code with the fix for TS-3522.  [~sudheerv] identified the problem in 
the comments of that bug.  Need to clean up handling in the case were the 
write.vio._cont might be NULL.  (was: [~degreane] and [~dcarlin] observed 
significant connection leaks in code with the fix for TS-3522.  @sudheerv 
identified the problem in the comments of that bug.  Need to clean up handling 
in the case were the write.vio._cont might be NULL.)

> Connection leak caused by TS-3522
> -
>
> Key: TS-3573
> URL: https://issues.apache.org/jira/browse/TS-3573
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> [~degreane] and [~dcarlin] observed significant connection leaks in code with 
> the fix for TS-3522.  [~sudheerv] identified the problem in the comments of 
> that bug.  Need to clean up handling in the case were the write.vio._cont 
> might be NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3573) Connection leak caused by TS-3522

2015-04-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3573.

Resolution: Fixed

> Connection leak caused by TS-3522
> -
>
> Key: TS-3573
> URL: https://issues.apache.org/jira/browse/TS-3573
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> [~degreane] and [~dcarlin] observed significant connection leaks in code with 
> the fix for TS-3522.  [~sudheerv] identified the problem in the comments of 
> that bug.  Need to clean up handling in the case were the write.vio._cont 
> might be NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3578) Rearrange Client Session processing to give access to socket at SSN_CLOSE for all protocols

2015-05-01 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3578:
--

 Summary: Rearrange Client Session processing to give access to 
socket at SSN_CLOSE for all protocols
 Key: TS-3578
 URL: https://issues.apache.org/jira/browse/TS-3578
 Project: Traffic Server
  Issue Type: Improvement
  Components: HTTP, HTTP/2, SPDY
Reporter: Susan Hinrichs


I wanted to use the tcpinfo plugin look at the kernel measured RTT.  
Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
SPDY.  In the H2 and SPDY cases the underlying NetVC is a PluginVC and does not 
have access to the underlying socket.

With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
hook would go off, the netVC had already been closed.

I propose making the following changes.

1.  Make SpdyClientSession a subclass of ProxyClientSession, so SSN_CLOSE_HOOK 
can be triggered there too.

2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is called 
before the net vc is closed.

I've made both changes on my dev build, and in my simple tests, the tcpinfo 
plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.

Since this involves rearranging some of the bowels of the protocol processing, 
I'll set up a pull request for broader review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3578) Rearrange Client Session processing to give access to socket at SSN_CLOSE for all protocols

2015-05-01 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3578:
--

Assignee: Susan Hinrichs

> Rearrange Client Session processing to give access to socket at SSN_CLOSE for 
> all protocols
> ---
>
> Key: TS-3578
> URL: https://issues.apache.org/jira/browse/TS-3578
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> I wanted to use the tcpinfo plugin look at the kernel measured RTT.  
> Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
> SPDY.  In the H2 and SPDY cases the underlying NetVC is a PluginVC and does 
> not have access to the underlying socket.
> With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
> hook would go off, the netVC had already been closed.
> I propose making the following changes.
> 1.  Make SpdyClientSession a subclass of ProxyClientSession, so 
> SSN_CLOSE_HOOK can be triggered there too.
> 2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is 
> called before the net vc is closed.
> I've made both changes on my dev build, and in my simple tests, the tcpinfo 
> plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.
> Since this involves rearranging some of the bowels of the protocol 
> processing, I'll set up a pull request for broader review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3578) Rearrange Client Session processing to give access to socket at SSN_CLOSE for all protocols

2015-05-01 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3578:
---
Description: 
I wanted to use the tcpinfo plugin to look at the kernel measured RTT.  
Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
SPDY.  In the H2 and SPDY cases, the underlying NetVC is a PluginVC and does 
not have access to the underlying socket.

With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
hook would go off, the netVC had already been closed.

I propose making the following changes.

1.  Make SpdyClientSession a subclass of ProxyClientSession, so SSN_CLOSE_HOOK 
can be triggered there too.

2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is called 
before the net vc is closed.

I've made both changes on my dev build, and in my simple tests, the tcpinfo 
plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.

Since this involves rearranging some of the bowels of the protocol processing, 
I'll set up a pull request for broader review.

  was:
I wanted to use the tcpinfo plugin look at the kernel measured RTT.  
Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
SPDY.  In the H2 and SPDY cases the underlying NetVC is a PluginVC and does not 
have access to the underlying socket.

With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
hook would go off, the netVC had already been closed.

I propose making the following changes.

1.  Make SpdyClientSession a subclass of ProxyClientSession, so SSN_CLOSE_HOOK 
can be triggered there too.

2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is called 
before the net vc is closed.

I've made both changes on my dev build, and in my simple tests, the tcpinfo 
plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.

Since this involves rearranging some of the bowels of the protocol processing, 
I'll set up a pull request for broader review.


> Rearrange Client Session processing to give access to socket at SSN_CLOSE for 
> all protocols
> ---
>
> Key: TS-3578
> URL: https://issues.apache.org/jira/browse/TS-3578
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> I wanted to use the tcpinfo plugin to look at the kernel measured RTT.  
> Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
> SPDY.  In the H2 and SPDY cases, the underlying NetVC is a PluginVC and does 
> not have access to the underlying socket.
> With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
> hook would go off, the netVC had already been closed.
> I propose making the following changes.
> 1.  Make SpdyClientSession a subclass of ProxyClientSession, so 
> SSN_CLOSE_HOOK can be triggered there too.
> 2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is 
> called before the net vc is closed.
> I've made both changes on my dev build, and in my simple tests, the tcpinfo 
> plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.
> Since this involves rearranging some of the bowels of the protocol 
> processing, I'll set up a pull request for broader review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3554.

Resolution: Fixed

Closing this issue.  It did resolve some memory leaks. [~reveller] is still 
seeing some memory growth.  We will open a new issue to track the remaining 
memory issues.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3554-53-2.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reopened TS-3554:


Nevermind.  Looks like we are holding off this fix until 5.3.1.  Will keep this 
issue open.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3554-53-2.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3584) SPDY and H2 requests should not trigger connection keep-alive processing

2015-05-05 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3584:
--

 Summary: SPDY and H2 requests should not trigger connection 
keep-alive processing
 Key: TS-3584
 URL: https://issues.apache.org/jira/browse/TS-3584
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP, HTTP/2, SPDY
Reporter: Susan Hinrichs


For HTTP 1.1 the default value for the Connection header is keep-alive.  So all 
requests coming from SPDY and H2 dutifully set up the HttpClientSession for 
potential future reuse.

However, SPDY and H2 will create a new FetchSM request (and related 
HttpClientSession) for every HTTP request, so the HttpClientSession will never 
be reused.

This results in unnecessary complexity and inefficiency.  I'm seeing some 
crashes in SPDY start up that could be related to VC freeing race conditions.  
I'd like to tidy this up to remove one element from the equation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3584) SPDY and H2 requests should not trigger connection keep-alive processing

2015-05-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3584:
--

Assignee: Susan Hinrichs

> SPDY and H2 requests should not trigger connection keep-alive processing
> 
>
> Key: TS-3584
> URL: https://issues.apache.org/jira/browse/TS-3584
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> For HTTP 1.1 the default value for the Connection header is keep-alive.  So 
> all requests coming from SPDY and H2 dutifully set up the HttpClientSession 
> for potential future reuse.
> However, SPDY and H2 will create a new FetchSM request (and related 
> HttpClientSession) for every HTTP request, so the HttpClientSession will 
> never be reused.
> This results in unnecessary complexity and inefficiency.  I'm seeing some 
> crashes in SPDY start up that could be related to VC freeing race conditions. 
>  I'd like to tidy this up to remove one element from the equation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3584) SPDY and H2 requests should not trigger connection keep-alive processing

2015-05-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3584.

Resolution: Fixed

I explicitly added the "Connection: close" header for both SPDY and H2.  We 
have been exercising the SPDY case in production.  I tested the H2 in my dev 
environment.

This seemed like the least code change.  We may want to revisit this solution 
as we better understand the relations between SPDY/H2 and underlying HTTP/1.x 
processing.

> SPDY and H2 requests should not trigger connection keep-alive processing
> 
>
> Key: TS-3584
> URL: https://issues.apache.org/jira/browse/TS-3584
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> For HTTP 1.1 the default value for the Connection header is keep-alive.  So 
> all requests coming from SPDY and H2 dutifully set up the HttpClientSession 
> for potential future reuse.
> However, SPDY and H2 will create a new FetchSM request (and related 
> HttpClientSession) for every HTTP request, so the HttpClientSession will 
> never be reused.
> This results in unnecessary complexity and inefficiency.  I'm seeing some 
> crashes in SPDY start up that could be related to VC freeing race conditions. 
>  I'd like to tidy this up to remove one element from the equation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3578) Rearrange Client Session processing to give access to socket at SSN_CLOSE for all protocols

2015-05-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540472#comment-14540472
 ] 

Susan Hinrichs commented on TS-3578:


This rearrangement has worked well.  For SSN_CLOSE, the hook gets triggered for 
both the HttpClientSession and the SpdyClientSession or Http2ClientSession.  We 
only log for the Spdy/Http2 Client session since that is the only session with 
access to the netvc.

For SSN_START, the hook is only called on the HttpClientSession, so no logging 
is performed in the Http2/Spdy case since the HttpClientSession does not have 
access to the netvc.

This brings up a broader issue of when should the session start and session 
close hooks be called in the Http2/Spdy case. 
  * Only on the Http2/Spdy client sessions
  * On both the Http2/Spdy and HttpClientSessions.  In the case of the 
HttpClientSession, this would be with the same frequency as the TXN_START and 
TXN_CLOSE hooks.
  * Only on the HttpClientSession.  In this case we would need to figure out 
how to get access to the netvc from the PluginVC.  Plus the SSN hooks would not 
correspond to the real start and end of the user agent network connection.

At this point, I prefer the option of calling the SSN hooks only on the 
Http2/Spdy client sessions.  But I need feedback.  Do people rely on SSN hooks 
having access to transaction data?

Of course in the case of a native HTTP/1.x connection, nothing should change.  
The SSN and TXN hooks will be invoked against the HttpClientSession.

> Rearrange Client Session processing to give access to socket at SSN_CLOSE for 
> all protocols
> ---
>
> Key: TS-3578
> URL: https://issues.apache.org/jira/browse/TS-3578
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: review
> Fix For: 6.0.0
>
>
> I wanted to use the tcpinfo plugin to look at the kernel measured RTT.  
> Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
> SPDY.  In the H2 and SPDY cases, the underlying NetVC is a PluginVC and does 
> not have access to the underlying socket.
> With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
> hook would go off, the netVC had already been closed.
> I propose making the following changes.
> 1.  Make SpdyClientSession a subclass of ProxyClientSession, so 
> SSN_CLOSE_HOOK can be triggered there too.
> 2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is 
> called before the net vc is closed.
> I've made both changes on my dev build, and in my simple tests, the tcpinfo 
> plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.
> Since this involves rearranging some of the bowels of the protocol 
> processing, I'll set up a pull request for broader review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541954#comment-14541954
 ] 

Susan Hinrichs commented on TS-3597:


Looks like something got messed up in the transition to openssl 1.0.2 support.  
At least that is what the identified commit added.  

Are you running against openssl 1.0.1 or 1.0.2?  I would think that not much 
would change with this commit if you were running with 1.0.1.

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541961#comment-14541961
 ] 

Susan Hinrichs commented on TS-3597:


Also adjusting the ssl oriented callbacks in this commit.  Are you testing with 
any plugins that invoke SSL callbacks?

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541966#comment-14541966
 ] 

Susan Hinrichs commented on TS-3597:


Nevermind.  I'm reproducing.  For my environment disabling accept thread was 
the key.  I'm currently running against openssl 1.1 and already had certs 
without dest_ip set.  Should be able to track down the issue now!

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542796#comment-14542796
 ] 

Susan Hinrichs commented on TS-3597:


Ok, I spoke too soon about being able to reproduce the problem.  In my dev 
environment, I get no TCP handshake completion if I turn off the accept_thread. 
 In reverse proxy mode, I get an assert in UnixNetVConnection::set_enabled 
called from do_io_read because the lock is not held.

It is possible this is what you are seeing in production, since this isn't a 
release assert and you are seeing timing issues since no lock is held during 
the accept processing.  Or this is something completely different and unique to 
my environment.  For the record, the commit identified has nothing to do with 
the TCP accept processing.

For your reading pleasure, here is my stack. Will dig more tomorrow.
{code}
#0  0x00351e4328a5 in raise () from /lib64/libc.so.6
#1  0x00351e434085 in abort () from /lib64/libc.so.6
#2  0x77dd9c51 in ink_die_die_die () at ink_error.cc:43
#3  0x77dd9d08 in ink_fatal_va(const char *, typedef __va_list_tag 
__va_list_tag *) (fmt=0x77deaa58 "%s:%d: failed assert `%s`", 
ap=0x7fffdca0)
at ink_error.cc:65
#4  0x77dd9dd9 in ink_fatal (
message_format=0x77deaa58 "%s:%d: failed assert `%s`")
at ink_error.cc:73
#5  0x77dd7876 in _ink_assert (
expression=0x83a988 "vio->mutex->thread_holding == this_ethread() && 
thread", file=0x83a6be "UnixNetVConnection.cc", line=859) at ink_assert.cc:37
#6  0x0078c4bd in UnixNetVConnection::set_enabled (this=0x3c0ed20, 
vio=0x3c0ee40) at UnixNetVConnection.cc:859
#7  0x0078bbb4 in UnixNetVConnection::reenable (this=0x3c0ed20, 
vio=0x3c0ee40) at UnixNetVConnection.cc:753
#8  0x0050d229 in VIO::reenable (this=0x3c0ee40)
at ../iocore/eventsystem/P_VIO.h:112
#9  0x0078b25c in UnixNetVConnection::do_io_read (this=0x3c0ed20, 
c=0x24a1180, nbytes=4096, buf=0x3357620) at UnixNetVConnection.cc:598
#10 0x005594bd in ProtocolProbeSessionAccept::mainEvent (
this=0x24a92c0, event=202, data=0x3c0ed20)
at ProtocolProbeSessionAccept.cc:148
#11 0x0050d1d6 in Continuation::handleEvent (this=0x24a92c0, 
event=202, data=0x3c0ed20) at ../iocore/eventsystem/I_Continuation.h:145
#12 0x007863e8 in NetAccept::acceptFastEvent (this=0x2480960, event=5, 
ep=0x1ee5160) at UnixNetAccept.cc:465
#13 0x0050d1d6 in Continuation::handleEvent (this=0x2480960, event=5, 
data=0x1ee5160) at ../iocore/eventsystem/I_Continuation.h:145
#14 0x007abcb2 in EThread::process_event (this=0x1bb, e=0x1ee5160, 
calling_code=5) at UnixEThread.cc:128
---Type  to continue, or q  to quit---
#15 0x007ac2d3 in EThread::execute (this=0x1bb)
at UnixEThread.cc:252
#16 0x0054097e in main (argv=0x7fffe398) at Main.cc:1840
{code}

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3603) Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads are disabled

2015-05-14 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3603:
--

 Summary: Debug Assert occurs in UnixNetVConnection::set_enabled 
when accept_threads are disabled
 Key: TS-3603
 URL: https://issues.apache.org/jira/browse/TS-3603
 Project: Traffic Server
  Issue Type: Bug
  Components: Network
Reporter: Susan Hinrichs


This was found while tracking down TS-3597.  The assert stack is in a comment 
on that bug.

When you don't have a dedicated assert thread, the mutex is not locked before 
going into the do_io_read to process the accept event.  In the dedicated thread 
case, you end up exercising UnixNetVConnection::acceptEvent which does grab the 
mutex.

May be a relatively harmless error.  Since this is a newly created VC, there 
should be no race conditions on it.  But violating locking assumptions seem 
like a really bad idea.  Especially since grabbing a lock on a supposedly 
uncontended object should be cheap.

A 5.3.x patch is attached to this bug which solves the problem on my build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3603) Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads are disabled

2015-05-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3603:
---
Attachment: TS-3603.diff

> Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads 
> are disabled
> ---
>
> Key: TS-3603
> URL: https://issues.apache.org/jira/browse/TS-3603
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
> Attachments: TS-3603.diff
>
>
> This was found while tracking down TS-3597.  The assert stack is in a comment 
> on that bug.
> When you don't have a dedicated assert thread, the mutex is not locked before 
> going into the do_io_read to process the accept event.  In the dedicated 
> thread case, you end up exercising UnixNetVConnection::acceptEvent which does 
> grab the mutex.
> May be a relatively harmless error.  Since this is a newly created VC, there 
> should be no race conditions on it.  But violating locking assumptions seem 
> like a really bad idea.  Especially since grabbing a lock on a supposedly 
> uncontended object should be cheap.
> A 5.3.x patch is attached to this bug which solves the problem on my build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-14 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543823#comment-14543823
 ] 

Susan Hinrichs commented on TS-3597:


Filed TS-3603 to track the missing lock assert in the no accept thread case.  I 
kind of doubt that that is causing this problem, but there is a 5.3.x patch on 
TS-3603.  [~zwoop] or [~gancho] could you give it a try in your environment and 
see if it changes anything?

In the meantime, I'll push on and look at the dst_ip=* issues which seem to 
affect the expression of the problem.

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-14 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543863#comment-14543863
 ] 

Susan Hinrichs commented on TS-3597:


Nevermind on trying the TS-3603 patch just yet.   Makes http traffic work for 
me.  But still no joy for https.  Syn's still not getting through.

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3604) Transparent mode does not work when accept_threads set to 0

2015-05-14 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3604:
--

 Summary: Transparent mode does not work when accept_threads set to 0
 Key: TS-3604
 URL: https://issues.apache.org/jira/browse/TS-3604
 Project: Traffic Server
  Issue Type: Bug
  Components: Network
Reporter: Susan Hinrichs


If you have transparency enabled on your port and you disable the dedicated 
accept_threads, the TCP connection does not complete for HTTP and HTTPS 
traffic.  Enabling the accept_thread causes traffic to flow again.  Traffic via 
a remap rule also works (once you apply the fix to TS-3603).

Looking at the packet capture, the client sends SYN's but they are never 
responded to.  It appears that the listen doesn't really get set up in this 
case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3604) Transparent mode does not work when accept_threads set to 0

2015-05-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3604:
---
Assignee: Alan M. Carroll

> Transparent mode does not work when accept_threads set to 0
> ---
>
> Key: TS-3604
> URL: https://issues.apache.org/jira/browse/TS-3604
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
>Assignee: Alan M. Carroll
>
> If you have transparency enabled on your port and you disable the dedicated 
> accept_threads, the TCP connection does not complete for HTTP and HTTPS 
> traffic.  Enabling the accept_thread causes traffic to flow again.  Traffic 
> via a remap rule also works (once you apply the fix to TS-3603).
> Looking at the packet capture, the client sends SYN's but they are never 
> responded to.  It appears that the listen doesn't really get set up in this 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3603) Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads are disabled

2015-05-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3603:
--

Assignee: Susan Hinrichs

> Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads 
> are disabled
> ---
>
> Key: TS-3603
> URL: https://issues.apache.org/jira/browse/TS-3603
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Attachments: TS-3603.diff
>
>
> This was found while tracking down TS-3597.  The assert stack is in a comment 
> on that bug.
> When you don't have a dedicated assert thread, the mutex is not locked before 
> going into the do_io_read to process the accept event.  In the dedicated 
> thread case, you end up exercising UnixNetVConnection::acceptEvent which does 
> grab the mutex.
> May be a relatively harmless error.  Since this is a newly created VC, there 
> should be no race conditions on it.  But violating locking assumptions seem 
> like a really bad idea.  Especially since grabbing a lock on a supposedly 
> uncontended object should be cheap.
> A 5.3.x patch is attached to this bug which solves the problem on my build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-14 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544361#comment-14544361
 ] 

Susan Hinrichs commented on TS-3597:


I'm seeing an issue in the no accept thread case where the value of 
sslHandshakeHookState is not HANDSHAKE_HOOKS_PRE in the sni callback, even 
though this appears to be the first time through the callback for that vc.

It looks like sometimes the VC pointer is reused (reallocated) without having 
its values returned to the initial value.

This means that the correct cert is not selected.  Instead the default 
certificate is used.  Not clear this is the error case that you are seeing, but 
certainly it is a bad indicator.  Must take a break for now.  Will press on 
later this evening.

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3597:
---
Attachment: TS-3597.diff

Found the problem.  We were assuming that the SSLNetVConnection object was 
initialized when it was allocated.  In the case of the accept thread this was 
the case because the global allocated was used.  But without the accept thread, 
the thread allocator is used and THREAD_ALLOC was called instead of 
THREAD_ALLOC_INIT. If the object came off the free list, the 
sslHandshakeHookState variable was not in the initial state so the certificate 
selection did not occur as designed.

TS-3597.diff includes the fix for this and the related missing lock (tracked as 
a separate bug).  Will work on getting both fixes pushed.

> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
> Attachments: TS-3597.diff
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3603) Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads are disabled

2015-05-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3603.

Resolution: Fixed

> Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads 
> are disabled
> ---
>
> Key: TS-3603
> URL: https://issues.apache.org/jira/browse/TS-3603
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Attachments: TS-3603.diff
>
>
> This was found while tracking down TS-3597.  The assert stack is in a comment 
> on that bug.
> When you don't have a dedicated assert thread, the mutex is not locked before 
> going into the do_io_read to process the accept event.  In the dedicated 
> thread case, you end up exercising UnixNetVConnection::acceptEvent which does 
> grab the mutex.
> May be a relatively harmless error.  Since this is a newly created VC, there 
> should be no race conditions on it.  But violating locking assumptions seem 
> like a really bad idea.  Especially since grabbing a lock on a supposedly 
> uncontended object should be cheap.
> A 5.3.x patch is attached to this bug which solves the problem on my build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3603) Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads are disabled

2015-05-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3603:
---
Backport to Version: 5.3.1

> Debug Assert occurs in UnixNetVConnection::set_enabled when accept_threads 
> are disabled
> ---
>
> Key: TS-3603
> URL: https://issues.apache.org/jira/browse/TS-3603
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Attachments: TS-3603.diff
>
>
> This was found while tracking down TS-3597.  The assert stack is in a comment 
> on that bug.
> When you don't have a dedicated assert thread, the mutex is not locked before 
> going into the do_io_read to process the accept event.  In the dedicated 
> thread case, you end up exercising UnixNetVConnection::acceptEvent which does 
> grab the mutex.
> May be a relatively harmless error.  Since this is a newly created VC, there 
> should be no race conditions on it.  But violating locking assumptions seem 
> like a really bad idea.  Especially since grabbing a lock on a supposedly 
> uncontended object should be cheap.
> A 5.3.x patch is attached to this bug which solves the problem on my build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3597) TLS can fail accept / handshake since commit 2a8bb593fd

2015-05-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3597.

Resolution: Fixed

Sorry I accidentally committed this fix with the fix for TS-3603 in commit 
ef467a2be79fc962ae0ec042ef9f6e871d3a775f

Both should be applied together in any case.


> TLS can fail accept / handshake since commit 2a8bb593fd
> ---
>
> Key: TS-3597
> URL: https://issues.apache.org/jira/browse/TS-3597
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
> Fix For: 6.0.0
>
> Attachments: TS-3597.diff
>
>
> At least under certain conditions (slightly unclear,but possible a race with 
> multiple NUMA nodes), we fail to accept / TLS handshake. I've tracked this 
> down to the commit from 2a8bb593fdd7ca9125efad76e27f3f17f5bca794.
> The commit prior to this does not expose the problem. [~gancho] also 
> discovered that this problem is only triggered when accept thread is off (0).
> Also from [~gancho], when this reproduces, a command like e.g. this will fail 
> the handshake completely (no ciphers):
> {code}
> openssl s_client -connect 10.1.2.3:443 -tls1 -servername some.host.com
> {code}
> Also, since this only happens with accept thread off (0), which implies 
> accept on every ET_NET thread, maybe there's some sort of race condition 
> going on here? That's just a wild speculation though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3604) Transparent mode does not work when accept_threads set to 0

2015-05-15 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545733#comment-14545733
 ] 

Susan Hinrichs commented on TS-3604:


Alan found the issue.  We are not passing the transparent option to the 
do_listen in the code paths used by the combined accept and regular threads.  
Propagating that information.

> Transparent mode does not work when accept_threads set to 0
> ---
>
> Key: TS-3604
> URL: https://issues.apache.org/jira/browse/TS-3604
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
>Assignee: Alan M. Carroll
>
> If you have transparency enabled on your port and you disable the dedicated 
> accept_threads, the TCP connection does not complete for HTTP and HTTPS 
> traffic.  Enabling the accept_thread causes traffic to flow again.  Traffic 
> via a remap rule also works (once you apply the fix to TS-3603).
> Looking at the packet capture, the client sends SYN's but they are never 
> responded to.  It appears that the listen doesn't really get set up in this 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3604) Transparent mode does not work when accept_threads set to 0

2015-05-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3604.
--
Resolution: Fixed

> Transparent mode does not work when accept_threads set to 0
> ---
>
> Key: TS-3604
> URL: https://issues.apache.org/jira/browse/TS-3604
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Susan Hinrichs
>Assignee: Alan M. Carroll
>
> If you have transparency enabled on your port and you disable the dedicated 
> accept_threads, the TCP connection does not complete for HTTP and HTTPS 
> traffic.  Enabling the accept_thread causes traffic to flow again.  Traffic 
> via a remap rule also works (once you apply the fix to TS-3603).
> Looking at the packet capture, the client sends SYN's but they are never 
> responded to.  It appears that the listen doesn't really get set up in this 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3596) TSHttpTxnPluginTagGet() returns "fetchSM" over H2

2015-05-15 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545757#comment-14545757
 ] 

Susan Hinrichs commented on TS-3596:


I don't think there is one.  [~Eric Schwartz ] and 
[~Francios Pesce ] are working on this issue for internal 
access.  Probably better revealed as a TSHttpSsnPluginTagGet() if such a thing 
exists.

> TSHttpTxnPluginTagGet() returns "fetchSM" over H2
> -
>
> Key: TS-3596
> URL: https://issues.apache.org/jira/browse/TS-3596
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Scott Beardsley
> Fix For: 6.0.0
>
>
> This should probably return something else, right? Maybe "HTTP2" instead? We 
> would like a way to identify H2 requests from SPDY and/or H1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3596) TSHttpTxnPluginTagGet() returns "fetchSM" over H2

2015-05-15 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545757#comment-14545757
 ] 

Susan Hinrichs edited comment on TS-3596 at 5/15/15 4:35 PM:
-

I don't think there is one. Eric Schwartz and Francious Pesce  are working on 
this issue for internal core access to this information.  Probably better 
revealed as a TSHttpSsnPluginTagGet() if such a thing exists.


was (Author: shinrich):
I don't think there is one.  [~Eric Schwartz ] and 
[~Francios Pesce ] are working on this issue for internal 
access.  Probably better revealed as a TSHttpSsnPluginTagGet() if such a thing 
exists.

> TSHttpTxnPluginTagGet() returns "fetchSM" over H2
> -
>
> Key: TS-3596
> URL: https://issues.apache.org/jira/browse/TS-3596
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Scott Beardsley
> Fix For: 6.0.0
>
>
> This should probably return something else, right? Maybe "HTTP2" instead? We 
> would like a way to identify H2 requests from SPDY and/or H1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3610) Certificate loading reads certificates from file multiple times

2015-05-16 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3610:
--

Assignee: Susan Hinrichs

> Certificate loading reads certificates from file multiple times
> ---
>
> Key: TS-3610
> URL: https://issues.apache.org/jira/browse/TS-3610
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> It was observed that during the ssl_multicert.config file loading the 
> certificates were being loaded from file multiple times.  For the code on 
> main, each file could be loaded 3 times: once for validity checking, once to 
> create the SSL_CTX, and once to set up OCSP stapling.
> With a minor rearrangement, we can load in the certificate once and pass the 
> X509 structure around as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3610) Certificate loading reads certificates from file multiple times

2015-05-16 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3610:
--

 Summary: Certificate loading reads certificates from file multiple 
times
 Key: TS-3610
 URL: https://issues.apache.org/jira/browse/TS-3610
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs


It was observed that during the ssl_multicert.config file loading the 
certificates were being loaded from file multiple times.  For the code on main, 
each file could be loaded 3 times: once for validity checking, once to create 
the SSL_CTX, and once to set up OCSP stapling.

With a minor rearrangement, we can load in the certificate once and pass the 
X509 structure around as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3610) Certificate loading reads certificates from file multiple times

2015-05-16 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546869#comment-14546869
 ] 

Susan Hinrichs commented on TS-3610:


Not quite there.  The SSLInitServerContext() reads the certifcate files too.  
This is case, allows for comma separated list of certificate files which the 
other use cases don't allow for.  Need to straighten that out as well.

> Certificate loading reads certificates from file multiple times
> ---
>
> Key: TS-3610
> URL: https://issues.apache.org/jira/browse/TS-3610
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> It was observed that during the ssl_multicert.config file loading the 
> certificates were being loaded from file multiple times.  For the code on 
> main, each file could be loaded 3 times: once for validity checking, once to 
> create the SSL_CTX, and once to set up OCSP stapling.
> With a minor rearrangement, we can load in the certificate once and pass the 
> X509 structure around as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-17 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547230#comment-14547230
 ] 

Susan Hinrichs commented on TS-3554:


I think I found at least the remaining memory leak.  On reload,  
SSLConfigParams::initialize is called which allocates the session_cache member 
variable.  Based on my analysis, the original value of the field is not freed.  
Ultimately, we will likely only want to reinitialize the session table if the 
relative parameters have changed.  And probably put a reference count on the 
data structures, so users of the original table do not start accessing bad data 
when it is freed. [~briang] can you take a quick look and let me know if my 
analysis seems right?

In the short term [~reveller] could you try setting 
proxy.config.ssl.session_cache to 0 in records.config? This disables all 
session caching, so this probably isn't great for a production environment, but 
it would be useful for testing memory usage.  As it stands, the code will still 
allocate the base table on each reload (14KB), but no sessions should be 
populated.  I'll attach a patch that removes that allocation in the disabled 
case too.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3554-53-2.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3554:
---
Attachment: limit_session_cache_alloc.diff

limit_session_cache_alloc.diff only allocates the session_cache if the 
session_cache config is set to 2 (use ATS implementation).

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3518) Multiple ssl_ca_name's in ssl_multicert breaks all intermediate CAs

2015-05-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3518.

Resolution: Fixed

> Multiple ssl_ca_name's in ssl_multicert breaks all intermediate CAs
> ---
>
> Key: TS-3518
> URL: https://issues.apache.org/jira/browse/TS-3518
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Thomas Jackson
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> In ssl_multicert you can specify multiple ssl_cert_name and ssl_key_name, 
> such as:
> {code}
> dest_ip=127.0.0.2 
> ssl_cert_name=www.example.com.cert,www.example.com.ecdsa.cert 
> ssl_key_name=www.example.com.key,www.example.com.ecdsa.key
> {code}
> Sometimes you need to specify an intermediate CA (a lot of the time TBH), 
> which from the docs sounds like you should be able to do:
> {code}
> dest_ip=127.0.0.2 
> ssl_cert_name=www.example.com.cert,www.example.com.ecdsa.cert 
> ssl_key_name=www.example.com.key,www.example.com.ecdsa.key 
> ssl_ca_name=RSA_intermediate,ECDSA_intermediate
> {code}
> Since you can specify ssl_ca_name for single certs, similar to cert_name and 
> key_name, but this currently doesn't work. In addition to not working for 
> ECDSA this seems to actually break *all* intermediate CAs from being served. 
> I've created a test case (https://github.com/apache/trafficserver/pull/186) 
> which shows the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3518) Multiple ssl_ca_name's in ssl_multicert breaks all intermediate CAs

2015-05-17 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547252#comment-14547252
 ] 

Susan Hinrichs commented on TS-3518:


Fixed this issue while addressing TS-3610.

> Multiple ssl_ca_name's in ssl_multicert breaks all intermediate CAs
> ---
>
> Key: TS-3518
> URL: https://issues.apache.org/jira/browse/TS-3518
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Thomas Jackson
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> In ssl_multicert you can specify multiple ssl_cert_name and ssl_key_name, 
> such as:
> {code}
> dest_ip=127.0.0.2 
> ssl_cert_name=www.example.com.cert,www.example.com.ecdsa.cert 
> ssl_key_name=www.example.com.key,www.example.com.ecdsa.key
> {code}
> Sometimes you need to specify an intermediate CA (a lot of the time TBH), 
> which from the docs sounds like you should be able to do:
> {code}
> dest_ip=127.0.0.2 
> ssl_cert_name=www.example.com.cert,www.example.com.ecdsa.cert 
> ssl_key_name=www.example.com.key,www.example.com.ecdsa.key 
> ssl_ca_name=RSA_intermediate,ECDSA_intermediate
> {code}
> Since you can specify ssl_ca_name for single certs, similar to cert_name and 
> key_name, but this currently doesn't work. In addition to not working for 
> ECDSA this seems to actually break *all* intermediate CAs from being served. 
> I've created a test case (https://github.com/apache/trafficserver/pull/186) 
> which shows the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3610) Certificate loading reads certificates from file multiple times

2015-05-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3610.

Resolution: Fixed

> Certificate loading reads certificates from file multiple times
> ---
>
> Key: TS-3610
> URL: https://issues.apache.org/jira/browse/TS-3610
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> It was observed that during the ssl_multicert.config file loading the 
> certificates were being loaded from file multiple times.  For the code on 
> main, each file could be loaded 3 times: once for validity checking, once to 
> create the SSL_CTX, and once to set up OCSP stapling.
> With a minor rearrangement, we can load in the certificate once and pass the 
> X509 structure around as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3599) Multiple dest_ip=* directives has unpredictable behavior in ssl_multicert.config

2015-05-17 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547260#comment-14547260
 ] 

Susan Hinrichs commented on TS-3599:


I think some of what you were seeing was due to flaws from TS-3597.  With that 
bug, the SNI logic was not triggering correctly if accept_threads were 
disabled.  In that case, the default context was always used.

The ssl_multicert.config loader assigns entries with dest_ip=* to the 
default_context.  If there are multiple, only one wins (last in the file I'd 
assume based on looking at the code).

The default context is used if there is no SNI (but the SNI callback in ATS 
should always be called as long as there is not a bug) or if nothing better 
matched on name or IP.

Should we issue a warning or error if there are multiple dest_ip=*?  Or other 
IP conflicts?

> Multiple dest_ip=* directives has unpredictable behavior in 
> ssl_multicert.config
> 
>
> Key: TS-3599
> URL: https://issues.apache.org/jira/browse/TS-3599
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Leif Hedstrom
> Fix For: 6.0.0
>
>
> If I create an ssl_multicert.config with e.g.
> {code}
> dest_ip=* ssl_key_name=foo.key ssl_cert_name=foo.crt
> dest_ip=* ssl_key_name=bar.key ssl_cert_name=bar.crt
> {code}
> Then even with an SNI enabled client, which uses SNI in the TLS handshake, 
> ATS seems to arbitrarily pick a cert. This seems nonsensical, I get the 
> impression that dest_ip= would only take effect if there is no SNI 
> in the handshake?
> I understand that more than one dest_ip=* is perhaps not a valid 
> configuration, but in that case we ought to either error out (fail to start), 
> or at least produce a really loud warning.  Clearly making it fail like this 
> seems unreasonable :).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3606) Add configuration option to allow server context per thread

2015-05-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3606:
---
Attachment: TS-3606-hack-for-53.diff

TS-3606-hack-for-53.diff contains diffs against the 5.3.x branch that loads the 
ssl_multicert file for each thread.  Likely not the right long term solution.  
And this hack does not handle reloads of ssl_multicert.config, but it should be 
functional enough to run some tests to give you an idea of whether moving 
contexts to per thread is worth the effort in reduced lock contention.

If so, then we can figure out the right solution.

> Add configuration option to allow server context per thread
> ---
>
> Key: TS-3606
> URL: https://issues.apache.org/jira/browse/TS-3606
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: SSL
>Reporter: Bryan Call
> Attachments: TS-3606-hack-for-53.diff
>
>
> This was recommended by John Foley (OpenSSL developer) to reduce lock 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3578) Rearrange Client Session processing to give access to socket at SSN_CLOSE for all protocols

2015-05-17 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547312#comment-14547312
 ] 

Susan Hinrichs commented on TS-3578:


>From the mailing list, there appears to be consensus on option 1 
>http://dev.trafficserver.apache.narkive.com/OX9XK0xn/spdy-h2-and-session-hooks.
>  I will open a new bug to track the general solution.

> Rearrange Client Session processing to give access to socket at SSN_CLOSE for 
> all protocols
> ---
>
> Key: TS-3578
> URL: https://issues.apache.org/jira/browse/TS-3578
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: review
> Fix For: 6.0.0
>
>
> I wanted to use the tcpinfo plugin to look at the kernel measured RTT.  
> Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
> SPDY.  In the H2 and SPDY cases, the underlying NetVC is a PluginVC and does 
> not have access to the underlying socket.
> With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
> hook would go off, the netVC had already been closed.
> I propose making the following changes.
> 1.  Make SpdyClientSession a subclass of ProxyClientSession, so 
> SSN_CLOSE_HOOK can be triggered there too.
> 2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is 
> called before the net vc is closed.
> I've made both changes on my dev build, and in my simple tests, the tcpinfo 
> plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.
> Since this involves rearranging some of the bowels of the protocol 
> processing, I'll set up a pull request for broader review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3578) Rearrange Client Session processing to give access to socket at SSN_CLOSE for all protocols

2015-05-17 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547320#comment-14547320
 ] 

Susan Hinrichs commented on TS-3578:


Incorporated comments from James. 

> Rearrange Client Session processing to give access to socket at SSN_CLOSE for 
> all protocols
> ---
>
> Key: TS-3578
> URL: https://issues.apache.org/jira/browse/TS-3578
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: review
> Fix For: 6.0.0
>
>
> I wanted to use the tcpinfo plugin to look at the kernel measured RTT.  
> Unfortunately, there was really only visibility for HTTP/1.x.  Not HTTP/2 or 
> SPDY.  In the H2 and SPDY cases, the underlying NetVC is a PluginVC and does 
> not have access to the underlying socket.
> With HTTP/2, the SSN_CLOSE hook would trigger, but by the time the SSN_CLOSE 
> hook would go off, the netVC had already been closed.
> I propose making the following changes.
> 1.  Make SpdyClientSession a subclass of ProxyClientSession, so 
> SSN_CLOSE_HOOK can be triggered there too.
> 2.  Rearrange the hook calling and net vc close so the SSN_CLOSE hook is 
> called before the net vc is closed.
> I've made both changes on my dev build, and in my simple tests, the tcpinfo 
> plugin is recording times for traffic on top of HTTP/1.1, SPDY, and HTTP/2.
> Since this involves rearranging some of the bowels of the protocol 
> processing, I'll set up a pull request for broader review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3612) Restructure Proxy Client Sessions to support transaction oriented Sessions execute transaction hooks and connection oriented Sessions execute session hooks

2015-05-17 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3612:
--

 Summary: Restructure Proxy Client Sessions to support transaction 
oriented Sessions execute transaction hooks and connection oriented Sessions 
execute session hooks
 Key: TS-3612
 URL: https://issues.apache.org/jira/browse/TS-3612
 Project: Traffic Server
  Issue Type: Improvement
  Components: HTTP, HTTP/2, SPDY
Reporter: Susan Hinrichs


In the current code, transaction and session hooks don't have access to H2 and 
SPDY session data.  This was partially addressed by TS-3578.

There was a discussion on the mailing list, and the consensus was that session 
hooks should be invoked on session-oriented sessions (H2, SPDY, and native 
HTTP/1.x) and transaction hooks should be invoked on transaction-oriented 
sessions.  

http://dev.trafficserver.apache.narkive.com/OX9XK0xn/spdy-h2-and-session-hooks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3612) Restructure Proxy Client Sessions to support transaction oriented Sessions execute transaction hooks and connection oriented Sessions execute session hooks

2015-05-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3612:
---
Fix Version/s: 6.0.0

> Restructure Proxy Client Sessions to support transaction oriented Sessions 
> execute transaction hooks and connection oriented Sessions execute session 
> hooks
> ---
>
> Key: TS-3612
> URL: https://issues.apache.org/jira/browse/TS-3612
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, HTTP/2, SPDY
>Reporter: Susan Hinrichs
> Fix For: 6.0.0
>
>
> In the current code, transaction and session hooks don't have access to H2 
> and SPDY session data.  This was partially addressed by TS-3578.
> There was a discussion on the mailing list, and the consensus was that 
> session hooks should be invoked on session-oriented sessions (H2, SPDY, and 
> native HTTP/1.x) and transaction hooks should be invoked on 
> transaction-oriented sessions.  
> http://dev.trafficserver.apache.narkive.com/OX9XK0xn/spdy-h2-and-session-hooks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3606) Add configuration option to allow server context per thread

2015-05-18 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548126#comment-14548126
 ] 

Susan Hinrichs commented on TS-3606:


Depends on the solution.  It will likely result in increased memory usage.  
Probably fine for small ssl_multicert.configs, but perhaps not good for really 
large sets of certificates.

> Add configuration option to allow server context per thread
> ---
>
> Key: TS-3606
> URL: https://issues.apache.org/jira/browse/TS-3606
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: SSL
>Reporter: Bryan Call
>Assignee: Bryan Call
> Fix For: 6.0.0
>
> Attachments: TS-3606-hack-for-53.diff
>
>
> This was recommended by John Foley (OpenSSL developer) to reduce lock 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-18 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548188#comment-14548188
 ] 

Susan Hinrichs commented on TS-3554:


Agreed on ref-counting the parent session_cache object.  That should make 
accesses in the callbacks safe.  If we do the ref-counting, I don't see the 
need for rw-locks.  If you have a reference to the old session cache, you 
continue to use that to fetch out the reconstituted SSL_SESSION object.  When 
you are done, you drop the reference, and the old session cache goes away, 
taking the serialized forms of the old sessions with it.  I assume the openssl 
framework takes care of freeing up the SSL_SESSION object when it is done with 
it.  

We could try to be clever and only reinitialize if the session table 
characteristics change.  But if the certs change, perhaps that isn't safe 
either.  Easiest to just always reset I suppose.





> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554377#comment-14554377
 ] 

Susan Hinrichs commented on TS-3554:


Summary of investigations from yesterday.

[~reveller] reports 6GB increase in memory usage on each reload of his 6K long 
ssl_mulitcert.config file.

I experimented with a 1K long ssl_multicert.config file and 5.3.x with the 
limit_session_cache_alloc.diff and the ts-3554-53-2.diff patchs.  I also made 
more changes to enable the gperf tools heapchecker 
(http://gperftools.googlecode.com/svn/trunk/doc/heap_checker.html) around the 
ssl_multicert.config reload logic.

If I reloaded without any traffic passing through, the heapchecker reported no 
leaks and the VmSize increased by about 20KB sometimes.

If I passed traffic through without any reloads, the VmSize stayed stable.  
This wasn't a huge traffic load.  I ran "ab" with 4-6 concurrency to a constant 
URL.

However, if I reloaded and passed traffic, I saw a VmSize increase of 25-30 MB 
per reload.  Still no real memory leaks reported by the heap checker, but the 
number of "live" objects reported increased with each reload.  So I assume that 
the "leak" is objects left in a table somewhere (perhaps in openssl?).

So if we both reload and have some traffic passing through, I see a memory size 
increase.  Not as big as [~reveller] reported, but substantial.  He also claims 
that there was no traffic going through.  But perhaps there was a little bit of 
traffic passing through but not the full production load?

Next step is to figure what theses "leaked" but live objects are.



> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554377#comment-14554377
 ] 

Susan Hinrichs edited comment on TS-3554 at 5/21/15 2:59 PM:
-

Summary of investigations from yesterday.

[~reveller] reports 6GB increase in memory usage on each reload of his 6K long 
ssl_mulitcert.config file.

I experimented with a 1K long ssl_multicert.config file and 5.3.x with the 
limit_session_cache_alloc.diff and the ts-3554-53-2.diff patchs.  I also made 
more changes to enable the gperf tools heapchecker 
(http://gperftools.googlecode.com/svn/trunk/doc/heap_checker.html) around the 
ssl_multicert.config reload logic.  Ran most of my tests with session caching 
turned off.

If I reloaded without any traffic passing through, the heapchecker reported no 
leaks and the VmSize increased by about 20KB sometimes.

If I passed traffic through without any reloads, the VmSize stayed stable.  
This wasn't a huge traffic load.  I ran "ab" with 4-6 concurrency to a constant 
URL.

However, if I reloaded and passed traffic, I saw a VmSize increase of 25-30 MB 
per reload.  Still no real memory leaks reported by the heap checker, but the 
number of "live" objects reported increased with each reload.  So I assume that 
the "leak" is objects left in a table somewhere (perhaps in openssl?).

So if we both reload and have some traffic passing through, I see a memory size 
increase.  Not as big as [~reveller] reported, but substantial.  He also claims 
that there was no traffic going through.  But perhaps there was a little bit of 
traffic passing through but not the full production load?

Next step is to figure what theses "leaked" but live objects are.




was (Author: shinrich):
Summary of investigations from yesterday.

[~reveller] reports 6GB increase in memory usage on each reload of his 6K long 
ssl_mulitcert.config file.

I experimented with a 1K long ssl_multicert.config file and 5.3.x with the 
limit_session_cache_alloc.diff and the ts-3554-53-2.diff patchs.  I also made 
more changes to enable the gperf tools heapchecker 
(http://gperftools.googlecode.com/svn/trunk/doc/heap_checker.html) around the 
ssl_multicert.config reload logic.

If I reloaded without any traffic passing through, the heapchecker reported no 
leaks and the VmSize increased by about 20KB sometimes.

If I passed traffic through without any reloads, the VmSize stayed stable.  
This wasn't a huge traffic load.  I ran "ab" with 4-6 concurrency to a constant 
URL.

However, if I reloaded and passed traffic, I saw a VmSize increase of 25-30 MB 
per reload.  Still no real memory leaks reported by the heap checker, but the 
number of "live" objects reported increased with each reload.  So I assume that 
the "leak" is objects left in a table somewhere (perhaps in openssl?).

So if we both reload and have some traffic passing through, I see a memory size 
increase.  Not as big as [~reveller] reported, but substantial.  He also claims 
that there was no traffic going through.  But perhaps there was a little bit of 
traffic passing through but not the full production load?

Next step is to figure what theses "leaked" but live objects are.



> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554740#comment-14554740
 ] 

Susan Hinrichs commented on TS-3554:


Sigh.   This has all been a wild goose chase.  On the plus side, I'm now far 
more familiar with gperftools heap-checker and heap-profiler (very nice tools).

The traffic + reload memory growth I was seeing was due to the fact that I had 
not applied the patch associated with the first big fix on this bug (commit  
98c87ee51b2ad91787b7a9fa2827cab1c03b3d22).  

Once I started over with 5.3.x and applied the patch ts-3554-53-2.diff, the 
memory growth disappeared.  The fix was not backported to 5.3.0 because 
[~reveller] was still seeing leaks.  In your most recent tests were you running 
with the patched 5.3.0?  If not, please try again.  For 5.3.1, we should got 
ahead and port back what we have even if we do not have all of reveller's 
memory leaks fixed.  I'll go ahead and spin off another bug to track the 
potential for session leaks.  As I dug more into memory profiling, it isn't 
clear that we are really running into that case.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-21 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3554:
---
Attachment: ts-3554-53-3.diff

ts-3554-53-3.diff is a superset of ts-3554-53-2.diff plus a Debug print to 
verify that the CertLookup object is freed on reload.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-22 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556149#comment-14556149
 ] 

Susan Hinrichs commented on TS-3554:


Spent some time working on one of [~reveller]'s machines yesterday.  With the 
same certs and ssl_multicert.config that I've been using on my system, I see 
much higher initial memory usage and the growth on reload that reveller has 
been reporting.

This morning I installed on a single processor quad core running FC17 and saw 
almost the same thing I was seeing on reveller's machine.  Now that I have a 
machine I can build on, in a good position to track it down.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-22 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556149#comment-14556149
 ] 

Susan Hinrichs edited comment on TS-3554 at 5/22/15 1:39 PM:
-

Spent some time working on one of [~reveller]'s machines yesterday.  With the 
same certs and ssl_multicert.config that I've been using on my system, I see 
much higher initial memory usage and the growth on reload that reveller has 
been reporting.

This morning I installed on a single processor quad core with 8GB memory 
running FC17 and saw almost the same thing I was seeing on reveller's machine.  
Now that I have a machine I can build on, in a good position to track it down.


was (Author: shinrich):
Spent some time working on one of [~reveller]'s machines yesterday.  With the 
same certs and ssl_multicert.config that I've been using on my system, I see 
much higher initial memory usage and the growth on reload that reveller has 
been reporting.

This morning I installed on a single processor quad core running FC17 and saw 
almost the same thing I was seeing on reveller's machine.  Now that I have a 
machine I can build on, in a good position to track it down.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-23 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557311#comment-14557311
 ] 

Susan Hinrichs commented on TS-3554:


We narrowed it down to the version of the openssl library.  I run with my own 
build of the openssl library fetched from openssl.org.  So the base 
openssl-1.0.1e and openssl-1.0.1f do not show the unexpected memory growth.

However, the CentOS version of openssl-1.0.1e (which is base openssl-1.0.1e 
plus a number of patches) does show the problem.  By running with the yum 
installed version of the openssl library or the rpmbuild source version of the 
CentOS package, I am able to reproduce the memory growth on reload on my VM.

Need to do some more experiments to determine what in the patch is causing the 
problem or if we move up to native 1.0.1m (the latest in the openssl version of 
the 1.0.1 stream) does the problem go away.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-23 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557327#comment-14557327
 ] 

Susan Hinrichs commented on TS-3554:


More data from my VM

Version openssl compiled/run against  => Status
1.0.1e from openssl.org => No memory growth
1.0.1m from openssl.org => No memory growth
1.0.1e provided by yum Centos6.4 (30.el6.8) => Memory growth.  Cannot even load 
base 3K line ssl_multicert without extra traffic_cop time delay
1.0.1e build from Centos (30.el6.8) source rpm => Same behavior as yum version
1.0.2 from openssl.org => No memory growth

Not clear what the centos patches are trying to provide beyond the 1.0.1m 
version (the latest in the 1.0.1 series provided by openssl, updated March 19, 
2015)

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-25 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558232#comment-14558232
 ] 

Susan Hinrichs commented on TS-3554:


No new API as far as I can tell.  Most of us don't load 1000's of certificates, 
so it wouldn't be noticeable.  I would assume RHEL6 has the same issue too.  
I'll try that out and follow up with red hat support as necessary when I get 
back to the office tomorrow.

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3378) SpdyRequest used after free()

2015-05-26 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3378:
--

Assignee: Susan Hinrichs

> SpdyRequest used after free()
> -
>
> Key: TS-3378
> URL: https://issues.apache.org/jira/browse/TS-3378
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> I see this on our docs.ts machine:
> {code}
> ==1310==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x6110004fc974 at pc 0x7c2162 bp 0x7fff97c95010 sp 0x7fff97c95008
> READ of size 1 at 0x6110004fc974 thread T0 ([ET_NET 0])
> #0 0x7c2161 in spdy_process_fetch 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:339
> #1 0x7c2161 in SpdyClientSession::state_session_readwrite(int, void*) 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:253
> #2 0x4f1308 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #3 0x4f1308 in FetchSM::InvokePluginExt(int) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:250
> #4 0x4f455a in FetchSM::fetch_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:516
> #5 0x59f737 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #6 0x59f737 in PluginVC::process_write_side(bool) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:519
> #7 0x5aa2fd in PluginVC::main_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:210
> #8 0xc6aabe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #9 0xc6aabe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:144
> #10 0xc6d0d9 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:238
> #11 0x498481 in main /usr/local/src/trafficserver/proxy/Main.cc:1759
> #12 0x2b01d58c0af4 in __libc_start_main (/lib64/libc.so.6+0x21af4)
> #13 0x4ab124 (/opt/ats/bin/traffic_server+0x4ab124)
> 0x6110004fc974 is located 52 bytes inside of 224-byte region 
> [0x6110004fc940,0x6110004fca20)
> freed by thread T0 ([ET_NET 0]) here:
> #0 0x2b01d1d2e1c7 in __interceptor_free 
> ../../.././libsanitizer/asan/asan_malloc_linux.cc:62
> #1 0x7c8433 in ClassAllocator::free(SpdyRequest*) 
> ../../lib/ts/Allocator.h:138
> #2 0x7c8433 in SpdyClientSession::cleanup_request(int) 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.h:146
> #3 0x7c8433 in 
> spdy_prepare_status_response_and_clean_request(SpdyClientSession*, int, char 
> const*) /usr/local/src/trafficserver/proxy/spdy/SpdyCa
> llbacks.cc:85
> #4 0x7c1094 in spdy_process_fetch 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:354
> #5 0x7c1094 in SpdyClientSession::state_session_readwrite(int, void*) 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:253
> #6 0x4f1c95 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #7 0x4f1c95 in FetchSM::InvokePluginExt(int) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:259
> #8 0x4f2eaa in FetchSM::process_fetch_read(int) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:465
> #9 0x4f4542 in FetchSM::fetch_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:514
> #10 0x59e077 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #11 0x59e077 in PluginVC::process_read_side(bool) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:640
> #12 0x5aab79 in PluginVC::main_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:206
> #13 0xc6aabe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #14 0xc6aabe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:144
> #15 0xc6d0d9 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:238
> #16 0x498481 in main /usr/local/src/trafficserver/proxy/Main.cc:1759
> #17 0x2b01d58c0af4 in __libc_start_main (/lib64/libc.so.6+0x21af4)
> previously allocated by thread T5 ([ET_NET 4]) here:
> #0 0x2b01d1d2e93b in __interceptor_posix_memalign 
> ../../.././libsanitizer/asan/asan_malloc_linux.cc:130
> #1 0x2b01d2c18309 in ats_memalign 
> /usr/local/src/trafficserver/lib/ts/ink_memory.cc:96
> #2 0x7c89ba in ClassAllocator::alloc() 
> ../../lib/ts/Allocator.h:124
> #3 0x7c89ba in spdy_on_ctrl_recv_callback(spdylay_session*, 
> spdylay_frame_type, spdylay_frame*, void*) 
> /usr/local/src/trafficserver/proxy/spdy/Spd
> yCallbacks.cc:328
> #4 0x2b01d3f1afff

[jira] [Commented] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-26 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559682#comment-14559682
 ] 

Susan Hinrichs commented on TS-3554:


I confirmed that this is an issue with the official RHEL 6 openssl package 
(1.0.1e release 30.el6_6.4).

I'm going to close this issue now.  The fixes associated with this bug should 
be back ported for 5.3.1.  The remaining issue is outside the ATS code base.

I'll work on getting a case open with Red Hat, but in the meantime anyone 
working with Centos 6.x or RHEL6 should build their own openssl package from 
the openssl.org source if they are running into this issue (loading large 
number's of certificates).

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3554) ATS memory leak reloading ssl_multicert.config with many ssl cert configs

2015-05-26 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3554.

Resolution: Fixed

> ATS memory leak reloading ssl_multicert.config with many ssl cert configs
> -
>
> Key: TS-3554
> URL: https://issues.apache.org/jira/browse/TS-3554
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, Core, SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: limit_session_cache_alloc.diff, ts-3554-53-2.diff, 
> ts-3554-53-3.diff, ts-3554-53.diff
>
>
> ATS will consume all available memory on a server with 128GB of RAM.  
> @shinrich suspects it may be due to CertLookup table not being freed on a 
> config reload.
> Our current process:
> - New cert comes in
> - ssl_multicert.config and remap.config updated
> - traffic_line -x
> This reload could occur as often as every 3 mins with 5000+ certs configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3640) Drupal auth fails with dda6814f07ee59c over SPDY

2015-05-29 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564798#comment-14564798
 ] 

Susan Hinrichs commented on TS-3640:


The issue appears to be due to is_pipeline_possible checks only half closing 
the connection, so the EOS is not sent to the spdy client. 

> Drupal auth fails with dda6814f07ee59c over SPDY
> 
>
> Key: TS-3640
> URL: https://issues.apache.org/jira/browse/TS-3640
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 6.0.0
>
>
> With the patch from dda6814f07ee59c, when Drupal authenticates a user, it 
> sends back a 302 redirect to that user's "page". This seems to stall the SPDY 
> session entirely (it stops dead in its track at this point). Backing out 
> dda6814f07ee59c makes it work again.
> I've emailed some potentially sensitive traces directly to [~shinrich]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3560) Make proxy.config.http.slow.log.threshold overridable

2015-05-29 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3560:
---
Assignee: Syeda Persia Aziz  (was: Alan M. Carroll)

> Make proxy.config.http.slow.log.threshold overridable
> -
>
> Key: TS-3560
> URL: https://issues.apache.org/jira/browse/TS-3560
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: David Carlin
>Assignee: Syeda Persia Aziz
> Fix For: 6.0.0
>
>
> Please make proxy.config.http.slow.log.threshold configurable via conf_remap 
> - we want to be able to enable it on a per-remap basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3647) Add support for changing transaction overrideable config vars to CPP API

2015-05-29 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3647:
---
Assignee: Syeda Persia Aziz  (was: Susan Hinrichs)

> Add support for changing transaction overrideable config vars to CPP API
> 
>
> Key: TS-3647
> URL: https://issues.apache.org/jira/browse/TS-3647
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: CPP API
>Reporter: Alan M. Carroll
>Assignee: Syeda Persia Aziz
>  Labels: newbie++
>
> The CPP API should support the equivalent of the following C API functions.
> {code}
> TSHttpTxnConfigIntSet
> TSHttpTxnConfigIntGet
> TSHttpTxnConfigFloatSet
> TSHttpTxnConfigFloatGet
> TSHttpTxnConfigStringSet
> TSHttpTxnConfigStringGet
> TSHttpTxnConfigFind
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3560) Make proxy.config.http.slow.log.threshold overridable

2015-05-29 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3560.

Resolution: Fixed

> Make proxy.config.http.slow.log.threshold overridable
> -
>
> Key: TS-3560
> URL: https://issues.apache.org/jira/browse/TS-3560
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: David Carlin
>Assignee: Syeda Persia Aziz
> Fix For: 6.0.0
>
>
> Please make proxy.config.http.slow.log.threshold configurable via conf_remap 
> - we want to be able to enable it on a per-remap basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3656) Activating follow redirection in send server response hook does not work for post

2015-06-02 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3656:
--

 Summary: Activating follow redirection in send server response 
hook does not work for post
 Key: TS-3656
 URL: https://issues.apache.org/jira/browse/TS-3656
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


If you have a plugin on the TS_HTTP_SEND_RESPONSE_HDR_HOOK, calls 
TSHttpTxnFollowRedirect(txn, 1), redirecting a POST request will fail.

In the not so bad case, the POST request will be redirected to the new 
location, but the POST data will be lost.

In the more bad case, ATS will crash.

The issue is that the post_redirect buffers are freed early on.  One could 
delay the post_redirect deallocation until later in the transaction.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3476) Add a log tag for APLN/NPN negotiated protocol

2015-06-02 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3476:
---
Assignee: Eric Schwartz  (was: Susan Hinrichs)

> Add a log tag for APLN/NPN negotiated protocol
> --
>
> Key: TS-3476
> URL: https://issues.apache.org/jira/browse/TS-3476
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Logging, SSL
>Reporter: Leif Hedstrom
>Assignee: Eric Schwartz
> Fix For: 6.0.0
>
>
> It seems crucial to be able to log which protocol handler was negotiated with 
> ALPN (and perhaps NPN as long as we support it). This could simple be the 
> string that was negotiated by the client/server in the TLS handshake. For 
> example, with HTTP/2, it would be "h2" (or "h2-14" with some browsers).
> A suggested log tag name would be % which seems clear enough to me and 
> easy to remember :).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3656) Activating follow redirection in send server response hook does not work for post

2015-06-03 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3656:
--

Assignee: Susan Hinrichs

> Activating follow redirection in send server response hook does not work for 
> post
> -
>
> Key: TS-3656
> URL: https://issues.apache.org/jira/browse/TS-3656
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> If you have a plugin on the TS_HTTP_SEND_RESPONSE_HDR_HOOK, calls 
> TSHttpTxnFollowRedirect(txn, 1), redirecting a POST request will fail.
> In the not so bad case, the POST request will be redirected to the new 
> location, but the POST data will be lost.
> In the more bad case, ATS will crash.
> The issue is that the post_redirect buffers are freed early on.  One could 
> delay the post_redirect deallocation until later in the transaction.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-04 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3665:
--

 Summary: Redirect logic causing debug asserts and leaking 
cache_vc's
 Key: TS-3665
 URL: https://issues.apache.org/jira/browse/TS-3665
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs


This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
issue addressed by TS-3140 after the fixes for TS-3661 were put in place.

TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
for both 301's in cache and 301's not in cache.  

My first assert was line 109 in HttpCacheSM.cc line 109, 
ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
While only debug assert, if we ignore it we will reassign cache_read_vc without 
freeing the previous.

I addressed this by adding cache_sm.close_read() to the SM_ACTION_REDIRECT_READ 
case of HttpSM::handle_api_return.

My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this case, 
do_cache_prepare_action will open a new cache_write_vc overwriting the original 
and losing the cache_vc memory.

The original fix to TS-3140 addressed this by adding a cache_sm.close_write in 
the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this caused 
problems of TS-3661 causing the originally selected cache key to be lost, but 
if you pass through this logic, I assume that the original cache write vc will 
be lost anyway.  [~sudheerv] and [~zwoop] does this situation not happen in 
your redirect use cases?  I'm afraid that I'm not following how the original 
cache key is preserved in the second cache open only if the first cache write 
open is not cleaned  up.

My test URLs are:

curl -v --proxy localhost:80 
http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png

and 

curl -v --proxy localhost:80 http://docs.trafficserver.apache.org





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-04 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572832#comment-14572832
 ] 

Susan Hinrichs commented on TS-3665:


BTW, in all if these scenarios we are running with follow redirect enabled.  I 
have redirect retry set to 10.

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-04 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572947#comment-14572947
 ] 

Susan Hinrichs commented on TS-3665:


So in the follow redirect case, we should ignore any 30x response that happens 
to already be in the cache.  Although we would have to look it up in the cache 
(as part of normal operating procedure) to know it was a 30x.

So we do need to attempt to read from the cache for the initial request anyway. 
 After that, it seems reasonable to not attempt to write to the cache until the 
"final" response that is sent back to the client.  Again, we may want to be 
reading from the cache on the intermediates, in case this is the final location 
and has already been cached.

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-04 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573048#comment-14573048
 ] 

Susan Hinrichs commented on TS-3665:


Backed out the changes for TS-3585 and it made no difference.  Not surprising, 
since the only functional difference in that patch affects if the cache is not 
enabled.  In our case, the cache is enabled.  It correctly directs the flow to 
do a cache lookup on the redirected target.

Will spend a bit of time to determine what is opening the cache write vc.

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
> Fix For: 6.0.0
>
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3665:
---
Attachment: ts-3665.diff

With ts-3665.diff, I added a close_read on the cache_sm in the REDIRECT_READ 
case.

I converted the assert in the do_cache_prepare_action to a test against NULL 
and cache close_write.

This enables both of my URLS to be fetched without asserts. 

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3665.diff
>
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3665:
---
Attachment: ts-3665-2.diff

ts-3665-2.diff contains an updated patch of the cache read vc cleanup suggested 
by [~sudheerv].  It also updates the debug asserts to correctly test for the 
non null cache write vc's floating around in the follow redirect case.  

The cache read vc is not due to the ts-3661 fix.  The cache write vc debug 
asserts are due to the ts-3661 fix.

The fixes should definitely be pushed to master.

The fix for ts-3661 should definitely be backported to 5.3.1.  I think cleaning 
up the debug asserts should also be backported (the fix on this bug), but it is 
not such a big thing.  I leave it for [~psudaemon] to decide.

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3665-2.diff, ts-3665.diff
>
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3378) SpdyRequest used after free()

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3378.

Resolution: Fixed

This change at least narrows the crash window.  @zwoop will keep an eye on this 
on docs.trafficserver.apache.org to see if it eliminates are reduces this crash.

> SpdyRequest used after free()
> -
>
> Key: TS-3378
> URL: https://issues.apache.org/jira/browse/TS-3378
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> I see this on our docs.ts machine:
> {code}
> ==1310==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x6110004fc974 at pc 0x7c2162 bp 0x7fff97c95010 sp 0x7fff97c95008
> READ of size 1 at 0x6110004fc974 thread T0 ([ET_NET 0])
> #0 0x7c2161 in spdy_process_fetch 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:339
> #1 0x7c2161 in SpdyClientSession::state_session_readwrite(int, void*) 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:253
> #2 0x4f1308 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #3 0x4f1308 in FetchSM::InvokePluginExt(int) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:250
> #4 0x4f455a in FetchSM::fetch_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:516
> #5 0x59f737 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #6 0x59f737 in PluginVC::process_write_side(bool) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:519
> #7 0x5aa2fd in PluginVC::main_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:210
> #8 0xc6aabe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #9 0xc6aabe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:144
> #10 0xc6d0d9 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:238
> #11 0x498481 in main /usr/local/src/trafficserver/proxy/Main.cc:1759
> #12 0x2b01d58c0af4 in __libc_start_main (/lib64/libc.so.6+0x21af4)
> #13 0x4ab124 (/opt/ats/bin/traffic_server+0x4ab124)
> 0x6110004fc974 is located 52 bytes inside of 224-byte region 
> [0x6110004fc940,0x6110004fca20)
> freed by thread T0 ([ET_NET 0]) here:
> #0 0x2b01d1d2e1c7 in __interceptor_free 
> ../../.././libsanitizer/asan/asan_malloc_linux.cc:62
> #1 0x7c8433 in ClassAllocator::free(SpdyRequest*) 
> ../../lib/ts/Allocator.h:138
> #2 0x7c8433 in SpdyClientSession::cleanup_request(int) 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.h:146
> #3 0x7c8433 in 
> spdy_prepare_status_response_and_clean_request(SpdyClientSession*, int, char 
> const*) /usr/local/src/trafficserver/proxy/spdy/SpdyCa
> llbacks.cc:85
> #4 0x7c1094 in spdy_process_fetch 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:354
> #5 0x7c1094 in SpdyClientSession::state_session_readwrite(int, void*) 
> /usr/local/src/trafficserver/proxy/spdy/SpdyClientSession.cc:253
> #6 0x4f1c95 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #7 0x4f1c95 in FetchSM::InvokePluginExt(int) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:259
> #8 0x4f2eaa in FetchSM::process_fetch_read(int) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:465
> #9 0x4f4542 in FetchSM::fetch_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/FetchSM.cc:514
> #10 0x59e077 in Continuation::handleEvent(int, void*) 
> ../iocore/eventsystem/I_Continuation.h:146
> #11 0x59e077 in PluginVC::process_read_side(bool) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:640
> #12 0x5aab79 in PluginVC::main_handler(int, void*) 
> /usr/local/src/trafficserver/proxy/PluginVC.cc:206
> #13 0xc6aabe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #14 0xc6aabe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:144
> #15 0xc6d0d9 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:238
> #16 0x498481 in main /usr/local/src/trafficserver/proxy/Main.cc:1759
> #17 0x2b01d58c0af4 in __libc_start_main (/lib64/libc.so.6+0x21af4)
> previously allocated by thread T5 ([ET_NET 4]) here:
> #0 0x2b01d1d2e93b in __interceptor_posix_memalign 
> ../../.././libsanitizer/asan/asan_malloc_linux.cc:130
> #1 0x2b01d2c18309 in ats_memalign 
> /usr/local/src/trafficserver/lib/ts/ink_memory.cc:96
> #2 0x7c89ba in ClassAllocator::alloc() 
> ../../lib/ts/Allocator.h:124
> #3 0x7c89ba in spdy_on_ctrl_recv_callback(spd

[jira] [Created] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-06-04 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3667:
--

 Summary: SSL Handhake read does not correctly handle EOF and error 
cases
 Key: TS-3667
 URL: https://issues.apache.org/jira/browse/TS-3667
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs


Reported by [~esproul] and postwait.

The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
EOF and errors are not terminated, but rather spin until the inactivity timeout 
is reached.  EAGAIN  is not being descheduled until more data is available.

This results in higher CPU utilization and hitting the SSL_error() function 
much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3667:
--

Assignee: Susan Hinrichs

> SSL Handhake read does not correctly handle EOF and error cases
> ---
>
> Key: TS-3667
> URL: https://issues.apache.org/jira/browse/TS-3667
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> Reported by [~esproul] and postwait.
> The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
> EOF and errors are not terminated, but rather spin until the inactivity 
> timeout is reached.  EAGAIN  is not being descheduled until more data is 
> available.
> This results in higher CPU utilization and hitting the SSL_error() function 
> much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3667:
---
Attachment: ts-3667.diff

I tried the ts-3667.diff patch against master on my dev machine.  I've 
triggered the EOF case,  But not the -1 return cases.  I'll work on that some 
more, but I wanted to make the patch available to those seeing the problem 
sooner.

> SSL Handhake read does not correctly handle EOF and error cases
> ---
>
> Key: TS-3667
> URL: https://issues.apache.org/jira/browse/TS-3667
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Attachments: ts-3667.diff
>
>
> Reported by [~esproul] and postwait.
> The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
> EOF and errors are not terminated, but rather spin until the inactivity 
> timeout is reached.  EAGAIN  is not being descheduled until more data is 
> available.
> This results in higher CPU utilization and hitting the SSL_error() function 
> much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3382) Complaints of CRYPTO_set_id_callback while compiling against openssl 1.1

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3382.

Resolution: Fixed

> Complaints of  CRYPTO_set_id_callback while compiling against openssl 1.1
> -
>
> Key: TS-3382
> URL: https://issues.apache.org/jira/browse/TS-3382
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
>  CRYPTO_set_id_callback has been deprecated since openssl 1.0.0.  Should 
> update with the replacing call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-06-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3667.

Resolution: Fixed

> SSL Handhake read does not correctly handle EOF and error cases
> ---
>
> Key: TS-3667
> URL: https://issues.apache.org/jira/browse/TS-3667
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Attachments: ts-3667.diff
>
>
> Reported by [~esproul] and postwait.
> The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
> EOF and errors are not terminated, but rather spin until the inactivity 
> timeout is reached.  EAGAIN  is not being descheduled until more data is 
> available.
> This results in higher CPU utilization and hitting the SSL_error() function 
> much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3640) Drupal auth fails with dda6814f07ee59c over SPDY

2015-06-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3640:
---
Attachment: ts-3640.diff

ts-3640.diff contains a patch that disables the close for write at the end of 
the post/redirect sequence.  It isn't clear to me when it would be beneficial 
to keep the client connection open for reads after completing the post if the 
connection is marked close.

I have not been able to exercise this code in my dev environment.  [~zwoop] 
could you give this patch a try?  If it doesn't help, I'll make a better effort 
in reproductin.

> Drupal auth fails with dda6814f07ee59c over SPDY
> 
>
> Key: TS-3640
> URL: https://issues.apache.org/jira/browse/TS-3640
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 6.0.0
>
> Attachments: ts-3640.diff
>
>
> With the patch from dda6814f07ee59c, when Drupal authenticates a user, it 
> sends back a 302 redirect to that user's "page". This seems to stall the SPDY 
> session entirely (it stops dead in its track at this point). Backing out 
> dda6814f07ee59c makes it work again.
> I've emailed some potentially sensitive traces directly to [~shinrich]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3665:
--

Assignee: Susan Hinrichs

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3665-2.diff, ts-3665.diff
>
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3640) Drupal auth fails with dda6814f07ee59c over SPDY

2015-06-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3640.

Resolution: Fixed

> Drupal auth fails with dda6814f07ee59c over SPDY
> 
>
> Key: TS-3640
> URL: https://issues.apache.org/jira/browse/TS-3640
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 6.0.0
>
> Attachments: ts-3640.diff
>
>
> With the patch from dda6814f07ee59c, when Drupal authenticates a user, it 
> sends back a 302 redirect to that user's "page". This seems to stall the SPDY 
> session entirely (it stops dead in its track at this point). Backing out 
> dda6814f07ee59c makes it work again.
> I've emailed some potentially sensitive traces directly to [~shinrich]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3641) Drupal Auth does not seem to work with HTTP/2

2015-06-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578867#comment-14578867
 ] 

Susan Hinrichs commented on TS-3641:


[~zwoop] Does the fix for TS-3640 fix the HTTP/2 case as well?

> Drupal Auth does not seem to work with HTTP/2
> -
>
> Key: TS-3641
> URL: https://issues.apache.org/jira/browse/TS-3641
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Leif Hedstrom
> Fix For: 6.0.0
>
>
> Using latest chrome, when authenticating to a Drupal site behind ATS, it 
> fails to authenticate. It silently seems to just ignore the auth, and moves 
> along unauthenticated. It's possible this is similar to TS-3640, but the 
> "fix" from that Jira does not resolve the HTTP/2 issues. 
> In fact, this problem exists all the way back to 5.3.0, so the fix here would 
> also be a back port for 5.3.1 (or 5.3.2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3665) Redirect logic causing debug asserts and leaking cache_vc's

2015-06-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3665.

Resolution: Fixed

> Redirect logic causing debug asserts and leaking cache_vc's
> ---
>
> Key: TS-3665
> URL: https://issues.apache.org/jira/browse/TS-3665
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3665-2.diff, ts-3665.diff
>
>
> This is related to TS-3140 and TS-3661.  I spent this morning reviewing the 
> issue addressed by TS-3140 after the fixes for TS-3661 were put in place.
> TS-3140 addresses the issue when the 301 is in cache, but I'm seeing asserts 
> for both 301's in cache and 301's not in cache.  
> My first assert was line 109 in HttpCacheSM.cc line 109, 
> ink_assert(cache_read_vc == NULL).  I added a cache_sm.close_read() to the 
> HttpTransact::SM_ACTION_REDIRECT_READ: case of HttpSM::handle_api_return.  
> While only debug assert, if we ignore it we will reassign cache_read_vc 
> without freeing the previous.
> I addressed this by adding cache_sm.close_read() to the 
> SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.
> My second assert is in HttpSM::do_cache_prepare_action (line 4446 of 
> HttpSM.cc).  Before the changes for TS-3661, it was expressing itself in 
> SM_ACTION_CACHE_ISSUE_WRITE case of HttpSM::cache_write_state().  In this 
> case, do_cache_prepare_action will open a new cache_write_vc overwriting the 
> original and losing the cache_vc memory.
> The original fix to TS-3140 addressed this by adding a cache_sm.close_write 
> in the SM_ACTION_REDIRECT_READ case of HttpSM::handle_api_return.  But this 
> caused problems of TS-3661 causing the originally selected cache key to be 
> lost, but if you pass through this logic, I assume that the original cache 
> write vc will be lost anyway.  [~sudheerv] and [~zwoop] does this situation 
> not happen in your redirect use cases?  I'm afraid that I'm not following how 
> the original cache key is preserved in the second cache open only if the 
> first cache write open is not cleaned  up.
> My test URLs are:
> curl -v --proxy localhost:80 
> http://whos.amung.us/cwidget/4s62rme9/007071fecc4e.png
> and 
> curl -v --proxy localhost:80 http://docs.trafficserver.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3680) Refine options for handling post and follow redirect

2015-06-09 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3680:
--

 Summary: Refine options for handling post and follow redirect
 Key: TS-3680
 URL: https://issues.apache.org/jira/browse/TS-3680
 Project: Traffic Server
  Issue Type: Improvement
  Components: HTTP
Reporter: Susan Hinrichs


In the current code base (5.3.x), the follow redirect feature applies to POST 
methods as well as GET/HEAD methods.  For POST redirects, the code saves aside 
a copy (reference counted) of the post data, so it can resend it later if the 
original server sends a redirect and the follow redirect feature is enabled.

This logic has been in there at least through the 5.x code probably earlier.  

It has been pointed out that in some cases, replaying a POST in a redirect 
scenario may not be safe.  The POST may have already caused a change before the 
redirect occurred.

In other cases, following the redirect on a post may be just fine.  If the 
origin server makes the redirect decision before anything is done on the post 
request, replaying the post request should be just fine.

We should step back and determine if we want to reconsider how we handle follow 
redirects for POST methods.  I see the following options.

* Keep the current support and bug fix it as necessary (e.g. TS-3656)
* Remove the post redirect support.
* Add another config control to enable follow redirect only for GET/HEAD vs 
enabled following redirect for all methods, vs do not follow redirect.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3656) Activating follow redirection in send server response hook does not work for post

2015-06-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579403#comment-14579403
 ] 

Susan Hinrichs commented on TS-3656:


I filed bug TS-3680 to hold the discussion on whether we should be following 
POST redirects or not.  I think this is a broader issue than this issue which 
is just a bug fix on an existing feature.

> Activating follow redirection in send server response hook does not work for 
> post
> -
>
> Key: TS-3656
> URL: https://issues.apache.org/jira/browse/TS-3656
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> If you have a plugin on the TS_HTTP_SEND_RESPONSE_HDR_HOOK, calls 
> TSHttpTxnFollowRedirect(txn, 1), redirecting a POST request will fail.
> In the not so bad case, the POST request will be redirected to the new 
> location, but the POST data will be lost.
> In the more bad case, ATS will crash.
> The issue is that the post_redirect buffers are freed early on.  One could 
> delay the post_redirect deallocation until later in the transaction.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3683) Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused

2015-06-11 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582400#comment-14582400
 ] 

Susan Hinrichs commented on TS-3683:


Yes, there are cumulative metrics. And in many circumstances this is sufficient.

The motivation for providing the same in formation on a per-log-entry basis is 
to be able to roll up the data in different ways.  For example, you might want 
to do post analysis to determine the SSL session reuse for clients from Europe 
vs clients from South America.  Per box statistics are not sufficient in that 
case.

> Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused
> 
>
> Key: TS-3683
> URL: https://issues.apache.org/jira/browse/TS-3683
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: François Pesce
>
> These tags would be useful for performance metrics collection:
> <%cqtr> The TCP reused status; indicates if this request went through an 
> already established connection.
> <%cqssr> The SSL session/ticket reused status; indicates if this request hit 
> the SSL session/ticket and avoided a full SSL handshake.
> both of them would display respectively 0 or 1 , if resp. not reused or 
> reused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3687) ATS Session Cache table never removes expired sessions

2015-06-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3687:
--

Assignee: Susan Hinrichs

> ATS Session Cache table never removes expired sessions
> --
>
> Key: TS-3687
> URL: https://issues.apache.org/jira/browse/TS-3687
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> While this sounds bad, it is only a performance issue.  It is not a security 
> issue.  Openssl will not allow the expired sessions to be used.
> Here are the details.
> When you use the ATS version of the ssl session cache, ATS registers
> callbacks to handle creating new sessions, getting existing sessions,
> and removing old sessions.  While debugging the new session plugin API,
> I saw that the new sessions and get session callbacks were being
> triggered but the remove session callback was never being triggered.
> At first I was concerned that we were never removing  sessions from the
> cache and reusing them forever.  I poked through the openssl 1.0.1 (and
> briefly the 1.0.2) code and set some break points, and verified that the
> stale sessions are being rejected but the code only tries to remove it
> from the openssl internal cache implementation (which failed and so the
> remove callback was never triggered).
> So I think this is only a performance problem.  The old session cache is
> never removed from the ATS session cache until we run out of space and
> the old values are evicted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3687) ATS Session Cache table never removes expired sessions

2015-06-11 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3687:
--

 Summary: ATS Session Cache table never removes expired sessions
 Key: TS-3687
 URL: https://issues.apache.org/jira/browse/TS-3687
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs


While this sounds bad, it is only a performance issue.  It is not a security 
issue.  Openssl will not allow the expired sessions to be used.

Here are the details.

When you use the ATS version of the ssl session cache, ATS registers
callbacks to handle creating new sessions, getting existing sessions,
and removing old sessions.  While debugging the new session plugin API,
I saw that the new sessions and get session callbacks were being
triggered but the remove session callback was never being triggered.

At first I was concerned that we were never removing  sessions from the
cache and reusing them forever.  I poked through the openssl 1.0.1 (and
briefly the 1.0.2) code and set some break points, and verified that the
stale sessions are being rejected but the code only tries to remove it
from the openssl internal cache implementation (which failed and so the
remove callback was never triggered).

So I think this is only a performance problem.  The old session cache is
never removed from the ATS session cache until we run out of space and
the old values are evicted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3687) ATS Session Cache table never removes expired sessions

2015-06-11 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582662#comment-14582662
 ] 

Susan Hinrichs commented on TS-3687:


Hmm.  Looking more closely at the definition of SSL_CTX_sess_set_remove_cb it 
seems that is is only called in the openssl internal cache case by design, 
unlike the other two session cache callbacks.

https://www.openssl.org/docs/ssl/SSL_CTX_sess_set_get_cb.html

i would argue however that ATS should proactively remove stale sessions.  
Reduce the system exposure for timely sensitive data and reduce the eviction 
pressure on the cache.

> ATS Session Cache table never removes expired sessions
> --
>
> Key: TS-3687
> URL: https://issues.apache.org/jira/browse/TS-3687
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> While this sounds bad, it is only a performance issue.  It is not a security 
> issue.  Openssl will not allow the expired sessions to be used.
> Here are the details.
> When you use the ATS version of the ssl session cache, ATS registers
> callbacks to handle creating new sessions, getting existing sessions,
> and removing old sessions.  While debugging the new session plugin API,
> I saw that the new sessions and get session callbacks were being
> triggered but the remove session callback was never being triggered.
> At first I was concerned that we were never removing  sessions from the
> cache and reusing them forever.  I poked through the openssl 1.0.1 (and
> briefly the 1.0.2) code and set some break points, and verified that the
> stale sessions are being rejected but the code only tries to remove it
> from the openssl internal cache implementation (which failed and so the
> remove callback was never triggered).
> So I think this is only a performance problem.  The old session cache is
> never removed from the ATS session cache until we run out of space and
> the old values are evicted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3453) Confusion of handling SSL events in write_to_net_io in UnixNetVConnection.cc

2015-06-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3453.

Resolution: Fixed

> Confusion of handling SSL events in write_to_net_io in UnixNetVConnection.cc
> 
>
> Key: TS-3453
> URL: https://issues.apache.org/jira/browse/TS-3453
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> While tracking down differences for SSL between 5.0 and 5.2 for TS-3451 I 
> came across odd event handling code in write_to_net_io in 
> UnixNetVConnection.cc.  Looking back at the history in that code things 
> became even more confusing.
> The current version on master (same as what is in 5.2) contains the following 
> in an if/else sequence. SSL IOs can return READ events event from write 
> functions (and visa versa), so that is why I assume that this write function 
> is dealing with SSL read events at all.
> {code}
> if (ret == SSL_HANDSHAKE_WANT_READ || ret == SSL_HANDSHAKE_WANT_ACCEPT || 
> ret == SSL_HANDSHAKE_WANT_CONNECT
>|| ret == SSL_HANDSHAKE_WANT_WRITE) {
>   vc->read.triggered = 0;
>   nh->read_ready_list.remove(vc);
>   vc->write.triggered = 0;
>   nh->write_ready_list.remove(vc);
>   if (ret == SSL_HANDSHAKE_WANT_READ || ret == SSL_HANDSHAKE_WANT_ACCEPT)
> read_reschedule(nh, vc);
>   else
> write_reschedule(nh, vc);
> }
> {code}
> Seems odd to be clearing and remove read events if the real event was only a 
> write.  And visa versa, seems odd to be clearing and removing write events if 
> the real event was a read.
> It seems to me that the sequence should be replaced with something like
> {code}
> if (ret == SSL_HANDSHAKE_WANT_READ || ret == SSL_HANDSHAKE_WANT_ACCEPT) {
>   vc->read.triggered = 0;
>   nh->read_ready_list.remove(vc);
>   read_reschedule(nh, vc);
> } else if (ret == SSL_HANDSHAKE_WANT_CONNECT || ret == 
> SSL_HANDSHAKE_WANT_WRITE) {
>   vc->write.triggered = 0;
>   nh->write_ready_list.remove(vc);
>   write_reschedule(nh, vc);
> }
> {code}
> Looking back at the history shows adding and removing and re-adding of 
> reschedules.  
> * TS-3006 9/22/14 by me.  Adds in the read_reschedule case
> * TS-2815 5/16/14 by [~bcall] Removes the read_reschedule case
> * TS-2211 10/28/13 by postwait Adds read_reschedule and protects the 
> write_reschedule and read_reschedule with specific event checks.
> * TS-1921 5/17/13 by [~jpe...@apache.org] Adds in the write_reschedule
> This seems like an obvious tidy up thing.  I'm not addressing a specific 
> issue here, but the current thing seems wrong.  Given the history, I'm 
> hesitant to clean things up without review from those that came before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3687) ATS Session Cache table never removes expired sessions

2015-06-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3687.

Resolution: Fixed

> ATS Session Cache table never removes expired sessions
> --
>
> Key: TS-3687
> URL: https://issues.apache.org/jira/browse/TS-3687
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> While this sounds bad, it is only a performance issue.  It is not a security 
> issue.  Openssl will not allow the expired sessions to be used.
> Here are the details.
> When you use the ATS version of the ssl session cache, ATS registers
> callbacks to handle creating new sessions, getting existing sessions,
> and removing old sessions.  While debugging the new session plugin API,
> I saw that the new sessions and get session callbacks were being
> triggered but the remove session callback was never being triggered.
> At first I was concerned that we were never removing  sessions from the
> cache and reusing them forever.  I poked through the openssl 1.0.1 (and
> briefly the 1.0.2) code and set some break points, and verified that the
> stale sessions are being rejected but the code only tries to remove it
> from the openssl internal cache implementation (which failed and so the
> remove callback was never triggered).
> So I think this is only a performance problem.  The old session cache is
> never removed from the ATS session cache until we run out of space and
> the old values are evicted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3641) Drupal Auth does not seem to work with HTTP/2

2015-06-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3641:
--

Assignee: Susan Hinrichs

> Drupal Auth does not seem to work with HTTP/2
> -
>
> Key: TS-3641
> URL: https://issues.apache.org/jira/browse/TS-3641
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> Using latest chrome, when authenticating to a Drupal site behind ATS, it 
> fails to authenticate. It silently seems to just ignore the auth, and moves 
> along unauthenticated. It's possible this is similar to TS-3640, but the 
> "fix" from that Jira does not resolve the HTTP/2 issues. 
> In fact, this problem exists all the way back to 5.3.0, so the fix here would 
> also be a back port for 5.3.1 (or 5.3.2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3136) Change default TLS cipher suites

2015-06-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3136:
---
Assignee: Syeda Persia Aziz  (was: Susan Hinrichs)

> Change default TLS cipher suites
> 
>
> Key: TS-3136
> URL: https://issues.apache.org/jira/browse/TS-3136
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Security, SSL
>Reporter: Leif Hedstrom
>Assignee: Syeda Persia Aziz
>  Labels: compatibility
> Fix For: 6.0.0
>
>
> In TS-3135 [~i.galic] suggested:
> {quote}
> also, recommendations for a safer ciphersuite:
> SSLCipherSuite 
> ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4
>  
> from https://cipherli.st/
> {quote}
> [~jacksontj] had responded with:
> {quote}
> [~i.galic] That cipher quite is geared towards security, but doesn't support 
> quite a few older clients. I'd recommend we use the suite from mozilla 
> (https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_Server_Configurations)
>  which is a good mix of security and compatibility:
> {code}
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3136) Change default TLS cipher suites

2015-06-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583706#comment-14583706
 ] 

Susan Hinrichs commented on TS-3136:


@bcall, do you mean the yahoo security team? Or is there an apache team we 
should reach out to?

I did some initial comparisons to the old list and there isn't a huge 
difference.

The new list removes the RC4 ciphers.
It adds in the ECDHE-ECDSA version of the ECDHE-RSA ciphers that are in the list
It adds in a number of DHE-* ciphers, more options for PFS
It adds in the camellia cipher

Since we have honor server order on by default, we may want to move the 
ECDHE-ECDSA version ahead of the ECDHE-RSA version


> Change default TLS cipher suites
> 
>
> Key: TS-3136
> URL: https://issues.apache.org/jira/browse/TS-3136
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Security, SSL
>Reporter: Leif Hedstrom
>Assignee: Syeda Persia Aziz
>  Labels: compatibility
> Fix For: 6.0.0
>
>
> In TS-3135 [~i.galic] suggested:
> {quote}
> also, recommendations for a safer ciphersuite:
> SSLCipherSuite 
> ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4
>  
> from https://cipherli.st/
> {quote}
> [~jacksontj] had responded with:
> {quote}
> [~i.galic] That cipher quite is geared towards security, but doesn't support 
> quite a few older clients. I'd recommend we use the suite from mozilla 
> (https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_Server_Configurations)
>  which is a good mix of security and compatibility:
> {code}
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3136) Change default TLS cipher suites

2015-06-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583706#comment-14583706
 ] 

Susan Hinrichs edited comment on TS-3136 at 6/12/15 5:16 PM:
-

[~bcall] do you mean the yahoo security team? Or is there an apache team we 
should reach out to?

I did some initial comparisons to the old list and there isn't a huge 
difference.

The new list removes the RC4 ciphers.
It adds in the ECDHE-ECDSA version of the ECDHE-RSA ciphers that are in the list
It adds in a number of DHE-* ciphers, more options for PFS
It adds in the camellia cipher

Since we have honor server order on by default, we may want to move the 
ECDHE-ECDSA version ahead of the ECDHE-RSA version



was (Author: shinrich):
@bcall, do you mean the yahoo security team? Or is there an apache team we 
should reach out to?

I did some initial comparisons to the old list and there isn't a huge 
difference.

The new list removes the RC4 ciphers.
It adds in the ECDHE-ECDSA version of the ECDHE-RSA ciphers that are in the list
It adds in a number of DHE-* ciphers, more options for PFS
It adds in the camellia cipher

Since we have honor server order on by default, we may want to move the 
ECDHE-ECDSA version ahead of the ECDHE-RSA version


> Change default TLS cipher suites
> 
>
> Key: TS-3136
> URL: https://issues.apache.org/jira/browse/TS-3136
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Security, SSL
>Reporter: Leif Hedstrom
>Assignee: Syeda Persia Aziz
>  Labels: compatibility
> Fix For: 6.0.0
>
>
> In TS-3135 [~i.galic] suggested:
> {quote}
> also, recommendations for a safer ciphersuite:
> SSLCipherSuite 
> ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4
>  
> from https://cipherli.st/
> {quote}
> [~jacksontj] had responded with:
> {quote}
> [~i.galic] That cipher quite is geared towards security, but doesn't support 
> quite a few older clients. I'd recommend we use the suite from mozilla 
> (https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_Server_Configurations)
>  which is a good mix of security and compatibility:
> {code}
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3136) Change default TLS cipher suites

2015-06-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583728#comment-14583728
 ] 

Susan Hinrichs commented on TS-3136:


I think we may want to consider the following string for default.  Starting 
from the mozilla string that Thomas suggested.

Removed Camilla.  Moved ECDHE-ECDSA in front of the ECHDE-RSA versions.  Moved 
AES256 in front of AES128 versions.  Still have 3DES for truly ancient clients.

ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA

> Change default TLS cipher suites
> 
>
> Key: TS-3136
> URL: https://issues.apache.org/jira/browse/TS-3136
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Security, SSL
>Reporter: Leif Hedstrom
>Assignee: Syeda Persia Aziz
>  Labels: compatibility
> Fix For: 6.0.0
>
>
> In TS-3135 [~i.galic] suggested:
> {quote}
> also, recommendations for a safer ciphersuite:
> SSLCipherSuite 
> ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4
>  
> from https://cipherli.st/
> {quote}
> [~jacksontj] had responded with:
> {quote}
> [~i.galic] That cipher quite is geared towards security, but doesn't support 
> quite a few older clients. I'd recommend we use the suite from mozilla 
> (https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_Server_Configurations)
>  which is a good mix of security and compatibility:
> {code}
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-2758) Unable to get file descriptor from transaction close

2015-06-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583895#comment-14583895
 ] 

Susan Hinrichs commented on TS-2758:


Yes, I just exercised the tcpinfo plugin and verified that straight HTTP 1.1 
request exercises the SSN_CLOSE hook and accesses the file descriptor.

> Unable to get file descriptor from transaction close
> 
>
> Key: TS-2758
> URL: https://issues.apache.org/jira/browse/TS-2758
> Project: Traffic Server
>  Issue Type: Bug
>  Components: TS API
>Reporter: James Peach
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
>
> While testing the {{tcpinfo}} plugin, I found that {{TSHttpSsnClientFdGet}} 
> fails from the {{TS_HTTP_TXN_CLOSE_HOOK}}. It would be really useful for the 
> {{tcpinfo}} plugin to be able to collect statistics at the end of 
> transactions and sessions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    3   4   5   6   7   8   9   10   11   12   >