from:"Zhao Yongming \(JIRA\)"

[jira] [Assigned] (TS-4991) jtest should handle Range request

2016-10-20 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming reassigned TS-4991:
-

Assignee: song

[~jasondmee] please take care of this request. thanks

> jtest should handle Range request
> -
>
> Key: TS-4991
> URL: https://issues.apache.org/jira/browse/TS-4991
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP, Tests, Tools
>Reporter: Zhao Yongming
>Assignee: song
>
> jtest is not able to generate Range requests and handle Range requests, we 
> should make it
> I'd like to see the SIMPLE "Range: bytes=100-200/1000" works first, then 
> maybe some other Range syntax oven multiple Range should be consider later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-4991) jtest should handle Range request

2016-10-20 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-4991:
-

 Summary: jtest should handle Range request
 Key: TS-4991
 URL: https://issues.apache.org/jira/browse/TS-4991
 Project: Traffic Server
  Issue Type: Improvement
  Components: HTTP, Tests, Tools
Reporter: Zhao Yongming


jtest is not able to generate Range requests and handle Range requests, we 
should make it

I'd like to see the SIMPLE "Range: bytes=100-200/1000" works first, then maybe 
some other Range syntax oven multiple Range should be consider later.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-2482) Problems with SOCKS

2016-08-26 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-2482:
--
Assignee: Oknet Xu  (was: weijin)

> Problems with SOCKS
> ---
>
> Key: TS-2482
> URL: https://issues.apache.org/jira/browse/TS-2482
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Radim Kolar
>Assignee: Oknet Xu
> Fix For: sometime
>
>
> There are several problems with using SOCKS. I am interested in case when TF 
> is sock client. Client sends HTTP request and TF uses SOCKS server to make 
> connection to internet.
> a/ - not documented enough in default configs
> From default configs comments it seems that for running 
> TF 4.1.2 as socks client, it is sufficient to add one line to socks.config:
> dest_ip=0.0.0.0-255.255.255.255 parent="10.0.0.7:9050"
> but socks proxy is not used. If i run tcpdump sniffing packets  TF never 
> tries to connect to that SOCKS.
> From source code - 
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it 
> looks that is needed to set "proxy.config.socks.socks_needed" to activate 
> socks support. This should be documented in both sample files: socks.config 
> and record.config
> b/
> after enabling socks, i am hit by this assert:
> Assertion failed: (ats_is_ip4(&target_addr)), function init, file Socks.cc, 
> line 65.
> i run on dual stack system (ip4,ip6). 
> This code is setting default destination for SOCKS request? Can not you use 
> just 127.0.0.1 for case if client gets connected over IP6?
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-4396) Off-by-one error in max redirects with redirection enabled

2016-07-02 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360120#comment-15360120
 ] 

Zhao Yongming commented on TS-4396:
---

proxy.config.http.number_of_redirections = 1 does NOT work as expected, let us 
fix it first.

> Off-by-one error in max redirects with redirection enabled
> --
>
> Key: TS-4396
> URL: https://issues.apache.org/jira/browse/TS-4396
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, Network
>Reporter: Felix Buenemann
>Assignee: Zhao Yongming
> Fix For: 7.0.0
>
>
> There is a problem in the current stable version 6.1.1 where the setting 
> proxy.config.http.number_of_redirections = 1 is incorrectly checked when 
> following origin redirects by setting proxy.config.http.redirection_enabled = 
> 1.
> If the requested URL is not already cached, ATS returns the redirect response 
> to the client instead of storing the target into the cache and returning it 
> to the client.
> The problem can be fixed by using proxy.config.http.number_of_redirections = 
> 2, but we are only following one redirect, so this is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TS-4396) Off-by-one error in max redirects with redirection enabled

2016-07-02 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming reassigned TS-4396:
-

Assignee: Zhao Yongming

> Off-by-one error in max redirects with redirection enabled
> --
>
> Key: TS-4396
> URL: https://issues.apache.org/jira/browse/TS-4396
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, Network
>Reporter: Felix Buenemann
>Assignee: Zhao Yongming
> Fix For: 7.0.0
>
>
> There is a problem in the current stable version 6.1.1 where the setting 
> proxy.config.http.number_of_redirections = 1 is incorrectly checked when 
> following origin redirects by setting proxy.config.http.redirection_enabled = 
> 1.
> If the requested URL is not already cached, ATS returns the redirect response 
> to the client instead of storing the target into the cache and returning it 
> to the client.
> The problem can be fixed by using proxy.config.http.number_of_redirections = 
> 2, but we are only following one redirect, so this is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-4368) Segmentation fault

2016-04-21 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-4368:
--
Component/s: (was: Logging)

> Segmentation fault
> --
>
> Key: TS-4368
> URL: https://issues.apache.org/jira/browse/TS-4368
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 6.1.2
>Reporter: Stef Fen
>
> We have a test trafficserver cluster of 2 nodes where the first node has 
> segfaults and the other doesn't.
> We are using this source 
> https://github.com/researchgate/trafficserver/tree/6.1.x
> which creates this packages (version 6.1.2)
> https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver
> {code}
> [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to 
> /var/log/trafficserver/crash-2016-04-20-124752.log
> traffic_server: Segmentation fault (Address not mapped to object [0x8050])
> traffic_server - STACK TRACE:
> /usr/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0x97)[0x2ac6b8d676d7]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340]
> /usr/bin/traffic_server(ink_aio_read(AIOCallback*, int)+0x36)[0x2ac6b8fe2e46]
> /usr/bin/traffic_server(CacheVC::handleRead(int, 
> Event*)+0x3a1)[0x2ac6b8f9d131]
> /usr/bin/traffic_server(Cache::open_read(Continuation*, ats::CryptoHash 
> const*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char const*, 
> int)+0x61f)[0x2ac6b8fc056f]
> /usr/bin/traffic_server(cache_op_ClusterFunction(ClusterHandler*, void*, 
> int)+0x94c)[0x2ac6b8f8fefc]
> /usr/bin/traffic_server(ClusterHandler::process_large_control_msgs()+0xf4)[0x2ac6b8f6dc84]
> /usr/bin/traffic_server(ClusterHandler::update_channels_read()+0x9b)[0x2ac6b8f7099b]
> /usr/bin/traffic_server(ClusterHandler::process_read(long)+0xae)[0x2ac6b8f7471e]
> /usr/bin/traffic_server(ClusterHandler::mainClusterEvent(int, 
> Event*)+0x158)[0x2ac6b8f75048]
> /usr/bin/traffic_server(ClusterState::doIO_read_event(int, 
> void*)+0x160)[0x2ac6b8f78d50]
> /usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7]
> /usr/bin/traffic_server(NetHandler::mainNetEvent(int, 
> Event*)+0x218)[0x2ac6b90005e8]
> /usr/bin/traffic_server(EThread::execute()+0xa82)[0x2ac6b9033b82]
> /usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-4368) Segmentation fault

2016-04-21 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-4368:
--
Description: 
We have a test trafficserver cluster of 2 nodes where the first node has 
segfaults and the other doesn't.

We are using this source 
https://github.com/researchgate/trafficserver/tree/6.1.x
which creates this packages (version 6.1.2)
https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver

{code}
[Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to 
/var/log/trafficserver/crash-2016-04-20-124752.log
traffic_server: Segmentation fault (Address not mapped to object [0x8050])
traffic_server - STACK TRACE:
/usr/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
void*)+0x97)[0x2ac6b8d676d7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340]
/usr/bin/traffic_server(ink_aio_read(AIOCallback*, int)+0x36)[0x2ac6b8fe2e46]
/usr/bin/traffic_server(CacheVC::handleRead(int, Event*)+0x3a1)[0x2ac6b8f9d131]
/usr/bin/traffic_server(Cache::open_read(Continuation*, ats::CryptoHash const*, 
HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char const*, 
int)+0x61f)[0x2ac6b8fc056f]
/usr/bin/traffic_server(cache_op_ClusterFunction(ClusterHandler*, void*, 
int)+0x94c)[0x2ac6b8f8fefc]
/usr/bin/traffic_server(ClusterHandler::process_large_control_msgs()+0xf4)[0x2ac6b8f6dc84]
/usr/bin/traffic_server(ClusterHandler::update_channels_read()+0x9b)[0x2ac6b8f7099b]
/usr/bin/traffic_server(ClusterHandler::process_read(long)+0xae)[0x2ac6b8f7471e]
/usr/bin/traffic_server(ClusterHandler::mainClusterEvent(int, 
Event*)+0x158)[0x2ac6b8f75048]
/usr/bin/traffic_server(ClusterState::doIO_read_event(int, 
void*)+0x160)[0x2ac6b8f78d50]
/usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7]
/usr/bin/traffic_server(NetHandler::mainNetEvent(int, 
Event*)+0x218)[0x2ac6b90005e8]
/usr/bin/traffic_server(EThread::execute()+0xa82)[0x2ac6b9033b82]
/usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d]
{code}


  was:
We have a test trafficserver cluster of 2 nodes where the first node has 
segfaults and the other doesn't.

We are using this source 
https://github.com/researchgate/trafficserver/tree/6.1.x
which creates this packages (version 6.1.2)
https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver

{code}
[Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to 
/var/log/trafficserver/crash-2016-04-20-124752.log
traffic_server: Segmentation fault (Address not mapped to object [0x8050])
traffic_server - STACK TRACE:
/usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x97)[0x2ac6b8d676d7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340]
/usr/bin/traffic_server(_Z12ink_aio_readP11AIOCallbacki+0x36)[0x2ac6b8fe2e46]
/usr/bin/traffic_server(_ZN7CacheVC10handleReadEiP5Event+0x3a1)[0x2ac6b8f9d131]
/usr/bin/traffic_server(_ZN5Cache9open_readEP12ContinuationPKN3ats10CryptoHashEP7HTTPHdrP21CacheLookupHttpConfig13CacheFragTypePKci+0x61f)[0x2ac6b8fc056f]
/usr/bin/traffic_server(_Z24cache_op_ClusterFunctionP14ClusterHandlerPvi+0x94c)[0x2ac6b8f8fefc]
/usr/bin/traffic_server(_ZN14ClusterHandler26process_large_control_msgsEv+0xf4)[0x2ac6b8f6dc84]
/usr/bin/traffic_server(_ZN14ClusterHandler20update_channels_readEv+0x9b)[0x2ac6b8f7099b]
/usr/bin/traffic_server(_ZN14ClusterHandler12process_readEl+0xae)[0x2ac6b8f7471e]
/usr/bin/traffic_server(_ZN14ClusterHandler16mainClusterEventEiP5Event+0x158)[0x2ac6b8f75048]
/usr/bin/traffic_server(_ZN12ClusterState15doIO_read_eventEiPv+0x160)[0x2ac6b8f78d50]
/usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7]
/usr/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x218)[0x2ac6b90005e8]
/usr/bin/traffic_server(_ZN7EThread7executeEv+0xa82)[0x2ac6b9033b82]
/usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d]
{code}



> Segmentation fault
> --
>
> Key: TS-4368
> URL: https://issues.apache.org/jira/browse/TS-4368
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 6.1.2
>Reporter: Stef Fen
>
> We have a test trafficserver cluster of 2 nodes where the first node has 
> segfaults and the other doesn't.
> We are using this source 
> https://github.com/researchgate/trafficserver/tree/6.1.x
> which creates this packages (version 6.1.2)
> https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver
> {code}
> [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to 
> /var/log/trafficserver/crash-2016-04-20-124752.log
> traffic_server: Segmentation fault (Address not mapped to object [0x8050])
> traffic_server - STACK TRACE:
> /usr/bin/traffic_server(crash_logger_invoke(int, siginfo_

[jira] [Updated] (TS-4368) Segmentation fault

2016-04-21 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-4368:
--
Affects Version/s: 6.1.2
  Component/s: Logging
   Clustering

> Segmentation fault
> --
>
> Key: TS-4368
> URL: https://issues.apache.org/jira/browse/TS-4368
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Clustering, Logging
>Affects Versions: 6.1.2
>Reporter: Stef Fen
>
> We have a test trafficserver cluster of 2 nodes where the first node has 
> segfaults and the other doesn't.
> We are using this source 
> https://github.com/researchgate/trafficserver/tree/6.1.x
> which creates this packages (version 6.1.2)
> https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver
> {code}
> [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to 
> /var/log/trafficserver/crash-2016-04-20-124752.log
> traffic_server: Segmentation fault (Address not mapped to object [0x8050])
> traffic_server - STACK TRACE:
> /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x97)[0x2ac6b8d676d7]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340]
> /usr/bin/traffic_server(_Z12ink_aio_readP11AIOCallbacki+0x36)[0x2ac6b8fe2e46]
> /usr/bin/traffic_server(_ZN7CacheVC10handleReadEiP5Event+0x3a1)[0x2ac6b8f9d131]
> /usr/bin/traffic_server(_ZN5Cache9open_readEP12ContinuationPKN3ats10CryptoHashEP7HTTPHdrP21CacheLookupHttpConfig13CacheFragTypePKci+0x61f)[0x2ac6b8fc056f]
> /usr/bin/traffic_server(_Z24cache_op_ClusterFunctionP14ClusterHandlerPvi+0x94c)[0x2ac6b8f8fefc]
> /usr/bin/traffic_server(_ZN14ClusterHandler26process_large_control_msgsEv+0xf4)[0x2ac6b8f6dc84]
> /usr/bin/traffic_server(_ZN14ClusterHandler20update_channels_readEv+0x9b)[0x2ac6b8f7099b]
> /usr/bin/traffic_server(_ZN14ClusterHandler12process_readEl+0xae)[0x2ac6b8f7471e]
> /usr/bin/traffic_server(_ZN14ClusterHandler16mainClusterEventEiP5Event+0x158)[0x2ac6b8f75048]
> /usr/bin/traffic_server(_ZN12ClusterState15doIO_read_eventEiPv+0x160)[0x2ac6b8f78d50]
> /usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7]
> /usr/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x218)[0x2ac6b90005e8]
> /usr/bin/traffic_server(_ZN7EThread7executeEv+0xa82)[0x2ac6b9033b82]
> /usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-4156) remove the traffic_sac, stand alone log collation server

2016-04-21 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-4156:
--
Fix Version/s: (was: sometime)
   7.0.0

> remove the traffic_sac, stand alone log collation server
> 
>
> Key: TS-4156
> URL: https://issues.apache.org/jira/browse/TS-4156
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
> Fix For: 7.0.0
>
>
> the stand alone collation server act as a dedicated log server from ATS, this 
> is a dedicated log product back in the Inktomi age, and we don't need it as 
> this functions are build into the traffic_server binary for free distribution.
> it is time to nuke it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-4156) remove the traffic_sac, stand alone log collation server

2016-01-31 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125409#comment-15125409
 ] 

Zhao Yongming commented on TS-4156:
---

orphaned log is a point where we can improve, I think we can make some tools on 
that, and due to that the orphaned log is out of the mainline log file, it is 
hard to archive the single file of logging in that period, even we can collect 
all the orphaned logs into one single box.

the orphaned log file happened when the log server down or under traffic issue, 
we seen very few orphaned logs after we improved the log collation server 
performance.

> remove the traffic_sac, stand alone log collation server
> 
>
> Key: TS-4156
> URL: https://issues.apache.org/jira/browse/TS-4156
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
> Fix For: sometime
>
>
> the stand alone collation server act as a dedicated log server from ATS, this 
> is a dedicated log product back in the Inktomi age, and we don't need it as 
> this functions are build into the traffic_server binary for free distribution.
> it is time to nuke it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TS-4156) remove the traffic_sac, stand alone log collation server

2016-01-31 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125402#comment-15125402
 ] 

Zhao Yongming edited comment on TS-4156 at 1/31/16 4:30 PM:


the log collation works as:
1, dedicated log collation server: no http cache (or others) function active, 
but still a traffic_server, with full functions installed, that way we just 
don't put any request on this traffic_server.
   someone with very high traffic may use this mode, just keep the collation 
server out of the http cache service.

2, mixed with cache server: both cache and logging collation server in active, 
we log for other hosts(collation clients).
  most of the users may choose this mode, it will help you collect all the logs 
into one single place, and easy for check or backup.

3, traffic_sac stand alone log server: no server function, just log collation 
server.
  this is the duplicated binary.

  by design, the log collation is going to help you simple the logging by store 
all the logs into one single place, one single file the whole site, with just 
one timeline. and the log collation mode 'poxy.local.log.collation_mode' is a 
LOCAL directive in records.config, that make it possible to active a single 
host as collation server while others as collation server, while still got the 
cluster management of config files.

so, I think that traffic_server with log collation mode in client or server is 
just a must builtin function if we want to keep the log collation feature, and 
keep a completely dedicated log collation server may bring more code complex.
 


was (Author: zym):
the log collation works as:
1, dedicated log collation server: no http cache (or others) function active, 
bug still a traffic_server, with full functions installed, that way we just 
don't put any request on this traffic_server.
   someone with very high traffic may use this mode, just don't keep the 
collation server out of the http cache service.

2, mixed with cache server: both cache and logging collation server in active, 
we log for other hosts(collation clients).
  most of the users may choose this mode, it will help you collect all the logs 
into one single place, and easy for check or backup.

3, traffic_sac stand alone log server: no server function, just log collation 
server.
  this is the duplicated binary.

  by design, the log collation is going to help you simple the logging by store 
all the logs into one single place, one single file the whole site, with just 
one timeline. and the log collation mode 'poxy.local.log.collation_mode' is a 
LOCAL directive in records.config, that make it possible to active a single 
host as collation server while others as collation server, while still got the 
cluster management of config files.

so, I think that traffic_server with log collation mode in client or server is 
just a must builtin function if we want to keep the log collation feature, and 
keep a completely dedicated log collation server may bring more code complex.
 

> remove the traffic_sac, stand alone log collation server
> 
>
> Key: TS-4156
> URL: https://issues.apache.org/jira/browse/TS-4156
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
> Fix For: sometime
>
>
> the stand alone collation server act as a dedicated log server from ATS, this 
> is a dedicated log product back in the Inktomi age, and we don't need it as 
> this functions are build into the traffic_server binary for free distribution.
> it is time to nuke it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-4156) remove the traffic_sac, stand alone log collation server

2016-01-31 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125402#comment-15125402
 ] 

Zhao Yongming commented on TS-4156:
---

the log collation works as:
1, dedicated log collation server: no http cache (or others) function active, 
bug still a traffic_server, with full functions installed, that way we just 
don't put any request on this traffic_server.
   someone with very high traffic may use this mode, just don't keep the 
collation server out of the http cache service.

2, mixed with cache server: both cache and logging collation server in active, 
we log for other hosts(collation clients).
  most of the users may choose this mode, it will help you collect all the logs 
into one single place, and easy for check or backup.

3, traffic_sac stand alone log server: no server function, just log collation 
server.
  this is the duplicated binary.

  by design, the log collation is going to help you simple the logging by store 
all the logs into one single place, one single file the whole site, with just 
one timeline. and the log collation mode 'poxy.local.log.collation_mode' is a 
LOCAL directive in records.config, that make it possible to active a single 
host as collation server while others as collation server, while still got the 
cluster management of config files.

so, I think that traffic_server with log collation mode in client or server is 
just a must builtin function if we want to keep the log collation feature, and 
keep a completely dedicated log collation server may bring more code complex.
 

> remove the traffic_sac, stand alone log collation server
> 
>
> Key: TS-4156
> URL: https://issues.apache.org/jira/browse/TS-4156
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
> Fix For: sometime
>
>
> the stand alone collation server act as a dedicated log server from ATS, this 
> is a dedicated log product back in the Inktomi age, and we don't need it as 
> this functions are build into the traffic_server binary for free distribution.
> it is time to nuke it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-4156) remove the traffic_sac, stand alone log collation server

2016-01-27 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-4156:
--
 Assignee: Zhao Yongming
Fix Version/s: sometime

> remove the traffic_sac, stand alone log collation server
> 
>
> Key: TS-4156
> URL: https://issues.apache.org/jira/browse/TS-4156
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
> Fix For: sometime
>
>
> the stand alone collation server act as a dedicated log server from ATS, this 
> is a dedicated log product back in the Inktomi age, and we don't need it as 
> this functions are build into the traffic_server binary for free distribution.
> it is time to nuke it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-4156) remove the traffic_sac, stand alone log collation server

2016-01-27 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-4156:
-

 Summary: remove the traffic_sac, stand alone log collation server
 Key: TS-4156
 URL: https://issues.apache.org/jira/browse/TS-4156
 Project: Traffic Server
  Issue Type: Improvement
  Components: Logging
Reporter: Zhao Yongming


the stand alone collation server act as a dedicated log server from ATS, this 
is a dedicated log product back in the Inktomi age, and we don't need it as 
this functions are build into the traffic_server binary for free distribution.

it is time to nuke it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-4058) Logging doesn't work when TS is compiled and run w/ --with-user

2015-12-07 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046156#comment-15046156
 ] 

Zhao Yongming commented on TS-4058:
---

good catch, _cop is design to be run as root, and --with-user=danielxu 
specified the _server to be run as danielxu, that is the current setup. 
currently one unprivileged user should not run _cop, in the past it even fail 
if you want to make install as no-root, haha. in most case we would advice to 
run with _server directly for small testing with _server.

It would be nice if you want can make _cop run with unprivileged user.

> Logging doesn't work when TS is compiled and run w/ --with-user
> ---
>
> Key: TS-4058
> URL: https://issues.apache.org/jira/browse/TS-4058
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Reporter: Daniel Xu
>Assignee: Daniel Xu
>
> ie. we run this _without_ sudo. 
> traffic_cop output seems to point to permission errors that occur within 
> traffic_manager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-4059) Default value for proxy.config.bin_path does not use value from config.layout

2015-12-07 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046143#comment-15046143
 ] 

Zhao Yongming commented on TS-4059:
---

I think that Craig Forbes want the building & installation honor the 
'--bindir=DIRuser executables [EPREFIX/bin]',  which is default to 
'EPREFIX/bin' if not specified, I think you may submit patch if that does not 
work as you wish.

IMO, all the _path config options should be removed as those are binary 
releasing options, as we are now open sourced with all layout configurable, we 
should remove them (or hardcode to the configure specified directory) from 
records config.

patch welcome

FYI

> Default value for proxy.config.bin_path does not use value from config.layout
> -
>
> Key: TS-4059
> URL: https://issues.apache.org/jira/browse/TS-4059
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Craig Forbes
>
> The default value for proxy.config.bin_path defined in RecordsConfig.cc is 
> hard coded to "bin".
> The value should be TS_BUILD_BINDIR so the value specified at configure time 
> is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-4056) MemLeak: ～NetAccept() do not free alloc_cache(vc)

2015-12-07 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-4056:
--
Affects Version/s: 6.1.0

> MemLeak: ～NetAccept() do not free alloc_cache(vc)
> -
>
> Key: TS-4056
> URL: https://issues.apache.org/jira/browse/TS-4056
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 6.1.0
>Reporter: Oknet Xu
>
> NetAccpet::alloc_cache is a void pointor is used in net_accept().
> the alloc_cache does not release after NetAccept canceled.
> I'm looking for all code, believe the "alloc_cache" is a bad idea here.
> I create a pull request on github: 
> https://github.com/apache/trafficserver/pull/366
> also add a condition check for vc==NULL after allocate_vc()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-3510) header_rewrite is blocking building on raspberry pi

2015-04-08 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-3510:
-

 Summary: header_rewrite is blocking building on raspberry pi 
 Key: TS-3510
 URL: https://issues.apache.org/jira/browse/TS-3510
 Project: Traffic Server
  Issue Type: Bug
  Components: Build, Plugins
Reporter: Zhao Yongming


ARM support is so good that we just have raspberrypi fail to build for 
header_rewrite.

{code}
pi@raspberrypi ~/trafficserver/plugins/header_rewrite $ make -j 2
  CXXconditions.lo
  CXXheader_rewrite.lo
{standard input}: Assembler messages:
{standard input}:1221: Error: selected processor does not support ARM mode `dmb'
  CXXlulu.lo
  CXXmatcher.lo
  CXXoperator.lo
Makefile:689: recipe for target 'conditions.lo' failed
make: *** [conditions.lo] Error 1
make: *** Waiting for unfinished jobs
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TS-3472) SNI proxy alike feature for TS

2015-03-30 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387868#comment-14387868
 ] 

Zhao Yongming edited comment on TS-3472 at 3/31/15 2:52 AM:


the forwarding proxy have nothing to control, that means they try to proxy if 
not cache-able. while the reverse proxy do caching on the site side, most of 
the forwarding proxy works on the user side.


was (Author: zym):
the forwarding proxy have nothing to control, that means they try to proxy if 
not cache-able. while the reverse proxy do caching on the site side, most of 
the forwarding proxy works on the user site.

> SNI proxy alike feature for TS
> --
>
> Key: TS-3472
> URL: https://issues.apache.org/jira/browse/TS-3472
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: SSL
>Reporter: Zhao Yongming
> Fix For: sometime
>
>
> when doing forward proxy only setup, the sniproxy: 
> https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to 
> setup a TLS layer proxy with SNI, very good for some dirty tasks.
> in ATS, there is already a very good support in all those basic components, 
> add SNI blind proxy should be a very good feature, with tiny small changes 
> maybe.
> SNI in TLS, will extent the proxy(on caching) into all TLS based services, 
> such as mail etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3472) SNI proxy alike feature for TS

2015-03-30 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387868#comment-14387868
 ] 

Zhao Yongming commented on TS-3472:
---

the forwarding proxy have nothing to control, that means they try to proxy if 
not cache-able. while the reverse proxy do caching on the site side, most of 
the forwarding proxy works on the user site.

> SNI proxy alike feature for TS
> --
>
> Key: TS-3472
> URL: https://issues.apache.org/jira/browse/TS-3472
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: SSL
>Reporter: Zhao Yongming
> Fix For: sometime
>
>
> when doing forward proxy only setup, the sniproxy: 
> https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to 
> setup a TLS layer proxy with SNI, very good for some dirty tasks.
> in ATS, there is already a very good support in all those basic components, 
> add SNI blind proxy should be a very good feature, with tiny small changes 
> maybe.
> SNI in TLS, will extent the proxy(on caching) into all TLS based services, 
> such as mail etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3472) SNI proxy alike feature for TS

2015-03-30 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386944#comment-14386944
 ] 

Zhao Yongming commented on TS-3472:
---

yes, the sniproxy make it possible to do proxy(without cache) for TLS based 
service with remap like origin routing control, that like some layer7 
routing|proxy service?

sometimes in forwarding proxy, proxy is much important than caching

> SNI proxy alike feature for TS
> --
>
> Key: TS-3472
> URL: https://issues.apache.org/jira/browse/TS-3472
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: SSL
>Reporter: Zhao Yongming
> Fix For: sometime
>
>
> when doing forward proxy only setup, the sniproxy: 
> https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to 
> setup a TLS layer proxy with SNI, very good for some dirty tasks.
> in ATS, there is already a very good support in all those basic components, 
> add SNI blind proxy should be a very good feature, with tiny small changes 
> maybe.
> SNI in TLS, will extent the proxy(on caching) into all TLS based services, 
> such as mail etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3472) SNI proxy alike feature for TS

2015-03-30 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386893#comment-14386893
 ] 

Zhao Yongming commented on TS-3472:
---

the sniproxy do not need to intercept with ssl server|client, it only take SNI 
name and route to the backend. it does not even need to link against with SSL 
libary.

with ssl_multicert.config:
dest_ip=* action=tunnel
does not work as we need a ssl cert/key file to act as a SSL intercept?

> SNI proxy alike feature for TS
> --
>
> Key: TS-3472
> URL: https://issues.apache.org/jira/browse/TS-3472
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: SSL
>Reporter: Zhao Yongming
> Fix For: sometime
>
>
> when doing forward proxy only setup, the sniproxy: 
> https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to 
> setup a TLS layer proxy with SNI, very good for some dirty tasks.
> in ATS, there is already a very good support in all those basic components, 
> add SNI blind proxy should be a very good feature, with tiny small changes 
> maybe.
> SNI in TLS, will extent the proxy(on caching) into all TLS based services, 
> such as mail etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-2482) Problems with SOCKS

2015-03-30 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386809#comment-14386809
 ] 

Zhao Yongming commented on TS-2482:
---

no time to test, here is the rough patch:
{code}
diff --git a/proxy/http/HttpTransact.cc b/proxy/http/HttpTransact.cc
index c6f55ed..cc4ffdc 100644
--- a/proxy/http/HttpTransact.cc
+++ b/proxy/http/HttpTransact.cc
@@ -865,7 +865,7 @@ HttpTransact::EndRemapRequest(State* s)
 /
 if (s->http_config_param->reverse_proxy_enabled
 && !s->client_info.is_transparent
-&& !incoming_request->is_target_in_url()) {
+&& !(incoming_request->is_target_in_url() || 
incoming_request->m_host_length > 0)) {
   /
   // the url mapping failed, reverse proxy was enabled,
   // and the request contains no host:
{code}

and:

{code}
diff --git a/iocore/net/Socks.cc b/iocore/net/Socks.cc
index cfdd214..c04c0f4 100644
--- a/iocore/net/Socks.cc
+++ b/iocore/net/Socks.cc
@@ -62,7 +62,7 @@ SocksEntry::init(ProxyMutex * m, SocksNetVC * vc, unsigned 
char socks_support, u
   req_data.api_info = 0;
   req_data.xact_start = time(0);

-  assert(ats_is_ip4(&target_addr));
+  //assert(ats_is_ip4(&target_addr));
   ats_ip_copy(&req_data.dest_ip, &target_addr);

   //we dont have information about the source. set to destination's
{code}


the patch assert may need more work, and the socks server only do http checking 
and no other socks support. that is a not so good socks server indeed, I'd see 
someone take it and continue to improve the socks server feature.

so, paste the patch here, before it lost in time.

> Problems with SOCKS
> ---
>
> Key: TS-2482
> URL: https://issues.apache.org/jira/browse/TS-2482
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Radim Kolar
>Assignee: weijin
> Fix For: sometime
>
>
> There are several problems with using SOCKS. I am interested in case when TF 
> is sock client. Client sends HTTP request and TF uses SOCKS server to make 
> connection to internet.
> a/ - not documented enough in default configs
> From default configs comments it seems that for running 
> TF 4.1.2 as socks client, it is sufficient to add one line to socks.config:
> dest_ip=0.0.0.0-255.255.255.255 parent="10.0.0.7:9050"
> but socks proxy is not used. If i run tcpdump sniffing packets  TF never 
> tries to connect to that SOCKS.
> From source code - 
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it 
> looks that is needed to set "proxy.config.socks.socks_needed" to activate 
> socks support. This should be documented in both sample files: socks.config 
> and record.config
> b/
> after enabling socks, i am hit by this assert:
> Assertion failed: (ats_is_ip4(&target_addr)), function init, file Socks.cc, 
> line 65.
> i run on dual stack system (ip4,ip6). 
> This code is setting default destination for SOCKS request? Can not you use 
> just 127.0.0.1 for case if client gets connected over IP6?
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-3472) SNI proxy alike feature for TS

2015-03-30 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-3472:
--
Fix Version/s: sometime

> SNI proxy alike feature for TS
> --
>
> Key: TS-3472
> URL: https://issues.apache.org/jira/browse/TS-3472
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: SSL
>Reporter: Zhao Yongming
> Fix For: sometime
>
>
> when doing forward proxy only setup, the sniproxy: 
> https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to 
> setup a TLS layer proxy with SNI, very good for some dirty tasks.
> in ATS, there is already a very good support in all those basic components, 
> add SNI blind proxy should be a very good feature, with tiny small changes 
> maybe.
> SNI in TLS, will extent the proxy(on caching) into all TLS based services, 
> such as mail etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-3472) SNI proxy alike feature for TS

2015-03-30 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-3472:
-

 Summary: SNI proxy alike feature for TS
 Key: TS-3472
 URL: https://issues.apache.org/jira/browse/TS-3472
 Project: Traffic Server
  Issue Type: New Feature
  Components: SSL
Reporter: Zhao Yongming


when doing forward proxy only setup, the sniproxy: 
https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to 
setup a TLS layer proxy with SNI, very good for some dirty tasks.

in ATS, there is already a very good support in all those basic components, add 
SNI blind proxy should be a very good feature, with tiny small changes maybe.

SNI in TLS, will extent the proxy(on caching) into all TLS based services, such 
as mail etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-2205) AIO caused system hang

2015-03-29 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386140#comment-14386140
 ] 

Zhao Yongming commented on TS-2205:
---

TS-3458 is reported as a index sycing issue. things need more information, I 
will try.

basicly we don't find anything that need to look anymore, I will let this issue 
open for a while and close if no further information.

> AIO caused system hang
> --
>
> Key: TS-2205
> URL: https://issues.apache.org/jira/browse/TS-2205
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 4.0.1
>Reporter: Zhao Yongming
>Assignee: weijin
>Priority: Critical
> Fix For: 6.0.0
>
>
> the system may hang with AIO thread CPU usage rising:
> {code}
> top - 17:10:46 up 38 days, 22:43,  2 users,  load average: 11.34, 2.97, 2.75
> Tasks: 512 total,  55 running, 457 sleeping,   0 stopped,   0 zombie
> Cpu(s):  6.9%us, 54.8%sy,  0.0%ni, 37.3%id,  0.0%wa,  0.0%hi,  0.9%si,  0.0%st
> Mem:  65963696k total, 64318444k used,  1645252k free,   241496k buffers
> Swap: 33554424k total,20416k used, 33534008k free, 14864188k cached
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 32498 ats   20   0 59.3g  45g  25m R 65.8 72.1  24:44.15 [ET_AIO 5]
>  3213 root  20   0 000 S 15.4  0.0  13:38.32 kondemand/7
>  3219 root  20   0 000 S 15.1  0.0  16:32.78 kondemand/13
> 4 root  20   0 000 S 13.8  0.0  33:18.13 ksoftirqd/0
>13 root  20   0 000 S 13.4  0.0  21:45.18 ksoftirqd/2
>37 root  20   0 000 S 13.4  0.0  19:42.34 ksoftirqd/8
>45 root  20   0 000 S 13.4  0.0  18:31.17 ksoftirqd/10
> 32483 ats   20   0 59.3g  45g  25m R 13.4 72.1  16:47.14 [ET_AIO 6]
> 32487 ats   20   0 59.3g  45g  25m R 13.4 72.1  16:46.93 [ET_AIO 2]
>25 root  20   0 000 S 13.1  0.0  19:02.18 ksoftirqd/5
>65 root  20   0 000 S 13.1  0.0  19:24.04 ksoftirqd/15
> 32477 ats   20   0 59.3g  45g  25m R 13.1 72.1  16:32.90 [ET_AIO 0]
> 32478 ats   20   0 59.3g  45g  25m R 13.1 72.1  16:49.77 [ET_AIO 1]
> 32479 ats   20   0 59.3g  45g  25m S 13.1 72.1  16:41.77 [ET_AIO 2]
> 32481 ats   20   0 59.3g  45g  25m R 13.1 72.1  16:50.40 [ET_AIO 4]
> 32482 ats   20   0 59.3g  45g  25m R 13.1 72.1  16:47.42 [ET_AIO 5]
> 32484 ats   20   0 59.3g  45g  25m R 13.1 72.1  16:25.81 [ET_AIO 7]
> 32485 ats   20   0 59.3g  45g  25m S 13.1 72.1  16:52.71 [ET_AIO 0]
> 32486 ats   20   0 59.3g  45g  25m S 13.1 72.1  16:51.69 [ET_AIO 1]
> 32491 ats   20   0 59.3g  45g  25m S 13.1 72.1  16:50.58 [ET_AIO 6]
> 32492 ats   20   0 59.3g  45g  25m S 13.1 72.1  16:49.12 [ET_AIO 7]
> 32480 ats   20   0 59.3g  45g  25m S 12.8 72.1  16:47.39 [ET_AIO 3]
> 32488 ats   20   0 59.3g  45g  25m R 12.8 72.1  16:52.16 [ET_AIO 3]
> 32489 ats   20   0 59.3g  45g  25m S 12.8 72.1  16:50.79 [ET_AIO 4]
> 32490 ats   20   0 59.3g  45g  25m R 12.8 72.1  16:52.61 [ET_AIO 5]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3374) Issues with cache.config implementation

2015-03-08 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352039#comment-14352039
 ] 

Zhao Yongming commented on TS-3374:
---

the current matching in cache.config is that when it matched with many actions, 
things will get very complex, for example in your case, the 'never-cache' is a 
killer actions, when any url matched with it, regards what ever the URL matchs 
on other rules, it will not be cached.

so, the example in the cache.config is applied on the same action, which is the 
'revalidate='.

> Issues with cache.config implementation
> ---
>
> Key: TS-3374
> URL: https://issues.apache.org/jira/browse/TS-3374
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Dan Morgan
>  Labels: cache-control
> Fix For: sometime
>
>
> The documentation implies that entries in the cache.config file are processed 
> in 'order'.
> For example, this example in the docs:
> ---
> The following example configures Traffic Server to revalidate gif and jpeg 
> objects in the domain mydomain.com every 6 hours, and all other objects in 
> mydomain.com every hour. The rules are applied in the order listed.
> dest_domain=mydomain.com suffix=gif revalidate=6h
> dest_domain=mydomain.com suffix=jpeg revalidate=6h
> dest_domain=mydomain.com revalidate=1h
> ---
> However, running with version 5.1.2 and having the following lines:
> dest_domain=mydomain.com prefix=somepath suffix=js revalidate=7d
> dest_domain=mydomain.com suffix=js action=never-cache
> I would expect it to not cache any .js URL's from mydomain.com, except those 
> that have a prefix of 'somepath'.  However what happens is that the 
> action=never-cache is applied to all URL's having mydomain.com (even the ones 
> that have a prefix of 'somepath').



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-965) cache.config can't deal with both revalidate= and ttl-in-cache= specified

2015-03-08 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352036#comment-14352036
 ] 

Zhao Yongming commented on TS-965:
--

I have no idea of what is the detail, as cache.config is a multi-matching rule 
system, and there is some hard-coded rules which is not explained anywhare, for 
example: if you matched with 'no-cache', then it will not cache.

I don't like the idea with the cache-control matching, which is hard to extend 
and hard to use in real world, maybe we should avoid using it in fever of the 
LUA remaping and LUA plugins

> cache.config can't deal with both revalidate= and ttl-in-cache= specified
> -
>
> Key: TS-965
> URL: https://issues.apache.org/jira/browse/TS-965
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Igor Galić
>Assignee: Alan M. Carroll
>  Labels: A, cache-control
> Fix For: 5.3.0
>
>
> If both of these options are specified (with the same time?), nothing is 
> cached at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-3197) dest_ip in cache.config should be expand to network style

2015-03-03 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-3197:
--
Issue Type: Improvement  (was: Bug)

> dest_ip in cache.config should be expand to network style
> -
>
> Key: TS-3197
> URL: https://issues.apache.org/jira/browse/TS-3197
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Configuration, Performance
>Reporter: Luca Rea
> Fix For: sometime
>
>
> Hi,
> I'm tring to exclude a /22 netblock from the cache system but the syntax 
> "dest_ip" doesn't work, detalis below:
> dest_ip="x.y.84.0-x.y.87.255" action=never-cache
> I've tried to "stop,clear-cache,start" several times but every time images 
> have been put into the cache and log shows "NONE FIN FIN TCP_MEM_HIT" or 
> "NONE FIN FIN TCP_IMS_HIT".
> Other Info:
> proxy.node.version.manager.long=Apache Traffic Server - traffic_manager - 
> 5.1.0 - (build # 81013 on Sep 10 2014 at 13:13:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-3197) dest_ip in cache.config should be expand to network style

2015-03-03 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-3197:
--
Priority: Minor  (was: Major)

> dest_ip in cache.config should be expand to network style
> -
>
> Key: TS-3197
> URL: https://issues.apache.org/jira/browse/TS-3197
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Configuration, Performance
>Reporter: Luca Rea
>Priority: Minor
> Fix For: sometime
>
>
> Hi,
> I'm tring to exclude a /22 netblock from the cache system but the syntax 
> "dest_ip" doesn't work, detalis below:
> dest_ip="x.y.84.0-x.y.87.255" action=never-cache
> I've tried to "stop,clear-cache,start" several times but every time images 
> have been put into the cache and log shows "NONE FIN FIN TCP_MEM_HIT" or 
> "NONE FIN FIN TCP_IMS_HIT".
> Other Info:
> proxy.node.version.manager.long=Apache Traffic Server - traffic_manager - 
> 5.1.0 - (build # 81013 on Sep 10 2014 at 13:13:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-3197) dest_ip in cache.config should be expand to network style

2015-03-03 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-3197:
--
Summary: dest_ip in cache.config should be expand to network style  (was: 
dest_ip in cache.config doesn't work)

> dest_ip in cache.config should be expand to network style
> -
>
> Key: TS-3197
> URL: https://issues.apache.org/jira/browse/TS-3197
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache, Configuration, Performance
>Reporter: Luca Rea
> Fix For: sometime
>
>
> Hi,
> I'm tring to exclude a /22 netblock from the cache system but the syntax 
> "dest_ip" doesn't work, detalis below:
> dest_ip="x.y.84.0-x.y.87.255" action=never-cache
> I've tried to "stop,clear-cache,start" several times but every time images 
> have been put into the cache and log shows "NONE FIN FIN TCP_MEM_HIT" or 
> "NONE FIN FIN TCP_IMS_HIT".
> Other Info:
> proxy.node.version.manager.long=Apache Traffic Server - traffic_manager - 
> 5.1.0 - (build # 81013 on Sep 10 2014 at 13:13:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3212) 200 code is returned as 304

2015-03-03 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344883#comment-14344883
 ] 

Zhao Yongming commented on TS-3212:
---

yeah, let us start tracking on cache control issue then.

and what confusing me is that why IMS is there if your resonse including all 
the 'no-cache' directives to inform the client that the content is not 
cache-able. that is weird.

anyway, keep this issue open until we fix the cache-control and recheck it.

> 200 code is returned as 304
> ---
>
> Key: TS-3212
> URL: https://issues.apache.org/jira/browse/TS-3212
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Rea
> Fix For: sometime
>
>
> The live streaming videos from akamaihd.net CDN cannot be watched because ATS 
> rewrite codes 200 into 304 and videos enter continuosly in buffering status:
> {code}
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> HTTP/1.1 200 OK
> Server: ContactLab
> Mime-Version: 1.0
> Content-Type: video/abst
> Content-Length: 122
> Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT
> Expires: Tue, 25 Nov 2014 15:31:53 GMT
> Cache-Control: max-age=0, no-cache
> Pragma: no-cache
> Date: Tue, 25 Nov 2014 15:31:53 GMT
> access-control-allow-origin: *
> Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; 
> domain=abclive.abcnews.com
> Age: 0
> Connection: keep-alive
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT
> HTTP/1.1 304 Not Modified
> Date: Tue, 25 Nov 2014 15:31:58 GMT
> Expires: Tue, 25 Nov 2014 15:31:58 GMT
> Cache-Control: max-age=0, no-cache
> Connection: keep-alive
> Server: ContactLab
> {code}
> using the url_regex to skip cache/IMS doesn't work, the workaround is the 
> following line in records.config:
> CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3212) 200 code is returned as 304

2015-03-02 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344608#comment-14344608
 ] 

Zhao Yongming commented on TS-3212:
---

[~luca.rea] are you still following on this issue? I think we have find out 
some of the dark side:
1. your client send with a IMS but don't want the proxy/cache return with a 
304. this is really hard to do that unless you make the IMS a fail in compare.

2, the cache.config is not working on never-cache, as expected as the "no touch 
and passing-through". that is a dark side of ATS in the cache-control, IMO.

I am going to sort out most of the cache-control issues as much as possible, 
I'd like to hear from you.

> 200 code is returned as 304
> ---
>
> Key: TS-3212
> URL: https://issues.apache.org/jira/browse/TS-3212
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Rea
> Fix For: sometime
>
>
> The live streaming videos from akamaihd.net CDN cannot be watched because ATS 
> rewrite codes 200 into 304 and videos enter continuosly in buffering status:
> {code}
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> HTTP/1.1 200 OK
> Server: ContactLab
> Mime-Version: 1.0
> Content-Type: video/abst
> Content-Length: 122
> Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT
> Expires: Tue, 25 Nov 2014 15:31:53 GMT
> Cache-Control: max-age=0, no-cache
> Pragma: no-cache
> Date: Tue, 25 Nov 2014 15:31:53 GMT
> access-control-allow-origin: *
> Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; 
> domain=abclive.abcnews.com
> Age: 0
> Connection: keep-alive
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT
> HTTP/1.1 304 Not Modified
> Date: Tue, 25 Nov 2014 15:31:58 GMT
> Expires: Tue, 25 Nov 2014 15:31:58 GMT
> Cache-Control: max-age=0, no-cache
> Connection: keep-alive
> Server: ContactLab
> {code}
> using the url_regex to skip cache/IMS doesn't work, the workaround is the 
> following line in records.config:
> CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3360) TS don't use peer IP address from icp.config

2015-03-02 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344597#comment-14344597
 ] 

Zhao Yongming commented on TS-3360:
---

I think that is only an default config file misleading, as due to the official 
doc:
https://docs.trafficserver.apache.org/en/latest/reference/configuration/icp.config.en.html#std:configfile-icp.config

the Hostname and HostIP only need to specify one of them, not both :D

can you provide an update on the default config file to make it clear?

> TS don't use peer IP address from icp.config
> 
>
> Key: TS-3360
> URL: https://issues.apache.org/jira/browse/TS-3360
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, ICP
>Reporter: Anton Ageev
> Fix For: 5.3.0
>
>
> I use TS 5.0.1.
> I try to add peer in icp.config:
> {code}
> peer1|192.168.0.2|2|80|3130|0|0.0.0.0|1|
> {code}
> But I got in the log:
> {code}
> DEBUG: (icp_warn) ICP query send, res=90, ip=*Not IP address [0]*
> {code}
> The only way to specify peer IP is to specify *real* hostname:
> {code}
> google.com|192.168.0.2|2|80|3130|0|0.0.0.0|1|
> {code}
> ICP request to google.com in the log:
> {code}
> DEBUG: (icp) [ICP_QUEUE_REQUEST] Id=617 send query to [173.194.112.96:3130]
> {code}
> Host IP (second field) is parsed to {{\*Not IP address \[0\]\*}} always.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-3412) Segmentation fault ET_CLUSTER

2015-02-25 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-3412:
--
Description: 
Can anyone help me please?

2.6.32-431.el6.x86_64
mem : 16GB
cpu   : 6core * 2   HS = 24core

{noformat}
kernel: [ET_CLUSTER 1][4508]: segfault at a8 ip 006c1571 sp 
2b5b58738890 error 4 in traffic_server[40+421000]
{noformat}

traffic.out
{noformat}
traffic_server: using root directory '/opt/ats'
traffic_server: Segmentation fault (Address not mapped to object 
[0xa8])traffic_server - STACK TRACE:
/opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo*, 
void*)+0x99)[0x4aaf19]
/lib64/libpthread.so.0(+0xf710)[0x2b5a25658710]
/opt/ats/bin/traffic_server(ClusterProcessor::connect_local(Continuation*, 
ClusterVCToken*, int, int)+0xa
/opt/ats/bin/traffic_server(cache_op_ClusterFunction(ClusterHandler*, void*, 
int)+0xabd)[0x6a71cd]
/opt/ats/bin/traffic_server(ClusterHandler::process_large_control_msgs()+0xe9)[0x6ab5e9]
/opt/ats/bin/traffic_server(ClusterHandler::update_channels_read()+0x8b)[0x6b0d7b]
/opt/ats/bin/traffic_server(ClusterHandler::process_read(long)+0x138)[0x6b1528]
/opt/ats/bin/traffic_server(ClusterHandler::mainClusterEvent(int, 
Event*)+0x176)[0x6b3f56]
/opt/ats/bin/traffic_server(ClusterState::IOComplete()+0x8a)[0x6b701a]
/opt/ats/bin/traffic_server(ClusterState::doIO_read_event(int, 
void*)+0xa7)[0x6b7307]
/opt/ats/bin/traffic_server[0x72b2e7]
/opt/ats/bin/traffic_server[0x72c53d]
/opt/ats/bin/traffic_server(NetHandler::mainNetEvent(int, 
Event*)+0x1f2)[0x7213c2]
/opt/ats/bin/traffic_server(EThread::process_event(Event*, int)+0x125)[0x74d4e5]
/opt/ats/bin/traffic_server(EThread::execute()+0x4c9)[0x74de29]
/opt/ats/bin/traffic_server[0x74c92a]
/lib64/libpthread.so.0(+0x79d1)[0x2b5a256509d1]
/lib64/libc.so.6(clone+0x6d)[0x2b5a26fa38fd]
traffic_server: using root directory '/opt/ats'
traffic_server: Terminated (Signal sent by kill() 28739 0)[E. Mgmt] log ==> 
[TrafficManager] using rootats'
{noformat}

records.config
{noformat}
CONFIG proxy.config.proxy_name STRING cluster-v530
LOCAL proxy.local.cluster.type INT 1
CONFIG proxy.config.cluster.ethernet_interface STRING bond0
CONFIG proxy.config.cluster.cluster_port INT 8086
CONFIG proxy.config.cluster.rsport INT 8088
CONFIG proxy.config.cluster.mcport INT 8089
CONFIG proxy.config.cluster.mc_group_addr STRING 224.0.1.40
CONFIG proxy.config.cluster.cluster_configuration STRING cluster.config
CONFIG proxy.config.cluster.threads INT 4
{noformat}

  was:
Can anyone help me please?

2.6.32-431.el6.x86_64
mem : 16GB
cpu   : 6core * 2   HS = 24core

{noformat}
kernel: [ET_CLUSTER 1][4508]: segfault at a8 ip 006c1571 sp 
2b5b58738890 error 4 in traffic_server[40+421000]
{noformat}

traffic.out
{noformat}
traffic_server: using root directory '/opt/ats'
traffic_server: Segmentation fault (Address not mapped to object 
[0xa8])traffic_server - STACK TRACE:
/opt/ats/bin/traffic_server(_Z19crash_logger_invokeiP7siginfoPv+0x99)[0x4aaf19]
/lib64/libpthread.so.0(+0xf710)[0x2b5a25658710]
/opt/ats/bin/traffic_server(_ZN16ClusterProcessor13connect_localEP12ContinuationP14ClusterVCTokenii+0xa
/opt/ats/bin/traffic_server(_Z24cache_op_ClusterFunctionP14ClusterHandlerPvi+0xabd)[0x6a71cd]
/opt/ats/bin/traffic_server(_ZN14ClusterHandler26process_large_control_msgsEv+0xe9)[0x6ab5e9]
/opt/ats/bin/traffic_server(_ZN14ClusterHandler20update_channels_readEv+0x8b)[0x6b0d7b]
/opt/ats/bin/traffic_server(_ZN14ClusterHandler12process_readEl+0x138)[0x6b1528]
/opt/ats/bin/traffic_server(_ZN14ClusterHandler16mainClusterEventEiP5Event+0x176)[0x6b3f56]
/opt/ats/bin/traffic_server(_ZN12ClusterState10IOCompleteEv+0x8a)[0x6b701a]
/opt/ats/bin/traffic_server(_ZN12ClusterState15doIO_read_eventEiPv+0xa7)[0x6b7307]
/opt/ats/bin/traffic_server[0x72b2e7]
/opt/ats/bin/traffic_server[0x72c53d]
/opt/ats/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x7213c2]
/opt/ats/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x125)[0x74d4e5]
/opt/ats/bin/traffic_server(_ZN7EThread7executeEv+0x4c9)[0x74de29]
/opt/ats/bin/traffic_server[0x74c92a]
/lib64/libpthread.so.0(+0x79d1)[0x2b5a256509d1]
/lib64/libc.so.6(clone+0x6d)[0x2b5a26fa38fd]
traffic_server: using root directory '/opt/ats'
traffic_server: Terminated (Signal sent by kill() 28739 0)[E. Mgmt] log ==> 
[TrafficManager] using rootats'
{noformat}

records.config
{noformat}
CONFIG proxy.config.proxy_name STRING cluster-v530
LOCAL proxy.local.cluster.type INT 1
CONFIG proxy.config.cluster.ethernet_interface STRING bond0
CONFIG proxy.config.cluster.cluster_port INT 8086
CONFIG proxy.config.cluster.rsport INT 8088
CONFIG proxy.config.cluster.mcport INT 8089
CONFIG proxy.config.cluster.mc_group_addr STRING 224.0.1.40
CONFIG proxy.config.cluster.cluster_configuration STRING cluster.config
CONFIG proxy.config.cluster.threads INT 4
{noformat}


> Segmentation fault ET_CLUSTER
> ---

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331983#comment-14331983
 ] 

Zhao Yongming commented on TS-3395:
---

I really don't know what you want, you want to stress out what ATS can do? or 
you want ATS do what Nginx/Squid would do? in both issue, I have point out the 
ATS way we deal with your issues, even guide you step by step towards the root 
cause and how we deal it with ATS, you are now using ATS, very different from 
Squid etc, it is powerfull and design in some strange way, if you are the fresh 
user, find out the ATS way is a good start, as it turns out that ATS will 
perform well in most of the real world cases.

on the testing issue, please refer to jtest (tools/jtest/) on testing if you 
dont know that, that is an other good stress tool which is suitble for stress a 
performance monster as ATS.

anyway, welcome to the ATS Colosseum.



> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330352#comment-14330352
 ] 

Zhao Yongming commented on TS-3395:
---

and in practice, the cache write connections will always less than 
origin_max_connections, sounds perfect?

> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330349#comment-14330349
 ] 

Zhao Yongming edited comment on TS-3395 at 2/21/15 5:08 PM:


good, that is the case what I'd like to avoid. haha.

I am talking about the limit on the origin side, while TS-3386 is facing on the 
UA side connection limit, and in your case, the limit on the UA and OS messed 
up when you deal with the 127.0.0.1 at the beginning.

in your practice, we strongly used the limit on the OS side, which is a very 
good solution for both cache & origin nice.
please refer to 'proxy.config.http.origin_max_connections' for the limit on the 
OS side

when cache connections holding by waiting or other issue, the httpSM will keep 
alive, which may cause you very huge memories, when in our productions with 
20kqps, we would like to keep the cache connections in about 1k-2k. that is 
critical to a busy system, when you want it stable in service.

and cache write may cause more performace decrease than read, pay more 
attention to cache writes



was (Author: zym):
good, that is the case what I'd like to avoid. haha.

I am talking about the limit on the origin side, while TS-3386 is facing on the 
UA side connection limit, and in your case, the limit on the UA and OS messed 
up when you deal with the 127.0.0.1 at the beginning.

in your practice, we strongly used the limit on the UA side, which is a very 
good solution for both cache & origin nice.
please refer to 'proxy.config.http.origin_max_connections' for the limit on the 
OS side

when cache connections holding by waiting or other issue, the httpSM will keep 
alive, which may cause you very huge memories, when in our productions with 
20kqps, we would like to keep the cache connections in about 1k-2k. that is 
critical to a busy system, when you want it stable in service.

and cache write may cause more performace decrease than read, pay more 
attention to cache writes


> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, i

[jira] [Comment Edited] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330349#comment-14330349
 ] 

Zhao Yongming edited comment on TS-3395 at 2/21/15 5:08 PM:


good, that is the case what I'd like to avoid. haha.

I am talking about the limit on the origin side, while TS-3386 is facing on the 
UA side connection limit, and in your case, the limit on the UA and OS messed 
up when you deal with the 127.0.0.1 at the beginning.

in your practice, we strongly used the limit on the UA side, which is a very 
good solution for both cache & origin nice.
please refer to 'proxy.config.http.origin_max_connections' for the limit on the 
OS side

when cache connections holding by waiting or other issue, the httpSM will keep 
alive, which may cause you very huge memories, when in our productions with 
20kqps, we would like to keep the cache connections in about 1k-2k. that is 
critical to a busy system, when you want it stable in service.

and cache write may cause more performace decrease than read, pay more 
attention to cache writes



was (Author: zym):
good, that is the case what I'd like to avoid. haha.

I am talking about the limit on the origin side, while TS-3386 is facing on the 
UA side connection limit, and in your case, the limit on the UA and OS messed 
up when you deal with the 127.0.0.1 at the beginning.

in your practice, we strongly used the limit on the UA side, which is a very 
good solution for both cache & origin nice.
please refer to 'proxy.config.http.origin_max_connections' for the limit on the 
UA side

when cache connections holding by waiting or other issue, the httpSM will keep 
alive, which may cause you very huge memories, when in our productions with 
20kqps, we would like to keep the cache connections in about 1k-2k. that is 
critical to a busy system, when you want it stable in service.

and cache write may cause more performace decrease than read, pay more 
attention to cache writes


> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, i

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330349#comment-14330349
 ] 

Zhao Yongming commented on TS-3395:
---

good, that is the case what I'd like to avoid. haha.

I am talking about the limit on the origin side, while TS-3386 is facing on the 
UA side connection limit, and in your case, the limit on the UA and OS messed 
up when you deal with the 127.0.0.1 at the beginning.

in your practice, we strongly used the limit on the UA side, which is a very 
good solution for both cache & origin nice.
please refer to 'proxy.config.http.origin_max_connections' for the limit on the 
UA side

when cache connections holding by waiting or other issue, the httpSM will keep 
alive, which may cause you very huge memories, when in our productions with 
20kqps, we would like to keep the cache connections in about 1k-2k. that is 
critical to a busy system, when you want it stable in service.

and cache write may cause more performace decrease than read, pay more 
attention to cache writes


> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-r

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330336#comment-14330336
 ] 

Zhao Yongming commented on TS-3395:
---

if you take ATS as a proxy, why not limit the connections on origin side? we 
have more options to protect the origin server than limit the disk io.
if you take ATS as a cache, the disk io and space is the key point of the cache 
system, why not add more disks if you can? a disk write bottleneck is really 
rare case when we talking about the cache system, right?

> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330323#comment-14330323
 ] 

Zhao Yongming commented on TS-3395:
---

my suggestion on performance testing is always avoid the disk IO(iops), as that 
is a hard limit on ATS performance, or any other proxy/cache system, and you 
can even calc out the real performance in production if that is the whole 
system bottoleneck.

> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330321#comment-14330321
 ] 

Zhao Yongming commented on TS-3395:
---

when you get the disk write IO bottoleneck, you will get a high water writes, 
and all others will not able to write when ATS will try to forward other 
request to the origin, that is in the users view, and that will result in cache 
hit ratio decreased, but you will get a higher request per second nubumer than 
others.

this is a feature by design I think :D

> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

2015-02-21 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330257#comment-14330257
 ] 

Zhao Yongming commented on TS-3395:
---

well, if that is the disk IO bottlenetck, I think that is something reasonable, 
can you please attach a disk iops verion of the disk I/O?

> Hit ratio drops with high concurrency
> -
>
> Key: TS-3395
> URL: https://issues.apache.org/jira/browse/TS-3395
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Bruno
> Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 100 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 100 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted

2015-02-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316482#comment-14316482
 ] 

Zhao Yongming commented on TS-3386:
---

well, things get more interesting.
q1: why you will lose the cached content in a restart of the traffic server??
 q1.1: is that a cache issue?

q2: you are going to protect the origin server, why you think that limit on the 
UA side connection is a better solution to the limit on the origin side?
 q2.1 have you seen any occurrence of connection(httpSM) hanghup?
 q2.2 what is a better way to handle of the connection issue, for example 
timeout?

when you try to handle tons of cache, tons of the traffic, keep it simple, keep 
it robust always better than anything intelligent.

yes, we have fixed many cache issue we meet, http SM issues, and connections 
timeout issue, connection leaking ... I think most of the important change 
already in the official tree. and this is the way we figure out the root issues 
in ATS, which may lead to just some very tiny fix that will only affect very 
high traffic site with very strict SLA requirement.

> Heartbeat failed with high load, trafficserver restarted
> 
>
> Key: TS-3386
> URL: https://issues.apache.org/jira/browse/TS-3386
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Performance
>Reporter: Luca Bruno
>
> I've been evaluating ATS for some days. I'm using it with mostly default 
> settings, except I've lowered the number of connections to the backend, I 
> have a raw storage of 500gb, and disabled ram cache.
> Working fine, then I wanted to stress it more. I've increased the test to 
> 1000 concurrent requests, then the ATS worker has been restarted and thus 
> lost the whole cache.
> /var/log/syslog:
> {noformat}
> Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR:  
> (last system error 32: Broken pipe)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status 
> signal [32985 256]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, 
> making sure traffic_server is dead
> Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager
> Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting 
> ---
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: 
> Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 
> 2015 at 13:05:19)
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test)

[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted

2015-02-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316376#comment-14316376
 ] 

Zhao Yongming commented on TS-3386:
---

if you want to talk about the kill, I'd like to say there should be more work 
before taking down the server, but how would you know that the connection full 
and all works well?

we have tried to put the heartbeat into a connection that will not be affect in 
the connection limit, but sounds not so good too

the heart beat is a fake L7 service health checker, which is design to find out 
something abnormal :D

> Heartbeat failed with high load, trafficserver restarted
> 
>
> Key: TS-3386
> URL: https://issues.apache.org/jira/browse/TS-3386
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Performance
>Reporter: Luca Bruno
>
> I've been evaluating ATS for some days. I'm using it with mostly default 
> settings, except I've lowered the number of connections to the backend, I 
> have a raw storage of 500gb, and disabled ram cache.
> Working fine, then I wanted to stress it more. I've increased the test to 
> 1000 concurrent requests, then the ATS worker has been restarted and thus 
> lost the whole cache.
> /var/log/syslog:
> {noformat}
> Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR:  
> (last system error 32: Broken pipe)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status 
> signal [32985 256]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, 
> making sure traffic_server is dead
> Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager
> Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting 
> ---
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: 
> Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 
> 2015 at 13:05:19)
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:42 tes

[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted

2015-02-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316343#comment-14316343
 ] 

Zhao Yongming commented on TS-3386:
---

well, proxy.config.net.connections_throttle = 1000, are you kidding? ATS is not 
squid nor httpd-1.x

> Heartbeat failed with high load, trafficserver restarted
> 
>
> Key: TS-3386
> URL: https://issues.apache.org/jira/browse/TS-3386
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Performance
>Reporter: Luca Bruno
>
> I've been evaluating ATS for some days. I'm using it with mostly default 
> settings, except I've lowered the number of connections to the backend, I 
> have a raw storage of 500gb, and disabled ram cache.
> Working fine, then I wanted to stress it more. I've increased the test to 
> 1000 concurrent requests, then the ATS worker has been restarted and thus 
> lost the whole cache.
> /var/log/syslog:
> {noformat}
> Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR:  
> (last system error 32: Broken pipe)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status 
> signal [32985 256]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, 
> making sure traffic_server is dead
> Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager
> Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting 
> ---
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: 
> Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 
> 2015 at 13:05:19)
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: traffic_server 
> Version: Apache Traffic Server - traff

[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted

2015-02-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316279#comment-14316279
 ] 

Zhao Yongming commented on TS-3386:
---

well, the remap metters, please don't mess up 127.0.0.1 8080 with most of the 
services, that is not what ATS working to as a proxy.

use something like map http://mydomain.com:8080/ . and do your testing 
using modified /etc/hosts or -x 127.0.0.1:8080 in curl.

> Heartbeat failed with high load, trafficserver restarted
> 
>
> Key: TS-3386
> URL: https://issues.apache.org/jira/browse/TS-3386
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Performance
>Reporter: Luca Bruno
>
> I've been evaluating ATS for some days. I'm using it with mostly default 
> settings, except I've lowered the number of connections to the backend, I 
> have a raw storage of 500gb, and disabled ram cache.
> Working fine, then I wanted to stress it more. I've increased the test to 
> 1000 concurrent requests, then the ATS worker has been restarted and thus 
> lost the whole cache.
> /var/log/syslog:
> {noformat}
> Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR:  
> (last system error 32: Broken pipe)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status 
> signal [32985 256]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, 
> making sure traffic_server is dead
> Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager
> Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting 
> ---
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: 
> Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 
> 2015 at 13:05:19)
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:44 test-cache traffic_serve

[jira] [Commented] (TS-3164) why the load of trafficserver occurrs a abrupt rise on a occasion ?

2015-02-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316244#comment-14316244
 ] 

Zhao Yongming commented on TS-3164:
---

I have seen that, but I got some very short lockup like situation, mostly less 
than 15s. still don't know why

> why the load of trafficserver occurrs a abrupt rise on a occasion ?
> ---
>
> Key: TS-3164
> URL: https://issues.apache.org/jira/browse/TS-3164
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS 6.3 64bit, 8 cores, 128G mem 
>Reporter: taoyunxing
> Fix For: sometime
>
>
> I use Tsar to monitor the traffic status of the ATS 4.2.0, and come across 
> the following problem:
> {code}
> Time   ---cpu-- ---mem-- ---tcp-- -traffic --sda--- --sdb--- 
> --sdc---  ---load- 
> Time util util   retranbytin  bytout util util
>  util load1   
> 03/11/14-18:20  40.6787.19 3.3624.5M   43.9M13.0294.68
>  0.00  5.34   
> 03/11/14-18:25  40.3087.20 3.2722.5M   42.6M12.3894.87
>  0.00  5.79   
> 03/11/14-18:30  40.8484.67 3.4421.4M   42.0M13.2995.37
>  0.00  6.28   
> 03/11/14-18:35  43.6387.36 3.2123.8M   45.0M13.2393.99
>  0.00  7.37   
> 03/11/14-18:40  42.2587.37 3.0924.2M   44.8M12.8495.77
>  0.00  7.25   
> 03/11/14-18:45  42.9687.44 3.4623.3M   46.0M12.9695.84
>  0.00  7.10   
> 03/11/14-18:50  44.0087.42 3.4922.3M   43.0M14.1794.99
>  0.00  6.57   
> 03/11/14-18:55  42.2087.44 3.4622.3M   43.6M13.1996.05
>  0.00  6.09   
> 03/11/14-19:00  44.9087.53 3.6023.6M   46.5M13.6196.67
>  0.00  8.06   
> 03/11/14-19:05  46.2687.73 3.2425.8M   49.1M15.3994.05
>  0.00  9.98   
> 03/11/14-19:10  43.8587.69 3.1925.4M   50.9M12.8897.80
>  0.00  7.99   
> 03/11/14-19:15  45.2887.69 3.3625.6M   49.6M13.1096.86
>  0.00  7.47   
> 03/11/14-19:20  44.1185.20 3.2924.1M   47.8M14.2496.75
>  0.00  5.82   
> 03/11/14-19:25  45.2687.78 3.5224.4M   47.7M13.2195.44
>  0.00  7.61   
> 03/11/14-19:30  44.8387.80 3.6425.7M   50.8M13.2798.02
>  0.00  6.85   
> 03/11/14-19:35  44.8987.78 3.6123.9M   49.0M13.3497.42
>  0.00  7.04   
> 03/11/14-19:40  69.2188.88 0.5518.3M   33.7M11.3971.23
>  0.00 65.80   
> 03/11/14-19:45  72.4788.66 0.2715.4M   31.6M11.5172.31
>  0.00 11.56   
> 03/11/14-19:50  44.8788.72 4.1122.7M   46.3M12.9997.33
>  0.00  8.29
> {code}
>
> in addition, top command show
> {code}
> hi:0
> ni:0
> si:45.56
> st:0
> sy:13.92
> us:12.58
> wa:14.3
> id:15.96
> {code}
> who help me ? thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted

2015-02-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316238#comment-14316238
 ] 

Zhao Yongming commented on TS-3386:
---

o, please just don't load any traffic and enable debug on http.*|dns.*, and I'd 
suspect this is a HostDB reverse lookup on 127.0.0.1 or lookup on localhost 
issue. let us dig it out.

> Heartbeat failed with high load, trafficserver restarted
> 
>
> Key: TS-3386
> URL: https://issues.apache.org/jira/browse/TS-3386
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Performance
>Reporter: Luca Bruno
>
> I've been evaluating ATS for some days. I'm using it with mostly default 
> settings, except I've lowered the number of connections to the backend, I 
> have a raw storage of 500gb, and disabled ram cache.
> Working fine, then I wanted to stress it more. I've increased the test to 
> 1000 concurrent requests, then the ATS worker has been restarted and thus 
> lost the whole cache.
> /var/log/syslog:
> {noformat}
> Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR:  
> (last system error 32: Broken pipe)
> Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status 
> signal [32985 256]
> Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, 
> making sure traffic_server is dead
> Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager
> Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting 
> ---
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: 
> Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 
> 2015 at 13:05:19)
> Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server 
> Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on 
> Feb 10 2015 at 13:04:42)
> Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: 
> RLIMIT_NOFILE(7):cur(736236),max(736236)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 
> status(502)
> Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2]
> Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: 
> Killed
> Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: --- traffic_server 
> Starting ---
> Feb 11 10:06:44 test-cache tr

[jira] [Updated] (TS-2482) Problems with SOCKS

2015-01-14 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-2482:
--
Assignee: weijin

> Problems with SOCKS
> ---
>
> Key: TS-2482
> URL: https://issues.apache.org/jira/browse/TS-2482
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Radim Kolar
>Assignee: weijin
> Fix For: sometime
>
>
> There are several problems with using SOCKS. I am interested in case when TF 
> is sock client. Client sends HTTP request and TF uses SOCKS server to make 
> connection to internet.
> a/ - not documented enough in default configs
> From default configs comments it seems that for running 
> TF 4.1.2 as socks client, it is sufficient to add one line to socks.config:
> dest_ip=0.0.0.0-255.255.255.255 parent="10.0.0.7:9050"
> but socks proxy is not used. If i run tcpdump sniffing packets  TF never 
> tries to connect to that SOCKS.
> From source code - 
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it 
> looks that is needed to set "proxy.config.socks.socks_needed" to activate 
> socks support. This should be documented in both sample files: socks.config 
> and record.config
> b/
> after enabling socks, i am hit by this assert:
> Assertion failed: (ats_is_ip4(&target_addr)), function init, file Socks.cc, 
> line 65.
> i run on dual stack system (ip4,ip6). 
> This code is setting default destination for SOCKS request? Can not you use 
> just 127.0.0.1 for case if client gets connected over IP6?
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-2482) Problems with SOCKS

2015-01-14 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278227#comment-14278227
 ] 

Zhao Yongming commented on TS-2482:
---

we have a patch that will fix the problem, I think. it works on turning ATS 
into a SOCSK5 server, but still pending to full testing with parent socks 
feature. the problem here is not only the assert, but also the HTTP 
transactions.

> Problems with SOCKS
> ---
>
> Key: TS-2482
> URL: https://issues.apache.org/jira/browse/TS-2482
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Radim Kolar
> Fix For: sometime
>
>
> There are several problems with using SOCKS. I am interested in case when TF 
> is sock client. Client sends HTTP request and TF uses SOCKS server to make 
> connection to internet.
> a/ - not documented enough in default configs
> From default configs comments it seems that for running 
> TF 4.1.2 as socks client, it is sufficient to add one line to socks.config:
> dest_ip=0.0.0.0-255.255.255.255 parent="10.0.0.7:9050"
> but socks proxy is not used. If i run tcpdump sniffing packets  TF never 
> tries to connect to that SOCKS.
> From source code - 
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it 
> looks that is needed to set "proxy.config.socks.socks_needed" to activate 
> socks support. This should be documented in both sample files: socks.config 
> and record.config
> b/
> after enabling socks, i am hit by this assert:
> Assertion failed: (ats_is_ip4(&target_addr)), function init, file Socks.cc, 
> line 65.
> i run on dual stack system (ip4,ip6). 
> This code is setting default destination for SOCKS request? Can not you use 
> just 127.0.0.1 for case if client gets connected over IP6?
> https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3088) Have ATS look at /etc/hosts

2014-12-15 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247700#comment-14247700
 ] 

Zhao Yongming commented on TS-3088:
---

looks some of the SPLIT DNS codes is removed, is that feature still working 
after this commit?

> Have ATS look at /etc/hosts
> ---
>
> Key: TS-3088
> URL: https://issues.apache.org/jira/browse/TS-3088
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: DNS
>Reporter: David Carlin
>Assignee: Alan M. Carroll
>Priority: Minor
> Fix For: 5.3.0
>
> Attachments: ts-3088-3-2-x-patch.diff
>
>
> It would be nice if /etc/hosts was read when resolving hostnames - useful for 
> testing/troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3220) Update http cache stats so we can determine if a response was served from ram cache

2014-12-05 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235637#comment-14235637
 ] 

Zhao Yongming commented on TS-3220:
---

yeah, nice catch, we have seen some ram cache hit higher than expected too.

> Update http cache stats so we can determine if a response was served from ram 
> cache
> ---
>
> Key: TS-3220
> URL: https://issues.apache.org/jira/browse/TS-3220
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Metrics
>Reporter: Bryan Call
>  Labels: A, Yahoo
> Fix For: 5.3.0
>
>
> Currently we use a combination of ram cache stats and some http ram cache 
> information to try to determine if the response was served from ram cache.  
> The ram cache stats don't know about http and the entry in ram cache might 
> not be valid.  It is possible to have a ram cache hit from the cache's point 
> of view, but not serve the response from cache at all.
> The http cache stats are missing a few stats to determine if the response was 
> served from ram.  We would need to add stat for ims responses served from ram 
> {{proxy.process.http.cache_hit_mem_ims}} and a stat if the stale response was 
> served from ram {{proxy.process.http.cache_hit_mem_stale_served}}.
> Ram cache stats for reference
> {code}
> proxy.process.cache.ram_cache.hits
> proxy.process.cache.ram_cache.misses
> {code}
> Current http cache stats for reference
> {code}
> proxy.process.http.cache_hit_fresh
> proxy.process.http.cache_hit_mem_fresh
> proxy.process.http.cache_hit_revalidated
> proxy.process.http.cache_hit_ims
> proxy.process.http.cache_hit_stale_served
> proxy.process.http.cache_miss_cold
> proxy.process.http.cache_miss_changed
> proxy.process.http.cache_miss_client_no_cache
> proxy.process.http.cache_miss_client_not_cacheable
> proxy.process.http.cache_miss_ims
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3212) 200 code is returned as 304

2014-11-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225607#comment-14225607
 ] 

Zhao Yongming commented on TS-3212:
---

well, if the ATS return you a 304, there will be two case:
1, the UA side IMS is pass to origin and origin returned with a 304, and that 
304 response itself is saved
2, the content is saved in cache and expired, then ATS query the origin with a 
selfbuilding IMS header, origin server response with a 200, but ATS try to 
reponse with a 304 to UA.

if it is case #2, please confirm that the content is saved in cache, and the 
origin response is 200. the http_ui and tcpdump or debug in records may help.

I think that case #2 looks cool, but it should not saved as here the content is 
set to 'no cache', right?

> 200 code is returned as 304
> ---
>
> Key: TS-3212
> URL: https://issues.apache.org/jira/browse/TS-3212
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Rea
>
> The live streaming videos from akamaihd.net CDN cannot be watched because ATS 
> rewrite codes 200 into 304 and videos enter continuosly in buffering status:
> {code}
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> HTTP/1.1 200 OK
> Server: ContactLab
> Mime-Version: 1.0
> Content-Type: video/abst
> Content-Length: 122
> Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT
> Expires: Tue, 25 Nov 2014 15:31:53 GMT
> Cache-Control: max-age=0, no-cache
> Pragma: no-cache
> Date: Tue, 25 Nov 2014 15:31:53 GMT
> access-control-allow-origin: *
> Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; 
> domain=abclive.abcnews.com
> Age: 0
> Connection: keep-alive
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT
> HTTP/1.1 304 Not Modified
> Date: Tue, 25 Nov 2014 15:31:58 GMT
> Expires: Tue, 25 Nov 2014 15:31:58 GMT
> Cache-Control: max-age=0, no-cache
> Connection: keep-alive
> Server: ContactLab
> {code}
> using the url_regex to skip cache/IMS doesn't work, the workaround is the 
> following line in records.config:
> CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3212) 200 code is returned as 304

2014-11-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224833#comment-14224833
 ] 

Zhao Yongming commented on TS-3212:
---

the 2nd GET request add the IMS header, I don't know why, can you make sure it 
is added by ATS?

from the #1 response, it should not cached by ATS, I'd like to say that is a 
but if it is cached, with default records.config.

any more details?

> 200 code is returned as 304
> ---
>
> Key: TS-3212
> URL: https://issues.apache.org/jira/browse/TS-3212
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Luca Rea
>
> The live streaming videos from akamaihd.net CDN cannot be watched because ATS 
> rewrite codes 200 into 304 and videos enter continuosly in buffering status:
> {code}
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> HTTP/1.1 200 OK
> Server: ContactLab
> Mime-Version: 1.0
> Content-Type: video/abst
> Content-Length: 122
> Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT
> Expires: Tue, 25 Nov 2014 15:31:53 GMT
> Cache-Control: max-age=0, no-cache
> Pragma: no-cache
> Date: Tue, 25 Nov 2014 15:31:53 GMT
> access-control-allow-origin: *
> Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; 
> domain=abclive.abcnews.com
> Age: 0
> Connection: keep-alive
> GET 
> http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKR&b=500,300,700,900,1200&hdcore=3.1.0&plugin=aasp-3.1.0.43.124
>  HTTP/1.1
> Host: abclive.abcnews.com
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 
> Firefox/33.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
> Accept-Encoding: gzip, deflate
> Referer: 
> http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf
> Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==
> Connection: keep-alive
> If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT
> HTTP/1.1 304 Not Modified
> Date: Tue, 25 Nov 2014 15:31:58 GMT
> Expires: Tue, 25 Nov 2014 15:31:58 GMT
> Cache-Control: max-age=0, no-cache
> Connection: keep-alive
> Server: ContactLab
> {code}
> using the url_regex to skip cache/IMS doesn't work, the workaround is the 
> following line in records.config:
> CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3192) implement proxy.config.config_dir

2014-11-11 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207663#comment-14207663
 ] 

Zhao Yongming commented on TS-3192:
---

that is a pending to removing feature, IMO. the origin TS is desgin to be 
relocatable for the config files due to binary distribution. it may accept 
records config and shell ENV settings, after the opensource, we can set the 
config dir by configure options and there is no need to make things that 
complex.

FYI

> implement proxy.config.config_dir
> -
>
> Key: TS-3192
> URL: https://issues.apache.org/jira/browse/TS-3192
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Configuration
>Reporter: James Peach
>Assignee: James Peach
> Fix For: 5.2.0
>
>
> {{proxy.config.config_dir}} has never been implemented, but there are various 
> scenarios where is it useful to be able to point Traffic Server to a 
> non-default set of configuration files. {{TS_ROOT}} is not always sufficient 
> for this because the system config directory is a path relative to the prefix 
> which otherwise cannot be altered (even assuming you know it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-1822) Do we still need proxy.config.system.mmap_max ?

2014-11-10 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205773#comment-14205773
 ] 

Zhao Yongming commented on TS-1822:
---

we make use of the reclaim freelist on our 48G memory system, handling about 
24-32G ram cache, with about 32KB everage content size, the default sysctl 
seting "vm.max_map_count = 65530" is no enough, we have to rise it to 2x.

so, I'd make this a option to rise the default sysctl setting if we choose to 
keep it, for example by cop process.

> Do we still need proxy.config.system.mmap_max ?
> ---
>
> Key: TS-1822
> URL: https://issues.apache.org/jira/browse/TS-1822
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Reporter: Leif Hedstrom
>Assignee: Phil Sorber
>  Labels: compatibility
> Fix For: 6.0.0
>
>
> A long time ago, we added proxy.config.system.mmap_max to let the 
> traffic_server increase the max number of mmap segments that we want to use. 
> We currently set this to 2MM.
> I'm wondering, do we really need this still ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TS-3181) manager ports should only do local network interaction

2014-11-09 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-3181:
--
Fix Version/s: sometime

> manager ports should only do local network interaction
> --
>
> Key: TS-3181
> URL: https://issues.apache.org/jira/browse/TS-3181
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Manager
>Reporter: Zhao Yongming
> Fix For: sometime
>
>
> the manager ports, such as 8088 8089 etc shoud only accept local network 
> connections, and by ignore all the connections from outer network, we can 
> make the interactions more stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3181) manager ports should only do local network interaction

2014-11-09 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203965#comment-14203965
 ] 

Zhao Yongming commented on TS-3181:
---

for example, we should try to filter out these issues with cluster enabled:
{code}
[Nov  7 15:28:21.428] Manager {0x7f277bfff700} NOTE: 
[ClusterCom::drainIncomingChannel] Unexpected message on cluster port.  
Possibly an attack
[Nov  7 15:28:57.501] Manager {0x7f277bfff700} NOTE: 
[ClusterCom::drainIncomingChannel] Unexpected message on cluster port.  
Possibly an attack
[Nov  7 15:34:09.624] Manager {0x7f277bfff700} NOTE: 
[ClusterCom::drainIncomingChannel] Unexpected message on cluster port.  
Possibly an attack
[Nov  7 15:38:36.235] Manager {0x7f277bfff700} NOTE: 
[ClusterCom::drainIncomingChannel] Unexpected message on cluster port.  
Possibly an attack
[Nov  7 15:39:45.596] Manager {0x7f277bfff700} NOTE: 
[ClusterCom::drainIncomingChannel] Unexpected message on cluster port.  
Possibly an attack
{code}

> manager ports should only do local network interaction
> --
>
> Key: TS-3181
> URL: https://issues.apache.org/jira/browse/TS-3181
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Manager
>Reporter: Zhao Yongming
>
> the manager ports, such as 8088 8089 etc shoud only accept local network 
> connections, and by ignore all the connections from outer network, we can 
> make the interactions more stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-3181) manager ports should only do local network interaction

2014-11-09 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-3181:
-

 Summary: manager ports should only do local network interaction
 Key: TS-3181
 URL: https://issues.apache.org/jira/browse/TS-3181
 Project: Traffic Server
  Issue Type: Improvement
  Components: Manager
Reporter: Zhao Yongming


the manager ports, such as 8088 8089 etc shoud only accept local network 
connections, and by ignore all the connections from outer network, we can make 
the interactions more stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TS-3180) Linux native aio not support disk >2T

2014-11-08 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-3180:
-

 Summary: Linux native aio not support disk >2T
 Key: TS-3180
 URL: https://issues.apache.org/jira/browse/TS-3180
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Zhao Yongming


{code}
21:47 < faysal> [Nov  8 15:45:30.080] Server {0x2ab53ff36700} WARNING: unable 
to clear cache directory '/dev/sdc 548864:366283256'
21:48 < faysal> although brw-rw 1 nobody nobody 8, 32 Nov  8 15:45 /dev/sdc
21:48 < faysal> fedora 21
21:48 < faysal> ping anyone
21:49 < ming_zym> disk fail?
21:52 < ming_zym> try to restart traffic server?
21:55 < faysal> i did restarted traffic server couple of times no luck
21:56 < faysal> by the way this is build with linux native aio enabled
21:56 < faysal> and latest master pulled today
21:56 < ming_zym> o, please don't use linux native aio in production
21:57 < ming_zym> not that ready to be used expect in testing
21:58 < ming_zym> I am sorry we don't have time to track down all those native 
aio issues here
21:59 < faysal> ok
21:59 < faysal> am compiling now without native aio
21:59 < faysal> and see what happens and inform you
22:06 < faysal> ming_zym: if you are working on native aio stuff its the issue
22:07 < faysal> i compiled without it and now its working fine
22:07 < faysal> i have noticed this on harddisks over 2T size
22:07 < faysal> smaller disks work fine with native aio
22:12 < ming_zym> ok, cool
22:13 < faysal> thats because i guess my disks are 3T each and one with 240G
22:14 < faysal> the 240 was taken no problem
22:14 < faysal> but the 3T has to be in GPT patition format
22:14 < faysal> and Fedora for some reason had issues identifying it
22:14 < ming_zym> hmm, maybe that is bug
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3174) Kill LRU Ram Cache

2014-11-08 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203330#comment-14203330
 ] 

Zhao Yongming commented on TS-3174:
---

hmm, are you sure you get the correct understanding of the CLFUS effects? In 
our using of the ram cache, the CLFUS will cause some trouble on memory 
wasting, especially with a heavy changed traffic patten, be cause that CLFUS 
will try to cache more small objects and swapping out the big objects after ram 
cache memory full. that is a good feature, but need more working to archive the 
memory allocating/de-allocating during this step, I think that there still need 
more work to make the big objects swaping out and de-allocate or reuse.

I know that will not kill TS on most so busy systems, but you still need to 
keep an eye on that. TS cop process will bring back the failed server, it may 
hide most of the problems from users. :D

and it is easy to verify, on system with mixed objects, ie, active object size 
range from 1KB-100MB.
set a higher ram cut off size from 4M to 100M, and following the 
doc/sdk/troubleshooting-tips/debugging-memory-leaks.en.rst to enable memory 
dump, compare the allocated and used memories on each size.

FYI

> Kill LRU Ram Cache
> --
>
> Key: TS-3174
> URL: https://issues.apache.org/jira/browse/TS-3174
> Project: Traffic Server
>  Issue Type: Task
>Reporter: Susan Hinrichs
> Fix For: 6.0.0
>
>
> Comment from [~zwoop]. Now that CLFUS is both stable, and default, is there 
> even a reason to keep the old LRU cache. If no objections should remove for 
> the next major version change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TS-2314) New config to allow unsatifiable Range: request to go straight to Origin

2014-09-22 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143348#comment-14143348
 ] 

Zhao Yongming edited comment on TS-2314 at 9/22/14 4:10 PM:


yeah, I think your suggestion is great. and the current rww lack of support for 
many cases besides this case, for example:
1, how long a reader in waiting should wait for?
   in this case, the answer sounds like not at all, but use as much downloaded 
data as possible
2, should we enable the rww for a file as big as 5G?
   for example I'd like to make a limited usage with relatively small files 
such as <30m, due to the origin site is far away from the edge site.
3, should we consider on the patial feature in the 
https://cwiki.apache.org/confluence/display/TS/Partial+Object+Caching ?
4, well, if it is a low speed user that triggered the cache storing, will it be 
a speed problem for others readers that waiting?

well, that are some of the issues we thinking on rww, I just want rww get more 
loves :D


was (Author: zym):
yeah, I think your suggestion is great. and the current rww lack of support for 
many cases besides this case, for example:
1, how long should a reader in waiting should wait for?
   in this case, the answer sounds like not at all, but use as much downloaded 
data as possible
2, should we enable the rww for a file as big as 5G?
   for example I'd like to make a limited usage with relatively small files 
such as <30m, due to the origin site is far away from the edge site.
3, should we consider on the patial feature in the 
https://cwiki.apache.org/confluence/display/TS/Partial+Object+Caching ?
4, well, if it is a low speed user that triggered the cache storing, will it be 
a speed problem for others readers that waiting?

well, that are some of the issues we thinking on rww, I just want rww get more 
loves :D

> New config to allow unsatifiable Range: request to go straight to Origin
> 
>
> Key: TS-2314
> URL: https://issues.apache.org/jira/browse/TS-2314
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: jaekyung oh
>  Labels: range
> Attachments: TS-2314.diff
>
>
> Basically read_while_writer works fine when ATS handles normal file.
> In progressive download and playback of mp4 in which moov atom is placed at 
> the end of the file, ATS makes and returns wrong response for range request 
> from unfulfilled cache when read_while_writer is 1.
> In origin, apache has h264 streaming module. Everything is ok whether the 
> moov atom is placed at the beginning of the file or not in origin except a 
> range request happens with read_while_writer.
> Mostly our customer’s contents placed moov atom at the end of the file and in 
> the case movie player stops playing when it seek somewhere in the movie.
> to check if read_while_writer works fine,
> 1. prepare a mp4 file whose moov atom is placed at the end of the file.
> 2. curl --range - http://www.test.com/mp4/test.mp4 1> 
> no_cache_from_origin 
> 3. wget http://www.test.com/mp4/test.mp4
> 4. right after wget, execute “curl --range - 
> http://www.test.com/mp4/test.mp4 1> from_read_while_writer” on other terminal
> (the point is sending range request while ATS is still downloading)
> 5. after wget gets done, curl --range - 
> http://www.test.com/mp4/test.mp4 1> from_cache
> 6. you can check compare those files by bindiff.
> The response from origin(no_cache_from_origin) for the range request is 
> exactly same to from_cache resulted from #5's range request. but 
> from_read_while_writer from #4 is totally different from others.
> i think a range request should be forwarded to origin server if it can’t find 
> the content with the offset in cache even if the read_while_writer is on, 
> instead ATS makes(from where?) and sends wrong response. (In squid.log it 
> indicates TCP_HIT)
> That’s why a movie player stops when it seeks right after the movie starts.
> Well. we turned off read_while_writer and movie play is ok but the problems 
> is read_while_writer is global options. we can’t set it differently for each 
> remap entry by conf_remap.
> So the downloading of Big file(not mp4 file) gives overhead to origin server 
> because read_while_writer is off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-2314) New config to allow unsatifiable Range: request to go straight to Origin

2014-09-22 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143348#comment-14143348
 ] 

Zhao Yongming commented on TS-2314:
---

yeah, I think your suggestion is great. and the current rww lack of support for 
many cases besides this case, for example:
1, how long should a reader in waiting should wait for?
   in this case, the answer sounds like not at all, but use as much downloaded 
data as possible
2, should we enable the rww for a file as big as 5G?
   for example I'd like to make a limited usage with relatively small files 
such as <30m, due to the origin site is far away from the edge site.
3, should we consider on the patial feature in the 
https://cwiki.apache.org/confluence/display/TS/Partial+Object+Caching ?
4, well, if it is a low speed user that triggered the cache storing, will it be 
a speed problem for others readers that waiting?

well, that are some of the issues we thinking on rww, I just want rww get more 
loves :D

> New config to allow unsatifiable Range: request to go straight to Origin
> 
>
> Key: TS-2314
> URL: https://issues.apache.org/jira/browse/TS-2314
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: jaekyung oh
>  Labels: range
> Attachments: TS-2314.diff
>
>
> Basically read_while_writer works fine when ATS handles normal file.
> In progressive download and playback of mp4 in which moov atom is placed at 
> the end of the file, ATS makes and returns wrong response for range request 
> from unfulfilled cache when read_while_writer is 1.
> In origin, apache has h264 streaming module. Everything is ok whether the 
> moov atom is placed at the beginning of the file or not in origin except a 
> range request happens with read_while_writer.
> Mostly our customer’s contents placed moov atom at the end of the file and in 
> the case movie player stops playing when it seek somewhere in the movie.
> to check if read_while_writer works fine,
> 1. prepare a mp4 file whose moov atom is placed at the end of the file.
> 2. curl --range - http://www.test.com/mp4/test.mp4 1> 
> no_cache_from_origin 
> 3. wget http://www.test.com/mp4/test.mp4
> 4. right after wget, execute “curl --range - 
> http://www.test.com/mp4/test.mp4 1> from_read_while_writer” on other terminal
> (the point is sending range request while ATS is still downloading)
> 5. after wget gets done, curl --range - 
> http://www.test.com/mp4/test.mp4 1> from_cache
> 6. you can check compare those files by bindiff.
> The response from origin(no_cache_from_origin) for the range request is 
> exactly same to from_cache resulted from #5's range request. but 
> from_read_while_writer from #4 is totally different from others.
> i think a range request should be forwarded to origin server if it can’t find 
> the content with the offset in cache even if the read_while_writer is on, 
> instead ATS makes(from where?) and sends wrong response. (In squid.log it 
> indicates TCP_HIT)
> That’s why a movie player stops when it seeks right after the movie starts.
> Well. we turned off read_while_writer and movie play is ok but the problems 
> is read_while_writer is global options. we can’t set it differently for each 
> remap entry by conf_remap.
> So the downloading of Big file(not mp4 file) gives overhead to origin server 
> because read_while_writer is off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-2314) New config to allow unsatifiable Range: request to go straight to Origin

2014-09-22 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143290#comment-14143290
 ] 

Zhao Yongming commented on TS-2314:
---

looks like a fundamental bug in the rww, should we take a deep look at it? 
cutting down the origin traffic is always the critical feature for a cache.

> New config to allow unsatifiable Range: request to go straight to Origin
> 
>
> Key: TS-2314
> URL: https://issues.apache.org/jira/browse/TS-2314
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: jaekyung oh
>  Labels: range
> Attachments: TS-2314.diff
>
>
> Basically read_while_writer works fine when ATS handles normal file.
> In progressive download and playback of mp4 in which moov atom is placed at 
> the end of the file, ATS makes and returns wrong response for range request 
> from unfulfilled cache when read_while_writer is 1.
> In origin, apache has h264 streaming module. Everything is ok whether the 
> moov atom is placed at the beginning of the file or not in origin except a 
> range request happens with read_while_writer.
> Mostly our customer’s contents placed moov atom at the end of the file and in 
> the case movie player stops playing when it seek somewhere in the movie.
> to check if read_while_writer works fine,
> 1. prepare a mp4 file whose moov atom is placed at the end of the file.
> 2. curl --range - http://www.test.com/mp4/test.mp4 1> 
> no_cache_from_origin 
> 3. wget http://www.test.com/mp4/test.mp4
> 4. right after wget, execute “curl --range - 
> http://www.test.com/mp4/test.mp4 1> from_read_while_writer” on other terminal
> (the point is sending range request while ATS is still downloading)
> 5. after wget gets done, curl --range - 
> http://www.test.com/mp4/test.mp4 1> from_cache
> 6. you can check compare those files by bindiff.
> The response from origin(no_cache_from_origin) for the range request is 
> exactly same to from_cache resulted from #5's range request. but 
> from_read_while_writer from #4 is totally different from others.
> i think a range request should be forwarded to origin server if it can’t find 
> the content with the offset in cache even if the read_while_writer is on, 
> instead ATS makes(from where?) and sends wrong response. (In squid.log it 
> indicates TCP_HIT)
> That’s why a movie player stops when it seeks right after the movie starts.
> Well. we turned off read_while_writer and movie play is ok but the problems 
> is read_while_writer is global options. we can’t set it differently for each 
> remap entry by conf_remap.
> So the downloading of Big file(not mp4 file) gives overhead to origin server 
> because read_while_writer is off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3083) crash

2014-09-17 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138429#comment-14138429
 ] 

Zhao Yongming commented on TS-3083:
---

hmm, can you provide more information on your configure options and env? I 
think we may get [~yunkai] take a look if it is the freelist issue

> crash
> -
>
> Key: TS-3083
> URL: https://issues.apache.org/jira/browse/TS-3083
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.2
>Reporter: bettydramit
>  Labels: crash
>
> c++filt  {code}
> /lib64/libpthread.so.0(+0xf710)[0x2b4c37949710]
> /usr/lib64/trafficserver/libtsutil.so.5(ink_atomiclist_pop+0x3e)[0x2b4c35abb64e]
> /usr/lib64/trafficserver/libtsutil.so.5(reclaimable_freelist_new+0x65)[0x2b4c35abc065]
> /usr/bin/traffic_server(MIOBuffer_tracker::operator()(long)+0x2b)[0x4a33db]
> /usr/bin/traffic_server(PluginVCCore::init()+0x2e3)[0x4d9903]
> /usr/bin/traffic_server(PluginVCCore::alloc()+0x11d)[0x4dcf4d]
> /usr/bin/traffic_server(TSHttpConnectWithPluginId+0x5d)[0x4b9e9d]
> /usr/bin/traffic_server(FetchSM::httpConnect()+0x74)[0x4a0224]
> /usr/bin/traffic_server(PluginVC::process_read_side(bool)+0x375)[0x4da675]
> /usr/bin/traffic_server(PluginVC::process_write_side(bool)+0x57a)[0x4dafca]
> /usr/bin/traffic_server(PluginVC::main_handler(int, void*)+0x315)[0x4dc9a5]
> /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x8f)[0x73788f]
> /usr/bin/traffic_server(EThread::execute()+0x57b)[0x7381fb]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-29 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115460#comment-14115460
 ] 

Zhao Yongming commented on TS-3032:
---

OK, looks memory wasting is less than 10% in the freelist, that is fine. pleass 
rise the vm.max_map_count and let us compare the too boxes in memory usage 
trends.

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
> Attachments: memory.d.png
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, 
> void*)+0x22b)[0x59270b]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server[0x714a60]
> /z/bi

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-29 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115416#comment-14115416
 ] 

Zhao Yongming commented on TS-3032:
---

well, looks your memory is starting from >20G, I'd think that your index memory 
is nearly about 20G, that indicate you may have ~20TB storage, if you haven't 
change proxy.config.cache.min_average_object_size, is this right?

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
> Attachments: memory.d.png
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, 
> void*)+0x22b)[0x59270b]
> /z/bin/traffic_server(HttpSM::main_handler(int, voi

[jira] [Comment Edited] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-29 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115401#comment-14115401
 ] 

Zhao Yongming edited comment on TS-3032 at 8/29/14 4:26 PM:


yeah, 64k is too small for you, I'd suggest you  128K for 48G ram system, but 
is that your ram cache set to 10G too? why it still use so many memory here? 
can you dump out the mem allocator debug info?


was (Author: zym):
yeah, 64k is too small for you, I'd suggest you > 128K, you may use 256K I 
think.

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
> Attachments: memory.d.png
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_se

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-29 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115401#comment-14115401
 ] 

Zhao Yongming commented on TS-3032:
---

yeah, 64k is too small for you, I'd suggest you > 128K, you may use 256K I 
think.

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
> Attachments: memory.d.png
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, 
> void*)+0x22b)[0x59270b]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server[0x714a60]
> /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x1ed)[0x7077cd]
> /z/bin/t

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-28 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113762#comment-14113762
 ] 

Zhao Yongming commented on TS-3032:
---

any update? [~ngorchilov]

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, 
> void*)+0x22b)[0x59270b]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server[0x714a60]
> /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x1ed)[0x7077cd]
> /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x736111]
> /z/bin/traffic_server(EThre

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-26 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110941#comment-14110941
 ] 

Zhao Yongming commented on TS-3032:
---

I'd suggest you get some tool to log the memory usage and other history data. a 
tool we used very often in tracing issues like this is 
https://github.com/alibaba/tsar  
https://blog.zymlinux.net/index.php/archives/251 , any other tool that can find 
out the data to compare is great.

when we deal with TS-1006, I even make some excel sheet to point out that the 
memory is a big problem, the more data the better

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528

[jira] [Commented] (TS-3021) hosting.config vs volume.config

2014-08-26 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110898#comment-14110898
 ] 

Zhao Yongming commented on TS-3021:
---

the hosting and volume is the same usage? I don't think so. the volume defines 
the partation spliting of the storage space, and the hosting assign them to 
hostname. unless you want to remove the control matcher, I'd not suggest to 
change thire file syntax. 

the config file is End User Interface, and we should do carefully discuss 
before we take any action. changes in UI is much evil than function renames in 
codes 

> hosting.config vs volume.config
> ---
>
> Key: TS-3021
> URL: https://issues.apache.org/jira/browse/TS-3021
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Igor Galić
> Fix For: sometime
>
>
> it appears to me that hosting.config and volume.config have a very similar 
> purpose / use-case. perhaps it would be good to merge those two.
> ---
> n.b.: i'm not up-to-date on the plans re lua-config, but even then we'll need 
> to consider how to present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109225#comment-14109225
 ] 

Zhao Yongming commented on TS-3032:
---

and if you have more than one boxes with that issue, please consider test one 
box with the following tweak:
1. re-install with reclaim freelist enabled. and make sure reclaim is enabled 
in the records.config
2. use the standard LRU: set proxy.config.cache.ram_cache.algorithm to 1

and if you have more system that can do a release test, we can identify which 
release is proved to be correct. :D

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffi

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109216#comment-14109216
 ] 

Zhao Yongming commented on TS-3032:
---

I'd like you keep colect those data for some more days, the same time(to get 
the same load) if you can, to figure out which component is wasting more 
memories.

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, 
> void*)+0x22b)[0x59270b]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server[0x714a60]
> /z/bin/traffic_server(NetHandler::mainNetEvent

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109214#comment-14109214
 ] 

Zhao Yongming commented on TS-3032:
---

well, you have 7368964608 memory in the freelist, and 4893378608 in use, that 
is 66% in use. with about 8000 active connections. all sounds not so bad except 
that 7G is far smaller than that 19G from the pid summary, why?


> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, 
> void*)+0x22b)[0x59270b]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_se

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109133#comment-14109133
 ] 

Zhao Yongming commented on TS-3032:
---

looks nothing unusal, I think that 'Cached: 25975284 kB' is caused by 
the access logging, then we need more infomation on ATS:
1. your ram cache setting: proxy.config.cache.ram_cache.size, if not set please 
tell us your storage device usage, and cache min_average_object_size.
2. let us dump some memory details in the ATS itself: 
https://docs.trafficserver.apache.org/en/latest/sdk/troubleshooting-tips/debugging-memory-leaks.en.html

and we should better get all those data the breaking point too :D

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_call

[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes

2014-08-24 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108410#comment-14108410
 ] 

Zhao Yongming commented on TS-3032:
---

I don't know who have any sucess story on BIG memory system, I'd like to hear 
if any.

for the problem you have, please attach some more data such as:
1. /proc/meminfo
2. the traffic_server process status: /proc//status
3. more system log related to alloc and memory, such as dmesg & syslog

and, please tell us your configure options when building the binary too. hopes 
that will help us inspect the problem.

> FATAL: ats_malloc: couldn't allocate XX bytes
> -
>
> Key: TS-3032
> URL: https://issues.apache.org/jira/browse/TS-3032
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 5.0.1
>Reporter: Nikolai Gorchilov
>Assignee: Brian Geffon
>  Labels: crash
> Fix For: 5.2.0
>
>
> ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due 
> to memory allocation issue. Happens once or twice a week.
> Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing 
> suspicious in dmesg.
> {noformat}
> FATAL: ats_malloc: couldn't allocate 155648 bytes
> /z/bin/traffic_server - STACK TRACE: 
> /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837]
> /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50]
> /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834]
> /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, 
> HdrHeap*)+0x8f)[0x62a54f]
> /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, 
> HTTPHdr*, bool, long)+0x1ae)[0x5d08de]
> /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, 
> HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c]
> /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8]
> /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x66)[0x58e356]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a]
> /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, 
> void*)+0x236)[0x2b626342b508]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_cache_open_read(int, 
> void*)+0x180)[0x59b070]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98]
> /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, 
> void*)+0x173)[0x57bbb3]
> /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, 
> CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6]
> /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, 
> HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0]
> /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, 
> CacheLookupHttpConfig*, long)+0x83)[0x57c2d3]
> /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a]
> /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51]
> /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235]
> /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
> /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528

[jira] [Commented] (TS-2895) memory allocation failure

2014-08-17 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099899#comment-14099899
 ] 

Zhao Yongming commented on TS-2895:
---

[~wangjun] any update on this issue?

> memory allocation failure
> -
>
> Key: TS-2895
> URL: https://issues.apache.org/jira/browse/TS-2895
> Project: Traffic Server
>  Issue Type: Test
>  Components: Cache, Clustering
>Reporter: wangjun
>Assignee: Zhao Yongming
>  Labels: crash
> Fix For: sometime
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> In this version(ats 4.0.2), I encountered a bug (memory allocation failure), 
> Look at the system log, screenshots below(screenshot-1.jpg).
> Look at the program logs, screenshots below((screenshot-2.jpg).
> Please help me, thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (TS-2966) Update Feature not working

2014-08-17 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming reassigned TS-2966:
-

Assignee: Zhao Yongming

> Update Feature not working
> --
>
> Key: TS-2966
> URL: https://issues.apache.org/jira/browse/TS-2966
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache, Core
>Reporter: Thomas Stinner
>Assignee: Zhao Yongming
> Fix For: sometime
>
> Attachments: traffic.out, trafficserver.patch
>
>
> I had a problem using the update feature. I recevied a SegFault in 
> do_host_db_lookup which was caused by accessing ua_session which was not 
> initialized (see attached patch). 
> After fixing that i no longer get an SegFault, but the files that are 
> retrieved by recursion are not placed into the cache. They are requested in 
> every schedule. 
> Only the starting file is placed correctly into the cache. 
> When retrieving the files with a client, caching works as expected. So i 
> don't think this is a configuration error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2903) Connections are leaked at about 1000 per hour

2014-07-01 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049568#comment-14049568
 ] 

Zhao Yongming commented on TS-2903:
---

well, 3.2.5 is definitely a very old version, can you test it on the git master 
version?

and if you find out that connections is leaking, you may need to check why the 
httpsm is hanging, please use the {http} in http_ui to get the detailed 
imformations, it is the best tool for this issue.

good luck

> Connections are leaked at about 1000 per hour
> -
>
> Key: TS-2903
> URL: https://issues.apache.org/jira/browse/TS-2903
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Puneet Dhaliwal
>
> For version 3.2.5, with keep alive on for in/out and post out, connections 
> were leaked at about 1000 per hour. The limit of 
> proxy.config.net.connections_throttle was reached at 30k and at 60k after 
> enough time.
> CONFIG proxy.config.http.keep_alive_post_out INT 1
> CONFIG proxy.config.http.keep_alive_enabled_in INT 1
> CONFIG proxy.config.http.keep_alive_enabled_out INT 1
> This might also be happening for 4.2.1 and 5.0.
> Pls let me know if there is further information required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2796) Leaking CacheVConnections

2014-05-22 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006172#comment-14006172
 ] 

Zhao Yongming commented on TS-2796:
---

any update on this issue? do you need me push on the code diffing in taobao's 
side?

> Leaking CacheVConnections
> -
>
> Key: TS-2796
> URL: https://issues.apache.org/jira/browse/TS-2796
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 4.0.2, 4.2.1, 5.0.0
>Reporter: Brian Geffon
>Assignee: Brian Geffon
>  Labels: yahoo
> Fix For: 5.0.0
>
>
> It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking 
> CacheVConnections resulting in IOBufAllocator leaking also, here is an 
> example:
>  allocated  |in-use  | type size  |   free list name
>67108864 |  0 |2097152 | 
> memory/ioBufAllocator[14]
>67108864 |   19922944 |1048576 | 
> memory/ioBufAllocator[13]
>  4798283776 |   14155776 | 524288 | 
> memory/ioBufAllocator[12]
>  7281311744 |   98304000 | 262144 | 
> memory/ioBufAllocator[11]
>  1115684864 |  148242432 | 131072 | 
> memory/ioBufAllocator[10]
>  497544 |  379977728 |  65536 | 
> memory/ioBufAllocator[9]
>  9902751744 | 5223546880 |  32768 | 
> memory/ioBufAllocator[8]
> 14762901504 |14762311680 |  16384 | 
> memory/ioBufAllocator[7]
>  6558056448 | 6557859840 |   8192 | 
> memory/ioBufAllocator[6]
>41418752 |   30502912 |   4096 | 
> memory/ioBufAllocator[5]
>  524288 |  0 |   2048 | 
> memory/ioBufAllocator[4]
>   0 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   0 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   0 |  0 |128 | 
> memory/ioBufAllocator[0]
> 2138112 |2124192 |928 | 
> memory/cacheVConnection
> [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x.
> The code path in CacheVC that is allocating the IoBuffers is 
> memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom 
> the real issue here is the leaking CacheVC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2796) Leaking CacheVConnections

2014-05-14 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997180#comment-13997180
 ] 

Zhao Yongming commented on TS-2796:
---

hmm, from what I know, many people have the memory issue, and it is a malloc 
and gc issue, but they just don't realized it. that is why I am pushing the 
reclaim freelist enabled by default. why not test it if you can vevify the 
result in hours.

and please take a look at the 'allocated' -  'in-use' where 
memory/ioBufAllocator size > 32K, if you sum up them, that is the memory you 
leak. the same as TS-1006

> Leaking CacheVConnections
> -
>
> Key: TS-2796
> URL: https://issues.apache.org/jira/browse/TS-2796
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 4.0.2, 4.2.1, 5.0.0
>Reporter: Brian Geffon
>  Labels: yahoo
> Fix For: 5.0.0
>
>
> It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking 
> CacheVConnections resulting in IOBufAllocator leaking also, here is an 
> example:
>  allocated  |in-use  | type size  |   free list name
>67108864 |  0 |2097152 | 
> memory/ioBufAllocator[14]
>67108864 |   19922944 |1048576 | 
> memory/ioBufAllocator[13]
>  4798283776 |   14155776 | 524288 | 
> memory/ioBufAllocator[12]
>  7281311744 |   98304000 | 262144 | 
> memory/ioBufAllocator[11]
>  1115684864 |  148242432 | 131072 | 
> memory/ioBufAllocator[10]
>  497544 |  379977728 |  65536 | 
> memory/ioBufAllocator[9]
>  9902751744 | 5223546880 |  32768 | 
> memory/ioBufAllocator[8]
> 14762901504 |14762311680 |  16384 | 
> memory/ioBufAllocator[7]
>  6558056448 | 6557859840 |   8192 | 
> memory/ioBufAllocator[6]
>41418752 |   30502912 |   4096 | 
> memory/ioBufAllocator[5]
>  524288 |  0 |   2048 | 
> memory/ioBufAllocator[4]
>   0 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   0 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   0 |  0 |128 | 
> memory/ioBufAllocator[0]
> 2138112 |2124192 |928 | 
> memory/cacheVConnection
> [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x.
> The code path in CacheVC that is allocating the IoBuffers is 
> memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom 
> the real issue here is the leaking CacheVC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2796) Leaking CacheVConnections

2014-05-14 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997221#comment-13997221
 ] 

Zhao Yongming commented on TS-2796:
---

yeah, I know you may think that the reclaim freelist is hard to manage and evil 
in coding, but if we can confirm it may help in this case, I'd like you think 
of enable it by default, we really should not waste so many time here, and pull 
back some not so experiencd users when they may think that we do have big 
memory problem in core.

I'd push on the other enhancement you like to make it enable by default. :D

> Leaking CacheVConnections
> -
>
> Key: TS-2796
> URL: https://issues.apache.org/jira/browse/TS-2796
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 4.0.2, 4.2.1, 5.0.0
>Reporter: Brian Geffon
>  Labels: yahoo
> Fix For: 5.0.0
>
>
> It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking 
> CacheVConnections resulting in IOBufAllocator leaking also, here is an 
> example:
>  allocated  |in-use  | type size  |   free list name
>67108864 |  0 |2097152 | 
> memory/ioBufAllocator[14]
>67108864 |   19922944 |1048576 | 
> memory/ioBufAllocator[13]
>  4798283776 |   14155776 | 524288 | 
> memory/ioBufAllocator[12]
>  7281311744 |   98304000 | 262144 | 
> memory/ioBufAllocator[11]
>  1115684864 |  148242432 | 131072 | 
> memory/ioBufAllocator[10]
>  497544 |  379977728 |  65536 | 
> memory/ioBufAllocator[9]
>  9902751744 | 5223546880 |  32768 | 
> memory/ioBufAllocator[8]
> 14762901504 |14762311680 |  16384 | 
> memory/ioBufAllocator[7]
>  6558056448 | 6557859840 |   8192 | 
> memory/ioBufAllocator[6]
>41418752 |   30502912 |   4096 | 
> memory/ioBufAllocator[5]
>  524288 |  0 |   2048 | 
> memory/ioBufAllocator[4]
>   0 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   0 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   0 |  0 |128 | 
> memory/ioBufAllocator[0]
> 2138112 |2124192 |928 | 
> memory/cacheVConnection
> [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x.
> The code path in CacheVC that is allocating the IoBuffers is 
> memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom 
> the real issue here is the leaking CacheVC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2796) Leaking CacheVConnections

2014-05-12 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995961#comment-13995961
 ] 

Zhao Yongming commented on TS-2796:
---

and if the recleamable freelist will help you, please help me promote it to be 
enabled by default

> Leaking CacheVConnections
> -
>
> Key: TS-2796
> URL: https://issues.apache.org/jira/browse/TS-2796
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 4.0.2, 4.2.1, 5.0.0
>Reporter: Brian Geffon
>  Labels: yahoo
> Fix For: 5.0.0
>
>
> It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking 
> CacheVConnections resulting in IOBufAllocator leaking also, here is an 
> example:
>  allocated  |in-use  | type size  |   free list name
>67108864 |  0 |2097152 | 
> memory/ioBufAllocator[14]
>67108864 |   19922944 |1048576 | 
> memory/ioBufAllocator[13]
>  4798283776 |   14155776 | 524288 | 
> memory/ioBufAllocator[12]
>  7281311744 |   98304000 | 262144 | 
> memory/ioBufAllocator[11]
>  1115684864 |  148242432 | 131072 | 
> memory/ioBufAllocator[10]
>  497544 |  379977728 |  65536 | 
> memory/ioBufAllocator[9]
>  9902751744 | 5223546880 |  32768 | 
> memory/ioBufAllocator[8]
> 14762901504 |14762311680 |  16384 | 
> memory/ioBufAllocator[7]
>  6558056448 | 6557859840 |   8192 | 
> memory/ioBufAllocator[6]
>41418752 |   30502912 |   4096 | 
> memory/ioBufAllocator[5]
>  524288 |  0 |   2048 | 
> memory/ioBufAllocator[4]
>   0 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   0 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   0 |  0 |128 | 
> memory/ioBufAllocator[0]
> 2138112 |2124192 |928 | 
> memory/cacheVConnection
> [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x.
> The code path in CacheVC that is allocating the IoBuffers is 
> memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom 
> the real issue here is the leaking CacheVC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2796) Leaking CacheVConnections

2014-05-12 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995956#comment-13995956
 ] 

Zhao Yongming commented on TS-2796:
---

I don't know what is this issue looking for, if we focus on the last line of 
memory dump, the memory/cacheVConnection, please ignore my comment. 

the most of the memory leaking in your memory dump result is 
memory/ioBufAllocator size > 32K. and from what I can guess, you are using the 
defualt CLFUS ram cache algorithm, which will produce this effect when the 
system running a long time, and the big objects in memory is replaced by the 
smaller ones, but memory used by big objects is not released to the system yet.

and that issue is already adressed in TS-1006, and result in the reclaimable 
freelist memory management codes, already shiped in >4.0 releases, with a 
configure options to enable.

so, if this is the cause, please help verify that your problem is still there 
with reclaimable freelist enabled, and you may test the simple LRU algorithm in 
the ram cache too.

thanks


> Leaking CacheVConnections
> -
>
> Key: TS-2796
> URL: https://issues.apache.org/jira/browse/TS-2796
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Affects Versions: 4.0.2, 4.2.1, 5.0.0
>Reporter: Brian Geffon
>  Labels: yahoo
> Fix For: 5.0.0
>
>
> It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking 
> CacheVConnections resulting in IOBufAllocator leaking also, here is an 
> example:
>  allocated  |in-use  | type size  |   free list name
>67108864 |  0 |2097152 | 
> memory/ioBufAllocator[14]
>67108864 |   19922944 |1048576 | 
> memory/ioBufAllocator[13]
>  4798283776 |   14155776 | 524288 | 
> memory/ioBufAllocator[12]
>  7281311744 |   98304000 | 262144 | 
> memory/ioBufAllocator[11]
>  1115684864 |  148242432 | 131072 | 
> memory/ioBufAllocator[10]
>  497544 |  379977728 |  65536 | 
> memory/ioBufAllocator[9]
>  9902751744 | 5223546880 |  32768 | 
> memory/ioBufAllocator[8]
> 14762901504 |14762311680 |  16384 | 
> memory/ioBufAllocator[7]
>  6558056448 | 6557859840 |   8192 | 
> memory/ioBufAllocator[6]
>41418752 |   30502912 |   4096 | 
> memory/ioBufAllocator[5]
>  524288 |  0 |   2048 | 
> memory/ioBufAllocator[4]
>   0 |  0 |   1024 | 
> memory/ioBufAllocator[3]
>   0 |  0 |512 | 
> memory/ioBufAllocator[2]
>   32768 |  0 |256 | 
> memory/ioBufAllocator[1]
>   0 |  0 |128 | 
> memory/ioBufAllocator[0]
> 2138112 |2124192 |928 | 
> memory/cacheVConnection
> [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x.
> The code path in CacheVC that is allocating the IoBuffers is 
> memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom 
> the real issue here is the leaking CacheVC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2669) ATS crash, then restart with all cached objects cleared

2014-03-28 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950510#comment-13950510
 ] 

Zhao Yongming commented on TS-2669:
---

please attach the server start log in diags.log file, as I attached. and please 
tell us how is your storage is configured.

> ATS crash, then restart with all cached objects cleared
> ---
>
> Key: TS-2669
> URL: https://issues.apache.org/jira/browse/TS-2669
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: AnDao
> Attachments: cachobjecs.png, storage.png
>
>
> Hi all,
> I'm using ATS 4.1.2, my ATS is just crashed and restart and clean all the 
> cached objects, cause my backend servers overload. Why ATS do clean all the 
> cached objects when crash and restart?
> The log is:
> * manager.log
>  [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: 
> [LocalManager::mgmtShutdown] Executing shutdown request.
> [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: 
> [LocalManager::processShutdown] Executing process shutdown request.
> [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR:  (last system error 32: 
> Broken pipe)
> [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} STATUS: opened 
> /zserver/log/trafficserver/manager.log
> [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} NOTE: updated diags config
> [Mar 27 12:57:13.520] Manager {0x7ffaeec7e7e0} NOTE: [ClusterCom::ClusterCom] 
> Node running on OS: 'Linux' Release: '2.6.32-358.6.2.el6.x86_64'
> [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: 
> [LocalManager::listenForProxy] Listening on port: 80
> [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: [TrafficManager] Setup 
> complete
> [Mar 27 12:57:14.618] Manager {0x7ffaeec7e7e0} NOTE: 
> [LocalManager::startProxy] Launching ts process
> [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: 
> [LocalManager::pollMgmtProcessServer] New process connecting fd '15'
> [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: [Alarms::signalAlarm] 
> Server Process born
> *** traffic.out ***
> [E. Mgmt] log ==> [TrafficManager] using root directory 
> '/zserver/trafficserver-4.1.2'
> [TrafficServer] using root directory '/zserver/trafficserver-4.1.2'
> NOTE: Traffic Server received Sig 15: Terminated
> [E. Mgmt] log ==> [TrafficManager] using root directory 
> '/zserver/trafficserver-4.1.2'
> [TrafficServer] using root directory '/zserver/trafficserver-4.1.2'
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /zserver/trafficserver-4.1.2/bin/traffic_server - STACK TRACE: 
> /lib64/libpthread.so.0(+0x35a360f500)[0x2b3b55819500]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact47change_response_header_because_of_range_requestEPNS_5StateEP7HTTPHdr+0x240)[0x54b8a0]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact28handle_content_length_headerEPNS_5StateEP7HTTPHdrS3_+0x2c8)[0x54bc38]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact14build_responseEPNS_5StateEP7HTTPHdrS3_11HTTPVersion10HTTPStatusPKc+0x3e3)[0x54c0c3]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact22handle_transform_readyEPNS_5StateE+0x70)[0x54ca40]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN6HttpSM32call_transact_and_set_next_stateEPFvPN12HttpTransact5StateEE+0x28)[0x51b418]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN6HttpSM38state_response_wait_for_transform_readEiPv+0xed)[0x52988d]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x533178]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN17TransformTerminus12handle_eventEiPv+0x1d2)[0x4e8c62]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x8f)[0x6a5a0f]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN7EThread7executeEv+0x63b)[0x6a658b]
> /zserver/trafficserver-4.1.2/bin/traffic_server[0x6a48aa]
> /lib64/libpthread.so.0(+0x35a3607851)[0x2b3b55811851]
> /lib64/libc.so.6(clone+0x6d)[0x35a32e890d]
> [E. Mgmt] log ==> [TrafficManager] using root directory 
> '/zserver/trafficserver-4.1.2'
> [TrafficServer] using root directory '/zserver/trafficserver-4.1.2'



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-2669) ATS crash, then restart with all cached objects cleared

2014-03-27 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950437#comment-13950437
 ] 

Zhao Yongming commented on TS-2669:
---

well, maybe something that need more check, please findout the diags.log lines 
like:
{code}
[Mar 27 20:31:25.948] {0x2b02748fde00} STATUS: opened 
/var/log/trafficserver/diags.log
[Mar 27 20:31:25.948] {0x2b02748fde00} NOTE: updated diags config
[Mar 27 20:31:25.954] Server {0x2b02748fde00} NOTE: cache clustering disabled
[Mar 27 20:31:25.964] Server {0x2b02748fde00} NOTE: ip_allow.config updated, 
reloading
[Mar 27 20:31:25.969] Server {0x2b02748fde00} NOTE: loading SSL certificate 
configuration from /etc/trafficserver/ssl_multicert.config
[Mar 27 20:31:25.976] Server {0x2b02748fde00} NOTE: cache clustering disabled
[Mar 27 20:31:25.977] Server {0x2b02748fde00} NOTE: logging initialized[15], 
logging_mode = 3
[Mar 27 20:31:25.978] Server {0x2b02748fde00} NOTE: loading plugin 
'/usr/lib64/trafficserver/plugins/libloader.so'
[Mar 27 20:31:25.982] Server {0x2b02748fde00} NOTE: loading plugin 
'/usr/local/ironbee/libexec/ts_ironbee.so'
[Mar 27 20:31:25.983] Server {0x2b02748fde00} NOTE: Rolling interval adjusted 
from 0 sec to 300 sec for /var/log/trafficserver/ts-ironbee.log
[Mar 27 20:31:25.992] Server {0x2b02748fde00} NOTE: traffic server running
[Mar 27 20:31:26.077] Server {0x2b0275d8e700} NOTE: cache enabled
{code}
you may find out that ' traffic server running' indicates that ATS is running, 
and 'cache enabled' show that cache is working. due to your system is crash and 
cache is not enabled, I will suspect that your ATS does not have that 'cache 
enabled' line.

this is often caused by something like privilege issue, that ATS server process 
does not have the write privilege on the disk block device files etc. or 
anything else you may findout in the diags.log or even system logs.

the interim cache & aio bugs may cause you lose the saved data, interim cache 
may cause you lose the data while server process restart, and AIO may get all 
data clear. but all those bugs fixed in the v4.1.0 release.


> ATS crash, then restart with all cached objects cleared
> ---
>
> Key: TS-2669
> URL: https://issues.apache.org/jira/browse/TS-2669
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: AnDao
>
> Hi all,
> I'm using ATS 4.1.2, my ATS is just crashed and restart and clean all the 
> cached objects, cause my backend servers overload. Why ATS do clean all the 
> cached objects when crash and restart?
> The log is:
> * manager.log
>  [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} FATAL: 
> [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: 
> [LocalManager::mgmtShutdown] Executing shutdown request.
> [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: 
> [LocalManager::processShutdown] Executing process shutdown request.
> [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR:  (last system error 32: 
> Broken pipe)
> [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} STATUS: opened 
> /zserver/log/trafficserver/manager.log
> [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} NOTE: updated diags config
> [Mar 27 12:57:13.520] Manager {0x7ffaeec7e7e0} NOTE: [ClusterCom::ClusterCom] 
> Node running on OS: 'Linux' Release: '2.6.32-358.6.2.el6.x86_64'
> [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: 
> [LocalManager::listenForProxy] Listening on port: 80
> [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: [TrafficManager] Setup 
> complete
> [Mar 27 12:57:14.618] Manager {0x7ffaeec7e7e0} NOTE: 
> [LocalManager::startProxy] Launching ts process
> [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: 
> [LocalManager::pollMgmtProcessServer] New process connecting fd '15'
> [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: [Alarms::signalAlarm] 
> Server Process born
> *** traffic.out ***
> [E. Mgmt] log ==> [TrafficManager] using root directory 
> '/zserver/trafficserver-4.1.2'
> [TrafficServer] using root directory '/zserver/trafficserver-4.1.2'
> NOTE: Traffic Server received Sig 15: Terminated
> [E. Mgmt] log ==> [TrafficManager] using root directory 
> '/zserver/trafficserver-4.1.2'
> [TrafficServer] using root directory '/zserver/trafficserver-4.1.2'
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /zserver/trafficserver-4.1.2/bin/traffic_server - STACK TRACE: 
> /lib64/libpthread.so.0(+0x35a360f500)[0x2b3b55819500]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact47change_response_header_because_of_range_requestEPNS_5StateEP7HTTPHdr+0x240)[0x54b8a0]
> /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact28handle_content_length_he

[jira] [Assigned] (TS-2528) better bool handling in public APIs (ts / mgmt)

2014-03-26 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming reassigned TS-2528:
-

Assignee: Zhao Yongming

> better bool handling in public APIs (ts / mgmt)
> ---
>
> Key: TS-2528
> URL: https://issues.apache.org/jira/browse/TS-2528
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Management API
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
>  Labels: api-change
> Fix For: 5.0.0
>
>
> {code}
>   tsapi bool TSListIsEmpty(TSList l);
>   tsapi bool TSListIsValid(TSList l);
>   tsapi bool TSIpAddrListIsEmpty(TSIpAddrList ip_addrl);
>   tsapi bool TSIpAddrListIsValid(TSIpAddrList ip_addrl);
>   tsapi bool TSPortListIsEmpty(TSPortList portl);
>   tsapi bool TSPortListIsValid(TSPortList portl);
>   tsapi bool TSStringListIsEmpty(TSStringList strl);
>   tsapi bool TSStringListIsValid(TSStringList strl);
>   tsapi bool TSIntListIsEmpty(TSIntList intl);
>   tsapi bool TSIntListIsValid(TSIntList intl, int min, int max);
>   tsapi bool TSDomainListIsEmpty(TSDomainList domainl);
>   tsapi bool TSDomainListIsValid(TSDomainList domainl);
>   tsapi TSError TSRestart(bool cluster);
>   tsapi TSError TSBounce(bool cluster);
>   tsapi TSError TSStatsReset(bool cluster, const char *name = NULL);
>   tsapi TSError TSEventIsActive(char *event_name, bool * is_current);
> {code}
> and we have:
> {code}
> #if !defined(linux)
> #if defined (__SUNPRO_CC) || (defined (__GNUC__) || ! defined(__cplusplus))
> #if !defined (bool)
> #if !defined(darwin) && !defined(freebsd) && !defined(solaris)
> // XXX: What other platforms are there?
> #define bool int
> #endif
> #endif
> #if !defined (true)
> #define true 1
> #endif
> #if !defined (false)
> #define false 0
> #endif
> #endif
> #endif  // not linux
> {code}
> I'd like we can make it a typedef or replace bool with int completely, to 
> make things better to be parsed by SWIG tools etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (TS-2668) need a way to fetch from the cluster when doing cluster local caching

2014-03-26 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming reassigned TS-2668:
-

Assignee: weijin

Weijin and Yuqing is working on a feature that is related to the API change 
requirements. please help find out the way to merge with this feature.

> need a way to fetch from the cluster when doing cluster local caching
> -
>
> Key: TS-2668
> URL: https://issues.apache.org/jira/browse/TS-2668
> Project: Traffic Server
>  Issue Type: Sub-task
>  Components: Cache, Clustering
>Reporter: Zhao Yongming
>Assignee: weijin
> Fix For: sometime
>
>
> this is the TS-2184 #2 feature subtask.
> when you want do local caching in cluster env, you must tell cache to write 
> done to the local disk when cluster hit. we need a good way to handle this.
> maybe a new API or similar API changes.
> be aware, the #2 feature may harms, and should be co-working with the other 
> features.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (TS-2668) need a way to fetch from the cluster when doing cluster local caching

2014-03-26 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-2668:
-

 Summary: need a way to fetch from the cluster when doing cluster 
local caching
 Key: TS-2668
 URL: https://issues.apache.org/jira/browse/TS-2668
 Project: Traffic Server
  Issue Type: Sub-task
  Components: Cache, Clustering
Reporter: Zhao Yongming


this is the TS-2184 #2 feature subtask.

when you want do local caching in cluster env, you must tell cache to write 
done to the local disk when cluster hit. we need a good way to handle this.

maybe a new API or similar API changes.

be aware, the #2 feature may harms, and should be co-working with the other 
features.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-1521) Enable compression for binary log format

2014-03-04 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919388#comment-13919388
 ] 

Zhao Yongming commented on TS-1521:
---

[~bettydreamit] submit thire gzipping patch for the ascii loging, please 
consider accept this feature too.

> Enable compression for binary log format
> 
>
> Key: TS-1521
> URL: https://issues.apache.org/jira/browse/TS-1521
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Logging
> Environment: RHEL 6+
>Reporter: Lans Carstensen
>Assignee: Yunkai Zhang
> Fix For: 6.0.0
>
> Attachments: logcompress.patch
>
>
> As noted by in a discussion on #traffic-server, gzip can result in 90%+ 
> compression on the binary access logs.  By adding a reasonable streaming 
> compression algorithm to the binary format you could significantly reduce 
> logging-related IOPS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TS-727) Do we need support for "streams" partitions?

2014-02-27 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914204#comment-13914204
 ] 

Zhao Yongming commented on TS-727:
--

I think that remove the streams partitions will result into a completely remove 
of the MIXT cache, someone is working on the rtmp alike streaming cache for 
ATS, I'd like to talk to them before we nuke them.

IMO, the 'stream' cache is much efficent than http if you would like to use ATS 
for live streaming broadcasting.

> Do we need support for "streams" partitions?
> 
>
> Key: TS-727
> URL: https://issues.apache.org/jira/browse/TS-727
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: Leif Hedstrom
>Assignee: Alan M. Carroll
> Fix For: 5.0.0
>
>
> There's code in the cache related to MIXT streams volumes (caches). Since we 
> don't support streams, I'm thinking this code could be removed? Or 
> alternatively, we should expose APIs so that someone writing a plugin and 
> wish to store a different protocol (e.g. QT) can register this media type 
> with the API and core. The idea being that the core only contains protocols 
> that are in the core, but expose the cache core so that plugins can take 
> advantage of it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TS-2184) Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled

2014-02-26 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912654#comment-13912654
 ] 

Zhao Yongming commented on TS-2184:
---

[~weijin] is working with [~happy_fish100] to provide a solution for your on #2 
feature, hopes they can help

> Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled
> ---
>
> Key: TS-2184
> URL: https://issues.apache.org/jira/browse/TS-2184
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Clustering
>Reporter: Scott Harris
>Assignee: Bin Chen
> Fix For: sometime
>
>
> With proxy.config.http.cache.cluster_cache_local enabled I would like cluster 
> nodes to store content locally but try to retrieve content from the cluster 
> first (if not cached locally) and if no cluster nodes have content cached 
> then retrieve from origin.
> Example - 2 Cluster nodes in Full cluster mode.
> 1. Node1 and Node2 are both empty.
> 2. Request to Node1 for "http://www.example.com/foo.html";.
> 3. Query Cluster for object
> 4. Not cached in cluster so retrieve from orgin, serve to client, object now 
> cached on Node1.
> 5. Request comes to Node2 for "http://www.example.com/foo.html";.
> 6. Node2 retrieves cached version from Node1, serves to client, stores 
> locally.
> 7. Subsequent request comes to Node1 or Node2 for 
> "http://www.example.com/foo.html";, object is served to client from local 
> cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TS-2184) Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled

2014-02-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911434#comment-13911434
 ] 

Zhao Yongming commented on TS-2184:
---

for the very HOT content in the cluster, we have another solution that tracking 
of the hot content(in traffic view) and put them in the cluster_cache_local 
list to make it dynamic. and this solution will need a workaround of the 
purging, you need to broadcast all the purging to every machines in the cluster.

and pull from the hashing machine in the cluster is not implemented too. we are 
testing to see how cool it will be. this function is provide by [~happy_fish100]

FYI

> Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled
> ---
>
> Key: TS-2184
> URL: https://issues.apache.org/jira/browse/TS-2184
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Clustering
>Reporter: Scott Harris
>Assignee: Bin Chen
> Fix For: 6.0.0
>
>
> With proxy.config.http.cache.cluster_cache_local enabled I would like cluster 
> nodes to store content locally but try to retrieve content from the cluster 
> first (if not cached locally) and if no cluster nodes have content cached 
> then retrieve from origin.
> Example - 2 Cluster nodes in Full cluster mode.
> 1. Node1 and Node2 are both empty.
> 2. Request to Node1 for "http://www.example.com/foo.html";.
> 3. Query Cluster for object
> 4. Not cached in cluster so retrieve from orgin, serve to client, object now 
> cached on Node1.
> 5. Request comes to Node2 for "http://www.example.com/foo.html";.
> 6. Node2 retrieves cached version from Node1, serves to client, stores 
> locally.
> 7. Subsequent request comes to Node1 or Node2 for 
> "http://www.example.com/foo.html";, object is served to client from local 
> cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TS-2184) Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled

2014-02-25 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911428#comment-13911428
 ] 

Zhao Yongming commented on TS-2184:
---

when Cluster is designed, the origin goal is to have ONLY one single valid 
content in the cluster, that is a good idea when you have very huge volume 
contents, and we have continued on this target, make sure even when some of the 
machines flapping in the cluster , to ensure that at anytime there is only one 
valid content in the cluster.

but consider of the ICP protocol and others, those may have multiple same 
content in the ICP cluster, and if you make it complex, it may have 
multi-version content in the cluster at sometime. so, ICP alike protocols is 
consider as a not so cool(safe) protocol if you need to enforce of the 
consistency of the contents you provide to the user agents.

back to this requirement, we can make the cluster act like ICP, first write to 
the cluster hashing machine, and the second or later read poll that content 
from the cluster and write to the local, but it will introduce the consistency 
problem here, you don't know who have that content local in the cluster when it 
is updated on the origin side, within the freshness. in most case, all write to 
the cache haveto broadcast to all the machines in the cluster, to enforce that 
change.

the proxy.config.http.cache.cluster_cache_local enabled is a directive to 
disable cluster hashing in the cluster mode, our origin target is to use it to 
put some very hot hostnames(or urls) in the local, to reducing the 
intro-cluster traffic. proxy.config.http.cache.cluster_cache_local enabled is 
override-able and we have the same directive in the cache.config too. if it is 
in active, Cluster may consider as mode=3, the single host mod.

so, if we want to achive ICP alike feature in cluster, mostly we should:
1, write the content on the hashing machine, if it is a miss in the cluster
2, read the cluster if it is missing in the local machine
3, write the local if it is a hit in the cluster
4, broadcast the change to all the machines in the cluster, if it is a 
overwrite(ie, revalidating etc)
5, purge on the hashing machine and broadcast the purge to all the machines in 
the cluster
it will be a very big change in the Cluster and http transaction

cc [~zwoop]


> Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled
> ---
>
> Key: TS-2184
> URL: https://issues.apache.org/jira/browse/TS-2184
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Clustering
>Reporter: Scott Harris
>Assignee: Bin Chen
> Fix For: 6.0.0
>
>
> With proxy.config.http.cache.cluster_cache_local enabled I would like cluster 
> nodes to store content locally but try to retrieve content from the cluster 
> first (if not cached locally) and if no cluster nodes have content cached 
> then retrieve from origin.
> Example - 2 Cluster nodes in Full cluster mode.
> 1. Node1 and Node2 are both empty.
> 2. Request to Node1 for "http://www.example.com/foo.html";.
> 3. Query Cluster for object
> 4. Not cached in cluster so retrieve from orgin, serve to client, object now 
> cached on Node1.
> 5. Request comes to Node2 for "http://www.example.com/foo.html";.
> 6. Node2 retrieves cached version from Node1, serves to client, stores 
> locally.
> 7. Subsequent request comes to Node1 or Node2 for 
> "http://www.example.com/foo.html";, object is served to client from local 
> cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TS-2019) find out what is the problem of reporting OpenReadHead failed on vector inconsistency

2014-02-08 Thread Zhao Yongming (JIRA)


[ 
https://issues.apache.org/jira/browse/TS-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895487#comment-13895487
 ] 

Zhao Yongming commented on TS-2019:
---

[~weijin] should check this issue

> find out what is the problem of reporting OpenReadHead failed on vector 
> inconsistency
> -
>
> Key: TS-2019
> URL: https://issues.apache.org/jira/browse/TS-2019
> Project: Traffic Server
>  Issue Type: Task
>  Components: Cache
>Reporter: Zhao Yongming
>Assignee: Alan M. Carroll
>Priority: Critical
> Fix For: 5.0.0
>
>
> {code}
> [Jul 10 19:40:33.170] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey 44B5C68B : vector inconsistency with 4624
> [Jul 10 19:40:33.293] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey 2ABA746F : vector inconsistency with 4632
> [Jul 10 19:40:33.368] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey 389594A0 : vector inconsistency with 4632
> [Jul 10 19:40:33.399] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey FBC601A3 : vector inconsistency with 4632
> [Jul 10 19:40:33.506] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey 1F39AD5F : vector inconsistency with 4632
> [Jul 10 19:40:33.602] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey ABFC6D97 : vector inconsistency with 4632
> [Jul 10 19:40:33.687] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey 2420ABBF : vector inconsistency with 4632
> [Jul 10 19:40:33.753] Server {0x2aaf1680} NOTE: OpenReadHead failed for 
> cachekey 5DD061C8 : vector inconsistency with 4632
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (TS-2561) remove app-template from examples

2014-02-08 Thread Zhao Yongming (JIRA)


 [ 
https://issues.apache.org/jira/browse/TS-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-2561:
--

Affects Version/s: 5.0.0
Fix Version/s: 5.0.0
 Assignee: Zhao Yongming

> remove app-template from examples
> -
>
> Key: TS-2561
> URL: https://issues.apache.org/jira/browse/TS-2561
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cleanup
>Affects Versions: 5.0.0
>Reporter: Zhao Yongming
>Assignee: Zhao Yongming
> Fix For: 5.0.0
>
>
> due to the STANDALONE IOCORE is removed, the app-template example should not 
> be there. and most of the app-template & STANDALONE IOCORE design purpose is 
> able to satisfied with the protocol plugin.
> let us remove it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (TS-2561) remove app-template from examples

2014-02-08 Thread Zhao Yongming (JIRA)

Zhao Yongming created TS-2561:
-

 Summary: remove app-template from examples
 Key: TS-2561
 URL: https://issues.apache.org/jira/browse/TS-2561
 Project: Traffic Server
  Issue Type: Bug
  Components: Cleanup
Reporter: Zhao Yongming


due to the STANDALONE IOCORE is removed, the app-template example should not be 
there. and most of the app-template & STANDALONE IOCORE design purpose is able 
to satisfied with the protocol plugin.

let us remove it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

1 2 3 4 5 6 7 8 >

1 - 100 of 724 matches

Mail list logo