[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

Zhao Yongming (JIRA) Sat, 21 Feb 2015 17:36:23 -0800

    [ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331983#comment-14331983
 ]


Zhao Yongming commented on TS-3395:
-----------------------------------

I really don't know what you want, you want to stress out what ATS can do? or 
you want ATS do what Nginx/Squid would do? in both issue, I have point out the 
ATS way we deal with your issues, even guide you step by step towards the root 
cause and how we deal it with ATS, you are now using ATS, very different from 
Squid etc, it is powerfull and design in some strange way, if you are the fresh 
user, find out the ATS way is a good start, as it turns out that ATS will 
perform well in most of the real world cases.

on the testing issue, please refer to jtest (tools/jtest/) on testing if you 
dont know that, that is an other good stress tool which is suitble for stress a 
performance monster as ATS.

anyway, welcome to the ATS Colosseum.



> Hit ratio drops with high concurrency
> -------------------------------------
>
>                 Key: TS-3395
>                 URL: https://issues.apache.org/jira/browse/TS-3395
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>            Reporter: Luca Bruno
>             Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 1000000 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 1000000 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.000000
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency

Reply via email to