[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329648#comment-14329648 ]
Luca Bruno edited comment on TS-3395 at 2/20/15 10:15 PM: ---------------------------------------------------------- As stated in my earlier comments, I already tried setting open_read settings, max 4 retries and 500 retry time. was (Author: lethalman): As stated in my earlier comments, I already tried setting open_read settings, max 4 retries and 500 time. > Hit ratio drops with high concurrency > ------------------------------------- > > Key: TS-3395 > URL: https://issues.apache.org/jira/browse/TS-3395 > Project: Traffic Server > Issue Type: Bug > Components: Cache > Reporter: Luca Bruno > Fix For: 5.3.0 > > > I'm doing some tests and I've noticed that the hit ratio drops with more than > 300 simultaneous http connections. > The cache is on a raw disk of 500gb and it's not filled, so no eviction. The > ram cache is disabled. > The test is done with web-polygraph. Content size vary from 5kb to 20kb > uniformly, expected hit ratio 60%, 2000 http connections, documents expire > after months. There's no Vary. > !http://i.imgur.com/Zxlhgnf.png! > Then I thought it could be a problem of polygraph. I wrote my own > client/server test code, it works fine also with squid, varnish and nginx. I > register a hit if I get either cR or cH in the headers. > {noformat} > 2015/02/19 12:38:28 Starting 1000000 requests > 2015/02/19 12:37:58 Elapsed: 3m51.23552164s > 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s > 2015/02/19 12:37:58 Average size: 12.50kb/req > 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s > 2015/02/19 12:37:58 Errors: 0 > 2015/02/19 12:37:58 Offered Hit ratio: 59.95% > 2015/02/19 12:37:58 Measured Hit ratio: 37.20% > 2015/02/19 12:37:58 Hit bytes: 4649000609 > 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req > 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req > {noformat} > So similar results, 37.20% on average. Then I thought that could be a problem > of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit > ratio, but request rate is very slow compared to ATS for obvious reasons. > Then I wanted to check if with 200 connections but with longer test time hit > ratio also dropped, but no, it's fine: > !http://i.imgur.com/oMHscuf.png! > So not a problem of my tests I guess. > Then I realized by debugging the test server that the same url was asked > twice. > Out of 1000000 requests, 78600 urls were asked at least twice. An url was > even requested 9 times. These same url are not requested close to each other: > even more than 30sec can pass from one request to the other for the same url. > I also tweaked the following parameters: > {noformat} > CONFIG proxy.config.http.cache.fuzz.time INT 0 > CONFIG proxy.config.http.cache.fuzz.min_time INT 0 > CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.000000 > CONFIG proxy.config.http.cache.max_open_read_retries INT 4 > CONFIG proxy.config.http.cache.open_read_retry_time INT 500 > {noformat} > And this is the result with polygraph, similar results: > !http://i.imgur.com/YgOndhY.png! > Tweaked the read-while-writer option, and yet having similar results. > Then I've enabled 1GB of ram, it is slightly better at the beginning, but > then it drops: > !http://i.imgur.com/dFTJI16.png! > traffic_top says 25% ram hit, 37% fresh, 63% cold. > So given that it doesn't seem to be a concurrency problem when requesting the > url to the origin server, could it be a problem of concurrent write access to > the cache? So that some pages are not cached at all? The traffoc_top fresh > percentage also makes me think it can be a problem in writing the cache. > Not sure if I explained the problem correctly, ask me further information in > case. But in summary: hit ratio drops with a high number of connections, and > the problem seems related to pages that are not written to the cache. > This is some related issue: > http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E > Also this: > http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)