[ https://issues.apache.org/jira/browse/TS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sudheer Vinukonda updated TS-3979: ---------------------------------- Fix Version/s: (was: 7.0.0) 6.1.0 > Deprecate allow empty doc cache setting. > ---------------------------------------- > > Key: TS-3979 > URL: https://issues.apache.org/jira/browse/TS-3979 > Project: Traffic Server > Issue Type: Bug > Components: Cache > Affects Versions: 6.0.0 > Reporter: Sudheer Vinukonda > Fix For: 6.1.0 > > > Refer TS-3978, TS-621 > Following a discussion on the IRC, opening this jira to deprecate the Allow > empty doc cache setting. The consensus is to cache anything that's cacheable > and valid (regardless, of if it's empty or non-empty) - Just to clarify, it > is not clear there's a way to ensure that it's safe to cache anything right > now. TS-621 tried to cache empty docs in the past but ran into many > difficulties, so, opening TS-3980 to investigate how Http layer (or other > higher layers) may safely indicate the CacheVC that it's safe to cache. > {code} > Amaryllis > 7:50 > sudheerv: do any other reverse proxies refuse to cache empty documents > without content-length? it's not something i've noticed before (in varnish, > for example) > Amaryllis > 7:50 > the connection could still be broken at any point in the response and result > in broken content being cached > 7:51 > but as a compromise, what if there was another option to only cache empty > responses for 3xx, which is probably 80% of use cases? > 7:52 > (maybe, in addition, it should accept transfer-encoding: chunked, since that > can indicate end-of-body) > zwoop_ > 7:54 > Amaryllis we should talk to amc (he knows everything, we don't need Google > here) about this > 7:54 > zwoop_ is now known as woop > 7:54 > woop is now known as zwoop > 7:54 > zwoop left the room (quit: Changing host). > 7:54 > zwoop [~zwoop@apache/committer/zwoop] entered the room. > 7:54 > mode (+o zwoop) by ChanServ > zwoop > 7:55 > Amaryllis amc Maybe we could have a plugin API, that basically forces to set > f.allow_empty_doc ? > amc > 7:55 > I think you can change the CacheVC structure without problem. > Amaryllis > 7:55 > i think some sort of fix for this should be in core, at the moment TS won't > cache any of our redirects > amc > 7:56 > There's not a global config to allow that? > zwoop > 7:56 > Amaryllis does it send a Cache-Control header with the redirects ? > Amaryllis > 7:56 > amc, zwoop: the origin sends transfer-encoding: chunked instead of > content-length. > 7:56 > so even with cache-control it won't be cached, that's what this PR is to > change > 7:56 > TS-3978 > ASFBot > 7:56 > TS-3978: Allow empty document caching to follow normal logic - > https://issues.apache.org/jira/browse/TS-3978 > zwoop > 7:56 > ah > 7:57 > yeah, that's expected > 7:57 > and is why we added that option to allow empty docs to be cached > Amaryllis > 7:57 > this is with that option enabled > zwoop > 7:57 > huh > Amaryllis > 7:57 > it still won't cache any document without content-length > zwoop > 7:57 > yeah, need CL: too > 7:57 > that's a requirement > Amaryllis > 7:57 > so i'll update the PR to also recognise chunked responses as being cacheable > even if empty > zwoop > 7:57 > otherwise, it can't distinguish the cases you were worried about (a broken > connection) vs a truly empty doc > Amaryllis > 7:58 > zwoop: right, but the PR makes it optional, only if allow_empty_doc=2 > zwoop > 7:58 > heh, your redirect is Chunked ?? > Amaryllis > 7:58 > yes, an empty chunked response > zwoop > 7:58 > is that even correct? > Amaryllis > 7:58 > it's perfectly valid, if somewhat unusual > zwoop > 7:58 > doesn't the chunking require the final "0" ? > 7:58 > which means, the doc isn't empty > Amaryllis > 7:58 > i think ATS is looking at 'empty' after de-chucking, isn't it? > 7:59 > because it's definitely not cacheing these responses > zwoop > 7:59 > not sure > 7:59 > gancho_ left the room (quit: Ping timeout: 272 seconds). > amc > 7:59 > I think Amaryllis is correct. > zwoop > 7:59 > Amaryllis but, I definitely remember that the CL: header was a requirement > that can not be ignored (safely), so not sure I think having an > allow_empty_doc=2 is ok > 8:00 > unless allow_empty_doc=2 also means allow Chunked content to be empty > amc > 8:00 > I need to check to see what ATS actually caches - the chunked data or the > payload. > zwoop > 8:00 > but, you have to have some indicator telling us that the doc really is empty > before we can cache it > 8:00 > amc is caches the unchunked data > 8:00 > it dechunks it, and caches that > 8:01 > is dechunk even a proper verb? > amc > 8:01 > Ah, but it caches the encoding header. > Amaryllis > 8:01 > zwoop: https://dpaste.de/6kUT > zwoop > 8:01 > amc Hmmm, really ? That doesn't sound right > amc > 8:01 > "zwoop drank too much at the summit and dechunked himself in the hallway". > Yep, a proper verb. > Amaryllis > 8:01 > that's the actual origin response causing problems for us > zwoop > 8:01 > amc I'd expect it to change it from chunked to a Content-Length: > amc > 8:02 > I was wondering about that. > zwoop > 8:02 > amc lets dechunk big time at the Bar camp > amc > 8:02 > What happens when the content is served? Is the encoding preserved and obeyed? > zwoop > 8:03 > for the first client (the cache miss) I think it seems the chunked response > Amaryllis > 8:03 > yes, TS returns a correct thunked response > 8:03 > and never caches it, so it's always the same > zwoop > 8:03 > but subsequent requests (when served out of cache) should have a CL: > amc > 8:03 > I was thinking of what happens for chunked responses from origin that are > non-zero lenght. > zwoop > 8:03 > Amaryllis can you not convince your origin that they are crazy, and change > it to not send a Chunked empty response, and instead send CL: 0 ? > sudheerv > 8:04 > catching up on the messages…but, my concern is the same as zwoop's > Amaryllis > 8:04 > zwoop: no, it never adds a CL: https://dpaste.de/Goex > zwoop > 8:04 > Amaryllis because youa re not caching it > Amaryllis > 8:04 > yes, exactly, because it's not cacheable > zwoop > 8:04 > Amaryllis amc is asking the general question, to which my answer is > Amaryllis > 8:04 > ah > zwoop > 8:04 > amc yes, I'd expect it to send the chunked response to the first client, and > subsequent responses (out of cache) with a CL: 0 > Amaryllis > 8:04 > perhaps i can make uwsgi do some sort of output buffering and put a CL header > in > zwoop > 8:05 > amc that's at least how it's worked for the last ~10 years, we better not > ahve changed it > Amaryllis > 8:05 > zwoop: how can that ever happen? a chunked empty response will never be > cached > zwoop > 8:05 > not talking empty responses > 8:05 > general case > amc > 8:05 > I have many opinions on this. I do think it should be cacheable as an empty > doc if it's chunked with a zero length payload. > Amaryllis > 8:05 > okay, but it's not CL: 0 then > zwoop > 8:05 > your case is deranged > 8:05 > Amaryllis right, nothing to do with your case > Amaryllis > 8:05 > amc: shall i submit a separate PR to fix that? > amc > 8:06 > Yes. > zwoop > 8:06 > amc yeah, that seems reasonable. But, If it was me, I'd still beat the Origin > people over the head, cause they are crazy > sudheerv > 8:06 > amc: yes, but, you have to ensure that you received the final chunk > 8:06 > if it already is the case, by the time you are checking for allow empty doc > caching, then that's fine > amc > 8:06 > Right. That is happening in the clips Amaryllis showed us. > Amaryllis > 8:06 > zwoop: i think it's because the origin sends the headers before it's > generated the response body. so it doesn't know it's going to be empty > sudheerv > 8:06 > but, i dont know if that's the case already > zwoop > 8:07 > Amaryllis it would know it's a redirect no? > 8:07 > Amaryllis and why would a redirect have a body, ever ? > Amaryllis > 8:07 > yes, but redirects can still have bodies > sudheerv > 8:07 > you may get empy responses without content-length when we support outbound > spdy/h2 as well > zwoop > 8:07 > true > Amaryllis > 8:07 > most web servers include bodies in 3xx > 8:07 > i did have the origin fix it in one specific case but we need something more > general > zwoop > 8:07 > yeah, I forgot about that, you're right > 8:07 > niq left the room (quit: Ping timeout: 250 seconds). > amc > 8:08 > If it's chunked and ATS recieves the final payload (0 length frame) it can > reasonably presume it got valid (non-truncated) content. > sudheerv > 8:08 > Amaryllis: I think the patch should ensure it can cover non-broken empty body > cases for chunked-encoding/spdy/h2 > Amaryllis > 8:08 > sudheerv: does this even apply to spdy/h2? > Amaryllis > 8:09 > actually, there is no H2 to origins at all... > sudheerv > 8:09 > not right now - we don't support outbound spdy/h2 yet > Amaryllis > 8:09 > right > sudheerv > 8:09 > but, when we do, it will be a similar problem there > Amaryllis > 8:09 > yes > sudheerv > 8:09 > there's no chunked encoding or content-len with spdy/h2 > Amaryllis > 8:09 > but i can't really test code that's yet to be written > sudheerv > 8:09 > so, yet another case of empty body > 8:09 > no, my point is - the code we write now must not break when we have spdy/h2 > 8:10 > so, instead of "assuming" that the body is valid, it might make sense > 8:10 > to somehow indicate that it is > Amaryllis > 8:10 > well, i think it will break anyway, in the sense that empty H2 responses > won't be cached... since they have no CL? > sudheerv > 8:10 > well, but you are fixing that > Amaryllis > 8:10 > but H2 doesn't do TE either, does it? i thought it replaces that entire thing > sudheerv > 8:10 > if it's consistently broken (or not supported), that's alright > 8:10 > yes, it doesn't > 8:10 > that's my point > Amaryllis > 8:11 > all i'm adding (for now) is to make it accept CL _or_ TE > sudheerv > 8:11 > so, we need to have a generic way of notifying cache layer > 8:11 > that the response is valid > Amaryllis > 8:11 > and test that empty TE is properly checked for the empty chunk > amc > 8:12 > Hmmm. Maybe a virtual on CacheVConnection "setAllowEmptyContent(bool)". > sudheerv > 8:12 > yeah, something like that > amc > 8:12 > Then an implementation in CacheVC. > sudheerv > 8:12 > amc: i also want to make this setting overridable > amc > 8:12 > Then the higher layer can say "I know this was a valid zero length content, > cache monkey" > sudheerv > 8:12 > (but, that's for later) > 8:12 > yes, that makes sense > Amaryllis > 8:13 > amc: CacheVC already has allow_empty_doc flag, i think that's enough > sudheerv > 8:13 > Amaryllis: yeah, it does - I poked around that a bit recently, didn't seem > very straightfwd > amc > 8:13 > Just bang on that directly? > sudheerv > 8:13 > but, pls go ahead and do it if you find it easy > Amaryllis > 8:13 > amc: that was the plan, yes > 8:14 > amc: > https://github.com/apache/trafficserver/pull/310/files#diff-7fada9a23fa1d12e90ca6bec876ecf8fL477 > 8:14 > amc: ... that check just needs another check for chunked as well. > sudheerv > 8:15 > Amaryllis: if we did something like what amc suggested above > amc > 8:15 > No, I meant that it would be set from (for instance) the chunked decoder or > just above that. > sudheerv > 8:15 > where the higher layer can notify cache that it can be cached > 8:15 > then we don't have to check any headers > 8:15 > that already seems hacky enough > amc > 8:15 > So there wouldn't be an addiitional value for the config, it would just work > as epexted. > 8:15 > expected. > sudheerv > 8:15 > yes > jpeach > 8:15 > there's already a function that checks whether the whole document was received > amc > 8:16 > Hmmm. Interesting. > Amaryllis > 8:16 > i think some things are getting mixed up here - i'm not going to add an > additional config value > 8:16 > that's what the rejected PR did > jpeach > 8:16 > it is used when figuring what to do when getting an EOF > amc > 8:16 > What I would want is a way for the chunk decoder or its parent logic to pass > on to the CacheVC the validity of the document. > 8:17 > Amaryllis- I thought your PR added the value '2' to the config variable. > jpeach > 8:17 > afaict as long as you can know you have the entire doc, it could be cacheable > sudheerv > 8:17 > jpeach: that makes sense > Amaryllis > 8:17 > amc: yes, but people disliked that, so instead i suggested fixing it to check > for chucked encoding > 8:17 > amc: which doesn't need an additional config setting > sudheerv > 8:17 > but, i guess the complication is that, at what ppint to do you know that you > received the whole doc > amc > 8:17 > OK, I must be reading the wrong PR. > sudheerv > 8:17 > is it before checking allow empty doc or after > Amaryllis > 8:18 > amc: there's only one PR, i didn't write the chunked version yet > sudheerv > 8:18 > if it's just a matter of changing the order to check for whole doc, without > breaking anything , then that seems like a cleaner solution > amc > 8:18 > Ah, OK. > 8:19 > Yes, I agree. I think we're all on the same page - check for getting the > whole document and if it's zero length and the config value is 1, cache it. > Amaryllis > 8:19 > (we need something for 6.0 fairly quickly, so we'll be using this patch in > production, but i'm happy to put more effort into fixing it properly) > sudheerv > 8:19 > +1 > amc > 8:19 > The brokeness is that complete chunked stuff isn't being checked correctly. > 8:19 > Amaryllis - coool. > Amaryllis > 8:19 > yes - that's the only bit that actually needs to be fixed, but it might also > make sense to refactor how the check is done > amc > 8:20 > Allright. > Amaryllis > 8:20 > i'm not sure i know enough about the TS internals to actually implement the > second bit though > amc > 8:20 > I have to go get my Macbook fixed - update took out my wireless. > 8:21 > I'll see if I have some time this weekend to take a look. > sudheerv > 8:21 > Amaryllis: if it's okay with everyone else, adding the check for TE header > just to fix the TE part alone for now is fine by me as well > 8:21 > (we can open a separate jira to improve the code for future) > 8:22 > (was just saying, that instead of adding case by case, it'd be nicer to have > a cleaner solution - but, by no means, it's immediately required) > Amaryllis > 8:23 > sudheerv: sure > jpeach > 8:23 > found it ... HttpSM::is_http_server_eos_truncation > 8:23 > dustywusty_ is now known as dustywusty > sudheerv > 8:23 > jpeach: that might be a bit late? > 8:24 > it seems to be after tunnel completes > jpeach > 8:24 > the place it is called from might be too late, but that's the logic for > knowing if we have the whole response > sudheerv > 8:24 > whereas, the tunnel is writing to UA and Cache in parallel > zwoop > 8:24 > sorry, had to reboot. > sudheerv > 8:24 > jpeach: right, but, the point is > zwoop > 8:24 > amc sudheerv I'm thinking, maybe we should just deprecate this setting > completely ? And always allow caching empty docs when we safely can ? > sudheerv > 8:24 > you may get an EOS after some body is recieved already > 8:25 > zwoop: +1 to that > zwoop > 8:25 > it was added for compatibility reasons eons ago, but we changed the default > to "1" in 5.x I think > sudheerv > 8:25 > we just need to know when is it safe > jpeach > 8:25 > sudheerv: I'm just saying that we have logic to know whether we have the > whole response; not that the logic is already used in all the right places > zwoop > 8:25 > we can deprecate it now (marking it as "don't change / use this unless you > really, really have to" and remove it for 7.0.0 > sudheerv > 8:27 > +1 > 8:27 > reveller left the room. > zwoop > 8:28 > sudheerv file a Jira on it maybe ? Or two. We should change the docs now > (marking it deprecated) and a bug for 7.0.0 for removal > sudheerv > 8:29 > sure.. > 8:30{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)