[ 
https://issues.apache.org/jira/browse/TS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheer Vinukonda updated TS-3979:
----------------------------------
    Affects Version/s: 6.0.0
        Fix Version/s: 7.0.0

> Remove allow empty doc cache setting.
> -------------------------------------
>
>                 Key: TS-3979
>                 URL: https://issues.apache.org/jira/browse/TS-3979
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 6.0.0
>            Reporter: Sudheer Vinukonda
>             Fix For: 7.0.0
>
>
> Refer TS-3978
> Following a discussion on the IRC, opening this jira to deprecate the Allow 
> empty doc cache setting. The consensus is to cache anything that's cacheable 
> and valid (regardless, of if it's empty or non-empty).
> {code}
> Amaryllis
> 7:50
> sudheerv: do any other reverse proxies refuse to cache empty documents 
> without content-length?  it's not something i've noticed before (in varnish, 
> for example)
> Amaryllis
> 7:50
> the connection could still be broken at any point in the response and result 
> in broken content being cached
> 7:51
> but as a compromise, what if there was another option to only cache empty 
> responses for 3xx, which is probably 80% of use cases?
> 7:52
> (maybe, in addition, it should accept transfer-encoding: chunked, since that 
> can indicate end-of-body)
> zwoop_
> 7:54
> Amaryllis  we should talk to amc (he knows everything, we don't need Google 
> here) about this
> 7:54
> zwoop_ is now known as woop
> 7:54
> woop is now known as zwoop
> 7:54
> zwoop left the room (quit: Changing host).
> 7:54
> zwoop [~zwoop@apache/committer/zwoop] entered the room.
> 7:54
> mode (+o zwoop) by ChanServ
> zwoop
> 7:55
> Amaryllis amc Maybe we could have a plugin API, that basically forces to set 
> f.allow_empty_doc ?
> amc
> 7:55
> I think you can change the CacheVC structure without problem.
> Amaryllis
> 7:55
> i think some sort of fix for this should be in core, at the moment TS won't 
> cache any of our redirects
> amc
> 7:56
> There's not a global config to allow that?
> zwoop
> 7:56
> Amaryllis  does it send a Cache-Control header with the redirects ?
> Amaryllis
> 7:56
> amc, zwoop: the origin sends transfer-encoding: chunked instead of 
> content-length.
> 7:56
> so even with cache-control it won't be cached, that's what this PR is to 
> change
> 7:56
> TS-3978
> ASFBot
> 7:56
> TS-3978: Allow empty document caching to follow normal logic - 
> https://issues.apache.org/jira/browse/TS-3978
> zwoop
> 7:56
> ah
> 7:57
> yeah, that's expected
> 7:57
> and is why we added that option to allow empty docs to be cached
> Amaryllis
> 7:57
> this is with that option enabled
> zwoop
> 7:57
> huh
> Amaryllis
> 7:57
> it still won't cache any document without content-length
> zwoop
> 7:57
> yeah, need CL: too
> 7:57
> that's a requirement
> Amaryllis
> 7:57
> so i'll update the PR to also recognise chunked responses as being cacheable 
> even if empty
> zwoop
> 7:57
> otherwise, it can't distinguish the cases you were worried about (a broken 
> connection) vs a truly empty doc
> Amaryllis
> 7:58
> zwoop: right, but the PR makes it optional, only if allow_empty_doc=2
> zwoop
> 7:58
> heh, your redirect is Chunked ??
> Amaryllis
> 7:58
> yes, an empty chunked response 
> zwoop
> 7:58
> is that even correct?
> Amaryllis
> 7:58
> it's perfectly valid, if somewhat unusual
> zwoop
> 7:58
> doesn't the chunking require the final "0" ?
> 7:58
> which means, the doc isn't empty
> Amaryllis
> 7:58
> i think ATS is looking at 'empty' after de-chucking, isn't it?
> 7:59
> because it's definitely not cacheing these responses
> zwoop
> 7:59
> not sure
> 7:59
> gancho_ left the room (quit: Ping timeout: 272 seconds).
> amc
> 7:59
> I think Amaryllis is correct.
> zwoop
> 7:59
> Amaryllis  but, I definitely remember that the CL: header was a requirement 
> that can not be ignored (safely), so not sure I think having an 
> allow_empty_doc=2 is ok
> 8:00
> unless allow_empty_doc=2 also means allow Chunked content to be empty
> amc
> 8:00
> I need to check to see what ATS actually caches - the chunked data or the 
> payload.
> zwoop
> 8:00
> but, you have to have some indicator telling us that the doc really is empty 
> before we can cache it
> 8:00
> amc is caches the unchunked data
> 8:00
> it dechunks it, and caches that
> 8:01
> is dechunk even a proper verb? 
> amc
> 8:01
> Ah, but it caches the encoding header.
> Amaryllis
> 8:01
> zwoop: https://dpaste.de/6kUT
> zwoop
> 8:01
> amc Hmmm, really ? That doesn't sound right
> amc
> 8:01
> "zwoop drank too much at the summit and dechunked himself in the hallway". 
> Yep, a proper verb.
> Amaryllis
> 8:01
> that's the actual origin response causing problems for us
> zwoop
> 8:01
> amc I'd expect it to change it from chunked to a Content-Length:
> amc
> 8:02
> I was wondering about that.
> zwoop
> 8:02
> amc lets dechunk big time at the Bar camp
> amc
> 8:02
> What happens when the content is served? Is the encoding preserved and obeyed?
> zwoop
> 8:03
> for the first client (the cache miss) I think it seems the chunked response
> Amaryllis
> 8:03
> yes, TS returns a correct thunked response
> 8:03
> and never caches it, so it's always the same
> zwoop
> 8:03
> but subsequent requests (when served out of cache) should have a CL:
> amc
> 8:03
> I was thinking of what happens for chunked responses from origin that are 
> non-zero lenght.
> zwoop
> 8:03
> Amaryllis  can you not convince your origin that they are crazy, and change 
> it to not send a Chunked empty response, and instead send CL: 0 ?
> sudheerv
> 8:04
> catching up on the messages…but, my concern is the same as zwoop's
> Amaryllis
> 8:04
> zwoop: no, it never adds a CL: https://dpaste.de/Goex
> zwoop
> 8:04
> Amaryllis  because youa re not caching it
> Amaryllis
> 8:04
> yes, exactly, because it's not cacheable 
> zwoop
> 8:04
> Amaryllis  amc is asking the general question, to which my answer is
> Amaryllis
> 8:04
> ah
> zwoop
> 8:04
> amc yes, I'd expect it to send the chunked response to the first client, and 
> subsequent responses (out of cache) with a CL: 0
> Amaryllis
> 8:04
> perhaps i can make uwsgi do some sort of output buffering and put a CL header 
> in
> zwoop
> 8:05
> amc that's at least how it's worked for the last ~10 years, we better not 
> ahve changed it
> Amaryllis
> 8:05
> zwoop: how can that ever happen?  a chunked empty response will never be 
> cached
> zwoop
> 8:05
> not talking empty responses
> 8:05
> general case
> amc
> 8:05
> I have many opinions on this. I do think it should be cacheable as an empty 
> doc if it's chunked with a zero length payload.
> Amaryllis
> 8:05
> okay, but it's not CL: 0 then
> zwoop
> 8:05
> your case is deranged
> 8:05
> Amaryllis  right, nothing to do with your case
> Amaryllis
> 8:05
> amc: shall i submit a separate PR to fix that?
> amc
> 8:06
> Yes.
> zwoop
> 8:06
> amc yeah, that seems reasonable. But, If it was me, I'd still beat the Origin 
> people over the head, cause they are crazy
> sudheerv
> 8:06
> amc: yes, but, you have to ensure that you received the final chunk
> 8:06
> if it already is the case, by the time you are checking for allow empty doc 
> caching, then that's fine
> amc
> 8:06
> Right. That is happening in the clips Amaryllis showed us.
> Amaryllis
> 8:06
> zwoop: i think it's because the origin sends the headers before it's 
> generated the response body.  so it doesn't know it's going to be empty
> sudheerv
> 8:06
> but, i dont know if that's the case already
> zwoop
> 8:07
> Amaryllis  it would know it's a redirect no?
> 8:07
> Amaryllis and why would a redirect have a body, ever ?
> Amaryllis
> 8:07
> yes, but redirects can still have bodies
> sudheerv
> 8:07
> you may get empy responses without content-length when we support outbound 
> spdy/h2 as well
> zwoop
> 8:07
> true
> Amaryllis
> 8:07
> most web servers include bodies in 3xx
> 8:07
> i did have the origin fix it in one specific case but we need something more 
> general
> zwoop
> 8:07
> yeah, I forgot about that, you're right
> 8:07
> niq left the room (quit: Ping timeout: 250 seconds).
> amc
> 8:08
> If it's chunked and ATS recieves the final payload (0 length frame) it can 
> reasonably presume it got valid (non-truncated) content.
> sudheerv
> 8:08
> Amaryllis: I think the patch should ensure it can cover non-broken empty body 
> cases for chunked-encoding/spdy/h2
> Amaryllis
> 8:08
> sudheerv: does this even apply to spdy/h2?
> Amaryllis
> 8:09
> actually, there is no H2 to origins at all...
> sudheerv
> 8:09
> not right now - we don't support outbound spdy/h2 yet
> Amaryllis
> 8:09
> right
> sudheerv
> 8:09
> but, when we do, it will be a similar problem there
> Amaryllis
> 8:09
> yes
> sudheerv
> 8:09
> there's no chunked encoding or content-len with spdy/h2
> Amaryllis
> 8:09
> but i can't really test code that's yet to be written 
> sudheerv
> 8:09
> so, yet another case of empty body 
> 8:09
> no, my point is - the code we write now must not break when we have spdy/h2
> 8:10
> so, instead of "assuming" that the body is valid, it might make sense
> 8:10
> to somehow indicate that it is
> Amaryllis
> 8:10
> well, i think it will break anyway, in the sense that empty H2 responses 
> won't be cached... since they have no CL?
> sudheerv
> 8:10
> well, but you are fixing that 
> Amaryllis
> 8:10
> but H2 doesn't do TE either, does it?  i thought it replaces that entire thing
> sudheerv
> 8:10
> if it's consistently broken (or not supported), that's alright
> 8:10
> yes, it doesn't
> 8:10
> that's my point
> Amaryllis
> 8:11
> all i'm adding (for now) is to make it accept CL _or_ TE
> sudheerv
> 8:11
> so, we need to have a generic way of notifying cache layer
> 8:11
> that the response is valid
> Amaryllis
> 8:11
> and test that empty TE is properly checked for the empty chunk
> amc
> 8:12
> Hmmm. Maybe a virtual on CacheVConnection "setAllowEmptyContent(bool)".
> sudheerv
> 8:12
> yeah, something like that
> amc
> 8:12
> Then an implementation in CacheVC.
> sudheerv
> 8:12
> amc: i also want to make this setting overridable
> amc
> 8:12
> Then the higher layer can say "I know this was a valid zero length content, 
> cache monkey"
> sudheerv
> 8:12
> (but, that's for later)
> 8:12
> yes, that makes sense
> Amaryllis
> 8:13
> amc: CacheVC already has allow_empty_doc flag, i think that's enough
> sudheerv
> 8:13
> Amaryllis: yeah, it does - I poked around that a bit recently, didn't seem 
> very straightfwd
> amc
> 8:13
> Just bang on that directly?
> sudheerv
> 8:13
> but, pls go ahead and do it if you find it easy 
> Amaryllis
> 8:13
> amc: that was the plan, yes 
> 8:14
> amc: 
> https://github.com/apache/trafficserver/pull/310/files#diff-7fada9a23fa1d12e90ca6bec876ecf8fL477
> 8:14
> amc: ... that check just needs another check for chunked as well.
> sudheerv
> 8:15
> Amaryllis: if we did something like what amc suggested above
> amc
> 8:15
> No, I meant that it would be set from (for instance) the chunked decoder or 
> just above that.
> sudheerv
> 8:15
> where the higher layer can notify cache that it can be cached
> 8:15
> then we don't have to check any headers
> 8:15
> that already seems hacky enough
> amc
> 8:15
> So there wouldn't be an addiitional value for the config, it would just work 
> as epexted.
> 8:15
> expected.
> sudheerv
> 8:15
> yes
> jpeach
> 8:15
> there's already a function that checks whether the whole document was received
> amc
> 8:16
> Hmmm. Interesting.
> Amaryllis
> 8:16
> i think some things are getting mixed up here - i'm not going to add an 
> additional config value
> 8:16
> that's what the rejected PR did
> jpeach
> 8:16
> it is used when figuring what to do when getting an EOF
> amc
> 8:16
> What I would want is a way for the chunk decoder or its parent logic to pass 
> on to the CacheVC the validity of the document.
> 8:17
> Amaryllis-  I thought your PR added the value '2' to the config variable.
> jpeach
> 8:17
> afaict as long as you can know you have the entire doc, it could be cacheable
> sudheerv
> 8:17
> jpeach: that makes sense
> Amaryllis
> 8:17
> amc: yes, but people disliked that, so instead i suggested fixing it to check 
> for chucked encoding
> 8:17
> amc: which doesn't need an additional config setting
> sudheerv
> 8:17
> but, i guess the complication is that, at what ppint to do you know that you 
> received the whole doc
> amc
> 8:17
> OK, I must be reading the wrong PR.
> sudheerv
> 8:17
> is it before checking allow empty doc or after
> Amaryllis
> 8:18
> amc: there's only one PR, i didn't write the chunked version yet
> sudheerv
> 8:18
> if it's just a matter of changing the order to check for whole doc, without 
> breaking anything , then that seems like a cleaner solution
> amc
> 8:18
> Ah, OK.
> 8:19
> Yes, I agree. I think we're all on the same page - check for getting the 
> whole document and if it's zero length and the config value is 1, cache it.
> Amaryllis
> 8:19
> (we need something for 6.0 fairly quickly, so we'll be using this patch in 
> production, but i'm happy to put more effort into fixing it properly)
> sudheerv
> 8:19
> +1
> amc
> 8:19
> The brokeness is that complete chunked stuff isn't being checked correctly.
> 8:19
> Amaryllis - coool.
> Amaryllis
> 8:19
> yes - that's the only bit that actually needs to be fixed, but it might also 
> make sense to refactor how the check is done
> amc
> 8:20
> Allright.
> Amaryllis
> 8:20
> i'm not sure i know enough about the TS internals to actually implement the 
> second bit though
> amc
> 8:20
> I have to go get my Macbook fixed - update took out my wireless.
> 8:21
> I'll see if I have some time this weekend to take a look.
> sudheerv
> 8:21
> Amaryllis: if it's okay with everyone else, adding the check for TE header 
> just to fix the TE part alone for now is fine by me as well
> 8:21
> (we can open a separate jira to improve the code for future)
> 8:22
> (was just saying, that instead of adding case by case, it'd be nicer to have 
> a cleaner solution - but, by no means, it's immediately required)
> Amaryllis
> 8:23
> sudheerv: sure
> jpeach
> 8:23
> found it ... HttpSM::is_http_server_eos_truncation
> 8:23
> dustywusty_ is now known as dustywusty
> sudheerv
> 8:23
> jpeach: that might be a bit late?
> 8:24
> it seems to be after tunnel completes
> jpeach
> 8:24
> the place it is called from might be too late, but that's the logic for 
> knowing if we have the whole response
> sudheerv
> 8:24
> whereas, the tunnel is writing to UA and Cache in parallel
> zwoop
> 8:24
> sorry, had to reboot.
> sudheerv
> 8:24
> jpeach: right, but, the point is
> zwoop
> 8:24
> amc sudheerv  I'm thinking, maybe we should just deprecate this setting 
> completely ? And always allow caching empty docs when we safely can ?
> sudheerv
> 8:24
> you may get an EOS after some body is recieved already
> 8:25
> zwoop: +1 to that
> zwoop
> 8:25
> it was added for compatibility reasons eons ago, but we changed the default 
> to "1"  in 5.x I think
> sudheerv
> 8:25
> we just need to know when is it safe 
> jpeach
> 8:25
> sudheerv: I'm just saying that we have logic to know whether we have the 
> whole response; not that the logic is already used in all the right places
> zwoop
> 8:25
> we can deprecate it now (marking it as "don't change / use this unless you 
> really, really have to" and remove it for 7.0.0
> sudheerv
> 8:27
> +1
> 8:27
> reveller left the room.
> zwoop
> 8:28
> sudheerv  file a Jira on it maybe ? Or two. We should change the docs now 
> (marking it deprecated) and a bug for 7.0.0 for removal
> sudheerv
> 8:29
> sure..
> 8:30{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to