Re: Overhead in 404 handling
Why do we need special casing outside of the error handler. Couldn't the error handler do the same? Regards Carsten On 08.09.2023 18:13, Jörg Hoh wrote: Hi, The handling of 404s in Sling can be quite resource-intense, especially if a custom error handler is provided, which renders a full-blown page. This can lead to the situation, that the 404 handling is as complex and resource-intense as handling a normal resource. This comes with these 2 aspects: * In such a situation a 404 often takes the same amount to handle than to render a proper HTML result. So creating a lot of 404s is an easy way for overload. * the useragent (browser) often does not look at the body of a 404 response, especially if the requested content type is not HTML. So for example if the browser requests a JS file, but gets a 404 statuscode, it ignores the (costly rendered) 404 page. In this case I think that an empty response body has the same effect. For these reasons I am thinking about creating a short-cut in the request error handling (opt-in), which will prevent the default error handling from being started if the user-agent does not request a HTML resource; instead just the status code plus a short status in the response body will be sent. For HTML requests still the regular error handling is executed. In a first POC this would all be hardcoded, but even in a releaseable version I don't think it makes sense to distinguish between more than "HTML requests" and "everything else". Also the response body can be hardcoded in these "short-cut" version. WDYT? -- Carsten Ziegeler Adobe cziege...@apache.org
Re: Overhead in 404 handling
Hi Jörg, I'm curious if your overload use case would be primarily for requests sent by anonymous users? I have been handling anonymous "sling/nonexisting" resources by using a custom filter to force a redirect to the login page instead of sending back a 404. In other words, the anonymous user doesn't get to know that the resource does not exist until they sign in. Signed users would get the full rendered (and branded) 404 UI. I figured that if a logged in user is abusing the 404 pages, then I know who they are from the logs and can rate limit or disable that account until they stop doing that. But maybe you have some other scenarios? Regards, Eric On Fri, Sep 8, 2023 at 9:13 AM Jörg Hoh wrote: > Hi, > > The handling of 404s in Sling can be quite resource-intense, especially if > a custom error handler is provided, which renders a full-blown page. > > This can lead to the situation, that the 404 handling is as complex and > resource-intense as handling a normal resource. This comes with these 2 > aspects: > > * In such a situation a 404 often takes the same amount to handle than to > render a proper HTML result. So creating a lot of 404s is an easy way for > overload. > * the useragent (browser) often does not look at the body of a 404 > response, especially if the requested content type is not HTML. So for > example if the browser requests a JS file, but gets a 404 statuscode, it > ignores the (costly rendered) 404 page. In this case I think that an empty > response body has the same effect. > > For these reasons I am thinking about creating a short-cut in the request > error handling (opt-in), which will prevent the default error handling from > being started if the user-agent does not request a HTML resource; instead > just the status code plus a short status in the response body will be sent. > For HTML requests still the regular error handling is executed. > > In a first POC this would all be hardcoded, but even in a releaseable > version I don't think it makes sense to distinguish between more than "HTML > requests" and "everything else". Also the response body can be hardcoded in > these "short-cut" version. > > WDYT? > > -- > Cheers, > Jörg Hoh, > > https://cqdump.joerghoh.de > Twitter: @joerghoh >
Overhead in 404 handling
Hi, The handling of 404s in Sling can be quite resource-intense, especially if a custom error handler is provided, which renders a full-blown page. This can lead to the situation, that the 404 handling is as complex and resource-intense as handling a normal resource. This comes with these 2 aspects: * In such a situation a 404 often takes the same amount to handle than to render a proper HTML result. So creating a lot of 404s is an easy way for overload. * the useragent (browser) often does not look at the body of a 404 response, especially if the requested content type is not HTML. So for example if the browser requests a JS file, but gets a 404 statuscode, it ignores the (costly rendered) 404 page. In this case I think that an empty response body has the same effect. For these reasons I am thinking about creating a short-cut in the request error handling (opt-in), which will prevent the default error handling from being started if the user-agent does not request a HTML resource; instead just the status code plus a short status in the response body will be sent. For HTML requests still the regular error handling is executed. In a first POC this would all be hardcoded, but even in a releaseable version I don't think it makes sense to distinguish between more than "HTML requests" and "everything else". Also the response body can be hardcoded in these "short-cut" version. WDYT? -- Cheers, Jörg Hoh, https://cqdump.joerghoh.de Twitter: @joerghoh