Re: Overhead in 404 handling

2023-09-09 Thread Carsten Ziegeler
Why do we need special casing outside of the error handler. Couldn't the 
error handler do the same?


Regards
Carsten

On 08.09.2023 18:13, Jörg Hoh wrote:

Hi,

The handling of 404s in Sling can be quite resource-intense, especially if
a custom error handler is provided, which renders a full-blown page.

This can lead to the situation, that the 404 handling is as complex and
resource-intense as handling a normal resource. This comes with these 2
aspects:

* In such a situation a 404 often takes the same amount to handle than to
render a proper HTML result. So creating a lot of 404s is an easy way for
overload.
* the useragent (browser) often does not look at the body of a 404
response, especially if the requested content type is not HTML. So for
example if the browser requests a JS file, but gets a 404 statuscode, it
ignores the (costly rendered) 404 page. In this case I think that an empty
response body has the same effect.

For these reasons I am thinking about creating a short-cut in the request
error handling (opt-in), which will prevent the default error handling from
being started if the user-agent does not request a HTML resource; instead
just the status code plus a short status in the response body will be sent.
For HTML requests still the regular error handling is executed.

In a first POC this would all be hardcoded, but even in a releaseable
version I don't think it makes sense to distinguish between more than "HTML
requests" and "everything else". Also the response body can be hardcoded in
these "short-cut" version.

WDYT?



--
Carsten Ziegeler
Adobe
cziege...@apache.org


Re: Overhead in 404 handling

2023-09-08 Thread Eric Norman
Hi Jörg,

I'm curious if your overload use case would be primarily for requests sent
by anonymous users?

I have been handling anonymous "sling/nonexisting" resources by using a
custom filter to force a redirect to the login page instead of sending back
a 404.  In other words, the anonymous user doesn't get to know that the
resource does not exist until they sign in.  Signed users would get the
full rendered (and branded) 404 UI.  I figured that if a logged in user is
abusing the 404 pages, then I know who they are from the logs and can rate
limit or disable that account until they stop doing that.

But maybe you have some other scenarios?

Regards,
Eric

On Fri, Sep 8, 2023 at 9:13 AM Jörg Hoh 
wrote:

> Hi,
>
> The handling of 404s in Sling can be quite resource-intense, especially if
> a custom error handler is provided, which renders a full-blown page.
>
> This can lead to the situation, that the 404 handling is as complex and
> resource-intense as handling a normal resource. This comes with these 2
> aspects:
>
> * In such a situation a 404 often takes the same amount to handle than to
> render a proper HTML result. So creating a lot of 404s is an easy way for
> overload.
> * the useragent (browser) often does not look at the body of a 404
> response, especially if the requested content type is not HTML. So for
> example if the browser requests a JS file, but gets a 404 statuscode, it
> ignores the (costly rendered) 404 page. In this case I think that an empty
> response body has the same effect.
>
> For these reasons I am thinking about creating a short-cut in the request
> error handling (opt-in), which will prevent the default error handling from
> being started if the user-agent does not request a HTML resource; instead
> just the status code plus a short status in the response body will be sent.
> For HTML requests still the regular error handling is executed.
>
> In a first POC this would all be hardcoded, but even in a releaseable
> version I don't think it makes sense to distinguish between more than "HTML
> requests" and "everything else". Also the response body can be hardcoded in
> these "short-cut" version.
>
> WDYT?
>
> --
> Cheers,
> Jörg Hoh,
>
> https://cqdump.joerghoh.de
> Twitter: @joerghoh
>


Overhead in 404 handling

2023-09-08 Thread Jörg Hoh
Hi,

The handling of 404s in Sling can be quite resource-intense, especially if
a custom error handler is provided, which renders a full-blown page.

This can lead to the situation, that the 404 handling is as complex and
resource-intense as handling a normal resource. This comes with these 2
aspects:

* In such a situation a 404 often takes the same amount to handle than to
render a proper HTML result. So creating a lot of 404s is an easy way for
overload.
* the useragent (browser) often does not look at the body of a 404
response, especially if the requested content type is not HTML. So for
example if the browser requests a JS file, but gets a 404 statuscode, it
ignores the (costly rendered) 404 page. In this case I think that an empty
response body has the same effect.

For these reasons I am thinking about creating a short-cut in the request
error handling (opt-in), which will prevent the default error handling from
being started if the user-agent does not request a HTML resource; instead
just the status code plus a short status in the response body will be sent.
For HTML requests still the regular error handling is executed.

In a first POC this would all be hardcoded, but even in a releaseable
version I don't think it makes sense to distinguish between more than "HTML
requests" and "everything else". Also the response body can be hardcoded in
these "short-cut" version.

WDYT?

-- 
Cheers,
Jörg Hoh,

https://cqdump.joerghoh.de
Twitter: @joerghoh