Re: Fixing QNetworkAccessManager use

2020-02-20 Thread Ben Cooksley
On Thu, Feb 20, 2020 at 2:09 AM Friedrich W. H. Kossebau
 wrote:
>
> Am Mittwoch, 19. Februar 2020, 08:05:01 CET schrieb Ben Cooksley:
> > On Mon, Feb 3, 2020 at 7:42 AM Volker Krause  wrote:
> > > It would also help to know where specifically we have that problem, so we
> > > can actually solve it, and so we can figure out why we failed to fix this
> > > there earlier.
> >
> > Just bringing this up again - it seems we've not had much movement on
> > this aside from the Wiki page.
>
> The wiki page currently still just recommends to set
> "networkAccessManger->setAttribute(QNetworkRequest::FollowRedirectsAttribute,
> true);"
>
> Which seems simple, but possible not what is enough in all cases.
>
> So my open questions here to be able to act on code I contribute to are:
>
> a) What about the mentioned QNetworkRequest::NoLessSafeRedirectPolicy, in
> which cases should that be used and when not?

For interacting with download.kde.org / files.kde.org, I would advise
against using this policy, as they will in virtually all instances
redirect to mirrors (who don't support https and are http only)

>
> b) What about the HSTS stuff, when is that recommended?

That should be enabled yes.

>
> c) What is a sane number for QNetworkRequest::maximumRedirectsAllowed?

5 to 10 redirects is a relatively sane number I would expect. At the
most I would expect our servers to issue a maximum of 3 redirects in a
given chain of URLs.
If it is longer than that then we are doing something wrong.

>
> Both in general and when it comes to KDE servers.
>
> Personally I am still unsure what the actual issue is. Why are redirects
> needed at all. Why all the address changes all the time? The "U" in
> "URL"/"URI" is for "uniform", not "unstable", isn't it ;)

Please see my other email regarding this.

>
> Can you give some examples for URLs of resources our code uses on KDE servers,
> and why they needed to change?

Get Hot New Stuff functionality (Gen 1), originally using a static
file tree under http://download.kde.org/khotnewstuff/
This needed to change for two reasons:
1) Mandatory HTTPS
2) The benefit of having these files mirrored, considering their
extremely small size and declining client base (KDE 3 and parts of KDE
4) was negligible and creating more load on our systems to support the
mirroring process than we got in terms of benefit of having them
mirrored. We therefore transitioned to serving these through a CDN.

Get Hot New Stuff functionality (Gen 2), originally using a dynamic
web service at http://newstuff.kde.org/ and http://data.kstuff.org/
needed to change for two reasons:
1) Mandatory HTTPS
2) The dynamic web service had not been updated in several years, and
was dependent on a very specific system setup we hadn't been able to
replicate and needed to decomission due to it's age. We therefore
needed to convert it to static files, and arrange for those to be
hosted elsewhere in our systems. newstuff.kde.org now converts the
requests sent to it to redirects to specific static files to keep
applications using it working (which includes KF5 era applications who
still actively use this and in at least one case continue to be
released using this)

Get Hot New Stuff functionality (Gen 3), originally used a file at
http://download.kde.org/ocs/providers.xml (now at
https://autoconfig.kde.org/ocs/providers.xml)
This needed to change for two reasons:
1) Mandatory HTTPS
2) It was necessary for non-sysadmins (particularly those involved in
running store.kde.org) to be able to update the file directly. As the
server hosting download.kde.org is sensitive and doesn't support
deploying changes from Git when they are committed, we had to move the
file to a different subdomain which could support this.

Marble maps, originally hosted under http://download.kde.org/ and
later at http://files.kde.org/marble/maps/ and now at
https://maps.kde.org/,
This need to be moved for couple of reasons:
1) When we transitioned download.kde.org to be a mirror redirector, it
was no longer possible for us to easily host non-mirrored resources
under the same domain (and the maps weren't mirrored), requiring they
be moved to files.kde.org (which as an added benefit also made it
possible for developers to update the maps themselves)
2) Later, it was discovered that Marble performance for loading maps
using files.kde.org after it transitioned to being a mirror redirector
as well was quite poor due to the large number of http requests
involved. We therefore shifted it to a CDN based resource which
eliminated these performance issues, known as maps.kde.org.

KStars resources, originally hosted under
http://download.kde.org/apps/kstars/ needed to be moved to
https://files.kde.org/ for the following reasons:
1) Mandatory HTTPS
2) To allow developers to freely update them as needed, something
which isn't possible on download.kde.org (which is restricted due to
it hosting the master copies of tarballs)

There have also been two instances where we have been 

Re: Fixing QNetworkAccessManager use

2020-02-19 Thread Ben Cooksley
On Thu, Feb 20, 2020 at 9:58 AM Friedrich W. H. Kossebau
 wrote:
>
> Am Mittwoch, 19. Februar 2020, 21:01:20 CET schrieb Johan Ouwerkerk:
> > On Wed, Feb 19, 2020 at 2:09 PM Friedrich W. H. Kossebau
> >
> >  wrote:
> > > Personally I am still unsure what the actual issue is. Why are redirects
> > > needed at all. Why all the address changes all the time?
> >
> > It is part of the HTTP spec for servers to be able to inform clients
> > that resource /foo/bar has moved to /bar/baz, either temporarily or
> > permanently.
>
> :) Thanks for that explanation, but that was not my question here (that part I
> am well aware of, done my share of web stuff).
>
> It was rather: why are subdomain names and/or access paths not once properly
> designed, but instead changed so often that redirection seems so important to
> be a default feature? Just because one can?

Things don't change extremely often.
Sometimes however requirements or other factors change, which
necessitates changing where a resource is hosted.

When this happens, it is extremely useful to have the ability to
relocate it elsewhere.

To use an example, when we first setup files.kde.org it was used by a
couple of things, including Necessitas for the Qt binaries that get
downloaded on to end user (Android) devices. When this was first
established, traffic was well within the reasonable bounds we had
expected when setting this up, and everything was served directly by
our (single) server. This went quite well for a while.

Sometime a bit later, an application was released on Google Play that
used Necessitas which was *extremely* popular, to the extent it caused
around a terabyte of data to be used within 48 hours or so. Hetzner
bandwidth was at this time not only limited to 100mbps, but also
capped - with the limit being 5 TB per month and overage after that
resulting in a charge per terabyte.

We therefore made the decision to convert files.kde.org to a mirror
network (like was already in place for download.kde.org), with
redirection taking place using Mirrorbrain. We were able to complete
this transition quickly thanks to the generous support of some of our
mirrors who established mirrors of files.kde.org. Fortunately
Necessitas had full support for handling redirects, so this is
something we were able to accomplish without any issues.

Had redirect support not been available, we would have been left with
no way out at that time.

I also have other examples involving Marble (including where we got
bitten by QNetworkAccessManager for the very first time - back in
January 2012) and numerous other KDE Edu applications (all of which
fortunately avoided QNAM).

> When we write code, we try to keep API stable as much as possible, and only
> change API when really useful, and that means for the consumer. When doing
> references in text we try to have eternally stable pointers (thanks ISBN &
> Co.),
>
> But this request for stable URLs on the internet might be an idealistic fight
> against windmills of a web 1.0 person...
>
> Cheers
> Friedrich
>
>

Cheers,
Ben


Re: Fixing QNetworkAccessManager use

2020-02-19 Thread Johan Ouwerkerk
On Wed, Feb 19, 2020 at 2:09 PM Friedrich W. H. Kossebau
 wrote:
>
> Personally I am still unsure what the actual issue is. Why are redirects
> needed at all. Why all the address changes all the time?
>

It is part of the HTTP spec for servers to be able to inform clients
that resource /foo/bar has moved to /bar/baz, either temporarily or
permanently.
This can be used to do things like mapping /retrieve/document/by/alias
-> /documents/actual/document-id, or to redirect to different hosts
entirely, or to inform plain text HTTP clients to upgrade to using
HTTPS instead. (HSTS is a spec describing how a server can then ask
the client to subsequently enforce its policy preference for when to
connect over HTTPS.)

The main difference between temporary and permanent redirects is that
clients are allowed to "remember" when a resource moved in the case of
permanent redirects so they can optimise subsequent calls to the moved
resources (bypassing the redirect entirely). But as you can see, the
temporary redirect is something that could be used to do load
balancing: assume /resource is expensive to compute or retrieve, then
put a proxy in front which load balances to the actual pool of servers
using temporary redirects. (Of course you could argue that in such a
case maybe round-robin DNS is a better solution altogether.)

Regards,

- Johan


Re: Fixing QNetworkAccessManager use

2020-02-19 Thread Friedrich W. H. Kossebau
Am Mittwoch, 19. Februar 2020, 21:01:20 CET schrieb Johan Ouwerkerk:
> On Wed, Feb 19, 2020 at 2:09 PM Friedrich W. H. Kossebau
> 
>  wrote:
> > Personally I am still unsure what the actual issue is. Why are redirects
> > needed at all. Why all the address changes all the time?
> 
> It is part of the HTTP spec for servers to be able to inform clients
> that resource /foo/bar has moved to /bar/baz, either temporarily or
> permanently.

:) Thanks for that explanation, but that was not my question here (that part I 
am well aware of, done my share of web stuff).

It was rather: why are subdomain names and/or access paths not once properly 
designed, but instead changed so often that redirection seems so important to 
be a default feature? Just because one can?
When we write code, we try to keep API stable as much as possible, and only 
change API when really useful, and that means for the consumer. When doing 
references in text we try to have eternally stable pointers (thanks ISBN & 
Co.),

But this request for stable URLs on the internet might be an idealistic fight 
against windmills of a web 1.0 person...

Cheers
Friedrich




Re: Fixing QNetworkAccessManager use

2020-02-19 Thread Friedrich W. H. Kossebau
Am Mittwoch, 19. Februar 2020, 08:05:01 CET schrieb Ben Cooksley:
> On Mon, Feb 3, 2020 at 7:42 AM Volker Krause  wrote:
> > It would also help to know where specifically we have that problem, so we
> > can actually solve it, and so we can figure out why we failed to fix this
> > there earlier.
> 
> Just bringing this up again - it seems we've not had much movement on
> this aside from the Wiki page.

The wiki page currently still just recommends to set
"networkAccessManger->setAttribute(QNetworkRequest::FollowRedirectsAttribute, 
true);"

Which seems simple, but possible not what is enough in all cases.

So my open questions here to be able to act on code I contribute to are:

a) What about the mentioned QNetworkRequest::NoLessSafeRedirectPolicy, in 
which cases should that be used and when not?

b) What about the HSTS stuff, when is that recommended?

c) What is a sane number for QNetworkRequest::maximumRedirectsAllowed?

Both in general and when it comes to KDE servers.

Personally I am still unsure what the actual issue is. Why are redirects 
needed at all. Why all the address changes all the time? The "U" in 
"URL"/"URI" is for "uniform", not "unstable", isn't it ;)

Can you give some examples for URLs of resources our code uses on KDE servers, 
and why they needed to change?

And if those redirects are permanent, should the client side not also 
permanently update to the new location then, instead of continuing to poke the 
old address every time again and again, until one day it will poke into a void 
because the backward compat redirect support has been dropped?

Cheers
Friedrich