Dear Étienne Mollier,

Aha, a logical response. Thank you for shedding some light on this for me.
That is probably the case. My access could perhaps be misinterpreted as an
attack. Maybe I misunderstand the concept of a mirror, but I do not wish to
maintain a server which allows the public to download Debian repositories.
I'll look into it, in any case. If I find it is possible to simply download
the entire collection, without having to host a mirror, I may very well go
that route.

If I continue the scraping route, would adding wait time in my loop between
downloads make my repeated access less of a problem? I would like to let it
run until it is finished. It is tedious to restart my scrape periodically.

Thanks,
John

On Sat, Jun 12, 2021 at 10:35 AM Étienne Mollier <etienne.moll...@mailoo.org>
wrote:

> Hi John,
>
> John E Petersen, on 2021-06-12:
> > Hey folks, I’m developing a unique kernel based on Debian Linux, and I’ve
> > been scraping the website for repositories. After a few thousand, the
> > servers start to block my ip.
>
> I'm not too sure what you are trying to achieve.  It sounds to
> me like you wish to either develop a Debian derivative, or make
> a backup copy of Debian.  The IP blocking you see is probably
> automated, and the result of your having done repeated access on
> a system that might not have been sized to be mirrored directly.
> I would suppose this is in place, so regular users can access to
> these ressources without being impacted by numerous background
> download tasks hammering the websites.
>
> Please have a look at the mirroring page[1] to assess whether
> you want to mirror the packages archive, and if so, how to do it
> with tools tailored for such task.
>
> [1]: https://www.debian.org/mirror/ftpmirror
>
> In hope this helps!
> --
> Étienne Mollier <etienne.moll...@mailoo.org>
> Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
> Sent from /dev/pts/2, please excuse my verbosity.
>

Reply via email to