On Fri, Nov 17, 2023 at 12:43:29PM -0800, Keith Lofstrom wrote:
...
> I "wget-ed" a website, and was soon contacted by a
> panicked/angry sysadmin watching their website brought
> to a crawl because their 5 mbps upload bandwidth was
> clobbered for hours by my scrape of their site. My bad.
When you connect through the internet, packets flow both
ways - ACK packets tell the sending process which packets
arrived and do not need to be re-sent.
If the data packets you request travel down the same
asymmetric, bandwidth-limited channel as the web-surfing
and email ACK packets of the employees at the Portland
EPA office, they can't do their web-work, and they will
designate your office network connection a "toxic internet
packet super-fund site". :-)
Just kidding. I hope.
This is something we should all be aware of when we access
the internet. Every process and system has constraints and
limits. Neighborly net users should not heedlessly push
too hard on those limits, because others will be impacted.
-
That said, in this PARTICULAR case,
https://www.publicdata.com/
... looks like a private company DESIGNED to provide bulk
data like you are downloading, so I am probably wrong IN
THIS PARTICULAR CASE. You are probably NOT stepping on
any toes here. However, you might learn something
helpful from the publicdata FAQ:
https://login.publicdata.com/faq.html
-
With all the high bandwidth bots roaming the web and
guzzling data at considerable expense to all of us, the
publicdata company may have processes that limit data
rates and thwart bots, so they don't need to purchase
as much bulk bandwidth from THEIR network providers.
If wget pushes on publicdata.com limits in a bot-like
manner, publicdata server software may treat you like
a bot, and behave in frustrating (and unexplained) ways.
If they frustrate a bot, they need not say they are sorry.
There may be ways to rate-limit your bulk data request,
so it doesn't trigger their rate-limits, and looks more
like an obsessed human user. I hypothesize; there are
web provider process management experts reading this,
who know how incoming 15 GB requests are handled,
throttled, or thriftily ignored. Please educate us!
Keith L.
(who remembers 300 baud modems, and long distance
toll rates)
--
Keith Lofstrom kei...@keithl.com