Re: [PLUG] playing around with the wget command finally worked

wes Mon, 20 Nov 2023 14:35:13 -0800

I imagine the intention of the robots file (in this case set to disallow
all "automated" requests) is to reduce web crawler traffic.


what's ironic is that the worst offenders already ignore it.

-wes

On Mon, Nov 20, 2023 at 2:32 PM American Citizen <[email protected]>
wrote:

> I am making a good faith effort to contact the site administrators. What
> is ironic is that anyone can use the save page command in the standard
> browser tools and get the file that way without asking at all.
>
> On 11/20/23 13:58, American Citizen wrote:
> > At the risk of being blocked by the Skalfti website, I found that the
> > following wget command grabs one and only one file
> >
> > %wget -r -A 'index.js' -e robots=off -O index.js https://vafri.is/quake/
> >
> > Notice that I had to give the file a name using the -O option, and it
> > is stored in the current working directory.
> >
> > I read that using the option -e robots=off is considered rude.. is
> > that generally so?
> >
> > Thanks for bearing with me on this question, as this is the very first
> > time I have used wget to grab one specific file, but not knowing
> > exactly where in the directory tree of the website the file is located.
> >
> > Randall
> >
> >
>

Re: [PLUG] playing around with the wget command finally worked

Reply via email to