Re: [PLUG] playing around with the wget command finally worked

Russell Senior Mon, 20 Nov 2023 15:54:13 -0800

In firefox, there is also the Tools / Page Info option. That often has
lists of real links in the rendered page. I sometimes use it to find the
mp3 file link for my favorite KBOO radio program, which normally plays
using a javascript player from their site, but with the link I can wget the
mp3 and play it locally with mplayer for finer control than the web tool
provides.


On Mon, Nov 20, 2023 at 3:47 PM John Moon <[email protected]> wrote:

> On 11/20/2023 2:33 PM, wes wrote:
> > I imagine the intention of the robots file (in this case set to disallow
> > all "automated" requests) is to reduce web crawler traffic.
> >
> > what's ironic is that the worst offenders already ignore it.
> >
> > -wes
> >
> > On Mon, Nov 20, 2023 at 2:32 PM American Citizen <
> [email protected]>
> > wrote:
> >
> >> I am making a good faith effort to contact the site administrators. What
> >> is ironic is that anyone can use the save page command in the standard
> >> browser tools and get the file that way without asking at all.
> >>
> >> On 11/20/23 13:58, American Citizen wrote:
> >>> At the risk of being blocked by the Skalfti website, I found that the
> >>> following wget command grabs one and only one file
> >>>
> >>> %wget -r -A 'index.js' -e robots=off -O index.js
> https://vafri.is/quake/
> >>>
> >>> Notice that I had to give the file a name using the -O option, and it
> >>> is stored in the current working directory.
> >>>
> >>> I read that using the option -e robots=off is considered rude.. is
> >>> that generally so?
> >>>
> >>> Thanks for bearing with me on this question, as this is the very first
> >>> time I have used wget to grab one specific file, but not knowing
> >>> exactly where in the directory tree of the website the file is located.
> >>>
> >>> Randall
> >>>
> >>>
> >>
>
> Maybe y'all already know this, but one tip is to use the "Network" tab
> in your browser developer tools and monitor the requests as the page
> loads. You should be able to see index.js being loaded by the browser.
> Then, you can right-click it and "Copy as cURL (POSIX)" (confirmed on
> Firefox, but I think Chrome has something similar).
>
> A curl command will be copied to your clipboard to download the file
> with the headers and user agent the same way your browser did for the
> original request.
>
> https://everything.curl.dev/usingcurl/copyas
>
> Cheers,
> John
>

Re: [PLUG] playing around with the wget command finally worked

Reply via email to