In firefox, there is also the Tools / Page Info option. That often has lists of real links in the rendered page. I sometimes use it to find the mp3 file link for my favorite KBOO radio program, which normally plays using a javascript player from their site, but with the link I can wget the mp3 and play it locally with mplayer for finer control than the web tool provides.
On Mon, Nov 20, 2023 at 3:47 PM John Moon <[email protected]> wrote: > On 11/20/2023 2:33 PM, wes wrote: > > I imagine the intention of the robots file (in this case set to disallow > > all "automated" requests) is to reduce web crawler traffic. > > > > what's ironic is that the worst offenders already ignore it. > > > > -wes > > > > On Mon, Nov 20, 2023 at 2:32 PM American Citizen < > [email protected]> > > wrote: > > > >> I am making a good faith effort to contact the site administrators. What > >> is ironic is that anyone can use the save page command in the standard > >> browser tools and get the file that way without asking at all. > >> > >> On 11/20/23 13:58, American Citizen wrote: > >>> At the risk of being blocked by the Skalfti website, I found that the > >>> following wget command grabs one and only one file > >>> > >>> %wget -r -A 'index.js' -e robots=off -O index.js > https://vafri.is/quake/ > >>> > >>> Notice that I had to give the file a name using the -O option, and it > >>> is stored in the current working directory. > >>> > >>> I read that using the option -e robots=off is considered rude.. is > >>> that generally so? > >>> > >>> Thanks for bearing with me on this question, as this is the very first > >>> time I have used wget to grab one specific file, but not knowing > >>> exactly where in the directory tree of the website the file is located. > >>> > >>> Randall > >>> > >>> > >> > > Maybe y'all already know this, but one tip is to use the "Network" tab > in your browser developer tools and monitor the requests as the page > loads. You should be able to see index.js being loaded by the browser. > Then, you can right-click it and "Copy as cURL (POSIX)" (confirmed on > Firefox, but I think Chrome has something similar). > > A curl command will be copied to your clipboard to download the file > with the headers and user agent the same way your browser did for the > original request. > > https://everything.curl.dev/usingcurl/copyas > > Cheers, > John >
