Re: [PLUG] Using wget to download all files from a web site

Michael Barnes Fri, 17 Nov 2023 09:02:06 -0800

I have used this command string successfully in the past to download
complete websites.


 $ wget      --recursive      --no-clobber      --page-requisites
 --html-extension      --convert-links      --restrict-file-names=windows
   --domains website.com      --no-parent website.com

HTH,
Michael


On Fri, Nov 17, 2023 at 8:49 AM Michael Ewan <michaelewa...@gmail.com>
wrote:

> You may be getting caught by robots.txt, try setting the user agent header,
> i.e. -U agent-string
>
> On Fri, Nov 17, 2023 at 8:31 AM Rich Shepard <rshep...@appl-ecosys.com>
> wrote:
>
> > On Fri, 17 Nov 2023, Rich Shepard wrote:
> >
> > > I need to download ~15G of data from a web site. Using a PLUG mail list
> > > thread from 2008 I tried this syntax:
> > > wget -r --accept *.* http://ph-public-data.com/
> >
> > To clarify, I don't think that I want to use the wget -m (mirror) command
> > because I don't think that I want to duplicate the web site, only the
> data
> > files in each page identified by the second column (study).
> >
> > Rich
> >
>

Re: [PLUG] Using wget to download all files from a web site

Reply via email to