On Tue, Dec 27, 2022 at 6:18 PM American Citizen
<[email protected]> wrote:
>
> Hi
>
> I used wget recently to try to download all 26 or 27 pages of my
> website, but it seems to miss about 40% of the pages.
>
> Does anyone have the CLI command line which captures 100% of a website
> URLS ?
>
> I tried the typical
>
> %wget -r --tries=10 https://my.website.com/ -o logfile
>
> as suggested in the "man wget" command, but it did NOT capture all the
> webpages. I even tried a wait parameter, but that only slowed things up
> and did not remedy the missing websubpages issue.
>
> I appreciate any tips so that ALL of the website data can be captured by
> wget. Yes, I am aware of the robots.txt restricting downloadable information
>
> - Randall

It only downloads pages that are linked to from my.website.com or
linked to from pages that are linked to from my.website.com
recursively. If there is no path from my.website.com to a page on your
website then that page will not be downloaded.

Bill

>

Reply via email to