On Tue, Dec 27, 2022 at 6:18 PM American Citizen <[email protected]> wrote: > > Hi > > I used wget recently to try to download all 26 or 27 pages of my > website, but it seems to miss about 40% of the pages. > > Does anyone have the CLI command line which captures 100% of a website > URLS ? > > I tried the typical > > %wget -r --tries=10 https://my.website.com/ -o logfile > > as suggested in the "man wget" command, but it did NOT capture all the > webpages. I even tried a wait parameter, but that only slowed things up > and did not remedy the missing websubpages issue. > > I appreciate any tips so that ALL of the website data can be captured by > wget. Yes, I am aware of the robots.txt restricting downloadable information > > - Randall
It only downloads pages that are linked to from my.website.com or linked to from pages that are linked to from my.website.com recursively. If there is no path from my.website.com to a page on your website then that page will not be downloaded. Bill >
