Hello, I don't think that the log output should be that complex of a question. Would someone kindly get back to me about the matter?
Thanks, David On Wed, 15 May 2024 19:46:18 -0400 David Niklas <defere...@null.net> wrote: > Hello, > > I'm a long term user of wget, and I'm trying to make the switch to > wget2. I'm having a problem understanding what exactly is going on. It > appears that wget2 is getting files outside of what my regex-es allow, > but on closer inspection, the files don't exist on my FS. > > Aside: I would attach the complete wget2 log output to this email, but > it's 27MB in size uncompressed and, even using xz, it still comes out to > 1MB in size. > I'm uncertain what your particular email list recommends. Normally I > have to get special permission from the list admin. > > > If there's some fine documentation which explains all this, I haven't > found it, so feel free to point me to it. > > Normally, you'd get HTTP response 200, or 404, or something, but wget2 > says that it's 0. What does that mean? > > When you check something, it's normally because you have it, but wget2 > doesn't appear to have downloaded the files it then says that it's > checking (although I may have forgotten to retain them for the purpose > of this email). > So what does '[3] Checking $URL ...' mean? > > When you add a URL, one would normally think that it's going to be > downloaded, but that doesn't appear to be the case with wget2. What does > "Adding URL: $URL" mean? > > As you probably noticed, I'm rather confused. Here's a portion of > wget2's output followed by the command that I used. > > Thanks, > David > > > ############################################################################# > > Adding URL: > https://web.archive.org/web/20220305001008js_/https:/americasfrontlinedoctors.org/_next/static/bBQU-7wbyVqBHhpUeRiRF/_middlewareManifest.js > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtr6Uw9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvr6Ew9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCs16Ew9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtr6Ew9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtZ6Ew9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCu170w9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCuM70w9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvr70w9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvC70w9.woff > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WRhyyTh89ZNpQ.woff2 > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459W1hyyTh89ZNpQ.woff2 > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WZhyyTh89ZNpQ.woff2 > Adding URL: > https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WdhyyTh89ZNpQ.woff2 > > ###################...####################################################### > > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/modernwisdompodcast' > ... [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/IsaacArthur' > ... HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MLChristiansen] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/BlacktipH] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/TomAntosFilms' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/DonaldJTrumpJr] > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/KimIversen' > ... > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/Homesteadonomics' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MariaBartiromo] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/StevenCrowder] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/LifeStories] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/TheAdventureAgents' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/NDWoodworkingArt' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Styxhexenhammer666] > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/EarthTitan' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/NYPost] > HTTP response 0 > [https://web.archive.org/web/20220130164746js_/https://www.americasfrontlinedoctors.org/_next/static/chunks/d0447323-9a7a3aa3a90e5cd2.js] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/KenDBerryMD' > ... > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/HeresyFinancial' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RepJimBanks' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/PageSix] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/SamuelEarpArtist] > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/HOTDANGSHOW' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Decider] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/JohnStossel] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ATRestoration' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ThisSouthernGirlCan' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MikhailaPeterson] > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RockFeed' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Locals] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Timcast] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/CountryCast' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ShaunAttwood' > ... > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/diywife' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/CWLemoine] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/TimcastIRL] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Entrepreneur] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RekietaLaw' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/MontyFranklin' > ... > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GeeksandGamers' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/TheBodyLanguageGuy] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Yarnhub] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/DrDrew] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/SportsWars' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/nfldaily' > ... > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/nbanow' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/TulsiGabbard] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MattKohrs] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GamingWithGeeks' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/HabibiPowerHour] > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/FactsChannel] > [3] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ParkHoppin' > ... > [2] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GeeksAndGamersClips' > ... > HTTP response 0 > [https://web.archive.org/web/20220404154147/https:/rumble.com/c/phetasy] > [1] Checking > 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/chiefstv' > ... > ############################################################################# > > > > The wget2 command is as follows. I had to wrap it. > > wget2 -NEkrl9 -t 13 --regex-type=posix --timeout 45 --reject-regex > > 'http.*http.*http|\.html?.*\.html?.*\.html?|www\..*www\..*www\.|\{|url[^/]+query|data[^/]+\.(url|image)| > /activity|/members|/groups|/%5C$|xmlrpc\.php|/phpBB/|/socialauth/|/googlebooks/|xmlrpc\.php|/admin\.php| > /rsc/|/htsrv/|/skins/|/activate/|blogger.com/.*(profile|share-post|delete|comment|post-edit)|delete-comment.g| > from=|target=|/public\.api/|/mshots/v1/|/public.api/|/(remote-login|press-this|wp-signup|wp-login)\.php| > Translations:|Sandbox:|Template:|title=User_talk|/pricecompare|\?[rs]=[x0-9]+&|/likers|/following|new\?user=| > /discussion-|\?(resize|w|h)&|signin\?|/signup|/messages|/followers|/likesandfollows|/add\?|/destroy/| > /create\?|/Layout|/Selected_page|redirect=no|Template:|_talk:|User:|User_talk:|sign_up|sign_in| > \.img(\.(xz|gz|bz2))?|/secured_requests|/usenet/|/rss\.php|/design-tools|/supportLink|eesimUrl| > /reliabilityLink|/markets|/storefront.html|distributorData|/mymaxim|/samplecart|/comment-subscriptions| > /walkthroughs|like_comment=|screenToRender=|/UserAccount|/myprofile|layout=siteinfo|/Subscribe|\?cid=| > \+url\+|captcha|utm_(medium|source)=|bc(lid|tid)|pubdate|HQS|tid|eid|kcid|pid=|screenToView=login| > [^[:alnum:]]search/|companies/|directory/|cat/(news|reviews|previews-unboxing)|PrintView|contentItemId| > /[Aa]uth|comment_mail|replytocom=|[^[:alnum:]]search\?|amp$|\.(rss|atom|json)$|/maintenance| > /lib/exe/indexer.php|dataflt|datasrt|\.iso|(show|focused)Comment(Area|s|Id)|decoration|(bookmarks| > browsespace|changes|diffpages(byversion)?|listattachmentsforspace|login|peopledirectory|recentlyupdated| > replycomment|report|space-bookmarks|tinyurl|view(follow|info|mailarchive|page(attachments|src)| > previousversions|recentblogposts|spacesummary|userprofile))\.action|edit$|recentchanges|revisions| > /WantedPages|/forum|cgi-bin/|(do|sectok|mode|action|oldid|diff|showComment|share|replyto)=|Talk:| > Special:|wp-admin/|feed|login|/(EU|FR-FR|anp|ar|az|bg|bgn|bn|ca|cn|cs|da|de|de-de|diq|el|en-au|en-ca| > en-gb|en-sg|en-za|eo|es|es-co|es-mx|es-es|eu|fa|fr|fr-fr|he|hi|hr|hu|hy|ia|id|it|it-it|ja|jbo|jp|kk|ko| > lb|lt|map-bms|ml|mni|nb|ne|nl|nl-nl|no|oc|pa|pl|pl-pl|pt|pt-br|ro|ru|sco|sd|sl|si|sq|sr|sr-ec|ta|te|th| > tr|ua|udm|ug-arab|uk|ur|vi|zh|zh_CN|zh-cn|zh_cn|zh_tw)(:|$)' > --accept-regex > > '(.*\.(css|gif|png|jpe?g)$|https?://web\.archive\.org/web/[^ *]+/ > https?://?(i0.wp.com|i[0-9].wp.com|s[0-9].wp.com|([0-9]\.)?bp.blogspot.com| > www.blogger.com|www.blogblog.com|lh[0-9]\.googleusercontent.com| > fonts.googleapies.com|(ssl|www|fonts).gstatic.com|(www[0-9]*?\.)?americasfrontlinedoctors.org))' > > https://web.archive.org/web/20220305001008/https://americasfrontlinedoctors.org/ > |& tee -a 2wget.log >