Hello,
I don't think that the log output should be that complex of a question.
Would someone kindly get back to me about the matter?

Thanks,
David


On Wed, 15 May 2024 19:46:18 -0400
David Niklas <defere...@null.net> wrote:
> Hello,
>
> I'm a long term user of wget, and I'm trying to make the switch to
> wget2. I'm having a problem understanding what exactly is going on. It
> appears that wget2 is getting files outside of what my regex-es allow,
> but on closer inspection, the files don't exist on my FS.
>
> Aside: I would attach the complete wget2 log output to this email, but
> it's 27MB in size uncompressed and, even using xz, it still comes out to
> 1MB in size.
> I'm uncertain what your particular email list recommends. Normally I
> have to get special permission from the list admin.
>
>
> If there's some fine documentation which explains all this, I haven't
> found it, so feel free to point me to it.
>
> Normally, you'd get HTTP response 200, or 404, or something, but wget2
> says that it's 0. What does that mean?
>
> When you check something, it's normally because you have it, but wget2
> doesn't appear to have downloaded the files it then says that it's
> checking (although I may have forgotten to retain them for the purpose
> of this email).
> So what does '[3] Checking $URL ...' mean?
>
> When you add a URL, one would normally think that it's going to be
> downloaded, but that doesn't appear to be the case with wget2. What does
> "Adding URL: $URL" mean?
>
> As you probably noticed, I'm rather confused. Here's a portion of
> wget2's output followed by the command that I used.
>
> Thanks,
> David
>
>
> #############################################################################
>
> Adding URL:
> https://web.archive.org/web/20220305001008js_/https:/americasfrontlinedoctors.org/_next/static/bBQU-7wbyVqBHhpUeRiRF/_middlewareManifest.js
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtr6Uw9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvr6Ew9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCs16Ew9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtr6Ew9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCtZ6Ew9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCu170w9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCuM70w9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvr70w9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUHjIg1_i6t8kCHKm4532VJOt5-QNFgpCvC70w9.woff
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WRhyyTh89ZNpQ.woff2
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459W1hyyTh89ZNpQ.woff2
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WZhyyTh89ZNpQ.woff2
> Adding URL:
> https://web.archive.org/web/20220305001008im_/https://fonts.gstatic.com/s/montserrat/v23/JTUSjIg1_i6t8kCHKm459WdhyyTh89ZNpQ.woff2
>
> ###################...#######################################################
>
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/modernwisdompodcast'
> ... [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/IsaacArthur'
> ... HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MLChristiansen]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/BlacktipH]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/TomAntosFilms'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/DonaldJTrumpJr]
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/KimIversen'
> ...
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/Homesteadonomics'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MariaBartiromo]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/StevenCrowder]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/LifeStories]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/TheAdventureAgents'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/NDWoodworkingArt'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Styxhexenhammer666]
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/EarthTitan'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/NYPost]
> HTTP response 0
> [https://web.archive.org/web/20220130164746js_/https://www.americasfrontlinedoctors.org/_next/static/chunks/d0447323-9a7a3aa3a90e5cd2.js]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/KenDBerryMD'
> ...
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/HeresyFinancial'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RepJimBanks'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/PageSix]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/SamuelEarpArtist]
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/HOTDANGSHOW'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Decider]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/JohnStossel]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ATRestoration'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ThisSouthernGirlCan'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MikhailaPeterson]
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RockFeed'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Locals]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Timcast]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/CountryCast'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ShaunAttwood'
> ...
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/diywife'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/CWLemoine]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/TimcastIRL]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Entrepreneur]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/RekietaLaw'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/MontyFranklin'
> ...
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GeeksandGamers'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/TheBodyLanguageGuy]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/Yarnhub]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/DrDrew]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/SportsWars'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/nfldaily'
> ...
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/nbanow'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/TulsiGabbard]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/MattKohrs]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GamingWithGeeks'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/HabibiPowerHour]
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/FactsChannel]
> [3] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/ParkHoppin'
> ...
> [2] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/GeeksAndGamersClips'
> ...
> HTTP response 0
> [https://web.archive.org/web/20220404154147/https:/rumble.com/c/phetasy]
> [1] Checking
> 'https://web.archive.org/web/20220404154147/https:/rumble.com/c/chiefstv'
> ...
> #############################################################################
>
>
>
> The wget2 command is as follows. I had to wrap it.
>
> wget2 -NEkrl9 -t 13 --regex-type=posix --timeout 45 --reject-regex
>
> 'http.*http.*http|\.html?.*\.html?.*\.html?|www\..*www\..*www\.|\{|url[^/]+query|data[^/]+\.(url|image)|
> /activity|/members|/groups|/%5C$|xmlrpc\.php|/phpBB/|/socialauth/|/googlebooks/|xmlrpc\.php|/admin\.php|
> /rsc/|/htsrv/|/skins/|/activate/|blogger.com/.*(profile|share-post|delete|comment|post-edit)|delete-comment.g|
> from=|target=|/public\.api/|/mshots/v1/|/public.api/|/(remote-login|press-this|wp-signup|wp-login)\.php|
> Translations:|Sandbox:|Template:|title=User_talk|/pricecompare|\?[rs]=[x0-9]+&|/likers|/following|new\?user=|
> /discussion-|\?(resize|w|h)&|signin\?|/signup|/messages|/followers|/likesandfollows|/add\?|/destroy/|
> /create\?|/Layout|/Selected_page|redirect=no|Template:|_talk:|User:|User_talk:|sign_up|sign_in|
> \.img(\.(xz|gz|bz2))?|/secured_requests|/usenet/|/rss\.php|/design-tools|/supportLink|eesimUrl|
> /reliabilityLink|/markets|/storefront.html|distributorData|/mymaxim|/samplecart|/comment-subscriptions|
> /walkthroughs|like_comment=|screenToRender=|/UserAccount|/myprofile|layout=siteinfo|/Subscribe|\?cid=|
> \+url\+|captcha|utm_(medium|source)=|bc(lid|tid)|pubdate|HQS|tid|eid|kcid|pid=|screenToView=login|
> [^[:alnum:]]search/|companies/|directory/|cat/(news|reviews|previews-unboxing)|PrintView|contentItemId|
> /[Aa]uth|comment_mail|replytocom=|[^[:alnum:]]search\?|amp$|\.(rss|atom|json)$|/maintenance|
> /lib/exe/indexer.php|dataflt|datasrt|\.iso|(show|focused)Comment(Area|s|Id)|decoration|(bookmarks|
> browsespace|changes|diffpages(byversion)?|listattachmentsforspace|login|peopledirectory|recentlyupdated|
> replycomment|report|space-bookmarks|tinyurl|view(follow|info|mailarchive|page(attachments|src)|
> previousversions|recentblogposts|spacesummary|userprofile))\.action|edit$|recentchanges|revisions|
> /WantedPages|/forum|cgi-bin/|(do|sectok|mode|action|oldid|diff|showComment|share|replyto)=|Talk:|
> Special:|wp-admin/|feed|login|/(EU|FR-FR|anp|ar|az|bg|bgn|bn|ca|cn|cs|da|de|de-de|diq|el|en-au|en-ca|
> en-gb|en-sg|en-za|eo|es|es-co|es-mx|es-es|eu|fa|fr|fr-fr|he|hi|hr|hu|hy|ia|id|it|it-it|ja|jbo|jp|kk|ko|
> lb|lt|map-bms|ml|mni|nb|ne|nl|nl-nl|no|oc|pa|pl|pl-pl|pt|pt-br|ro|ru|sco|sd|sl|si|sq|sr|sr-ec|ta|te|th|
> tr|ua|udm|ug-arab|uk|ur|vi|zh|zh_CN|zh-cn|zh_cn|zh_tw)(:|$)'
> --accept-regex
>
> '(.*\.(css|gif|png|jpe?g)$|https?://web\.archive\.org/web/[^ *]+/
> https?://?(i0.wp.com|i[0-9].wp.com|s[0-9].wp.com|([0-9]\.)?bp.blogspot.com|
> www.blogger.com|www.blogblog.com|lh[0-9]\.googleusercontent.com|
> fonts.googleapies.com|(ssl|www|fonts).gstatic.com|(www[0-9]*?\.)?americasfrontlinedoctors.org))'
>
> https://web.archive.org/web/20220305001008/https://americasfrontlinedoctors.org/
> |& tee -a 2wget.log
>


Reply via email to