Hi, > "Both --no-clobber and --convert-links were specified, only --convert-links will be used."
Right, I missed that. The combination of both flags was buggy by design (also in 1.12) and suffered from several flaws (not to say bugs). Regex more like '.*/xpage=watch.*'. The exact syntax depends on --regex-type=TYPE regex type (posix|pcre) What else can you do... try wget2. It allows the combination of --no-clobber and --convert-links. And if you find bugs they can be fixed (other as wget1.x were we have to redesign a whole lot of things). See https://gitlab.com/gnuwget/wget2 If you don't like to build from git, you can download a pretty recent tarball from https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz. Signature at https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.sig Regards, Tim On 06/05/2018 03:52 PM, CryHard wrote: > Hey Tim, > > Please see http://savannah.gnu.org/bugs/?31781 where it implemented. Since > version 1.12.1. > > On my personal mac I have 1.19.5, and when I run the command with both > arguments i get: > > "Both --no-clobber and --convert-links were specified, only --convert-links > will be used." > > As a response. > > Anyway, I might make due without -nc if I can use the regex argument. Could > you give an example on how would that argument work in my case? Can I just > use www.mywiki.com/delete/* as an argument for example? or .*/xpage=watch.* ? > > Thanks! > > > Sent with ProtonMail Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On June 5, 2018 2:40 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote: > >> Hi, >> >> in this case you could try it with -X / --exclude-directories. >> >> E.g. wget -X /delete,/remove >> >> That wouldn't help with "xpage=watch..." though. >> >> And I can't tell you if and how good -X works with wget 1.12. >> >> Why (or since when) doesn't --no-clobber plus --convert-links work any >> >> more ? >> >> Please feel free to open a bug report at >> >> https://savannah.gnu.org/bugs/?func=additem&group=wget with a detailed >> >> description, please. >> >> Cause it works for me :-) >> >> Regards, Tim >> >> On 06/05/2018 03:11 PM, CryHard wrote: >> >>> Hey Tim, >>> >>> Thanks for the info. The wiki software we use (xwiki) appends something to >>> wiki pages URLs to express a certain behavior. For example, to "watch" a >>> page, the button once pressed redirects you to >>> "www.wiki.com/WIKI-PAGE-NAME?xpage=watch&do=adddocument" >>> >>> Where the only thing that changes is the "WIKI-PAGE-NAME" part. >>> >>> Also, for actions such as like "deleting" or "reverting" a wiki page, the >>> URL changes by adding /remove/ or /delete/ 'sub-folders" in the URL. these >>> are usually in the middle, before the actual page name. For example: >>> www.wiki.com/delete/WIKI-PAGE-NAME. So in this case the "offending URL" is >>> in the middle of the actual wiki page URL. >>> >>> What I would need to do is exclude from wget visiting any >>> www.wiki.com/delete or www.wiki.com/remove/ pages. I'd also need to exclude >>> links that end with "xpage=watch&do=adddocument" which triggers me to watch >>> that page. >>> >>> I am using v1.12 because the most recent versions have disabled >>> --no-clobber and --convert-links from working together. I need --no-clobber >>> because if the download stops, I need to be able to resume without >>> re-downloading all the files. And I need --convert-links because this needs >>> to work as a local copy. >>> >>> From my understanding the options you mention have been added after v1.12. >>> Is there any way to achieve this? >>> >>> BTW, -N (timestamps) doesn't work, as the server on which the wiki is >>> hosted doesn't seem to support this, hence wget keeps redownloading the >>> same files. >>> >>> Thanks a lot! >>> >>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >>> >>> On June 5, 2018 1:57 PM, Tim Rühsen tim.rueh...@gmx.de wrote: >>> >>>> On 06/05/2018 11:53 AM, CryHard wrote: >>>> >>>>> Hey there, >>>>> >>>>> I've used the following: >>>>> >>>>> wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) >>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 >>>>> Safari/537.36" --user=myuser --ask-password --no-check-certificate >>>>> --recursive --page-requisites --adjust-extension --span-hosts >>>>> --restrict-file-names=windows --domains wiki.com --no-parent wiki.com >>>>> --no-clobber --convert-links --wait=0 --quota=inf -P /home/W >>>>> >>>>> To download a wiki. The problem is that this will follow "button" links, >>>>> e.g the links that allow a user to put a page on a watchlist for further >>>>> modifications. This has led to me watching hundreds of pages. Not only >>>>> that, but apparently it also follows the links that lead to reverting >>>>> changes made by others on a page. >>>>> >>>>> Is there a way to avoid this behavior? >>>> >>>> Hi, >>>> >>>> that depends on how these "button links" are realized. >>>> >>>> A button may be part of a HTML FORM tag/structure where the URL is the >>>> >>>> value of the 'action' attribute. Wget doesn't download such URLs because >>>> >>>> of the problem you describe. >>>> >>>> A dynamic web page can realize "button links" by using simple links. >>>> >>>> Wget doesn't know about hidden semantics and so downloads these URLs - >>>> >>>> and maybe they trigger some changes in a database. >>>> >>>> If this is your issue, you have to look into the HTML files and exclude >>>> >>>> those URLs from being downloaded. Or you create a whitelist. Look at >>>> >>>> options -A/-R and --accept-regex and --reject-regex. >>>> >>>>> I'm using the following version: >>>>> >>>>>> wget --version >>>>>> >>>>>> GNU Wget 1.12 built on linux-gnu. >>>> >>>> Ok, you should update wget if possible. Latest version is 1.19.5. >>>> >>>> Regards, Tim > >
signature.asc
Description: OpenPGP digital signature