Follow-up Comment #8, bug #60494 (project wget): So what KodeCharlie says is correct (regarding b)). I would reword it like: We have arbitrary user input which has nothing to do what RFCs say. And this is the hard part as we have to 'guess' what the user meant. Once the input is 'normalized' (unescaped, charset translated (into utf-8), protcoll extended, ...), the rest is straight forward following the RFCs.
@PetrPisar Regarding the filename: it is also user input. And the problem is that the wget author(s) made some decisions in the past on how to treat user input. There is no black and white here and any decision has it's pros and cons. I think that part of the problem is that URLs on web sites are often printed in their escaped form. And wget users explicitly wanted to use copy&paste (from web site to console). Then the next aspect is: we don't want to change a long-standing (default) behavior. This breaks (production) scripts and command lines. What we can possibly do is to add a new '--strict-input' option that skips 'guessing' and instead assumes a 100% valid URL. BTW, this is a good idea for wget2 ;-) I agree that "wget [option]... [URL]..." is not 100% correct in terms of RFCs. But wget is also a user tool, and normal users don't have the RFCs in mind when they think about URLs. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?60494> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/