Follow-up Comment #10, bug #60287 (project wget): Without converting charsets, it would be difficult to rely on certain library functions and support certain features.
For example, locale-dependent C library functions work only with the locale's encoding, and will produce wrong results if presented with strings encoded differently. The IRI support needs to work in UTF-8 internally. And when writing Web pages to disk, Wget needs to encode the page name so that it would be acceptable as a file name by the local filesystem. That is why conversion to the locale's charset is rather necessary. Using the original bytes might work for some operations, but not for others, so keeping the original bytes would need some logic for where they can and cannot be used, which is a complication. It is better to convert once, and then forget about it. The 404 error is most probably because Wget does attempt to convert encoding, but does it incorrectly when you don't tell it the actual encodings. So the re-encoded URL is garbled. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?60287> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/