> Date: Wed, 19 Aug 2015 01:43:51 +0200 > From: Ángel González <keis...@gmail.com> > > +int > +wc_utime (unsigned char *filename, struct _utimbuf *times) > +{ > + wchar_t *w_filename; > + int buffer_size; > + > + buffer_size = sizeof (wchar_t) * MultiByteToWideChar(65001, 0, > filename, -1, > w_filename, 0); > + w_filename = alloca (buffer_size); > + MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size); > + return _wutime (w_filename, times); > +} > > and similar for stat, open, etc. Something similar is what would be > needed on > Windows? > Is his patch usable? Maybe I also commented a little in > http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00081.html > but after that nothing happened, it seems. > > That would probably work, but would need a review. On a quick look, some of > the functions have memory leaks (seems he first used malloc, then changed to > alloca just some of them).
Indeed. Actually, there's no need to allocate memory dynamically, neither will malloc nor with alloca, since Windows file names have fixed size limitation that is known in advance. So each conversion function can use a fixed-sized local wchar_t array. Doing that will also avoid the need for 2 calls to MultiByteToWideChar, the first one to find out how much space to allocate. > And of course, there's the question of what to do if the filename we are > trying to convert to utf-16 is not in fact valid utf-8. The calls to MultiByteToWideChar should use a flag (MB_ERR_INVALID_CHARS) in its 2nd argument that makes the function fail with a distinct error code in that case. When it fails like that, the wc_* wrappers should simply call the "normal" unibyte functions with the original 'char *' argument. This makes the modified code fall back on previous behavior when the source file names are not in UTF-8. And regardless, wget should convert to the locale's codeset (on all platforms). Once the above patches are accepted, the Windows build will pretend that its locale's codeset is UTF-8, and that will ensure the conversions with MultiByteToWideChar will work in most situations.