Dear all,
Preface
=======
I'd like to contribute a patch to HDF5, and this appears to be the
appropriate place to send it to. If I am mistaken, I'd appreciate a
pointer where to go instead. (It would be great if the website could
have some prominent information about how to contribute to HDF5. Also, I
couldn't find it, does HDF5 have a version control repository, such as
Git or SVN?)
Problem description
===================
Windows has two different representations of filenames: 8-bit
fixed-width "ANSI" and 16-bit "Unicode" (effectively UTF-16). The 8-bit
representation depends on the locale settings of the computer; the lower
128 values correspond to ASCII, while the upper 128 values depend on the
locale settings of the computer; in Germany, for example, code page 1252
is typically used. (Very similar, but not identical to ISO-8859-1.)
When using standardized C / POSIX functions as HDF5 does (open, fopen,
etc.), which accept 8-bit strings, they will always assume the local
8-bit encoding. The problem is that the local 8-bit will never be able
to encode all possible filenames that the operating system supports, as
a fixed 8-bit encoding will never be able to encode all Unicode
characters. Furthermore, in some languages there are so many characters
that any fixed 8-bit encoding will never be able to represent all of them.
This in turn means that on Windows systems it is possible to have HDF5
fail to open a file if the file name (or the directory that contains it)
contains characters that are not representable in the local 8-bit
encoding of the system. For example, on a typical US Windows
installation it is not possible to use HDF5 to store files with names
that contain e.g. Japanese characters, even though the operating system
itself does support these.
To actually access all possible files Microsoft offers alternatives to
the standard functions that accept UTF-16 filenames in form of wchar_t
strings. There is _wopen() instead of open(), and _wfopen() instead of
fopen().
(For reference: other operating systems, such as Linux and Mac OS X,
always represent filenames as 8-bit strings; the operating system often
does not care about the precise encoding and leaves it up to the
software itself (though in practice this most likely will be UTF-8
nowadays), which means that the standard 8-bit APIs can always be used
to access any file on disk.)
Example consequences of this problem: GUI application, user chooses a
file from a "File Open" dialog, file name is converted appropriately and
passed to HDF5, HDF5 cannot load the file (that the user chose in the
same application) because the file (or a directory containing it)
contains characters that can't be represented in the local code page.
Rejected solutions
==================
The most obvious solution would be to simply provide additional
functions in HDF5 that also accept wchar_t filenames on Windows systems.
However, HDF5 has a large number of methods that simply pass through
file names (or maybe even manipulate them a bit) and this would lead to
a huge duplication of existing code, which I don't believe is a good
idea for the long-term maintenance of HDF5.
An alternative suggestion (see e.g. [1]) would be to always assume on
Windows systems that the filename supplied is encoded in UTF-8 (which,
due to being variable-length, can represent all possible characters) and
convert it to UTF-16 before passing it to the wide functions (_wopen,
_wfopen) directly. This has the advantage that now all filenames can be
represented. However this has the huge disadvantage that most software
does not expect HDF5 to accept UTF-8-encoded file names, and if a
program converts a string that it got from a "File Open" dialog into the
local 8bit codepage (as many programs would do now), any character in
the local code page beyond ASCII would cease to work (as UTF-8 encodes
them differently). For example, since the German umlauts Ä, Ö, Ü can be
represented in the local codepage, file names with these characters can
actually be opened on Windows systems with HDF5 at the moment (when
using German locale settings, at least), and this change would break
existing programs if it were to be added to HDF5 itself unconditionally.
Proposed solution
=================
I'd like to propose the following solution instead. It is based on the
UTF-8 encoding idea, but keeps compatibility with existing software.
- Default behavior: HDF5 behaves as it currently does and calls the
standard "ANSI" open(), fopen(), etc. functions. It will hence
continue to work with characters in the local code page.
- Add a boolean to the file access property list that may be used to
indicate that the file name is in UTF-8 on Windows systems (the
boolean will be ignored on all other operating systems):
H5Pset_windows_unicode_filenames(fapl, TRUE);
- Update the filesystem drivers to check for this flag, and if it
is set to actually do a conversion from UTF-8 to UTF-16 and then
call the corresponding wide functions.
The advantage is that current code doesn't break, but users who want to
properly support Windows can actually do so, they just need to ensure
they encode their filenames in UTF-8. The other main advantage is that
the patch is not very invasive.
I've attached (against 1.10.1) that implements this. The following is
currently supported:
- Property list flag accessors:
H5Pset_windows_unicode_filenames(fapl, value);
H5Pget_windows_unicode_filenames(fapl, &value);
- SEC2/Windows driver
- Core driver
- stdio driver
I've successfully tested this in the following constellation on a
Windows 10 system with German locale (using MinGW-w64/gcc7.2.0 as the
compiler, 64bit):
- Flag not set, files with Umlauts, calling HDF5 with the file names
encoded in the current codepage. (Compatibility check for existing
software.)
- Flag set, lots of different test cases (file names in pure ASCII,
German Umlauts, Japanese characters, Hebrew characters, Arabic
characters), calling HDF5 with the file names encoded in UTF-8
and the flag set in the FAPL before calling the HDF5 functions.
I tested all three drivers (SEC2, Core, stdio) in both cases.
I also tested that the patch doesn't break on Linux (Debian 9, gcc
7.2.0, 64bit x86) to ensure that the patches don't harm non-Windows
platforms.
What should work, but I haven't tested it:
- The FAMILY driver, as that just passes through the FAPL to the
underlying driver, and since UTF-8 is ASCII-compatible, any
manipulation done in the driver should be safe as well.
What I believe doesn't make sense to implement:
- The direct I/O driver. It appears to contain some Windows code, but
the CMake build system will never build it on Windows, so I left
that out. If that is wrong and the direct I/O driver should work on
Windows, I'll be happy to update the patch.
What I didn't implement yet:
- C++, Fortran and Java wrappers for the FAPL flag getters/setters
- External File Lists (EFL) support (H5Defl.c)
- HDF5 plugin libraries (H5PL.c)
- Logging driver (H5FDlog.c)
- Cache logging (H5Clog.c)
Feedback is appreciated, and it would be fantastic if this could be
included in a future version of HDF5. I would be willing to help out
with the missing pieces. I do think that those can be added
incrementally, and the current patch already improves the state of
affairs on Windows quite a bit.
For the avoidance of doubt: my employer agrees to license these changes
under the same license that HDF5 1.10.1 is licensed under.
Best regards,
Christian
References:
[1]
https://tschoonj.github.io/blog/2014/11/06/hdf5-on-windows-utf-8-filenames-support/
diff -ruNp hdf5-1.10.1.orig/src/H5FDcore.c hdf5-1.10.1/src/H5FDcore.c
--- hdf5-1.10.1.orig/src/H5FDcore.c 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5FDcore.c 2017-12-20 09:18:12.097603400 +0100
@@ -591,6 +591,11 @@ done:
* Programmer: Robb Matzke
* Thursday, July 29, 1999
*
+ * Modifications:
+ * Christian Seiler
+ * December 2017
+ * Support Windows Unicode filenames
+ *
*-------------------------------------------------------------------------
*/
static H5FD_t *
@@ -602,6 +607,7 @@ H5FD__core_open(const char *name, unsign
H5P_genplist_t *plist; /* Property list pointer */
#ifdef H5_HAVE_WIN32_API
struct _BY_HANDLE_FILE_INFORMATION fileinfo;
+ hbool_t filename_is_utf8 = FALSE;
#endif
h5_stat_t sb;
int fd = -1;
@@ -633,33 +639,69 @@ H5FD__core_open(const char *name, unsign
if(H5P_peek(plist, H5F_ACS_FILE_IMAGE_INFO_NAME, &file_image_info) < 0)
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, NULL, "can't get initial file image info")
+#ifdef H5_HAVE_WIN32_API
+ /* Retrieve initial file image info */
+ if(H5P_peek(plist, H5F_ACS_WINDOWS_UNICODE_FILENAMES_NAME, &filename_is_utf8) < 0)
+ HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, NULL, "can't get setting whether filename is Unicode")
+#endif
+
/* If the file image exists and this is an open, make sure the file doesn't exist */
HDassert(((file_image_info.buffer != NULL) && (file_image_info.size > 0)) ||
((file_image_info.buffer == NULL) && (file_image_info.size == 0)));
HDmemset(&sb, 0, sizeof(sb));
if((file_image_info.buffer != NULL) && !(H5F_ACC_CREAT & flags)) {
- if(HDopen(name, o_flags, 0666) >= 0)
+#ifdef H5_HAVE_WIN32_API
+ if(filename_is_utf8)
+ fd = HDopenw32u(name, o_flags, 0666);
+ else
+ fd = HDopen(name, o_flags, 0666);
+#else
+ fd = HDopen(name, o_flags, 0666);
+#endif
+ if(fd >= 0) {
+ HDclose(fd);
HGOTO_ERROR(H5E_FILE, H5E_FILEEXISTS, NULL, "file already exists")
+ }
/* If backing store is requested, create and stat the file
* Note: We are forcing the O_CREAT flag here, even though this is
* technically an open.
*/
if(fa->backing_store) {
- if((fd = HDopen(name, o_flags | O_CREAT, 0666)) < 0)
+#ifdef H5_HAVE_WIN32_API
+ if(filename_is_utf8)
+ fd = HDopenw32u(name, o_flags | O_CREAT, 0666);
+ else
+ fd = HDopen(name, o_flags | O_CREAT, 0666);
+#else
+ fd = HDopen(name, o_flags, 0666);
+#endif
+ if(fd < 0)
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to create file")
- if(HDfstat(fd, &sb) < 0)
+ if(HDfstat(fd, &sb) < 0) {
+ HDclose(fd);
HSYS_GOTO_ERROR(H5E_FILE, H5E_BADFILE, NULL, "unable to fstat file")
+ }
} /* end if */
} /* end if */
/* Open backing store, and get stat() from file. The only case that backing
* store is off is when the backing_store flag is off and H5F_ACC_CREAT is
* on. */
else if(fa->backing_store || !(H5F_ACC_CREAT & flags)) {
- if((fd = HDopen(name, o_flags, 0666)) < 0)
+#ifdef H5_HAVE_WIN32_API
+ if(filename_is_utf8)
+ fd = HDopenw32u(name, o_flags, 0666);
+ else
+ fd = HDopen(name, o_flags, 0666);
+#else
+ fd = HDopen(name, o_flags, 0666);
+#endif
+ if(fd < 0)
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to open file")
- if(HDfstat(fd, &sb) < 0)
+ if(HDfstat(fd, &sb) < 0) {
+ HDclose(fd);
HSYS_GOTO_ERROR(H5E_FILE, H5E_BADFILE, NULL, "unable to fstat file")
+ }
} /* end if */
/* Create the new file struct */
diff -ruNp hdf5-1.10.1.orig/src/H5FDsec2.c hdf5-1.10.1/src/H5FDsec2.c
--- hdf5-1.10.1.orig/src/H5FDsec2.c 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5FDsec2.c 2017-12-20 09:18:32.820811000 +0100
@@ -304,6 +304,11 @@ done:
* Programmer: Robb Matzke
* Thursday, July 29, 1999
*
+ * Modifications:
+ * Christian Seiler
+ * December 2017
+ * Support Windows Unicode filenames
+ *
*-------------------------------------------------------------------------
*/
static H5FD_t *
@@ -312,8 +317,10 @@ H5FD_sec2_open(const char *name, unsigne
H5FD_sec2_t *file = NULL; /* sec2 VFD info */
int fd = -1; /* File descriptor */
int o_flags; /* Flags for open() call */
+ H5P_genplist_t *plist; /* Property list pointer */
#ifdef H5_HAVE_WIN32_API
struct _BY_HANDLE_FILE_INFORMATION fileinfo;
+ hbool_t filename_is_utf8 = FALSE;
#endif
h5_stat_t sb;
H5FD_t *ret_value = NULL; /* Return value */
@@ -340,8 +347,27 @@ H5FD_sec2_open(const char *name, unsigne
if(H5F_ACC_EXCL & flags)
o_flags |= O_EXCL;
+#ifdef H5_HAVE_WIN32_API
+ if(H5P_DEFAULT != fapl_id) {
+ if(NULL == (plist = (H5P_genplist_t *)H5I_object(fapl_id)))
+ HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, NULL, "not a file access property list")
+
+ /* Retrieve initial file image info */
+ if(H5P_peek(plist, H5F_ACS_WINDOWS_UNICODE_FILENAMES_NAME, &filename_is_utf8) < 0)
+ HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, NULL, "can't get setting whether filename is Unicode")
+ }
+#endif
+
/* Open the file */
- if((fd = HDopen(name, o_flags, 0666)) < 0) {
+#ifdef H5_HAVE_WIN32_API
+ if(filename_is_utf8)
+ fd = HDopenw32u(name, o_flags, 0666);
+ else
+ fd = HDopen(name, o_flags, 0666);
+#else
+ fd = HDopen(name, o_flags, 0666);
+#endif
+ if(fd < 0) {
int myerrno = errno;
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to open file: name = '%s', errno = %d, error message = '%s', flags = %x, o_flags = %x", name, myerrno, HDstrerror(myerrno), flags, (unsigned)o_flags);
} /* end if */
diff -ruNp hdf5-1.10.1.orig/src/H5FDstdio.c hdf5-1.10.1/src/H5FDstdio.c
--- hdf5-1.10.1.orig/src/H5FDstdio.c 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5FDstdio.c 2017-12-20 10:16:14.291570700 +0100
@@ -46,6 +46,7 @@
#include <windows.h>
#include <io.h>
+#include <wchar.h>
#endif /* H5_HAVE_WIN32_API */
@@ -177,6 +178,12 @@ static herr_t H5FD_stdio_truncate(H5FD_t
static herr_t H5FD_stdio_lock(H5FD_t *_file, hbool_t rw);
static herr_t H5FD_stdio_unlock(H5FD_t *_file);
+static FILE *H5FD_stdio_fopen_maybe_wide(const char *name, hbool_t use_wide_api, const char *mode);
+static FILE *H5FD_stdio_freopen_maybe_wide(const char *name, hbool_t use_wide_api, const char *mode, FILE *f);
+#ifdef H5_HAVE_WIN32_API
+static wchar_t *wide_from_utf8(const char *string);
+#endif
+
static const H5FD_class_t H5FD_stdio_g = {
"stdio", /* name */
MAXADDR, /* maxaddr */
@@ -318,7 +325,7 @@ H5Pset_fapl_stdio(hid_t fapl_id)
*-------------------------------------------------------------------------
*/
static H5FD_t *
-H5FD_stdio_open( const char *name, unsigned flags, hid_t /*UNUSED*/ fapl_id,
+H5FD_stdio_open( const char *name, unsigned flags, hid_t fapl_id,
haddr_t maxaddr)
{
FILE *f = NULL;
@@ -330,7 +337,7 @@ H5FD_stdio_open( const char *name, unsig
#else /* H5_HAVE_WIN32_API */
struct stat sb;
#endif /* H5_HAVE_WIN32_API */
-
+ hbool_t use_wide_windows_api = false;
/* Sanity check on file offsets */
assert(sizeof(file_offset_t) >= sizeof(size_t));
@@ -348,17 +355,22 @@ H5FD_stdio_open( const char *name, unsig
if (ADDR_OVERFLOW(maxaddr))
H5Epush_ret(func, H5E_ERR_CLS, H5E_ARGS, H5E_OVERFLOW, "maxaddr too large", NULL)
+#ifdef H5_HAVE_WIN32_API
+ if (H5P_DEFAULT != fapl_id)
+ H5Pget_windows_unicode_filenames(fapl_id, &use_wide_windows_api);
+#endif
+
/* Tentatively open file in read-only mode, to check for existence */
if(flags & H5F_ACC_RDWR)
- f = fopen(name, "rb+");
+ f = H5FD_stdio_fopen_maybe_wide(name, use_wide_windows_api, "rb+");
else
- f = fopen(name, "rb");
+ f = H5FD_stdio_fopen_maybe_wide(name, use_wide_windows_api, "rb");
if(!f) {
/* File doesn't exist */
if(flags & H5F_ACC_CREAT) {
assert(flags & H5F_ACC_RDWR);
- f = fopen(name, "wb+");
+ f = H5FD_stdio_fopen_maybe_wide(name, use_wide_windows_api, "wb+");
write_access = 1; /* Note the write access */
}
else
@@ -370,7 +382,7 @@ H5FD_stdio_open( const char *name, unsig
H5Epush_ret(func, H5E_ERR_CLS, H5E_IO, H5E_FILEEXISTS, "file exists but CREAT and EXCL were specified", NULL)
} else if(flags & H5F_ACC_RDWR) {
if(flags & H5F_ACC_TRUNC)
- f = freopen(name, "wb+", f);
+ f = H5FD_stdio_freopen_maybe_wide(name, use_wide_windows_api, "wb+", f);
write_access = 1; /* Note the write access */
} /* end if */
/* Note there is no need to reopen if neither TRUNC nor EXCL are specified,
@@ -1163,6 +1175,140 @@ H5FD_stdio_unlock(H5FD_t *_file)
} /* end H5FD_stdio_unlock() */
+/*-------------------------------------------------------------------------
+ * Function: H5FD_stdio_fopen_maybe_wide
+ *
+ * Purpose: fopen() a file, potentially using Unicode filename APIs on
+ * Windows systems. Will always call fopen() on non-Windows
+ * systems.
+ *
+ * Return: A file handle as returned by fopen(), or NULL on failure.
+ *
+ * Programmer: Christian Seiler; December 2017
+ *
+ *-------------------------------------------------------------------------
+ */
+static FILE *
+H5FD_stdio_fopen_maybe_wide(const char *name, hbool_t use_wide_api, const char *mode)
+{
+#ifdef H5_HAVE_WIN32_API
+ if(use_wide_api) {
+ FILE *f = NULL;
+ int error;
+ wchar_t *wide_name = wide_from_utf8(name);
+ wchar_t *wide_mode = wide_from_utf8(mode);
+ if(NULL == wide_name || NULL == wide_mode) {
+ error = errno;
+ /* free() will ignore NULL pointers and only one of the
+ * conversions may have failed.
+ */
+ free(wide_name);
+ free(wide_mode);
+ errno = error;
+ return NULL;
+ }
+ f = _wfopen(wide_name, wide_mode);
+ error = errno;
+ free(wide_name);
+ free(wide_mode);
+ errno = error;
+ return f;
+ }
+#else
+ (void) use_wide_api;
+#endif
+ return fopen(name, mode);
+} /* end H5FD_stdio_fopen_maybe_wide() */
+
+
+/*-------------------------------------------------------------------------
+ * Function: H5FD_stdio_freopen_maybe_wide
+ *
+ * Purpose: freopen() a file, potentially using Unicode filename APIs
+ * on Windows systems. Will always call freopen() on
+ * non-Windows systems.
+ *
+ * Return: A file handle as returned by freopen(), or NULL on failure.
+ *
+ * Programmer: Christian Seiler; December 2017
+ *
+ *-------------------------------------------------------------------------
+ */
+static FILE *
+H5FD_stdio_freopen_maybe_wide(const char *name, hbool_t use_wide_api, const char *mode, FILE *f)
+{
+#ifdef H5_HAVE_WIN32_API
+ if(use_wide_api) {
+ FILE *f = NULL;
+ int error;
+ wchar_t *wide_name = wide_from_utf8(name);
+ wchar_t *wide_mode = wide_from_utf8(mode);
+ if(NULL == wide_name || NULL == wide_mode) {
+ error = errno;
+ /* free() will ignore NULL pointers and only one of the
+ * conversions may have failed.
+ */
+ free(wide_name);
+ free(wide_mode);
+ errno = error;
+ return NULL;
+ }
+ f = _wfreopen(wide_name, wide_mode, f);
+ error = errno;
+ free(wide_name);
+ free(wide_mode);
+ errno = error;
+ return f;
+ }
+#else
+ (void) use_wide_api;
+#endif
+ return freopen(name, mode, f);
+} /* end H5FD_stdio_freopen_maybe_wide() */
+
+
+#ifdef H5_HAVE_WIN32_API
+/*-------------------------------------------------------------------------
+ * Function: wide_from_utf8
+ *
+ * Purpose: Converts a UTF-8 string to a wide string. As this driver
+ * should only use public methods it may not use
+ * Wfilename_from_utf8 from the private HDF5 API. Hence it is
+ * necessary to reimplement this functionality.
+ *
+ * Return: A freshly allocated wide string converted from the UTF-
+ * string that was supplied to this function.
+ *
+ * Programmer: Christian Seiler; December 2017
+ *
+ *-------------------------------------------------------------------------
+ */
+static wchar_t *wide_from_utf8(const char *string)
+{
+ wchar_t *buffer;
+ int len;
+
+ if(0 >= (len = MultiByteToWideChar(CP_UTF8, 0, string, -1, NULL, 0))) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if(NULL == (buffer = (wchar_t *)malloc(len * sizeof(wchar_t)))) {
+ errno = ENOMEM;
+ return NULL;
+ }
+
+ if(0 >= (len = MultiByteToWideChar(CP_UTF8, 0, string, -1, buffer, len))) {
+ free(buffer);
+ errno = EINVAL;
+ return NULL;
+ }
+
+ return buffer;
+} /* end wide_from_utf8() */
+#endif
+
+
#ifdef _H5private_H
/*
* This is not related to the functionality of the driver code.
diff -ruNp hdf5-1.10.1.orig/src/H5Fprivate.h hdf5-1.10.1/src/H5Fprivate.h
--- hdf5-1.10.1.orig/src/H5Fprivate.h 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5Fprivate.h 2017-12-19 13:05:58.570599500 +0100
@@ -499,6 +499,7 @@
#define H5F_ACS_PAGE_BUFFER_SIZE_NAME "page_buffer_size" /* the maximum size for the page buffer cache */
#define H5F_ACS_PAGE_BUFFER_MIN_META_PERC_NAME "page_buffer_min_meta_perc" /* the min metadata percentage for the page buffer cache */
#define H5F_ACS_PAGE_BUFFER_MIN_RAW_PERC_NAME "page_buffer_min_raw_perc" /* the min raw data percentage for the page buffer cache */
+#define H5F_ACS_WINDOWS_UNICODE_FILENAMES_NAME "windows_unicode_filenames" /* Setting: whether Unicode filenames are to be used on Windows */
/* ======================== File Mount properties ====================*/
#define H5F_MNT_SYM_LOCAL_NAME "local" /* Whether absolute symlinks local to file. */
diff -ruNp hdf5-1.10.1.orig/src/H5Pfapl.c hdf5-1.10.1/src/H5Pfapl.c
--- hdf5-1.10.1.orig/src/H5Pfapl.c 2017-04-25 23:57:47.000000000 +0200
+++ hdf5-1.10.1/src/H5Pfapl.c 2017-12-19 13:27:26.407584600 +0100
@@ -244,6 +244,9 @@
#define H5F_ACS_PAGE_BUFFER_MIN_RAW_PERC_DEF 0
#define H5F_ACS_PAGE_BUFFER_MIN_RAW_PERC_ENC H5P__encode_unsigned
#define H5F_ACS_PAGE_BUFFER_MIN_RAW_PERC_DEC H5P__decode_unsigned
+/* Definition for setting for Windows Unicode filenames */
+#define H5F_ACS_WINDOWS_UNICODE_FILENAMES_SIZE sizeof(hbool_t)
+#define H5F_ACS_WINDOWS_UNICODE_FILENAMES_DEF FALSE
/******************/
@@ -375,6 +378,7 @@ static const H5AC_cache_image_config_t H
static const size_t H5F_def_page_buf_size_g = H5F_ACS_PAGE_BUFFER_SIZE_DEF; /* Default page buffer size */
static const unsigned H5F_def_page_buf_min_meta_perc_g = H5F_ACS_PAGE_BUFFER_MIN_META_PERC_DEF; /* Default page buffer minimum metadata size */
static const unsigned H5F_def_page_buf_min_raw_perc_g = H5F_ACS_PAGE_BUFFER_MIN_RAW_PERC_DEF; /* Default page buffer minumum raw data size */
+static const hbool_t H5F_def_windows_unicode_filenames_g = H5F_ACS_WINDOWS_UNICODE_FILENAMES_DEF; /* Default setting for Unicode filenames on Windows */
/*-------------------------------------------------------------------------
@@ -606,6 +610,11 @@ H5P__facc_reg_prop(H5P_genclass_t *pclas
NULL, NULL, NULL, NULL) < 0)
HGOTO_ERROR(H5E_PLIST, H5E_CANTINSERT, FAIL, "can't insert property into class")
+ /* Register the property of whether Unicode filenames are to be used on Windows. */
+ if(H5P_register_real(pclass, H5F_ACS_WINDOWS_UNICODE_FILENAMES_NAME, H5F_ACS_WINDOWS_UNICODE_FILENAMES_SIZE, &H5F_def_windows_unicode_filenames_g,
+ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL) < 0)
+ HGOTO_ERROR(H5E_PLIST, H5E_CANTINSERT, FAIL, "can't insert property into class")
+
done:
FUNC_LEAVE_NOAPI(ret_value)
} /* end H5P__facc_reg_prop() */
@@ -4807,3 +4816,71 @@ done:
FUNC_LEAVE_API(ret_value)
} /* end H5Pget_page_buffer_size() */
+
+/*-------------------------------------------------------------------------
+ * Function: H5Pset_windows_unicode_filenames
+ *
+ * Purpose: Indicates that filenames are encoded in UTF-8 on Windows
+ * systems. (This setting is ignored on other platforms.)
+ *
+ * Return: Non-negative on success/Negative on failure
+ *
+ * Programmer: Christian Seiler
+ * December 2017
+ *
+ *-------------------------------------------------------------------------
+ */
+herr_t
+H5Pset_windows_unicode_filenames(hid_t plist_id, hbool_t is_enabled)
+{
+ H5P_genplist_t *plist; /* Property list pointer */
+ herr_t ret_value = SUCCEED; /* return value */
+
+ FUNC_ENTER_API(FAIL)
+ H5TRACE2("e", "ib", plist_id, is_enabled);
+
+ /* Get the plist structure */
+ if(NULL == (plist = H5P_object_verify(plist_id, H5P_FILE_ACCESS)))
+ HGOTO_ERROR(H5E_ATOM, H5E_BADATOM, FAIL, "can't find object for ID")
+
+ /* Set size */
+ if(H5P_set(plist, H5F_ACS_WINDOWS_UNICODE_FILENAMES_NAME, &is_enabled) < 0)
+ HGOTO_ERROR(H5E_PLIST, H5E_CANTSET,FAIL, "can't set setting for Windows Unicode filenames")
+
+done:
+ FUNC_LEAVE_API(ret_value)
+} /* end H5Pset_windows_unicode_filenames() */
+
+
+/*-------------------------------------------------------------------------
+ * Function: H5Pget_windows_unicode_filenames
+ *
+ * Purpose: Retrieves whether filenames are assumed to be encoded as UTF-8
+ *
+ * Return: Non-negative on success/Negative on failure
+ *
+ * Programmer: Christian Seiler
+ * December 2017
+ *
+ *-------------------------------------------------------------------------
+ */
+herr_t
+H5Pget_windows_unicode_filenames(hid_t plist_id, hbool_t *is_enabled)
+{
+ H5P_genplist_t *plist; /* Property list pointer */
+ herr_t ret_value = SUCCEED; /* return value */
+
+ FUNC_ENTER_API(FAIL)
+ H5TRACE2("e", "i*b", plist_id, is_enabled);
+
+ /* Get the plist structure */
+ if(NULL == (plist = H5P_object_verify(plist_id, H5P_FILE_ACCESS)))
+ HGOTO_ERROR(H5E_ATOM, H5E_BADATOM, FAIL, "can't find object for ID")
+
+ if(is_enabled)
+ if(H5P_get(plist, H5F_ACS_WINDOWS_UNICODE_FILENAMES_NAME, is_enabled) < 0)
+ HGOTO_ERROR(H5E_PLIST, H5E_CANTGET,FAIL, "can't get setting for Windows Unicode filenames")
+
+done:
+ FUNC_LEAVE_API(ret_value)
+} /* end H5Pget_windows_unicode_filenames() */
diff -ruNp hdf5-1.10.1.orig/src/H5Ppublic.h hdf5-1.10.1/src/H5Ppublic.h
--- hdf5-1.10.1.orig/src/H5Ppublic.h 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5Ppublic.h 2017-12-19 13:11:06.834656400 +0100
@@ -369,6 +369,8 @@ H5_DLL herr_t H5Pset_mdc_image_config(hi
H5_DLL herr_t H5Pget_mdc_image_config(hid_t plist_id, H5AC_cache_image_config_t *config_ptr /*out*/);
H5_DLL herr_t H5Pset_page_buffer_size(hid_t plist_id, size_t buf_size, unsigned min_meta_per, unsigned min_raw_per);
H5_DLL herr_t H5Pget_page_buffer_size(hid_t plist_id, size_t *buf_size, unsigned *min_meta_per, unsigned *min_raw_per);
+H5_DLL herr_t H5Pset_windows_unicode_filenames(hid_t plist_id, hbool_t is_enabled);
+H5_DLL herr_t H5Pget_windows_unicode_filenames(hid_t plist_id, hbool_t *is_enabled);
/* Dataset creation property list (DCPL) routines */
H5_DLL herr_t H5Pset_layout(hid_t plist_id, H5D_layout_t layout);
diff -ruNp hdf5-1.10.1.orig/src/H5system.c hdf5-1.10.1/src/H5system.c
--- hdf5-1.10.1.orig/src/H5system.c 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5system.c 2017-12-19 16:41:05.965182100 +0100
@@ -911,6 +911,7 @@ Wflock(int fd, int operation) {
} /* end Wflock() */
+
/*--------------------------------------------------------------------------
* Function: Wnanosleep
*
@@ -931,6 +932,117 @@ Wnanosleep(const struct timespec *req, s
} /* end Wnanosleep() */
+ /*--------------------------------------------------------------------------
+ * Function: Wopen_unicode
+ *
+ * Purpose: Open a file on Windows, assuming the file name is encoded in
+ * UTF-8
+ *
+ * Return: File descriptor
+ *
+ * Programmer: Christian Seiler
+ * Winter 2017
+ *--------------------------------------------------------------------------
+ */
+int
+Wopen_unicode(const char* file_name_utf8, int flags, ...)
+{
+ wchar_t *wide_name;
+ int fd;
+ int mode;
+ int error;
+ va_list ap;
+
+ if(NULL == (wide_name = Wfilename_from_utf8(file_name_utf8)))
+ return -1;
+
+ if(0 != (flags & _O_CREAT)) {
+ va_start(ap, flags);
+ mode = va_arg(ap, int);
+ va_end(ap);
+ fd = _wopen(wide_name, flags, mode);
+ } else {
+ fd = _wopen(wide_name, flags);
+ }
+ error = errno;
+ H5MM_free(wide_name);
+ if(0 > fd) {
+ errno = error;
+ return -1;
+ }
+ return fd;
+} /* end Wopen_unicode() */
+
+
+ /*--------------------------------------------------------------------------
+ * Function: Wstat_unicode
+ *
+ * Purpose: Stat a file on Windows, assuming the file name is encoded in
+ * UTF-8
+ *
+ * Return: 0 Success
+ * -1 Failure
+ *
+ * Programmer: Christian Seiler
+ * Winter 2017
+ *--------------------------------------------------------------------------
+ */
+int
+Wstat_unicode(const char* file_name_utf8, h5_stat_t *buf)
+{
+ wchar_t *wide_name;
+ int ret;
+ int error;
+
+ if(NULL == (wide_name = Wfilename_from_utf8(file_name_utf8)))
+ return -1;
+
+ ret = _wstati64(wide_name, buf);
+ error = errno;
+ H5MM_free(wide_name);
+ errno = error;
+ return ret;
+} /* end Wstat_unicode() */
+
+
+ /*--------------------------------------------------------------------------
+ * Function: Wfilename_from_utf8
+ *
+ * Purpose: Convert a filename from UTF-8 to Windows's internal wide
+ * string representation (UTF-16 based)
+ *
+ * Return: The wide string, allocated with H5MM_malloc
+ *
+ * Programmer: Christian Seiler
+ * Winter 2017
+ *--------------------------------------------------------------------------
+ */
+wchar_t *
+Wfilename_from_utf8(const char* file_name_utf8)
+{
+ wchar_t *buffer;
+ int len;
+
+ if(0 >= (len = MultiByteToWideChar(CP_UTF8, 0, file_name_utf8, -1, NULL, 0))) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if(NULL == (buffer = (wchar_t *)H5MM_malloc(len * sizeof(wchar_t)))) {
+ errno = ENOMEM;
+ return NULL;
+ }
+
+ if(0 >= (len = MultiByteToWideChar(CP_UTF8, 0, file_name_utf8, -1, buffer, len))) {
+ H5MM_free(buffer);
+ errno = EINVAL;
+ return NULL;
+ }
+
+ return buffer;
+} /* end Wfilename_from_utf8() */
+
+
/*-------------------------------------------------------------------------
* Function: Wllround, Wllroundf, Wlround, Wlroundf, Wround, Wroundf
*
diff -ruNp hdf5-1.10.1.orig/src/H5win32defs.h hdf5-1.10.1/src/H5win32defs.h
--- hdf5-1.10.1.orig/src/H5win32defs.h 2017-04-25 23:45:02.000000000 +0200
+++ hdf5-1.10.1/src/H5win32defs.h 2017-12-19 13:15:13.177111200 +0100
@@ -51,11 +51,13 @@ typedef __int64 h5_stat_size
* transformations when performing I/O.
*/
#define HDopen(S,F,M) _open(S,F|_O_BINARY,M)
+#define HDopenw32u(S,F,M) Wopen_unicode(S,F|_O_BINARY,M)
#define HDread(F,M,Z) _read(F,M,Z)
#define HDrmdir(S) _rmdir(S)
#define HDsetvbuf(F,S,M,Z) setvbuf(F,S,M,(Z>1?Z:2))
#define HDsleep(S) Sleep(S*1000)
#define HDstat(S,B) _stati64(S,B)
+#define HDstatw32u(S,B) Wstat_unicode(S,B)
#define HDstrcasecmp(A,B) _stricmp(A,B)
#define HDstrdup(S) _strdup(S)
#define HDtzset() _tzset()
@@ -115,6 +117,9 @@ extern "C" {
H5_DLL int c99_snprintf(char* str, size_t size, const char* format, ...);
H5_DLL int c99_vsnprintf(char* str, size_t size, const char* format, va_list ap);
H5_DLL int Wnanosleep(const struct timespec *req, struct timespec *rem);
+ H5_DLL int Wopen_unicode(const char* file_name_utf8, int flags, ...);
+ H5_DLL int Wstat_unicode(const char* file_name_utf8, h5_stat_t *buf);
+ H5_DLL wchar_t *Wfilename_from_utf8(const char* file_name_utf8);
/* Round functions only needed for VS2012 and earlier.
* They are always built to ensure they don't go stale and
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5