Hello HDF people,

I am relatively new (~5 months) to the HDF concept but I am becoming a true 
enthusiast. 

I am using Java HDF5 Interface (JHI5) 
<https://support.hdfgroup.org/products/java/JNI3/jhi5/index.html> in Java 
programs. I am interested in simple ways to use HDF5 as a binder for a 
collection of diverse datatypes with annotations at the level of the file as 
well as at the level of individual datasets. Clearly, HDF5 Attribute API 
provides a rich framework for purposes of annotation. However, I want to have 
only String annotations, so I am interested in String attributes. 

I want to use UTF-8 encoded Strings  of arbitrary length. I was looking for 
some explanations regarding UTF-8 encoding and I found this 

https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/ 
<https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/>

In particular this document has the following snippet:
For example, the following commands could be used to create an 8-character, 
UTF-8 encoded, string datatype for use in either an attribute or dataset:
    datatype_id = H5Tcopy(H5T_C_S1) ;
    error =       H5Tset_cset(datatype_id, H5T_CSET_UTF8) ;
    error =       H5Tset_size(datatype_id, "8") ;             
This is puzzling because “set_size” functionality can work properly only if the 
size required by UTF-8 String is known in advance. However, in general this is 
not the case because UTF-8 characters may take from 1 to 4 bytes. I understand 
that most of HDF5 users use ASCII al the time and in that case this will work. 
Still, in general case it seems to be plain wrong.

In other words, the only proper way to create a UTF-encoded string datatype is 
to provide a function which computes the size from the string object itself. 

In fact, I currently store my String attributes just as 1D byte array datasets. 
It is very easy to convert between Strings and byte arrays in Java. Works fine 
for me. My only discomfort is that the resulting HDF5 file is not a “proper” 
HDF5 file, in a sense that a 3-rd party reader of my HDF5 file will not be able 
to interpret these attributes without additional information. 

Summarizing, I would like to write and read UTF-8 strings as attributes using 
Java and still preserve fully the "self-describing” feature of the HDF5 format. 
Please advice.

Thanks,
Alexey      

 
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to