Hello,

I did some tests and there seems a limitation on number of append_records
in the HDF5 library.

I wrote a simple test where there is a for loop that keeps appending 1
record to a table in a h5 file. After each append I flush the h5 file. I
found that the process always encounters the same "H5SL_insert_common():
can't insert duplicate key" error when running iteration # 322631, ie, the
322631th append_records fails.

A ugly fix to this is to re-open the API at the end of each iteration, ie,
adding "H5close()" & "H5open". The process runs to the end (iteration 2
miilion) without hitting the error.

So it seems that it's the limitation of key space (size of variable) inside
the library that causes the duplicate key error?
And it's resulted from calling "H5Tget_member_type()" in
H5TBappend_records. The trace is:

H5TBappend_table() -> H5TBcreate_type() -> H5Tget_member_type() -> .... ->
H5SL_insert_common() -> ERROR

Any quick fix to this?
Maybe modify the H5SL_INSERT MACRO a little bit to avoid this?

Best,

Ching-Chia




On Thu, Sep 11, 2014 at 10:51 AM, Ching-Chia Wang <[email protected]>
wrote:

>
>     Hello,
>
> I have encountered an error which seems non-deterministic. 2 other
> developers have reported the same issue but there was no conclusion made
> previously:
>
> 1.
> http://hdf-forum.184993.n3.nabble.com/H5SL-insert-common-can-t-insert-duplicate-key-td4026817.html
> 2.
> http://hdf-forum.184993.n3.nabble.com/H5SL-duplicate-key-errors-with-HDF5-1-8-13-td4027300.html
>
> My process is a single-threaded write from CSV to HDF5 under a couple of
> different configurations such as compression algorithms (I use LZO and
> BLOSC filters), chunk sizes, and incoming dataset sizes(number of rows). I
> only used HDF5 Table API, and I made some minor modification to
> H5TBmake_table, which enables me to use other kinds of compressors and
> compression levels. No other changes to the API code.
>
> I have 100 configurations to be used sequentially, and each one creates
> one new H5 file. All configs were fed with the same input csv files. The H5
> file size will be about 2-3GB with compression enabled. The whole process
> takes hours to complete.
>
> 1 of the configs encountered failure from the function
> H5TBappend_records(), but as I singled out the problematic config and ran
> the write again with it, the whole writing process went smoothly. In other
> words, the error appears to only occur when running the 100 configs all at
> once.
>
>
> The complete error message is as follows:
> -----
> Running Configuration #22/100:
>
> HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 0:
>   #000: H5Tnative.c line 122 in H5Tget_native_type(): unable to register
> data type
>     major: Datatype
>     minor: Unable to register new atom
>   #001: H5I.c line 895 in H5I_register(): can't insert ID node into skip
> list
>     major: Object atom
>     minor: Unable to insert object
>   #002: H5SL.c line 995 in H5SL_insert(): can't create new skip list node
>     major: Skip Lists
>     minor: Unable to insert object
>   #003: H5SL.c line 687 in H5SL_insert_common(): can't insert duplicate key
>     major: Skip Lists
>     minor: Unable to insert object
>
> H5TBappend_table returns negative value at file: test_file.csv
> -----
>
> I am sure that the process did not try to insert the same key to the group
> hierarchy more than once, because as I re-ran the one problematic
> configuration, I could not reproduce the error. The process instead ran
> correctly.
>
> More, on different boxes, the process encountered the error in different
> configurations.
>
> I checked the code of the function H5SL_insert_common(), is it possible
> that the hashval is running out of space?
>
> Any idea about the reason behind this error? What is a skip-list? Any
> possible walk-around?
>
>
>
> My HDF5 installation configuration is as follows (built with thread-safe
> already):
>
> ----------------------------------------------------------------
>
>         SUMMARY OF THE HDF5 CONFIGURATION
>         =================================
>
> General Information:
> -------------------
>            HDF5 Version: 1.8.13
>           Configured on: Mon Sep  8 10:45:51 EDT 2014
>           Configured by: hidden
>          Configure mode: production
>             Host system: x86_64-unknown-linux-gnu
>           Uname information: Linux  2.6.32-431.20.3.el6.x86_64 #1 SMP Thu
> Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>                Byte sex: little-endian
>               Libraries: static, shared
>          Installation point: hidden
>
> Compiling Options:
> ------------------
>                Compilation Mode: production
>                      C Compiler: /usr/bin/gcc ( gcc (GCC) 4.6.2 20111027 )
>                          CFLAGS:
>                       H5_CFLAGS: -std=c99 -pedantic -Wall -Wextra -Wundef
> -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align
> -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes
> -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls
> -Wnested-externs -Winline -Wno-long-long -Wfloat-equal
> -Wmissing-format-attribute -Wmissing-noreturn -Wpacked
> -Wdisabled-optimization -Wformat=2 -Wendif-labels
> -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch
> -Wvariadic-macros -Wnonnull -Winit-self -Wmissing-include-dirs
> -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations
> -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla
> -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat
> -Wstrict-aliasing -Wstrict-overflow=5 -Wjump-misses-init
> -Wunsuffixed-float-constants -Wdouble-promotion -Wsuggest-attribute=const
> -Wtrampolines -O3 -fomit-frame-pointer -finline-functions
>                       AM_CFLAGS:
>                        CPPFLAGS:
>                     H5_CPPFLAGS: -D_POSIX_C_SOURCE=199506L   -DNDEBUG
> -UH5_DEBUG_API
>                     AM_CPPFLAGS: -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
> -D_BSD_SOURCE
>                Shared C Library: yes
>                Static C Library: yes
>   Statically Linked Executables: no
>                         LDFLAGS:
>                      H5_LDFLAGS:
>                      AM_LDFLAGS:  -L/usr/local/szlib/lib/lib
>         Extra libraries:  -lpthread -lz -lrt -ldl -lm
>                Archiver: ar
>              Ranlib: ranlib
>           Debugged Packages:
>             API Tracing: no
>
> Languages:
> ----------
>                         Fortran: no
>
>                             C++: yes
>                    C++ Compiler: /usr/bin/c++
>                       C++ Flags:
>                    H5 C++ Flags:
>                    AM C++ Flags:
>              Shared C++ Library: yes
>              Static C++ Library: yes
>
> Features:
> ---------
>                   Parallel HDF5: no
>              High Level library: yes
>                    Threadsafety: yes
>             Default API Mapping: v18
>  With Deprecated Public Symbols: yes
>          I/O filters (external): deflate(zlib)
>          I/O filters (internal): shuffle,fletcher32,nbit,scaleoffset
>                             MPE: no
>                      Direct VFD: no
>                         dmalloc: no
> Clear file buffers before write: yes
>            Using memory checker: no
>          Function Stack Tracing: no
>       Strict File Format Checks: no
>    Optimization Instrumentation: no
>        Large File Support (LFS): yes
>
>
> -----------------------------------------------------------------------
>
> Thanks.
>
> Best,
> Ching-Chia
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to