Re: [Hdf-forum] Collective IO and filters

Michael K. Edwards Thu, 09 Nov 2017 13:46:22 -0800

I observe this comment in the H5Z-blosc code:

    /* Allocate an output buffer exactly as long as the input data; if
       the result is larger, we simply return 0.  The filter is flagged
       as optional, so HDF5 marks the chunk as uncompressed and
       proceeds.
    */


In my current setup, I have not marked the filter with
H5Z_FLAG_MANDATORY, for this reason.  Is this comment accurate for the
collective filtered path, or is it possible that the zero return code
is being treated as "compressed data is zero bytes long"?



On Thu, Nov 9, 2017 at 1:37 PM, Michael K. Edwards
<[email protected]> wrote:
> Thank you for the explanation.  That's consistent with what I see when
> I add a debug printf into H5D__construct_filtered_io_info_list().  So
> I'm now looking into the filter situation.  It's possible that the
> H5Z-blosc glue is mishandling the case where the compressed data is
> larger than the uncompressed data.
>
> About to write 12 of 20
> About to write 0 of 20
> About to write 0 of 20
> About to write 8 of 20
> Rank 0 selected 12 of 20
> Rank 1 selected 8 of 20
> HDF5-DIAG: Error detected in HDF5 (1.11.0) MPI-process 0:
>   #000: H5Dio.c line 319 in H5Dwrite(): can't prepare for writing data
>     major: Dataset
>     minor: Write failed
>   #001: H5Dio.c line 395 in H5D__pre_write(): can't write data
>     major: Dataset
>     minor: Write failed
>   #002: H5Dio.c line 836 in H5D__write(): can't write data
>     major: Dataset
>     minor: Write failed
>   #003: H5Dmpio.c line 1019 in H5D__chunk_collective_write(): write error
>     major: Dataspace
>     minor: Write failed
>   #004: H5Dmpio.c line 934 in H5D__chunk_collective_io(): couldn't
> finish filtered linked chunk MPI-IO
>     major: Low-level I/O
>     minor: Can't get value
>   #005: H5Dmpio.c line 1474 in
> H5D__link_chunk_filtered_collective_io(): couldn't process chunk entry
>     major: Dataset
>     minor: Write failed
>   #006: H5Dmpio.c line 3278 in
> H5D__filtered_collective_chunk_entry_io(): couldn't unfilter chunk for
> modifying
>     major: Data filters
>     minor: Filter operation failed
>   #007: H5Z.c line 1256 in H5Z_pipeline(): filter returned failure during read
>     major: Data filters
>     minor: Read failed
>
>
>
> On Thu, Nov 9, 2017 at 1:02 PM, Jordan Henderson
> <[email protected]> wrote:
>> For the purpose of collective I/O it is true that all ranks must call
>> H5Dwrite() so that they can participate in those collective operations that
>> are necessary (the file space re-allocation and so on). However, even though
>> they called H5Dwrite() with a valid memspace, the fact that they have a NONE
>> selection in the given file space should cause their chunk-file mapping
>> struct (see lines 357-385 of H5Dpkg.h for the struct's definition and the
>> code for H5D__link_chunk_filtered_collective_io() to see how it uses this
>> built up list of chunks selected in the file) to contain no entries in the
>> "fm->sel_chunks" field. That alone should mean that during the chunk
>> redistribution, they will not actually send anything at all to any of the
>> ranks. They only participate there for the sake that, were the method of
>> redistribution modified, ranks which previously had no chunks selected could
>> potentially be given some chunks to work on.
>>
>>
>> For all practical purposes, every single chunk_entry seen in the list from
>> rank 0's perspective should be a valid I/O caused by some rank writing some
>> positive amount of bytes to the chunk. On rank 0's side, you should be able
>> to check the io_size field of each of the chunk_entry entries and see how
>> big the I/O is from the "original_owner" to that chunk. If any of these are
>> 0, something is likely very wrong. If that is indeed the case, you could
>> likely pull a hacky workaround by manually removing them from the list, but
>> I'd be more concerned about the root of the problem if there are zero-size
>> I/O chunk_entry entries being added to the list.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Collective IO and filters

Reply via email to