luccadibe opened a new pull request, #2369:
URL: https://github.com/apache/systemds/pull/2369

   This PR aims to add new tests to the HDF5Readers using new test input 
datasets generated with a new R script. It is meant as a first step for 
[SYSTEMDS-3929](https://issues.apache.org/jira/browse/SYSTEMDS-3929) .
   
   The existing tests used three datasets which were commited , for example 
[src/test/scripts/functions/io/hdf5/in/transfusion_1.h5](https://github.com/apache/systemds/blob/79122eb1f3d6ea9d9b4457a8d1550e7b032a0707/src/test/scripts/functions/io/hdf5/in/transfusion_1.h5)
 , I could not find a generator script in the codebase for these.
   
   In this PR, the test .h5 files generated with the script include datasets 
with different schemas  ( 2d,3d,4d) and datatypes (doubles, integers, strings) 
and is a table-driven test in a single file. All data is generated using R and 
the library "`rhdf5`" , which was the one already being used before for 
validation.
   
   The test loads the dataset with systemds and R and verifies that the outputs 
match using `TestUtils.compareMatrices` .
   
   Currently, all tests fail , due to some message types (11 and 12) not being 
supported by the HDF5 implementation in systemds.
   Message 11 is the [Filter Pipeline 
Message](https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t4.html#subsubsec_fmt4_dataobject_hdr_msg_layout:~:text=Flags%20is%20set.-,IV.A.2.l.%20The%20Data%20Storage%20%2D%20Filter%20Pipeline%20Message,-Header%20Message%20Name)
   Message 12 is the [Attribute 
Message](https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t4.html#subsubsec_fmt4_dataobject_hdr_msg_layout:~:text=in%20the%20array.-,IV.A.2.m.%20The%20Attribute%20Message,-Header%20Message%20Name)
   
   These messages seem to be applied by default by `rhdf5` version 2.54.0 .
   
   Please correct me if I'm wrong: as the ReaderHDF5 implements MatrixReader, 
only 2d datasets are supported, and this implementation should flatten higher 
dimensional datasets into 2d.
   The sytemds implementation currently assumes only 2d datasets:
   
   
[H5RootObject.java](https://github.com/apache/systemds/blob/79122eb1f3d6ea9d9b4457a8d1550e7b032a0707/src/main/java/org/apache/sysds/runtime/io/hdf5/H5RootObject.java#L181C2-L186C1)
   ```java
        public void setDimensions(int[] dimensions) {
                this.dimensions = dimensions;
                this.row = dimensions[0];
                this.col = dimensions[1];
        }
   ```
   I would like to know what systemds aims to support regarding hdf5 so the 
tests can reflect that, after which I can start working on fixing bugs / 
implementing the missing features potentially.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to