luccadibe opened a new pull request, #2369: URL: https://github.com/apache/systemds/pull/2369
This PR aims to add new tests to the HDF5Readers using new test input datasets generated with a new R script. It is meant as a first step for [SYSTEMDS-3929](https://issues.apache.org/jira/browse/SYSTEMDS-3929) . The existing tests used three datasets which were commited , for example [src/test/scripts/functions/io/hdf5/in/transfusion_1.h5](https://github.com/apache/systemds/blob/79122eb1f3d6ea9d9b4457a8d1550e7b032a0707/src/test/scripts/functions/io/hdf5/in/transfusion_1.h5) , I could not find a generator script in the codebase for these. In this PR, the test .h5 files generated with the script include datasets with different schemas ( 2d,3d,4d) and datatypes (doubles, integers, strings) and is a table-driven test in a single file. All data is generated using R and the library "`rhdf5`" , which was the one already being used before for validation. The test loads the dataset with systemds and R and verifies that the outputs match using `TestUtils.compareMatrices` . Currently, all tests fail , due to some message types (11 and 12) not being supported by the HDF5 implementation in systemds. Message 11 is the [Filter Pipeline Message](https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t4.html#subsubsec_fmt4_dataobject_hdr_msg_layout:~:text=Flags%20is%20set.-,IV.A.2.l.%20The%20Data%20Storage%20%2D%20Filter%20Pipeline%20Message,-Header%20Message%20Name) Message 12 is the [Attribute Message](https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t4.html#subsubsec_fmt4_dataobject_hdr_msg_layout:~:text=in%20the%20array.-,IV.A.2.m.%20The%20Attribute%20Message,-Header%20Message%20Name) These messages seem to be applied by default by `rhdf5` version 2.54.0 . Please correct me if I'm wrong: as the ReaderHDF5 implements MatrixReader, only 2d datasets are supported, and this implementation should flatten higher dimensional datasets into 2d. The sytemds implementation currently assumes only 2d datasets: [H5RootObject.java](https://github.com/apache/systemds/blob/79122eb1f3d6ea9d9b4457a8d1550e7b032a0707/src/main/java/org/apache/sysds/runtime/io/hdf5/H5RootObject.java#L181C2-L186C1) ```java public void setDimensions(int[] dimensions) { this.dimensions = dimensions; this.row = dimensions[0]; this.col = dimensions[1]; } ``` I would like to know what systemds aims to support regarding hdf5 so the tests can reflect that, after which I can start working on fixing bugs / implementing the missing features potentially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
