[Hdf-forum] RFC: libHDF5 to support row and column major storage?

Jason Newton Mon, 11 May 2015 21:50:35 -0700

Hi -

I've been a evangelist for HDF5 for a few of years now, it is a noble and
amazing library that solves data storage issues occurring with scientific
and beyond applications - e.g. it can save many developers from wasting
time and money so they can spend that on solving more original problems.
But you guys knew that already.  I think there's been a mistake though -
that is the lack of first class column-vs-row major storage.  In a world
where we are split down the middle on what format we used based on what
application, library and language we use we work in one or the other it is
an ongoing reality that there will never be one true standard to follow.
But HDF5 sought to only support row-major - and I can back that up -
standardizing is a good thing.  But then as time has shown, that really
didn't work for alot of folks - such as those in Matlab and fortran - when
they read our data - it looks transposed to them!  When HDF5 utils/our code
sees their data - it looks transposed to us!  These are arguably the users
you do not want to face these difficulties  as it makes it down right
embarrassing at times and hard to work around in within that language
(ahem, Matlab again is painful to work with).  Not only that but it doesn't
really scale - it will always take some manual fixing and there's no
standardized mark for whether a dataset is one of these column major
masquerading datasets.  So let me assure you this is quite ugly to deal
with in Matlab/etc and doesn't seem to be the path many people take - and
it can require skills many people don't have or understanding that they
can't give.


But then, why did we allow saving column major data in a row based standard
in the first place?  Well, the answer seems to be performance.  Surely it
can't take that long to convert the datasets - most of the time at least -
although there would for sure be some memory based limitations to allow
transposing just as HDF IOs. But alas - the current state of the library
indicates otherwise and thus is the users job to handle correctly
transforming the data back and forth between application and party.  But
wait - wasn't this kind of activity what HDF5 was built to alleviate in the
first place?

So then how do we rectify the situation?  Well speaking as a developer
using HDF5 extensively and writing libraries for it - it looks to me it
should be in the core library as it is exceedingly messy to handle on the
user side each time.  I think the interpretation of the dataset and it's
dimensions should be based on dataset creation properties.  This would
allow an official marking of what kind of interpretation the raw storage of
the data (and dimensions?) are.  However, this is only half of the battle.
We'd need something like the type conversion system to permute order in all
the right places if the user needs to IO an opposing storage layout.  And
it should be fast and light on memory.  Perhaps it would merely operate
inplace as a new utility subroutine taking in the mem_type and user memory.
However I can still think of one problem this does not address: compound
types using  a mixture of philosophies with fields being the opposite to
the dataset layout - and this case has me completely stumped as this
indicates it should be type level as well.  The compound part of this is a
sticky situation but I'd still motion that the dataset creation property
works for most things that occur in practice.

So... has the HDF5 group tried to deal with this wart yet?  Let me know if
anything is on the drawing board.


-Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

[Hdf-forum] RFC: libHDF5 to support row and column major storage?

Reply via email to