Hi Quincy,

On Tue, Oct 12, 2010 at 02:28:41PM -0500, Quincey Koziol wrote:
> On Oct 12, 2010, at 2:24 PM, Jens Thoms Toerring wrote:
> >>> Finally, there's another thing perhaps someone can help me
> >>> with: I tried to create some 120.000 1D data sets, about
> >>> 200 bytes large and each in it's own group. This resulted
> >>> in a huge overhead in the file: instead of the expected file
> >>> size of arond 24 MB (of course plus a bit for overhead) the
> >>> files were about 10 times larger than expected. Using a number
> >>> (30) of 2D data sets (with 4000 rows) took care of this but I
> >>> am curious why this makes such a big difference.
> >> 
> >>    Did you create them as chunked datasets?  And, what were the dimensions
> >> of the chunk sizes you used?
> > 
> > No, those were simple 1-dimensional data sets, written out in a
> > single call immediately after creation and then closed. Perhps
> > having them all in their own group makes a difference? What I
> > noticed was that h5dump on the resulting file told me under
> > Storage information/Groups that for B-tree/List about 140 MB
> > were used...
> 
>   This is very weird, can you send a sample program that shows this result?

Here's a stripped down version of my original program: it now
just creates 100.000 datasets with 5 doubles, each within its
own group. The amount of "real" data, including strings for
group and dataset names should be about 5 MB, but the file I
get with HDF5, version 1.8.5, is nearly 144 MB large. I expect
a certain amount of overhead, of course, but that ratio was a
bit astonishing;-)

If I leave out the creation of the datasets (i.e. just create
100.000 groups) the size of the file drops to about 80 MB,
so creating a single group seems to "cost" about 800 byte.
Creating just 100.000 datasets (without groups) seems to be
less expensive, here the overhead seems to be in the order
of 350 bytes per dataset. Does that seems reasonable to you?

                            Best regards, Jens

------------- h5_test.cpp ----------------------------------------

#include <iostream>
#include <sstream>
#include <stack>
#include <vector>
#include <string>
#include "H5Cpp.h"

using namespace std;
using namespace H5;

class HDF5Writer {

  public:

    HDF5Writer( H5std_string const & fileName )
    {
        m_file = new H5File( fileName, H5F_ACC_TRUNC );
        m_group = new Group( m_file->openGroup( "/" ) );
    }

    ~HDF5Writer( )
    {
        while ( ! m_group_stack.empty( ) )
            closeGroup( );
        m_group->close( );
        delete m_group;
        m_file->close( );
        delete m_file;
    }

    void createGroup( H5std_string const & name)
    {
        m_group_stack.push( m_group );
        m_group = new Group( m_group->createGroup( name ) );
    }

    void closeGroup( )
    {
        m_group->close( );
        delete m_group;
        m_group = m_group_stack.top( );
        m_group_stack.pop( );
    }

    void writeVector( H5std_string const     & name,
                      vector< double > const & data )
    {
        hsize_t dim[ ] = { data.size( ) };
        DataSpace dataspace( 1, dim );
        DataSet dataset( m_group->createDataSet( name, PredType::IEEE_F64LE,
                                                 dataspace ) );
        dataset.write( &data.front( ), PredType::NATIVE_DOUBLE );
        dataset.close( );
        dataspace.close( );
    }

  private:

    H5File * m_file;
    Group * m_group;
    stack< Group * > m_group_stack;
};

int main( )
{
    HDF5Writer w( "test.h5" );
    vector< double > arr( 5, 0 );
                
    for ( size_t i = 0; i < 100000; i++ )
    {
        ostringstream cname;
        cname << "g" << i;
        w.createGroup( cname.str( ) );
        w.writeVector( "d", arr );
        w.closeGroup( );
    }
}

-- 
  \   Jens Thoms Toerring  ________      [email protected]
   \_______________________________      http://toerring.de

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to