Re: [Hdf-forum] Slow Reading 721GB File in Parallel

Mohamad Chaarawi Wed, 30 May 2012 16:51:49 -0700

On 5/30/2012 5:27 PM, chrisyeshi wrote:

The documentation of the system I am using only describes how tochange the stripe size and stripe count. It doesn't provide guidelinesabout how many it should be. What would be a common stripe count andstripe size values for a ~1TB file?

I would go with the maximum available for the stripe count. You can tryand experiment with the stripe size, maybe 32 MB would be good..Increasing ROMIO's cb_buffer_size through an MPI Info hint is also worthtrying.


Mohamad

On Wed, May 30, 2012 at 1:32 PM, Mohamad Chaarawi [via hdf-forum]<[hidden email] </user/SendEmail.jtp?type=node&node=4023736&i=0>> wrote:


    Hi Yucong,

    On 5/30/2012 3:00 PM, Yucong Ye wrote:


    Ok, the total data size is constant, and I am dividing it to 4096
    parts no matter how many processes I use, so the dataset is fully
    read only with 4096 processes. If I am only using 16 processes,
    the dataset will only be read 16 parts out of 4096 parts.

    Does that clarify what I am doing here?


    ok I understand now.. thanks for clarifying this..
    But again, since you are reading more data as you scale, you will
    probably  get slower performance, especially if your selections
    for all processes are non-contiguous in file.
    The stripe size & count are also major issues you need to address
    as I mentioned in my previous email.

    Mohamad

    On May 30, 2012 12:49 PM, "Mohamad Chaarawi" <[hidden email]
    <http://user/SendEmail.jtp?type=node&node=4023424&i=0>> wrote:


        The selection of each process actually stays the same size
        since the region_count is not changing.


        Ok, let me understand this again:
        Your dataset size is constant (no matter what process count
        you execute with), and processes are reading parts of the
        dataset.
        When you are executing your program with say 16 processes, is
        your dataset being divided equally (to some extent) among the
        16 procs? When you increase your process count to 36, is the
        dataset being divided equally among 36 processes, meaning
        that the amount of data that a process reads decreases as you
        scale, since the file size is the same?
        If not, then this means you are reading parts of the dataset
        multiple times as you scale, which makes the performance
        degradation expected. This is like comparing the performance,
        in the serial case, of 1 read operation to n read operations.
        If yes, then move on to the second part..


        the result of running "lfs getstripe filename | grep stripe" is:

            lmm_stripe_count:   4
            lmm_stripe_size:    1048576
            lmm_stripe_offset:  286


        The stripe count is way too small for ~1 TB byte.. your
        system administrator should have some guidelines on what the
        stripe count and size should be for certain file sizes. I
        would check that, and readjust those parameters accordingly.

        Thanks,
        Mohamad


        Let me confirm with the second question.

        On Wed, May 30, 2012 at 11:01 AM, Mohamad Chaarawi [via
        hdf-forum] <[hidden email]
        <http://user/SendEmail.jtp?type=node&node=4023160&i=0>> wrote:

            Hi Yucong ,

            On 5/30/2012 12:33 PM, Yucong Ye wrote:


            The region_index changes according to the mpi rank
            while the region_count stays the same, which is 16,16,16.


            Ok, I just needed to make sure that the selections for
            each process are done such that it is compatible with
            scaling being done (as the number of processes increase,
            the selection of each process decreases accordingly)..
            The performance numbers you provided are indeed
            troubling, but it could be for several reasons, some being:

              * The stripe size & count of your file on Lustre could
                be too small. Although this is a read operation (no
                file locking is done by the OSTs), increasing the
                number of io processes puts too much burden on the
                OSTs. Could you check those 2 parameters of your
                file? you can do that by running this on the command
                line:
                  o lfs getstripe filename | grep stripe
              * The MPI-I/O implementation is not doing aggregation.
                If you are using ROMIO, two phase should do this for
                you which sets the default to the number of nodes
                (not processes). I would also try and increase the
                cb_buffer_size (default is 4MBs).

            Thanks,
            Mohamad

            On May 30, 2012 8:19 AM, "Mohamad Chaarawi" <[hidden
            email]
            <http://user/SendEmail.jtp?type=node&node=4023015&i=0>>
            wrote:

                Hi Chrisyeshi,

                Is the region_index & region_count the same on all
                processes? i.e. Are you just reading the same data
                on all processes?

                Mohamad

                On 5/29/2012 3:02 PM, chrisyeshi wrote:

                    Hi,

                    I am having trouble to read from a 721GB file
                    using 4096 nodes.
                    When I test with a few nodes, it works, but
                    when I test with more nodes, it
                    takes significantly more time.
                    What the test program does it only read in the
                    data and deleting it.
                    Here's the timing information:

                    Nodes    |    Time For Running Entire Program
                    16              4:28
                    32              6:55
                    64              8:56
                    128            11:22
                    256            13:25
                    512            15:34

                    768            28:34
                    800            29:04

                    I am running the program in a Cray XK6 system,
                    and the file system is Lustre

                    *There is a big gap after 512 nodes, and with
                    4096 nodes, it couldn't finish
                    in 6 hours.
                    Is this normal? Shouldn't it be a lot faster?*

                    Here is my reading function, it's similar to
                    the sample hdf5 parallel
                    program:

                    #include<hdf5.h>
                    #include<stdio.h>
                    #include<stdlib.h>
                    #include<assert.h>

                    void readData(const char* filename, int
                    region_index[3], int
                    region_count[3], float* flow_field[6])
                    {
                      char attributes[6][50];
                      sprintf(attributes[0], "/uvel");
                      sprintf(attributes[1], "/vvel");
                      sprintf(attributes[2], "/wvel");
                      sprintf(attributes[3], "/pressure");
                      sprintf(attributes[4], "/temp");
                      sprintf(attributes[5], "/OH");

                      herr_t status;
                      hid_t file_id;
                      hid_t dset_id;
                      hid_t dset_plist;
                      // open file spaces
                      hid_t acc_tpl = H5Pcreate(H5P_FILE_ACCESS);
                      status = H5Pset_fapl_mpio(acc_tpl,
                    MPI_COMM_WORLD, MPI_INFO_NULL);
                      file_id = H5Fopen(filename, H5F_ACC_RDONLY,
                    acc_tpl);
                      status = H5Pclose(acc_tpl);
                      for (int i = 0; i<  6; ++i)
                      {
                        // open dataset
                        dset_id = H5Dopen(file_id, attributes[i],
                    H5P_DEFAULT);

                        // get dataset space
                        hid_t spac_id = H5Dget_space(dset_id);
                        hsize_t htotal_size3[3];
                        status = H5Sget_simple_extent_dims(spac_id,
                    htotal_size3, NULL);
                        hsize_t region_size3[3] = {htotal_size3[0]
                    / region_count[0],
                       htotal_size3[1] / region_count[1],
                       htotal_size3[2] / region_count[2]};

                        // hyperslab
                        hsize_t start[3] = {region_index[0] *
                    region_size3[0],
                    region_index[1] * region_size3[1],
                    region_index[2] * region_size3[2]};
                        hsize_t count[3] = {region_size3[0],
                    region_size3[1], region_size3[2]};
                        status = H5Sselect_hyperslab(spac_id,
                    H5S_SELECT_SET, start, NULL,
                    count, NULL);
                        hid_t memspace = H5Screate_simple(3, count,
                    NULL);

                        // read
                        hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER);
                        status = H5Pset_dxpl_mpio(xfer_plist,
                    H5FD_MPIO_COLLECTIVE);

                        flow_field[i] = (float *) malloc(count[0] *
                    count[1] * count[2] *
                    sizeof(float));
                        status = H5Dread(dset_id, H5T_NATIVE_FLOAT,
                    memspace, spac_id,
                    xfer_plist, flow_field[i]);

                        // clean up
                        H5Dclose(dset_id);
                        H5Sclose(spac_id);
                        H5Pclose(xfer_plist);
                      }
                      H5Fclose(file_id);
                    }

                    *Do you see any problem with this function? I
                    am new to hdf5 parallel.*

                    Thanks in advance!

                    --
                    View this message in context:
                    
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429.html
                    Sent from the hdf-forum mailing list archive at
                    Nabble.com.

                    _______________________________________________
                    Hdf-forum is for HDF software users discussion.
                    [hidden email]
                    <http://user/SendEmail.jtp?type=node&node=4023015&i=1>
                    
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



                _______________________________________________
                Hdf-forum is for HDF software users discussion.
                [hidden email]
                <http://user/SendEmail.jtp?type=node&node=4023015&i=2>
                http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



            _______________________________________________
            Hdf-forum is for HDF software users discussion.
            [hidden email]  
<http://user/SendEmail.jtp?type=node&node=4023015&i=3>
            http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



            _______________________________________________
            Hdf-forum is for HDF software users discussion.
            [hidden email]
            <http://user/SendEmail.jtp?type=node&node=4023015&i=4>
            http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


            
------------------------------------------------------------------------
            If you reply to this email, your message will be added
            to the discussion below:
            
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023015.html

            To unsubscribe from Slow Reading 721GB File in Parallel,
            click here.
            NAML
            
<http://hdf-forum.184993.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




        ------------------------------------------------------------------------
        View this message in context: Re: Slow Reading 721GB File in
        Parallel
        
<http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023160.html>
        Sent from the hdf-forum mailing list archive
        <http://hdf-forum.184993.n3.nabble.com/> at Nabble.com.


        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [hidden email]  <http://user/SendEmail.jtp?type=node&node=4023424&i=1>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [hidden email]
        <http://user/SendEmail.jtp?type=node&node=4023424&i=2>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [hidden email]  <http://user/SendEmail.jtp?type=node&node=4023424&i=3>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [hidden email] <http://user/SendEmail.jtp?type=node&node=4023424&i=4>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


    ------------------------------------------------------------------------
    If you reply to this email, your message will be added to the
    discussion below:
    
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023424.html

    To unsubscribe from Slow Reading 721GB File in Parallel, click here.
    NAML
    
<http://hdf-forum.184993.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




------------------------------------------------------------------------

View this message in context: Re: Slow Reading 721GB File in Parallel<http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023736.html>Sent from the hdf-forum mailing list archive<http://hdf-forum.184993.n3.nabble.com/> at Nabble.com.



_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Slow Reading 721GB File in Parallel

Reply via email to