Let me confirm with the second question.
On Wed, May 30, 2012 at 11:01 AM, Mohamad Chaarawi [via hdf-forum]
<[hidden email] </user/SendEmail.jtp?type=node&node=4023160&i=0>> wrote:
Hi Yucong ,
On 5/30/2012 12:33 PM, Yucong Ye wrote:
The region_index changes according to the mpi rank while the
region_count stays the same, which is 16,16,16.
Ok, I just needed to make sure that the selections for each
process are done such that it is compatible with scaling being
done (as the number of processes increase, the selection of each
process decreases accordingly).. The performance numbers you
provided are indeed troubling, but it could be for several
reasons, some being:
* The stripe size & count of your file on Lustre could be too
small. Although this is a read operation (no file locking is
done by the OSTs), increasing the number of io processes puts
too much burden on the OSTs. Could you check those 2
parameters of your file? you can do that by running this on
the command line:
o lfs getstripe filename | grep stripe
* The MPI-I/O implementation is not doing aggregation. If you
are using ROMIO, two phase should do this for you which sets
the default to the number of nodes (not processes). I would
also try and increase the cb_buffer_size (default is 4MBs).
Thanks,
Mohamad
On May 30, 2012 8:19 AM, "Mohamad Chaarawi" <[hidden email]
<http://user/SendEmail.jtp?type=node&node=4023015&i=0>> wrote:
Hi Chrisyeshi,
Is the region_index & region_count the same on all processes?
i.e. Are you just reading the same data on all processes?
Mohamad
On 5/29/2012 3:02 PM, chrisyeshi wrote:
Hi,
I am having trouble to read from a 721GB file using 4096
nodes.
When I test with a few nodes, it works, but when I test
with more nodes, it
takes significantly more time.
What the test program does it only read in the data and
deleting it.
Here's the timing information:
Nodes | Time For Running Entire Program
16 4:28
32 6:55
64 8:56
128 11:22
256 13:25
512 15:34
768 28:34
800 29:04
I am running the program in a Cray XK6 system, and the
file system is Lustre
*There is a big gap after 512 nodes, and with 4096 nodes,
it couldn't finish
in 6 hours.
Is this normal? Shouldn't it be a lot faster?*
Here is my reading function, it's similar to the sample
hdf5 parallel
program:
#include<hdf5.h>
#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
void readData(const char* filename, int region_index[3], int
region_count[3], float* flow_field[6])
{
char attributes[6][50];
sprintf(attributes[0], "/uvel");
sprintf(attributes[1], "/vvel");
sprintf(attributes[2], "/wvel");
sprintf(attributes[3], "/pressure");
sprintf(attributes[4], "/temp");
sprintf(attributes[5], "/OH");
herr_t status;
hid_t file_id;
hid_t dset_id;
hid_t dset_plist;
// open file spaces
hid_t acc_tpl = H5Pcreate(H5P_FILE_ACCESS);
status = H5Pset_fapl_mpio(acc_tpl, MPI_COMM_WORLD,
MPI_INFO_NULL);
file_id = H5Fopen(filename, H5F_ACC_RDONLY, acc_tpl);
status = H5Pclose(acc_tpl);
for (int i = 0; i< 6; ++i)
{
// open dataset
dset_id = H5Dopen(file_id, attributes[i], H5P_DEFAULT);
// get dataset space
hid_t spac_id = H5Dget_space(dset_id);
hsize_t htotal_size3[3];
status = H5Sget_simple_extent_dims(spac_id,
htotal_size3, NULL);
hsize_t region_size3[3] = {htotal_size3[0] /
region_count[0],
htotal_size3[1] /
region_count[1],
htotal_size3[2] /
region_count[2]};
// hyperslab
hsize_t start[3] = {region_index[0] * region_size3[0],
region_index[1] * region_size3[1],
region_index[2] * region_size3[2]};
hsize_t count[3] = {region_size3[0], region_size3[1],
region_size3[2]};
status = H5Sselect_hyperslab(spac_id, H5S_SELECT_SET,
start, NULL,
count, NULL);
hid_t memspace = H5Screate_simple(3, count, NULL);
// read
hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER);
status = H5Pset_dxpl_mpio(xfer_plist,
H5FD_MPIO_COLLECTIVE);
flow_field[i] = (float *) malloc(count[0] * count[1]
* count[2] *
sizeof(float));
status = H5Dread(dset_id, H5T_NATIVE_FLOAT, memspace,
spac_id,
xfer_plist, flow_field[i]);
// clean up
H5Dclose(dset_id);
H5Sclose(spac_id);
H5Pclose(xfer_plist);
}
H5Fclose(file_id);
}
*Do you see any problem with this function? I am new to
hdf5 parallel.*
Thanks in advance!
--
View this message in context:
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429.html
Sent from the hdf-forum mailing list archive at Nabble.com.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
<http://user/SendEmail.jtp?type=node&node=4023015&i=1>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
<http://user/SendEmail.jtp?type=node&node=4023015&i=2>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________ Hdf-forum is for
HDF software users discussion.
[hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=3>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=4>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
------------------------------------------------------------------------
If you reply to this email, your message will be added to the
discussion below:
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023015.html
To unsubscribe from Slow Reading 721GB File in Parallel, click here.
NAML
<http://hdf-forum.184993.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
------------------------------------------------------------------------
View this message in context: Re: Slow Reading 721GB File in Parallel
<http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023160.html>
Sent from the hdf-forum mailing list archive
<http://hdf-forum.184993.n3.nabble.com/> at Nabble.com.
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org