On 5/30/2012 5:27 PM, chrisyeshi wrote:
The documentation of the system I am using only describes how to change the stripe size and stripe count. It doesn't provide guidelines about how many it should be. What would be a common stripe count and stripe size values for a ~1TB file?
I would go with the maximum available for the stripe count. You can try and experiment with the stripe size, maybe 32 MB would be good.. Increasing ROMIO's cb_buffer_size through an MPI Info hint is also worth trying.
Mohamad
On Wed, May 30, 2012 at 1:32 PM, Mohamad Chaarawi [via hdf-forum] <[hidden email] </user/SendEmail.jtp?type=node&node=4023736&i=0>> wrote:Hi Yucong, On 5/30/2012 3:00 PM, Yucong Ye wrote:Ok, the total data size is constant, and I am dividing it to 4096 parts no matter how many processes I use, so the dataset is fully read only with 4096 processes. If I am only using 16 processes, the dataset will only be read 16 parts out of 4096 parts. Does that clarify what I am doing here?ok I understand now.. thanks for clarifying this.. But again, since you are reading more data as you scale, you will probably get slower performance, especially if your selections for all processes are non-contiguous in file. The stripe size & count are also major issues you need to address as I mentioned in my previous email. MohamadOn May 30, 2012 12:49 PM, "Mohamad Chaarawi" <[hidden email] <http://user/SendEmail.jtp?type=node&node=4023424&i=0>> wrote:The selection of each process actually stays the same size since the region_count is not changing.Ok, let me understand this again: Your dataset size is constant (no matter what process count you execute with), and processes are reading parts of the dataset. When you are executing your program with say 16 processes, is your dataset being divided equally (to some extent) among the 16 procs? When you increase your process count to 36, is the dataset being divided equally among 36 processes, meaning that the amount of data that a process reads decreases as you scale, since the file size is the same? If not, then this means you are reading parts of the dataset multiple times as you scale, which makes the performance degradation expected. This is like comparing the performance, in the serial case, of 1 read operation to n read operations. If yes, then move on to the second part..the result of running "lfs getstripe filename | grep stripe" is: lmm_stripe_count: 4 lmm_stripe_size: 1048576 lmm_stripe_offset: 286The stripe count is way too small for ~1 TB byte.. your system administrator should have some guidelines on what the stripe count and size should be for certain file sizes. I would check that, and readjust those parameters accordingly. Thanks, MohamadLet me confirm with the second question. On Wed, May 30, 2012 at 11:01 AM, Mohamad Chaarawi [via hdf-forum] <[hidden email] <http://user/SendEmail.jtp?type=node&node=4023160&i=0>> wrote: Hi Yucong , On 5/30/2012 12:33 PM, Yucong Ye wrote:The region_index changes according to the mpi rank while the region_count stays the same, which is 16,16,16.Ok, I just needed to make sure that the selections for each process are done such that it is compatible with scaling being done (as the number of processes increase, the selection of each process decreases accordingly).. The performance numbers you provided are indeed troubling, but it could be for several reasons, some being: * The stripe size & count of your file on Lustre could be too small. Although this is a read operation (no file locking is done by the OSTs), increasing the number of io processes puts too much burden on the OSTs. Could you check those 2 parameters of your file? you can do that by running this on the command line: o lfs getstripe filename | grep stripe * The MPI-I/O implementation is not doing aggregation. If you are using ROMIO, two phase should do this for you which sets the default to the number of nodes (not processes). I would also try and increase the cb_buffer_size (default is 4MBs). Thanks, MohamadOn May 30, 2012 8:19 AM, "Mohamad Chaarawi" <[hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=0>> wrote: Hi Chrisyeshi, Is the region_index & region_count the same on all processes? i.e. Are you just reading the same data on all processes? Mohamad On 5/29/2012 3:02 PM, chrisyeshi wrote: Hi, I am having trouble to read from a 721GB file using 4096 nodes. When I test with a few nodes, it works, but when I test with more nodes, it takes significantly more time. What the test program does it only read in the data and deleting it. Here's the timing information: Nodes | Time For Running Entire Program 16 4:28 32 6:55 64 8:56 128 11:22 256 13:25 512 15:34 768 28:34 800 29:04 I am running the program in a Cray XK6 system, and the file system is Lustre *There is a big gap after 512 nodes, and with 4096 nodes, it couldn't finish in 6 hours. Is this normal? Shouldn't it be a lot faster?* Here is my reading function, it's similar to the sample hdf5 parallel program: #include<hdf5.h> #include<stdio.h> #include<stdlib.h> #include<assert.h> void readData(const char* filename, int region_index[3], int region_count[3], float* flow_field[6]) { char attributes[6][50]; sprintf(attributes[0], "/uvel"); sprintf(attributes[1], "/vvel"); sprintf(attributes[2], "/wvel"); sprintf(attributes[3], "/pressure"); sprintf(attributes[4], "/temp"); sprintf(attributes[5], "/OH"); herr_t status; hid_t file_id; hid_t dset_id; hid_t dset_plist; // open file spaces hid_t acc_tpl = H5Pcreate(H5P_FILE_ACCESS); status = H5Pset_fapl_mpio(acc_tpl, MPI_COMM_WORLD, MPI_INFO_NULL); file_id = H5Fopen(filename, H5F_ACC_RDONLY, acc_tpl); status = H5Pclose(acc_tpl); for (int i = 0; i< 6; ++i) { // open dataset dset_id = H5Dopen(file_id, attributes[i], H5P_DEFAULT); // get dataset space hid_t spac_id = H5Dget_space(dset_id); hsize_t htotal_size3[3]; status = H5Sget_simple_extent_dims(spac_id, htotal_size3, NULL); hsize_t region_size3[3] = {htotal_size3[0] / region_count[0], htotal_size3[1] / region_count[1], htotal_size3[2] / region_count[2]}; // hyperslab hsize_t start[3] = {region_index[0] * region_size3[0], region_index[1] * region_size3[1], region_index[2] * region_size3[2]}; hsize_t count[3] = {region_size3[0], region_size3[1], region_size3[2]}; status = H5Sselect_hyperslab(spac_id, H5S_SELECT_SET, start, NULL, count, NULL); hid_t memspace = H5Screate_simple(3, count, NULL); // read hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER); status = H5Pset_dxpl_mpio(xfer_plist, H5FD_MPIO_COLLECTIVE); flow_field[i] = (float *) malloc(count[0] * count[1] * count[2] * sizeof(float)); status = H5Dread(dset_id, H5T_NATIVE_FLOAT, memspace, spac_id, xfer_plist, flow_field[i]); // clean up H5Dclose(dset_id); H5Sclose(spac_id); H5Pclose(xfer_plist); } H5Fclose(file_id); } *Do you see any problem with this function? I am new to hdf5 parallel.* Thanks in advance! -- View this message in context: http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429.html Sent from the hdf-forum mailing list archive at Nabble.com. _______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=1> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=2> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=3> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org_______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023015&i=4> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org ------------------------------------------------------------------------ If you reply to this email, your message will be added to the discussion below: http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023015.html To unsubscribe from Slow Reading 721GB File in Parallel, click here. NAML <http://hdf-forum.184993.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ------------------------------------------------------------------------ View this message in context: Re: Slow Reading 721GB File in Parallel <http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023160.html> Sent from the hdf-forum mailing list archive <http://hdf-forum.184993.n3.nabble.com/> at Nabble.com. _______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023424&i=1> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org_______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023424&i=2> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023424&i=3> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org_______________________________________________ Hdf-forum is for HDF software users discussion. [hidden email] <http://user/SendEmail.jtp?type=node&node=4023424&i=4> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org ------------------------------------------------------------------------ If you reply to this email, your message will be added to the discussion below: http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023424.html To unsubscribe from Slow Reading 721GB File in Parallel, click here. NAML <http://hdf-forum.184993.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ------------------------------------------------------------------------View this message in context: Re: Slow Reading 721GB File in Parallel <http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023736.html> Sent from the hdf-forum mailing list archive <http://hdf-forum.184993.n3.nabble.com/> at Nabble.com._______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org