Hello,
We have modified an existing application to directly call libpvfs2. Our pvfs2 setup has 6 servers and is setup to run pvfs2 over OpenIB verbs. We borrowed the code more or less from pvfs2-cp. This seems to work and we have had several successful runs. However we have also had a couple of hangs on one node. The traceback for the hang is:

#0  0x00002ab9874a34bf in poll () from /lib/libc.so.6
#1  0x0000000001cbea67 in BMI_ib_testcontext ()
#2  0x0000000001c8feb4 in BMI_testcontext ()
#3  0x0000000001c99624 in PINT_thread_mgr_bmi_push ()
#4  0x0000000001c950d3 in do_one_work_cycle_all ()
#5  0x0000000001c95883 in job_testcontext ()
#6  0x0000000001ca37e4 in PINT_client_state_machine_test ()
#7  0x0000000001ca3c00 in PINT_client_wait_internal ()
#8  0x0000000001c7df71 in PVFS_sys_io ()
#9  0x0000000001c6e253 in flushBuffer ()
at /afs/.scl.ameslab.gov/project/nodeimg/amd64.test/usr/src/ gamess-pvfs/bypa
ssIO-pvfs.c:355
        #10 0x0000000005eb27b0 in userFilePos ()

Eventually we timeout and die. So the first question is do you have any suggestions as to where to look for the cause of the hang? That is a write, but I have seen it fail now during a read as well (it died on the 12th pass through after reading the complete file 11 times).

We also have several usage and/or tuning related questions. First off, when the file is created there are options for the "dfile_count" and the "strip_size". Thus far I have left them at defaults. Can you comment on what sort of values would be optimal for sequentially accessed large files. Would tuning the IO buffer size the application passes to the strip size be useful?

We have also have a problem when running on our IBM EHCA's with too many memory registrations. The odd part is that I am using the same 1MB buffer all time so I don't see why it seems to be reregistered at each write. My write code looks like this:

                file_req = PVFS_BYTE;
ret = PVFS_Request_contiguous(ioSize, PVFS_BYTE, &mem_req);
                if (ret < 0) {
                    PVFS_perror("PVFS_Request_contiguous", ret);
                    return;
                }
                ret = PVFS_sys_write(target_object.ref, file_req,
                    bufferedFilePos, myBuffer, mem_req,
                    &credentials, &resp_io);
                if (ret == 0) {
                     PVFS_Request_free(&mem_req);
            /*       return(resp_io.total_completed);*/
                } else
                    PVFS_perror("PVFS_sys_write", ret);

One question is what does PVFS_Request_contiguous actually do? Since I am using the same buffer all the time would it be ok to setup the request once and then reuse it so long as the io size is the same?

Thanks for any help you can provide,

Brett


____________________________________________
Dr. Brett Bode
329 Wilhelm Hall
Ames Laboratory
Iowa State University
Ames, IA 50011              (515) 294-9192
[EMAIL PROTECTED]  FAX: (515) 294-4491
____________________________________________



_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to