Yves: The timeouts that you listed below are in the configuration file.
ClientJobBMITimeoutSecs 300 - The client's job scheduler limits each "job" sent across the network to this timeout. If the job exceeds this limit, the job is cancelled. Depending on the request, the job may be retried. Keep in mind that one PVFS request can be made up of many jobs. ClientJobFlowTimeoutSecs - This value limits the time spent on a particular job called a flow. A flow is used to transfer data across the network to a server or to transfer data from a server to the client. Again, if the flow exceeds this timeout, then the flow is cancelled. The server counterparts for these settings are rarely used, since the server doesn't normally initiate reads or writes. I think your real problem has something to do with IB, but I am not an expert in that area. I have cc'd Kyle Schochenmaier to see if he can help. Becky On Thu, Oct 18, 2012 at 4:07 PM, Yves Revaz <[email protected]> wrote: > > Dear list, > > I sometimes have the following error occuring in my pvfs server log. > > [E 10/18/2012 20:59:50] Warning: encourage_recv_incoming: mop_id 150c320 > in RTS_DONE message not found. > [E 10/18/2012 21:00:50] job_time_mgr_expire: job time out: cancelling flow > operation, job_id: 33307291. > [E 10/18/2012 21:00:50] fp_multiqueue_cancel: flow proto cancel called on > 0xf18c80 > [E 10/18/2012 21:00:50] fp_multiqueue_cancel: I/O error occurred > [E 10/18/2012 21:00:50] handle_io_error: flow proto error cleanup started > on 0xf18c80: Operation cancelled (possibly due to timeout) > [E 10/18/2012 21:00:50] handle_io_error: flow proto 0xf18c80 canceled 1 > operations, will clean up. > [E 10/18/2012 21:00:50] bmi_recv_callback_fn: I/O error occurred > [E 10/18/2012 21:00:50] handle_io_error: flow proto 0xf18c80 error cleanup > finished: Operation cancelled (possibly due to time > > > Looking at the mailing list, I've found that increasing these default > value (300) > > ServerJobBMITimeoutSecs 30 > ServerJobFlowTimeoutSecs 30 > ClientJobBMITimeoutSecs 300 > ClientJobFlowTimeoutSecs 300 > > to 600. > > What is at the origin of these timeout ? > > Thanks, > > > yves > > > > > > -- > (o o) > ------------------------------**--------------oOO--(_)--OOo---**---- > Dr. Yves Revaz > Laboratory of Astrophysics EPFL > > Observatoire de Sauverny Tel : ++ 41 22 379 24 28 > 51. Ch. des Maillettes Fax : ++ 41 22 379 22 05 > 1290 Sauverny e-mail : [email protected] > SWITZERLAND Web : http://www.lunix.ch/revaz/ > ------------------------------**------------------------------**---- > > ______________________________**_________________ > Pvfs2-users mailing list > Pvfs2-users@beowulf-**underground.org<[email protected]> > http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users> > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
