Thanks for the patch.  I'll merge it in with my patch.

-Randy

On 2/7/13 11:25 AM, "Michael Robbert" <[email protected]> wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Randy,
>As long as you're working on IB patches. I just remembered that I had
>to apply a patch before I could get 2.8.7 to build on my CentOS 5
>machines running their stock IB stack.
>
>- --- src/io/bmi/bmi_ib/openib.c.orig     2013-01-10 15:47:52.000000000
>- -0700
>+++ src/io/bmi/bmi_ib/openib.c  2013-01-10 15:37:59.000000000 -0700
>@@ -745,7 +745,9 @@
> #ifdef HAVE_IBV_EVENT_CLIENT_REREGISTER
>        CASE(IBV_EVENT_CLIENT_REREGISTER);
> #endif
>+#ifdef HAVE_IBV_EVENT_GID_CHANGE
>        CASE(IBV_EVENT_GID_CHANGE);
>+#endif
>     }
>     return s;
> }
>
>The issue was brought up in a thread on this list last summer, but I
>never saw a final resolution and if there was one it apparently didn't
>make it into 2.8.7
>
>Thanks,
>Mike Robbert
>Colorado School of Mines
>
>On 2/7/13 8:20 AM, Randall Martin wrote:
>> I'm working on a set of patches for the IB support.  There are
>> several issues I'm working through on the patches before I commit
>> them.  I'll send you a copy when I have them ready for release so
>> you can test them.
>> 
>> 
>> -Randy
>> 
>> 
>> On 2/7/13 8:54 AM, "Yves Revaz" <[email protected]> wrote:
>> 
>>> On 10/18/2012 11:41 PM, Kyle Schochenmaier wrote:
>>>> Hi Yves -
>>>> 
>>>> How frequently do you see these warnings?  Does it cause any
>>>> servers/clients to hang?
>>> 
>>> Hi Kyle and the list,
>>> 
>>> In a previous mail, I was mentioning the following errors:
>>> 
>>> [E 02/07/2013 14:39:24] Warning: encourage_recv_incoming: mop_id
>>> d0e680 in RTS_DONE message not found. [E 02/07/2013 14:39:54]
>>> job_time_mgr_expire: job time out: cancelling flow operation,
>>> job_id: 17549115350. [E 02/07/2013 14:39:54]
>>> fp_multiqueue_cancel: flow proto cancel called on 0x1bce5e0 [E
>>> 02/07/2013 14:39:54] fp_multiqueue_cancel: I/O error occurred [E
>>> 02/07/2013 14:39:54] handle_io_error: flow proto error cleanup
>>> started on 0x1bce5e0: Operation cancelled (possibly due to
>>> timeout) [E 02/07/2013 14:39:54] handle_io_error: flow proto
>>> 0x1bce5e0 canceled 1 operations, will clean up. [E 02/07/2013
>>> 14:39:54] bmi_recv_callback_fn: I/O error occurred [E 02/07/2013
>>> 14:39:54] handle_io_error: flow proto 0x1bce5e0 error cleanup
>>> finished: Operation cancelled (possibly due to timeout)
>>> 
>>> In fact, I'm trying to move 10Tb of data in our pvfs, using and
>>> rsync. When a lot of data are transfered, those errors occurs
>>> very frequently, about every 5 minutes, which is very annoying.
>>> 
>>> I've checked our IB network which is perfectly sane. I'm
>>> currently using orangefs-2.8.6/. Should I move to 2.8.7 ? Looking
>>> at the changelog of the 2.8.7 realease, I don't thinks IB related
>>> problems have been fixed.
>>> 
>>> Thanks,
>>> 
>>> yves
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> If not common/destructive this could be that there was a simple
>>>> error case on the infiniband fabric and that the operation
>>>> timed out in pvfs and that can be readily ignored as it would
>>>> be retransmitted eventually.
>>>> 
>>>> If you see this a lot it may be one of a few issues that we've
>>>> fixed in recent releases, which version of orangefs/pvfs are
>>>> you using? ~Kyle
>>>> 
>>>> Kyle Schochenmaier
>>>> 
>>>> 
>>>> On Thu, Oct 18, 2012 at 4:31 PM, Becky
>>>> Ligon<[email protected]>  wrote:
>>>>> Yves:
>>>>> 
>>>>> The timeouts that you listed below are in the configuration
>>>>> file.
>>>>> 
>>>>> ClientJobBMITimeoutSecs 300 - The client's job scheduler
>>>>> limits each "job" sent across the network to this timeout.
>>>>> If the job exceeds this limit, the job is cancelled.
>>>>> Depending on the request, the job may be retried. Keep in
>>>>> mind that one PVFS request can be made up of many jobs.
>>>>> 
>>>>> ClientJobFlowTimeoutSecs - This value limits the time spent
>>>>> on a particular job called a flow.  A flow is used to
>>>>> transfer data across the network to a server or to transfer
>>>>> data from a server to the client.    Again, if the flow
>>>>> exceeds this timeout, then the flow is cancelled.
>>>>> 
>>>>> The server counterparts for these settings are rarely used,
>>>>> since the server doesn't normally initiate reads or writes.
>>>>> 
>>>>> I think your real problem has something to do with IB, but I
>>>>> am not an expert in that area.  I have cc'd Kyle
>>>>> Schochenmaier to see if he can help.
>>>>> 
>>>>> Becky
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Oct 18, 2012 at 4:07 PM, Yves
>>>>> Revaz<[email protected]>  wrote:
>>>>>> 
>>>>>> Dear list,
>>>>>> 
>>>>>> I sometimes have the following error occuring in my pvfs
>>>>>> server log.
>>>>>> 
>>>>>> [E 10/18/2012 20:59:50] Warning: encourage_recv_incoming:
>>>>>> mop_id 150c320 in RTS_DONE message not found. [E 10/18/2012
>>>>>> 21:00:50] job_time_mgr_expire: job time out: cancelling
>>>>>> flow operation, job_id: 33307291. [E 10/18/2012 21:00:50]
>>>>>> fp_multiqueue_cancel: flow proto cancel called on 0xf18c80
>>>>>> [E 10/18/2012 21:00:50] fp_multiqueue_cancel: I/O error
>>>>>> occurred [E 10/18/2012 21:00:50] handle_io_error: flow
>>>>>> proto error cleanup started on 0xf18c80: Operation
>>>>>> cancelled (possibly due to timeout) [E 10/18/2012 21:00:50]
>>>>>> handle_io_error: flow proto 0xf18c80 canceled 1 operations,
>>>>>> will clean up. [E 10/18/2012 21:00:50]
>>>>>> bmi_recv_callback_fn: I/O error occurred [E 10/18/2012
>>>>>> 21:00:50] handle_io_error: flow proto 0xf18c80 error
>>>>>> cleanup finished: Operation cancelled (possibly due to
>>>>>> time
>>>>>> 
>>>>>> 
>>>>>> Looking at the mailing list, I've found that increasing
>>>>>> these default value (300)
>>>>>> 
>>>>>> ServerJobBMITimeoutSecs 30 ServerJobFlowTimeoutSecs 30
>>>>>> ClientJobBMITimeoutSecs 300 ClientJobFlowTimeoutSecs 300
>>>>>> 
>>>>>> to 600.
>>>>>> 
>>>>>> What is at the origin of these  timeout ?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> 
>>>>>> yves
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- (o o) 
>>>>>> --------------------------------------------oOO--(_)--OOo-------
>>>>>>
>>>>>> 
>Dr. Yves Revaz
>>>>>> Laboratory of Astrophysics EPFL
>>>>>> 
>>>>>> Observatoire de Sauverny     Tel : ++ 41 22 379 24 28 51.
>>>>>> Ch. des Maillettes       Fax : ++ 41 22 379 22 05 1290
>>>>>> Sauverny             e-mail : [email protected]
>>>>>> SWITZERLAND                  Web :
>>>>>> http://www.lunix.ch/revaz/
>>>>>> ----------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> 
>_______________________________________________
>>>>>> Pvfs2-users mailing list
>>>>>> [email protected]
>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> 
>- --
>>>>> Becky Ligon OrangeFS Support and Development Omnibond
>>>>> Systems Anderson, South Carolina
>>>>> 
>>>>> 
>>> 
>>> 
>>> --
>>> 
>>> ----------------------------------------------------------------
>>> Dr. Yves Revaz Laboratory of Astrophysics Ecole Polytechnique
>>> F←d←rale de Lausanne (EPFL) Observatoire de Sauverny     Tel : ++
>>> 41 22 379 24 28 51. Ch. des Maillettes       Fax : ++ 41 22 379
>>> 22 05 1290 Sauverny             e-mail : [email protected]
>>> SWITZERLAND                  Web : http://www.lunix.ch/revaz/
>>> ----------------------------------------------------------------
>>> 
>>> _______________________________________________ Pvfs2-users
>>> mailing list [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> 
>> 
>> 
>> _______________________________________________ Pvfs2-users mailing
>> list [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> 
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>Comment: GPGTools - http://gpgtools.org
>
>iQEcBAEBAgAGBQJRE9VcAAoJEFmgPOBxQDtBEYMIAJtgo1LMWxVtyWPa2PNvWr2c
>NMUw30GNJ2llhwJVdefpmNqPLdou0Sqr7moAPseA2qYBguER1jqSH0rnXg7yE5TX
>CNERJwaL4+99y+tRsvKukrEvegrS/CQ5tUPsiuFaqqcTlQRGYeGPtqJV3JuAsEa2
>bu49sN7yWFtM2fY0ZaFa2ouya6PR2mFAdH0ZnpcWr4OTY1Uf4py8njWvvWrMCB/2
>I3//H5RoOxhCBIe85RCdXbMh4LMQbwBeTYFePlutE7YplbrQwDLg/K4/ctswRl3T
>oKpRy5GJ83LJQomhwWWjAAnWWXe6zNlbiGe/B5APrlgZfV960shxFPeWwej3EEk=
>=iXn7
>-----END PGP SIGNATURE-----
>_______________________________________________
>Pvfs2-users mailing list
>[email protected]
>http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to