Good Afternoon,

   I've been seeing a great deal of slowness from clients on an OPA network 
accessing lustre through lnet routers.  The nodes take very long to complete 
things like lfs df, and show lots of dropped / reestablished connections.  The 
OSS systems show this as well, and occasionally will report that all routes are 
down to a host on the omnipath fabric.  They also show large numbers of bulk 
callback errors.  The lnet router show large numbers of PUT_NACK messages, as 
well as Abort reconnection messages for nodes on the OPA fabric.

w/r, 
Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to