In my experience lock callback timer expirations are often symptoms of network problems. Clients are either unable to deliver the expected lock cancellation or are unable to perform I/O under the lock (which will extend the timer). Are there any error messages that indicate communication failures between the evicted client and the server hosting demo-OST0002?
Chris Horn On 10/13/17, 1:44 PM, "lustre-discuss on behalf of John Casu" <lustre-discuss-boun...@lists.lustre.org on behalf of j...@chiraldynamics.com> wrote: client, server = 2.8.0, connected via 40GbE running IOR & trying to write large files (40TB/file) I get the follow in /var/log/messages on my client Oct 13 12:07:19 c3 kernel: Lustre: Evicted from demo-OST0002_UUID (at 10.55.100.20@tcp) after server handle changed from 0x3e6cc8dc71d19edb to 0x3e6cc8dc71d2130a Oct 13 12:07:19 c3 kernel: LustreError: 167-0: demo-OST0002-osc-ffff887f229fa800: This client was evicted by demo-OST0002; in progress operations using this service will fail. and the following on the oss: Oct 13 08:54:40 oss0 kernel: LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.55.100.31@tcp ns: filter-demo-OST0002_UUID lock: ffff88044e6cfe00/0x3e6cc8dc71d21112 lrc: 3/0,0 mode: PW/PW res: [0x275a:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4194303) flags: 0x60000400010020 nid: 10.55.100.31@tcp remote: 0xb696c0d49b95953c expref: 16214 pid: 109581 timeout: 8078643085 lvb_type: 0 wondering why the lock callback timer might expire. Only have 3 clients & pair of mds & pair of oss. _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org