Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11308



The patch tested out just fine.

Once question I have:

when a client gets a NAK back from the server, it doesn't seem that it returns
EIO up the stack. Instead, the reply seems to get dropped on the floor and we
let the ptlrpc level timeouts hit before it is resent. With longer timeouts
(300s), this could make for a long time between us seeing the NAK & actually
resending the RPC>

What is the expected behavior here?

Example syslog traffic when "lctl --net ptl del_peer" run on an OST (nid00028)
while a dd was running on nid00007:
Dec 19 17:14:32 nid00028 kernel: Lustre:
3737:0:(ptllnd_rx_buf.c:569:kptllnd_rx_parse()) NAK [EMAIL PROTECTED]: no 
connection;
peer must reconnect 
Dec 19 17:14:32 nid00007 kernel: Lustre:
4015:0:(ptllnd_rx_buf.c:539:kptllnd_rx_parse()) NAK from [EMAIL PROTECTED] 
(ptlid:9-28) 
Dec 19 17:14:32 nid00007 kernel: Lustre: 4016:0:(router.c:184:lnet_notify())
Upcall: NID [EMAIL PROTECTED] is dead 
Dec 19 17:14:32 nid00007 kernel: Lustre:
4:0:(linux-debug.c:96:libcfs_run_upcall()) Invoked portals upcall
/usr/lib/lustre/lnet_upcall ROUTER_NOTIFY,[EMAIL PROTECTED],down,1166570069 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4890:0:(ldlm_lib.c:489:target_handle_reconnect()) ost_svc:
93f03b41-ebd5-4daa-8f75-eb981390f46e reconnecting 
Dec 19 17:14:43 nid00003 kernel: LustreError:
15069:0:(client.c:955:ptlrpc_expire_one_request()) @@@ timeout (sent at
1166570064, 15s ago) [out 1166570064.782602, in 0.000000] [EMAIL PROTECTED]
x2292/t0 o400->[EMAIL PROTECTED]:28 lens 64/64 ref 1 fl Rpc:N/0/0 rc 0/0 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4891:0:(filter.c:2985:filter_set_info_async()) ost_svc: received MDS connection
from [EMAIL PROTECTED] 
Dec 19 17:14:43 nid00003 kernel: LustreError: Connection to service ost_svc via
nid [EMAIL PROTECTED] was lost; in progress operations using this service will 
wait for
recovery to complete. 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4891:0:(filter.c:2985:filter_set_info_async()) previously skipped 3 similar
messages 
Dec 19 17:14:43 nid00003 kernel: Lustre: OSC_eelc0-0c0s0n3_ost_svc_mds_svc:
Connection restored to service ost_svc using nid [EMAIL PROTECTED] 
Dec 19 17:14:43 nid00003 kernel: Lustre:
15101:0:(mds_lov.c:530:__mds_lov_syncronize()) MDS mds_svc: ost_svc_UUID now
active, resetting orphans 
Dec 19 17:14:43 nid00003 kernel: Lustre:
15101:0:(mds_lov.c:530:__mds_lov_syncronize()) previously skipped 2 similar
messages 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4892:0:(recov_thread.c:580:llog_repl_connect()) llcd
0000010072c03000:00000100711da9c0 not empty 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4893:0:(filter.c:2364:filter_destroy_precreated()) ost_svc: deleting orphan
objects from 886857 to 886981 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4893:0:(filter.c:2364:filter_destroy_precreated()) previously skipped 3 similar
messages 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4989:0:(llog_cat.c:352:llog_cat_process_cb()) processing log 0x11050006:37fdedf6
at index 57 of catalog 0x11050002 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4989:0:(llog_cat.c:352:llog_cat_process_cb()) previously skipped 1 similar 
messages 
Dec 19 17:14:43 nid00028 kernel: Lustre:
4989:0:(filter_log.c:227:filter_recov_log_mds_ost_cb()) fetch generation log,
send cookie 
Dec 19 17:14:43 nid00028 kernel: Lustre: 4989:0:(llog.c:294:llog_process())
recovery from log: 0x11050004:c5056065 stopped

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to