On Thu, Jun 18, 2009 at 9:48 PM, Isaac Huang<[email protected]> wrote: > On Thu, Jun 18, 2009 at 09:11:50PM -0400, Michael Di Domenico wrote: >> I cannot figure out what exactly has happened here and how to recover from >> it. >> >> Jun 18 21:02:52 node0-eth1 kernel: LustreError: >> 2722:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Error -104 reading >> HELLO from 192.168.0.248 >> Jun 18 21:02:52 node0-eth1 kernel: LustreError: 11b-b: Connection to >> 192.168.0....@tcp at host 192.168.0.248 on port 988 was reset: is it >> running a compatible version of Lustre and is 192.168.0....@tcp one of >> its NIDs? > > Lustre asked lnet to connect to 192.168.0....@tcp. > >> for some reason when i mount the OST on the above node it's trying to >> connect to itself on eth0, even though i have networks=tcp0(eth1) in >> my modprobe.conf and the NID is set to 192.168.1.248 >> >> Jun 18 21:02:52 node0-eth1 kernel: Lustre: Client data1-client has started >> Jun 18 21:02:52 node7-eth0 kernel: LustreError: 120-3: Refusing >> connection from 192.168.0.50 for 192.168.0....@tcp: No matching NI > > But the connection was rejected because the server didn't have > 192.168.0....@tcp as one of its NIDs. > > What was your mount command line? What does 'lctl list_nids' say on > the nodes?
list_nids show the right nid on all the nodes 192.168....@tcp 192.168.0.x does exist on all the nodes, but lustre shouldn't be trying to use it ever _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
