On Thu, Jun 18, 2009 at 9:48 PM, Isaac Huang<[email protected]> wrote:
> On Thu, Jun 18, 2009 at 09:11:50PM -0400, Michael Di Domenico wrote:
>> I cannot figure out what exactly has happened here and how to recover from 
>> it.
>>
>> Jun 18 21:02:52 node0-eth1 kernel: LustreError:
>> 2722:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Error -104 reading
>> HELLO from 192.168.0.248
>> Jun 18 21:02:52 node0-eth1 kernel: LustreError: 11b-b: Connection to
>> 192.168.0....@tcp at host 192.168.0.248 on port 988 was reset: is it
>> running a compatible version of Lustre and is 192.168.0....@tcp one of
>> its NIDs?
>
> Lustre asked lnet to connect to 192.168.0....@tcp.
>
>> for some reason when i mount the OST on the above node it's trying to
>> connect to itself on eth0, even though i have networks=tcp0(eth1) in
>> my modprobe.conf and the NID is set to 192.168.1.248
>>
>> Jun 18 21:02:52 node0-eth1 kernel: Lustre: Client data1-client has started
>> Jun 18 21:02:52 node7-eth0 kernel: LustreError: 120-3: Refusing
>> connection from 192.168.0.50 for 192.168.0....@tcp: No matching NI
>
> But the connection was rejected because the server didn't have
> 192.168.0....@tcp as one of its NIDs.
>
> What was your mount command line? What does 'lctl list_nids' say on
> the nodes?

list_nids show the right nid on all the nodes 192.168....@tcp

192.168.0.x does exist on all the nodes, but lustre shouldn't be
trying to use it ever
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to