Re: [Lustre-discuss] Problems with MDS Crashing
We have had another hang, but this time we had KVM access to the machine (and the screen blanker wasn't on). I took some screenshots, the first one is an error I got after reboot, the BMP one is what I saw when I first logged in to KVM, and the other ones are what I saw when trying to type 'root' - it started printing traces. http://amber.leeware.com/wi/lustre-death/ After reboot there was a command timeout message from RAID card. When hanged - "too little hardware resources". -- Andrew http://CloudAccess.net/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre client bug (?)
Hi, I'm not sure where I should report it but I couldn't find the error text in Google so I guess it's not in bug tracker yet. This appeared on CentOS 64-bit client under light traffic. Lustre 1.8.2 patchless client from Sun, Linux 2.6.28.10 #4 SMP, both without custom patches. I'm not sure what more details I could supply. mx1 kernel: LustreError: 20716:0:(statahead.c:149:ll_sai_entry_cleanup()) ASSERTION(list_empty(&entry->se_list)) failed Message from syslogd@ at Thu Apr 22 04:31:50 2010 ... mx1 kernel: LustreError: 20716:0:(statahead.c:149:ll_sai_entry_cleanup()) LBUG -- Andrzej Godziuk http://CloudAccess.net/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] DRBD + active/active OST, again
On Tue, Mar 2, 2010 at 2:31 PM, Johann Lombardi wrote: > On Tue, Mar 02, 2010 at 02:01:06PM +0100, Andrew Godziuk wrote: >> Then I guess this part of manual should be changed: ... >> to state explicitly that active/active scenario is only possible when >> OSS is active for some OSTs and passive for some others. > > Yes, i think this is explained in the next section: > "For OST failover, multiple OSS nodes are configured to be able to serve the > same OST. However, only one OSS node can serve the OST at a time. An OST can > be > moved between OSS nodes that have access to the same storage device using > umount/mount commands. " It sounded to me like contradiction and made me ask the question here. Now that I know, it sounds logical. > BTW, in your case, since you did not specify a failover node for the OST at > mkfs time, the lustre clients are not aware of the alternative path and thus > won't try to reach the OST through the 2nd OSS. So your filesystem should > still be safe since the 2nd mount instance should never receive any client > connection. However, I would still recommend to umount the OST on the 2nd > OSS asap. This was just a test setup, I'll be specifying --failover in the live setup for sure. Again, thank you very much for your help. -- Andrzej Godziuk http://CloudAccess.net/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] DRBD + active/active OST, again
Johann, Thank you for your detailed answer, this made the picture much more clear. Then I guess this part of manual should be changed: "The active/passive configuration is seldom used for OST servers as it doubles hardware costs without improving performance. On the other hand, an active/active cluster configuration can improve performance by serving and providing arbitrary failover protection to a number of OSTs." to state explicitly that active/active scenario is only possible when OSS is active for some OSTs and passive for some others. -- Andrzej Godziuk http://CloudAccess.net/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] DRBD + active/active OST, again
Hi, I know this topic has been discussed here many times, but all the messages seem to be about Lustre 1.6. Has anything changed in Lustre 1.8 that would make it possible to set up two OSS with an OST shared using DRBD, in an active-active configuration? I have mounted a shared OST on two OSS nodes, none of them marked with "--failover", and it looked as if it was working, but I didn't do any stress tests for reliablility. Does such setup ever have a chance to work for real or did it only look as if everything was OK? -- Andrzej Godziuk http://CloudAccess.net/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss