On Nov 07, 2006  07:49 -0800, Zhe Zhang wrote:
> I am doing research in parallel computing and our cluster is using Lustre. I 
> currently found that if one OST is down, I can not read any part of the file, 
> regardless of the part is or is not located at the failed OST. 
> 
> One extreme example is when I have 3 OST's: ost1, ost2 and ost3, and one file 
> f is stored on those 3 nodes, beginning with ost1(I used #lfs setstripe 
> /mnt/lustre 65536 0 3). When I tried to read the very first character from f 
> with the fgetc(f), I found that my Lustre client still tries to read from all 
> the 3 OST's. And when I shutdown ost3, the fgetc(f) can not be 
> finished(program halts).
> 
> 
> So is it a normal thing or I misconfigured Lustre?  Thanks!

If you want to allow partial file access then you need to enable "failout"
of the OSTs.  Otherwise Lustre will block the access to the file until the
OST is restored, to avoid application errors during failover.  This can be
done always by adding "--failout" to the OST definition, or at runtime if
you know an OST is down for a long time and will not be recovered by:

        lctl --device {OST device number} deactivate

on the MDS and all of the clients (note the specific OST device number will
be different between the MDS and the clients).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

_______________________________________________
Lustre-devel mailing list
[EMAIL PROTECTED]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to