Am Dienstag, den 06.10.2009, 09:33 -0600 schrieb Andreas Dilger: > > ... bla bla ... > > Is there a reason why an immediate read after a write on the same node > > from/to a shared file is slow? Is there any additional communication, > > e.g. is the client flushing the buffer cache before the first read? The > > statistics show that the average time to complete a 1.44MB read request > > is increasing during the runtime of our program. At some point it hits > > an upper limit or a saturation point and stays there. Is there some kind > > of queue or something that is getting full in this kind of > > write/read-scenario? May tuneable some stuff in /proc/fs/luste? > > One possible issue is that you don't have enough extra RAM to cache 1.5GB > of the checkpoint, so during the write it is being flushed to the OSTs > and evicted from cache. When you immediately restart there is still dirty > data being written from the clients that is contending with the reads to > restart. > Cheers, Andreas
Well, I do call fsync() after the write is finished. During the write process I see a constant stream of 4 GB/s running from the lustre servers to the raid controllers which finishes when the write process terminates. When I start reading, there are no more writes going this way, so I suspect it might be something else ... Even if I wait between the writes and reads 5 minutes (all dirty pages should have been flushed by then) the picture does not change. Michael -- Michael Kluge, M.Sc. Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kl...@tu-dresden.de WWW: http://www.tu-dresden.de/zih
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss