Re: [OpenAFS] Re: bonnie++ on OpenAFS
Hi Matt, On Nov 23, 2010, at 16:14 , Matt W. Benjamin wrote: > Is "write-on-close" was an expectation which could be broken? It is the case > that AFS has strongly expressed that its semantics are (in general) > _sync-on-close_, and it's meaning is that an application which has closed an > AFS file may consider any writes it made to be stable, and visible to other > clients. Write stability is not assured only on close, an application may > explicitly sync when convenient. I'm not sure, first, how a client can ever > have been assured that its writes were not stabilised before it closed its > corresponding file(s), nor, how it would benefit from this? For example, the > client may not revoke its own operations on the file prior to closing it. > Which is to say, I think the CM is free to stabilise ordinary writes when any > convenient opportunity arises. Feel free to flame now... as an AFS user, I never expected a guarantee that no data is actually written out to the fileserver ever before the file is closed on the client. After all, I'd like to be able to write files larger than the cache without having to close and reopen them in append mode. But it's a huge advantage of AFS over other filesystems that this is the usual case for files of modest size compared to the cache, because it helps avoiding fragmentation on the fileserver. Imagine 1500 batch jobs on 200 clients dribbling into 3000 files in the same volume: I don't think that today's typical backend filesystems to the namei fileserver will be able to limit fragmentation as much as in the write-on-close case. But as long as data is flushed in consecutive chunks as large as possible (or at least reasonably large if possible), it's perfectly ok to do it before the file is closed at least for our use case. - Stephan > > Matt > >> >> The problem is that we don't make good decisions when we decide to >> flush the cache. However, any change to flush items which are less >> active will be a behaviour change - in particular, on a multi-user >> system it would mean that one user could break write-on-close for >> other users simply by filling the cache. >> >> Cheers, >> >> Simon. -- Stephan Wiesand DESY -DV- Platanenenallee 6 15738 Zeuthen, Germany smime.p7s Description: S/MIME cryptographic signature
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Hi, "Matt W. Benjamin" writes: > Hi, > > Is "write-on-close" was an expectation which could be broken? It is > the case that AFS has strongly expressed that its semantics are (in > general) _sync-on-close_, and it's meaning is that an application > which has closed an AFS file may consider any writes it made to be > stable, and visible to other clients. Write stability is not assured > only on close, an application may explicitly sync when convenient. > I'm not sure, first, how a client can ever have been assured that its > writes were not stabilised before it closed its corresponding file(s), > nor, how it would benefit from this? For example, the client may not > revoke its own operations on the file prior to closing it. Which is > to say, I think the CM is free to stabilise ordinary writes when any > convenient opportunity arises. Feel free to flame now... My personal feeling is that the write-on-close semantic is only a guarantee to the writing client in terms of "you can only be guaranteed that your writes will be flushed when the file is closed". However I do not believe that there are any semantic rules prohibiting writes back to the server PRIOR to closing the file. One potential reason to delay writes to the server is that the client may over-write dirty blocks multiple times, so you don't want to necessarily push writes to the server at every write() operation. However I don't see any reason that we can't flush dirty chunks back in this situation. > Simon says: > >> The problem is that we don't make good decisions when we decide to >> flush the cache. However, any change to flush items which are less >> active will be a behaviour change - in particular, on a multi-user >> system it would mean that one user could break write-on-close for >> other users simply by filling the cache. I'd rather that write-on-close be "broken" (not that I see this as a problem) than the cache thrashing. Again, my feeling is that write-on-close isn't a guarantee that your data will NOT be written before you close(), but rather it's a guarantee that you cannot be assured that your data is written before you close(). Semantically these are two very distinct rules. Pre-flushing writes ahead of the close do NOT violate this second assurance. Just my $0.02. -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH warl...@mit.eduPGP key available ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On 23 Nov 2010, at 14:15, Andrew Deason wrote: > On Tue, 23 Nov 2010 11:23:03 + > Simon Wilkinson wrote: > >> We need a better solution to cache eviction. The problem is that, >> until very recently, we didn't have the means for one process to >> successfully flush files written by a different process. > > I'm not following you; why can the cache truncate daemon not be > triggered and waited for, like in normal cache shortage conditions? Because it doesn't do stores. More importantly, because it can't do stores, as it doesn't have access to the credentials that the file was written with. This is the key issue that I was alluding to earlier. The work that Marc has done means that, in master, we do now have access to this information on recent Linux kernels, where we use it to handle write back. We need to look at generalising this to other operating systems, and making use of it for cache eviction. S. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Hi, Is "write-on-close" was an expectation which could be broken? It is the case that AFS has strongly expressed that its semantics are (in general) _sync-on-close_, and it's meaning is that an application which has closed an AFS file may consider any writes it made to be stable, and visible to other clients. Write stability is not assured only on close, an application may explicitly sync when convenient. I'm not sure, first, how a client can ever have been assured that its writes were not stabilised before it closed its corresponding file(s), nor, how it would benefit from this? For example, the client may not revoke its own operations on the file prior to closing it. Which is to say, I think the CM is free to stabilise ordinary writes when any convenient opportunity arises. Feel free to flame now... Matt > > The problem is that we don't make good decisions when we decide to > flush the cache. However, any change to flush items which are less > active will be a behaviour change - in particular, on a multi-user > system it would mean that one user could break write-on-close for > other users simply by filling the cache. > > Cheers, > > Simon. > > ___ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Tue, Nov 23, 2010 at 9:49 AM, Marc Dionne wrote: > On Tue, Nov 23, 2010 at 9:15 AM, Andrew Deason wrote: >> On Tue, 23 Nov 2010 11:23:03 + >> Simon Wilkinson wrote: >> >>> We need a better solution to cache eviction. The problem is that, >>> until very recently, we didn't have the means for one process to >>> successfully flush files written by a different process. >> >> I'm not following you; why can the cache truncate daemon not be >> triggered and waited for, like in normal cache shortage conditions? > > a) I'm pretty sure the cache truncate daemon simply skips dirty chunks > and doesn't do any writeback to the server. > b) The truncate daemon only looks at cache usage, not at dirtiness. > So we can be above the threshold where doPartialWrite will insist on > writing back data (2/3 of cache chunks dirty), but the cache is still > well below the threshold where the truncate daemon will start to > shrink (95% chunks or 90% space I think) nothing precludes us from changing that, but yes, b) is correct and i'm pretty sure you're right about a) also. -- Derrick ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Tue, 23 Nov 2010 09:49:58 -0500 Marc Dionne wrote: > > I'm not following you; why can the cache truncate daemon not be > > triggered and waited for, like in normal cache shortage conditions? > > a) I'm pretty sure the cache truncate daemon simply skips dirty chunks > and doesn't do any writeback to the server. > b) The truncate daemon only looks at cache usage, not at dirtiness. > So we can be above the threshold where doPartialWrite will insist on > writing back data (2/3 of cache chunks dirty), but the cache is still > well below the threshold where the truncate daemon will start to > shrink (95% chunks or 90% space I think) Okay, so we're "full" of locally-written data. I was reading Simon's comments to mean that the cache is just >90% filled, but what he's (apparently) actually saying makes a lot more sense. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Tue, Nov 23, 2010 at 9:15 AM, Andrew Deason wrote: > On Tue, 23 Nov 2010 11:23:03 + > Simon Wilkinson wrote: > >> We need a better solution to cache eviction. The problem is that, >> until very recently, we didn't have the means for one process to >> successfully flush files written by a different process. > > I'm not following you; why can the cache truncate daemon not be > triggered and waited for, like in normal cache shortage conditions? a) I'm pretty sure the cache truncate daemon simply skips dirty chunks and doesn't do any writeback to the server. b) The truncate daemon only looks at cache usage, not at dirtiness. So we can be above the threshold where doPartialWrite will insist on writing back data (2/3 of cache chunks dirty), but the cache is still well below the threshold where the truncate daemon will start to shrink (95% chunks or 90% space I think) Marc ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Tue, Nov 23, 2010 at 6:23 AM, Simon Wilkinson wrote: > > > On 23 Nov 2010, at 11:02, Hartmut Reuter wrote: >> >> The problem here ist that afs_DoPartialWrite is called with each write. >> Normally it gets out without doing anything, but if the percentage of dirty >> chunks is to high it triggers a background store. > > On master, at last DoPartialWrite does an immediate store - the only place we > can do a background write is in response to a normal close request. > > In any case, this problem arises regardless of how we're storing the file. > The issue is that our cache eviction strategy picks the most recently > accessed chunk to evict, and then we dirty that chunk again immediately after > we've flushed it. > > We need a better solution to cache eviction. The problem is that, until very > recently, we didn't have the means for one process to successfully flush > files written by a different process. even at that, we did have the opportunity to try to evict earlier chunks, but things can now get better in head; (and 1.6) -- Derrick ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Tue, 23 Nov 2010 11:23:03 + Simon Wilkinson wrote: > We need a better solution to cache eviction. The problem is that, > until very recently, we didn't have the means for one process to > successfully flush files written by a different process. I'm not following you; why can the cache truncate daemon not be triggered and waited for, like in normal cache shortage conditions? -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On 23 Nov 2010, at 11:02, Hartmut Reuter wrote: > > The problem here ist that afs_DoPartialWrite is called with each write. > Normally it gets out without doing anything, but if the percentage of dirty > chunks is to high it triggers a background store. On master, at last DoPartialWrite does an immediate store - the only place we can do a background write is in response to a normal close request. In any case, this problem arises regardless of how we're storing the file. The issue is that our cache eviction strategy picks the most recently accessed chunk to evict, and then we dirty that chunk again immediately after we've flushed it. We need a better solution to cache eviction. The problem is that, until very recently, we didn't have the means for one process to successfully flush files written by a different process. S.___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Simon Wilkinson wrote: Yep, this is what's happening in the trace Achim provided, too. Every 4k we write the chunk. I'm not sure how that's possible unless something is closing the file a lot, or the cache is full of stuff we can't kick out. Actually, it's entirely possible. Here's how it all goes wrong... When the cache is full, every call to write results in us attempting to empty the cache. On Linux the page cache means that we only call write once for each 4k chunk. However, our attempts to empty the cache are a little pathetic. We just attempt to store all of the chunks of the file currently being written back to the fileserver. If it's a new file there is only one such chunk - the one that we are currently writing. As chunks are much larger than pages, and when a chunk is dirty we flush the whole thing to the server, this is why we see repeated writes of the same data. The process goes something like this: *) Write page at 0k, dirties first chunk of file. *) Discover cache is full, flush first chunk (0->1024k) to the file server *) Write page at 4k, dirties first chunk of file *) Cache is still full, flush first chunk to file server *) Write page at 8k, dirties first chunk of file ... and so on. The problem is that we don't make good decisions when we decide to flush the cache. However, any change to flush items which are less active will be a behaviour change - in particular, on a multi-user system it would mean that one user could break write-on-close for other users simply by filling the cache. The problem here ist that afs_DoPartialWrite is called with each write. Normally it gets out without doing anything, but if the percentage of dirty chunks is to high it triggers a background store. However, this can happen multiple times before the background job starts executing. Therefore I introduced in AFS/OSD a new flag bit CStoring which is switched on when the background task is submitted and switched off when it's done. And during that time no new background stores are scheduled for this file. Hartmut Cheers, Simon. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Yep, this is what's happening in the trace Achim provided, too. Every 4k we write the chunk. I'm not sure how that's possible unless something is closing the file a lot, or the cache is full of stuff we can't kick out. Actually, it's entirely possible. Here's how it all goes wrong... When the cache is full, every call to write results in us attempting to empty the cache. On Linux the page cache means that we only call write once for each 4k chunk. However, our attempts to empty the cache are a little pathetic. We just attempt to store all of the chunks of the file currently being written back to the fileserver. If it's a new file there is only one such chunk - the one that we are currently writing. As chunks are much larger than pages, and when a chunk is dirty we flush the whole thing to the server, this is why we see repeated writes of the same data. The process goes something like this: *) Write page at 0k, dirties first chunk of file. *) Discover cache is full, flush first chunk (0->1024k) to the file server *) Write page at 4k, dirties first chunk of file *) Cache is still full, flush first chunk to file server *) Write page at 8k, dirties first chunk of file ... and so on. The problem is that we don't make good decisions when we decide to flush the cache. However, any change to flush items which are less active will be a behaviour change - in particular, on a multi-user system it would mean that one user could break write-on-close for other users simply by filling the cache. Cheers, Simon. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Achim Gsell wrote: On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote: On 22 Nov 2010, at 23:06, Achim Gsell wrote: 3.) But if I first open 8 files and - after this is done - start writing to these files sequentially, the problem occurs. The difference to 1.) and 2.) is, that I have these 8 open files while the test is running. This simulates the "putc-test" of bonnie++ more or less: AFS is a write-on-close filesystem, so holding all of these files open means that it is trying really hard not to flush any data back to the fileserver. However, at some point the cache fills, and it has to start writing data back. In 1.4, we make some really bad choices about which data to write back, and so we end up thrashing the cache. With Marc Dionne's work in 1.5, we at least have the ability to make better choices, but nobody has really looked in detail at what happens when the cache fills, as the best solution is to avoid it happening in the first place! Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast then performance drops to< 3MB/s ... We are always using memcache with only 64 or 256 MB, but I have seen this problem, too. I think it's on the server side: Today's server have a lot of memory and the data are written into the buffers first. Only when the buffers rach the limit the operating system starts to really sync them out to the disks. And with this huge amount of buffers you regularly see for some time the performance going down. I suppose that during the sync the fileserver's writes are hanging. Hartmut So long Achim ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Mon, 22 Nov 2010 19:36:28 -0500 Marc Dionne wrote: > 2 - It can repeatedly flush out the same data to the server. I don't > understand exactly what's occurring in this particular case, but > looking at packet traces (I reproduced it here), I see the client > sending a series of overlapping ranges (0-4k, 0-8k... 0-1MB) to the > server. So on average at that point it is writing each 4k block 128 > times to the server... Yep, this is what's happening in the trace Achim provided, too. Every 4k we write the chunk. I'm not sure how that's possible unless something is closing the file a lot, or the cache is full of stuff we can't kick out. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Mon, Nov 22, 2010 at 6:56 PM, Achim Gsell wrote: > > On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote: > >> >> On 22 Nov 2010, at 23:06, Achim Gsell wrote: >>> >>> 3.) But if I first open 8 files and - after this is done - start writing to >>> these files sequentially, the problem occurs. The difference to 1.) and 2.) >>> is, that I have these 8 open files while the test is running. This >>> simulates the "putc-test" of bonnie++ more or less: >> >> AFS is a write-on-close filesystem, so holding all of these files open means >> that it is trying really hard not to flush any data back to the fileserver. >> However, at some point the cache fills, and it has to start writing data >> back. In 1.4, we make some really bad choices about which data to write >> back, and so we end up thrashing the cache. With Marc Dionne's work in 1.5, >> we at least have the ability to make better choices, but nobody has really >> looked in detail at what happens when the cache fills, as the best solution >> is to avoid it happening in the first place! > > Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB > disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast > then performance drops to < 3MB/s ... > > So long > > Achim Same question as Simon.. what's your memory size, and also what's your dirty background ratio (cat /proc/sys/vm/dirty_background_ratio)? Quite a bit of writing can occur before the VM decides to initiate writeback, so issues with the cache can show up later than one would think if there's a lot of memory and/or the dirty ratio is set high. The cache manager has 2 basic problems in this situation: 1 - It only tries to write back data for the file it's currently writing. Data from the earlier files is occupying most of the cache but won't be evicted. So it ends up spinning within a small section of cache. 2 - It can repeatedly flush out the same data to the server. I don't understand exactly what's occurring in this particular case, but looking at packet traces (I reproduced it here), I see the client sending a series of overlapping ranges (0-4k, 0-8k... 0-1MB) to the server. So on average at that point it is writing each 4k block 128 times to the server... Marc ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On 22 Nov 2010, at 23:56, Achim Gsell wrote: Sounds reasonable. But I have the same problem with a 9GB disk- cache, a 1GB disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast then performance drops to < 3MB/s ... How much memory does your test machine have? S. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote: > > On 22 Nov 2010, at 23:06, Achim Gsell wrote: >> >> 3.) But if I first open 8 files and - after this is done - start writing to >> these files sequentially, the problem occurs. The difference to 1.) and 2.) >> is, that I have these 8 open files while the test is running. This simulates >> the "putc-test" of bonnie++ more or less: > > AFS is a write-on-close filesystem, so holding all of these files open means > that it is trying really hard not to flush any data back to the fileserver. > However, at some point the cache fills, and it has to start writing data > back. In 1.4, we make some really bad choices about which data to write back, > and so we end up thrashing the cache. With Marc Dionne's work in 1.5, we at > least have the ability to make better choices, but nobody has really looked > in detail at what happens when the cache fills, as the best solution is to > avoid it happening in the first place! Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast then performance drops to < 3MB/s ... So long Achim ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On 22 Nov 2010, at 23:06, Achim Gsell wrote: > > 3.) But if I first open 8 files and - after this is done - start writing to > these files sequentially, the problem occurs. The difference to 1.) and 2.) > is, that I have these 8 open files while the test is running. This simulates > the "putc-test" of bonnie++ more or less: AFS is a write-on-close filesystem, so holding all of these files open means that it is trying really hard not to flush any data back to the fileserver. However, at some point the cache fills, and it has to start writing data back. In 1.4, we make some really bad choices about which data to write back, and so we end up thrashing the cache. With Marc Dionne's work in 1.5, we at least have the ability to make better choices, but nobody has really looked in detail at what happens when the cache fills, as the best solution is to avoid it happening in the first place! Cheers, Simon. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Nov 22, 2010, at 11:32 PM, Andrew Deason wrote: > On Mon, 22 Nov 2010 23:12:57 +0100 > Achim Gsell wrote: > Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems ... >>> >>> Are you sure the access pattern for that is the same as the bonnie++ >>> test that's running? >> >> OK, here is a simple shell script reproducing the problem: > > Er, I'm sorry, I'm not understanding something here. I thought you just > said that this is _not_ a problem when doing this in parallel with dd. Yes, and the script is working serial ... the parallel dd's are just an example, that I don't think it's trashing problem. 1.) This works: $ for i in 1 2 3 4 5 6 7 8; do dd if=/dev/zero of=file$i bs=1024k count=1024; done 2.) And this too: $ for i in 1 2 3 4 5 6 7 8; do dd if=/dev/zero of=file$i bs=1024k count=1024 & done 3.) But if I first open 8 files and - after this is done - start writing to these files sequentially, the problem occurs. The difference to 1.) and 2.) is, that I have these 8 open files while the test is running. This simulates the "putc-test" of bonnie++ more or less: fd1 = open("file1"); fd2 = open("file2"); fd3 = open("file3"); fd4 = open("file4"); fd5 = open("file5"); fd6 = open("file6"); fd7 = open("file7"); fd8 = open("file8"); write 1 GB to fd1 write 1 GB to fd2 write 1 GB to fd3 write 1 GB to fd4 write 1 GB to fd5 write 1 GB to fd6 write 1 GB to fd7 # usually here I run into the problem :-( write 1 GB to fd8 close (fd1); close (fd2); close (fd3); close (fd4); close (fd5); close (fd6); close (fd7); close (fd8); Achim___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Mon, 22 Nov 2010 23:12:57 +0100 Achim Gsell wrote: > >> Trashing? Mmh. I can write 8 1 GB in parallel with dd without > >> problems ... > > > > Are you sure the access pattern for that is the same as the bonnie++ > > test that's running? > > OK, here is a simple shell script reproducing the problem: Er, I'm sorry, I'm not understanding something here. I thought you just said that this is _not_ a problem when doing this in parallel with dd. If both bonnie++ and writing with dd give you the same problems, I would suspect that the cache is too small (and thrashing). I'm not saying it's not a bug if so, but could you just try this with a 10G disk cache and see if it still happens? -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Nov 22, 2010, at 8:15 PM, Andrew Deason wrote: > On Mon, 22 Nov 2010 20:01:31 +0100 > Achim Gsell wrote: > >>> If the latter (or regardless), I'd try increasing the size >>> of the cache; since you're seeing traffic across the network, I'd >>> suspect you're thrashing. >> >> Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems >> ... > > Are you sure the access pattern for that is the same as the bonnie++ > test that's running? OK, here is a simple shell script reproducing the problem: #!/bin/bash exec 4> 1 exec 5> 2 exec 6> 3 exec 7> 4 exec 8> 5 exec 9> 6 exec 10> 7 dd if=/dev/zero bs=1024k count=1024 1>&4 dd if=/dev/zero bs=1024k count=1024 1>&5 dd if=/dev/zero bs=1024k count=1024 1>&6 dd if=/dev/zero bs=1024k count=1024 1>&7 dd if=/dev/zero bs=1024k count=1024 1>&8 dd if=/dev/zero bs=1024k count=1024 1>&9 dd if=/dev/zero bs=1024k count=1024 1>&10 #EOF So, I guess, the trouble makers are the open file descriptors ... > I mean, if it's writing to random places in these files, for example, > your working set is somewhere closer to 8G, and you have a cache that is > 1G, well... bonnie++ writes sequentially to the each file, there is no random pattern. > >> I will store a network dump in my public AFS directory and tell you as >> soon as I have it - may take some time ... > > Before you do this, take a look at 'xstat_cm_test -collID 2 > -onceonly' (again before/after to be on the safe side), specifically at > the hits vs misses, and the numbers for FetchData, FetchStatus, and > StoreData, which will give some information about how you're using the > cache and how much you're hitting the server. > > (If you want to show the data here, putting it in public AFS or a > pastebin may be nice to the other list denizens) /afs/psi.ch/user/g/gsell/public/dd2afs.tcpdump Achim ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Mon, 22 Nov 2010 20:01:31 +0100 Achim Gsell wrote: > > If the latter (or regardless), I'd try increasing the size > > of the cache; since you're seeing traffic across the network, I'd > > suspect you're thrashing. > > Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems > ... Are you sure the access pattern for that is the same as the bonnie++ test that's running? I mean, if it's writing to random places in these files, for example, your working set is somewhere closer to 8G, and you have a cache that is 1G, well... > I will store a network dump in my public AFS directory and tell you as > soon as I have it - may take some time ... Before you do this, take a look at 'xstat_cm_test -collID 2 -onceonly' (again before/after to be on the safe side), specifically at the hits vs misses, and the numbers for FetchData, FetchStatus, and StoreData, which will give some information about how you're using the cache and how much you're hitting the server. (If you want to show the data here, putting it in public AFS or a pastebin may be nice to the other list denizens) -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Nov 22, 2010, at 7:51 PM, Andrew Deason wrote: > On Mon, 22 Nov 2010 18:53:26 +0100 > Achim Gsell wrote: > >>> Some stats might help say what's wrong. 'rxdebug 7001 >>> -rxstats' and 'rxdebug -rxstats' before and after. Also >>> 'xstat_fs_test -collID 3 -onceonly' before and after, but I >>> don't think that's it. >> >> Output of commands is attached. > > client spurious counter seems high, but maybe it's just higher than I'm > used to because of the load here. I'm not aware of anything recent > related to handling RX call numbers, but if someone would like to > correct me... > > About how much data and time had gone by between the 'before' and > 'after' invocations, would you say? ~6GB; less then 3 minutes. Achim > > -- > Andrew Deason > adea...@sinenomine.net > > ___ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Nov 22, 2010, at 7:11 PM, Andrew Deason wrote: > On Mon, 22 Nov 2010 18:00:41 +0100 > Achim Gsell wrote: > >> Hi, >> >> bonnie++ on OpenAFS with sizes above 8192 behaves strange on my Linux >> systems: after writing up to 5-7 files with 1GB each, write >> performance drops from 50-60 MB/s to around 3MB/s. > > Is this writing to several files sequentially? Or do you mean bonnie++ > writes has about 5-7 files open at the same time, and has written about > 1G into each? I mean the latter. If you start bonnie++ with -s 8192 it opens 8 files and starts writing sequentially to these files: the first file will be filled with 1 GB then the second and so on. > If the latter (or regardless), I'd try increasing the size > of the cache; since you're seeing traffic across the network, I'd > suspect you're thrashing. Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems ... > If you feel like sharing a network dump of the > traffic when the problem occurs (probably not to the list), we could > tell you what it all is. I will store a network dump in my public AFS directory and tell you as soon as I have it - may take some time ... Achim___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Mon, 22 Nov 2010 18:53:26 +0100 Achim Gsell wrote: > > Some stats might help say what's wrong. 'rxdebug 7001 > > -rxstats' and 'rxdebug -rxstats' before and after. Also > > 'xstat_fs_test -collID 3 -onceonly' before and after, but I > > don't think that's it. > > Output of commands is attached. client spurious counter seems high, but maybe it's just higher than I'm used to because of the load here. I'm not aware of anything recent related to handling RX call numbers, but if someone would like to correct me... About how much data and time had gone by between the 'before' and 'after' invocations, would you say? -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: bonnie++ on OpenAFS
On Mon, 22 Nov 2010 18:00:41 +0100 Achim Gsell wrote: > Hi, > > bonnie++ on OpenAFS with sizes above 8192 behaves strange on my Linux > systems: after writing up to 5-7 files with 1GB each, write > performance drops from 50-60 MB/s to around 3MB/s. Is this writing to several files sequentially? Or do you mean bonnie++ writes has about 5-7 files open at the same time, and has written about 1G into each? If the latter (or regardless), I'd try increasing the size of the cache; since you're seeing traffic across the network, I'd suspect you're thrashing. If you feel like sharing a network dump of the traffic when the problem occurs (probably not to the list), we could tell you what it all is. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
On Nov 22, 2010, at 6:14 PM, Andrew Deason wrote: > On Mon, 22 Nov 2010 18:00:41 +0100 > Achim Gsell wrote: > >> Cache type:memory > > I'm not saying this is a problem, but does the same thing happen with a > disk cache? Same problem ... > > Some stats might help say what's wrong. 'rxdebug 7001 -rxstats' > and 'rxdebug -rxstats' before and after. Also 'xstat_fs_test > -collID 3 -onceonly' before and after, but I don't think that's > it. Output of commands is attached. > >> Server: >> Scientific Linux 5, kernel 2.6.18-194.17.4.el5 >> OpenAFS 1.4.12.1 >> >> Server and clients are test-systems without any other load. > > Server and client parameters? /usr/vice/etc/afsd -cachedir /var/cache/openafs -blocks 1048576 -rootvol root.afs -mountdir /afs -afsdb -dynroot -fakestat-all -nosettime -chunksize 18 [-memcache] /fileserver -p 128 -b 512 -l 2048 -s 2048 -vc 2048 -cb 512000 -rxpck 2048 -sendsize 2097152 -udpsize 2097152 -syslog -d 0 -auditlog /var/opt/openafs-server/fileserver.audit -allow-dotted-principals So long Achim rxdebug-client.after Description: Binary data rxdebug-client.before Description: Binary data rxdebug-server.after Description: Binary data rxdebug-server.before Description: Binary data xstat_fs_test.after Description: Binary data xstat_fs_test.before Description: Binary data
[OpenAFS] Re: bonnie++ on OpenAFS
On Mon, 22 Nov 2010 18:00:41 +0100 Achim Gsell wrote: > Cache type:memory I'm not saying this is a problem, but does the same thing happen with a disk cache? Some stats might help say what's wrong. 'rxdebug 7001 -rxstats' and 'rxdebug -rxstats' before and after. Also 'xstat_fs_test -collID 3 -onceonly' before and after, but I don't think that's it. > Server: > Scientific Linux 5, kernel 2.6.18-194.17.4.el5 > OpenAFS 1.4.12.1 > > Server and clients are test-systems without any other load. Server and client parameters? -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info