Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Stephan Wiesand
Hi Matt,

On Nov 23, 2010, at 16:14 , Matt W. Benjamin wrote:

> Is "write-on-close" was an expectation which could be broken?  It is the case 
> that AFS has strongly expressed that its semantics are (in general) 
> _sync-on-close_, and it's meaning is that an application which has closed an 
> AFS file may consider any writes it made to be stable, and visible to other 
> clients.  Write stability is not assured only on close, an application may 
> explicitly sync when convenient.  I'm not sure, first, how a client can ever 
> have been assured that its writes were not stabilised before it closed its 
> corresponding file(s), nor, how it would benefit from this?  For example, the 
> client may not revoke its own operations on the file prior to closing it.  
> Which is to say, I think the CM is free to stabilise ordinary writes when any 
> convenient opportunity arises.  Feel free to flame now...

as an AFS user, I never expected a guarantee that no data is actually written 
out to the fileserver ever before the file is closed on the client. After all, 
I'd like to be able to write files larger than the cache without having to 
close and reopen them in append mode.

But it's a huge advantage of AFS over other filesystems that this is the usual 
case for files of modest size compared to the cache, because it helps avoiding 
fragmentation on the fileserver. Imagine 1500 batch jobs on 200 clients 
dribbling into 3000 files in the same volume: I don't think that today's 
typical backend filesystems to the namei fileserver will be able to limit 
fragmentation as much as in the write-on-close case.

But as long as data is flushed in consecutive chunks as large as possible (or 
at least reasonably large if possible), it's perfectly ok to do it before the 
file is closed at least for our use case.

- Stephan
> 
> Matt
> 
>> 
>> The problem is that we don't make good decisions when we decide to  
>> flush the cache. However, any change to flush items which are less  
>> active will be a behaviour change - in particular, on a multi-user  
>> system it would mean that one user could break write-on-close for  
>> other users simply by filling the cache.
>> 
>> Cheers,
>> 
>> Simon.

-- 
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany



smime.p7s
Description: S/MIME cryptographic signature


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Derek Atkins
Hi,

"Matt W. Benjamin"  writes:

> Hi,
>
> Is "write-on-close" was an expectation which could be broken?  It is
> the case that AFS has strongly expressed that its semantics are (in
> general) _sync-on-close_, and it's meaning is that an application
> which has closed an AFS file may consider any writes it made to be
> stable, and visible to other clients.  Write stability is not assured
> only on close, an application may explicitly sync when convenient.
> I'm not sure, first, how a client can ever have been assured that its
> writes were not stabilised before it closed its corresponding file(s),
> nor, how it would benefit from this?  For example, the client may not
> revoke its own operations on the file prior to closing it.  Which is
> to say, I think the CM is free to stabilise ordinary writes when any
> convenient opportunity arises.  Feel free to flame now...

My personal feeling is that the write-on-close semantic is only a
guarantee to the writing client in terms of "you can only be guaranteed
that your writes will be flushed when the file is closed".  However I do
not believe that there are any semantic rules prohibiting writes back to
the server PRIOR to closing the file.

One potential reason to delay writes to the server is that the client
may over-write dirty blocks multiple times, so you don't want to
necessarily push writes to the server at every write() operation.
However I don't see any reason that we can't flush dirty chunks back in
this situation.

> Simon says:
>
>> The problem is that we don't make good decisions when we decide to  
>> flush the cache. However, any change to flush items which are less  
>> active will be a behaviour change - in particular, on a multi-user  
>> system it would mean that one user could break write-on-close for  
>> other users simply by filling the cache.

I'd rather that write-on-close be "broken" (not that I see this as a
problem) than the cache thrashing.  Again, my feeling is that
write-on-close isn't a guarantee that your data will NOT be written
before you close(), but rather it's a guarantee that you cannot be
assured that your data is written before you close().  Semantically
these are two very distinct rules.

Pre-flushing writes ahead of the close do NOT violate this second
assurance.

Just my $0.02.

-derek

-- 
   Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
   Member, MIT Student Information Processing Board  (SIPB)
   URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
   warl...@mit.eduPGP key available
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Simon Wilkinson
On 23 Nov 2010, at 14:15, Andrew Deason  wrote:

> On Tue, 23 Nov 2010 11:23:03 +
> Simon Wilkinson  wrote:
> 
>> We need a better solution to cache eviction. The problem is that,
>> until very recently, we didn't have the means for one process to
>> successfully flush files written by a different process.
> 
> I'm not following you; why can the cache truncate daemon not be
> triggered and waited for, like in normal cache shortage conditions?

Because it doesn't do stores. More importantly, because it can't do stores, as 
it doesn't have access to the credentials that the file was written with. This 
is the key issue that I was alluding to earlier. The work that Marc has done 
means that, in master, we do now have access to this information on recent 
Linux kernels, where we use it to handle write back. We need to look at 
generalising this to other operating systems, and making use of it for cache 
eviction.

S.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Matt W. Benjamin
Hi,

Is "write-on-close" was an expectation which could be broken?  It is the case 
that AFS has strongly expressed that its semantics are (in general) 
_sync-on-close_, and it's meaning is that an application which has closed an 
AFS file may consider any writes it made to be stable, and visible to other 
clients.  Write stability is not assured only on close, an application may 
explicitly sync when convenient.  I'm not sure, first, how a client can ever 
have been assured that its writes were not stabilised before it closed its 
corresponding file(s), nor, how it would benefit from this?  For example, the 
client may not revoke its own operations on the file prior to closing it.  
Which is to say, I think the CM is free to stabilise ordinary writes when any 
convenient opportunity arises.  Feel free to flame now...

Matt

> 
> The problem is that we don't make good decisions when we decide to  
> flush the cache. However, any change to flush items which are less  
> active will be a behaviour change - in particular, on a multi-user  
> system it would mean that one user could break write-on-close for  
> other users simply by filling the cache.
> 
> Cheers,
> 
> Simon.
> 
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

-- 

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Derrick Brashear
On Tue, Nov 23, 2010 at 9:49 AM, Marc Dionne  wrote:
> On Tue, Nov 23, 2010 at 9:15 AM, Andrew Deason  wrote:
>> On Tue, 23 Nov 2010 11:23:03 +
>> Simon Wilkinson  wrote:
>>
>>> We need a better solution to cache eviction. The problem is that,
>>> until very recently, we didn't have the means for one process to
>>> successfully flush files written by a different process.
>>
>> I'm not following you; why can the cache truncate daemon not be
>> triggered and waited for, like in normal cache shortage conditions?
>
> a) I'm pretty sure the cache truncate daemon simply skips dirty chunks
> and doesn't do any writeback to the server.
> b) The truncate daemon only looks at cache usage, not at dirtiness.
> So we can be above the threshold where doPartialWrite will insist on
> writing back data (2/3 of cache chunks dirty), but the cache is still
> well below the threshold where the truncate daemon will start to
> shrink (95% chunks or 90% space I think)

nothing precludes us from changing that, but yes, b) is correct and
i'm pretty sure you're right about
a) also.


-- 
Derrick
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Andrew Deason
On Tue, 23 Nov 2010 09:49:58 -0500
Marc Dionne  wrote:

> > I'm not following you; why can the cache truncate daemon not be
> > triggered and waited for, like in normal cache shortage conditions?
> 
> a) I'm pretty sure the cache truncate daemon simply skips dirty chunks
> and doesn't do any writeback to the server.
> b) The truncate daemon only looks at cache usage, not at dirtiness.
> So we can be above the threshold where doPartialWrite will insist on
> writing back data (2/3 of cache chunks dirty), but the cache is still
> well below the threshold where the truncate daemon will start to
> shrink (95% chunks or 90% space I think)

Okay, so we're "full" of locally-written data. I was reading Simon's
comments to mean that the cache is just >90% filled, but what he's
(apparently) actually saying makes a lot more sense.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Marc Dionne
On Tue, Nov 23, 2010 at 9:15 AM, Andrew Deason  wrote:
> On Tue, 23 Nov 2010 11:23:03 +
> Simon Wilkinson  wrote:
>
>> We need a better solution to cache eviction. The problem is that,
>> until very recently, we didn't have the means for one process to
>> successfully flush files written by a different process.
>
> I'm not following you; why can the cache truncate daemon not be
> triggered and waited for, like in normal cache shortage conditions?

a) I'm pretty sure the cache truncate daemon simply skips dirty chunks
and doesn't do any writeback to the server.
b) The truncate daemon only looks at cache usage, not at dirtiness.
So we can be above the threshold where doPartialWrite will insist on
writing back data (2/3 of cache chunks dirty), but the cache is still
well below the threshold where the truncate daemon will start to
shrink (95% chunks or 90% space I think)

Marc
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Derrick Brashear
On Tue, Nov 23, 2010 at 6:23 AM, Simon Wilkinson  wrote:
>
>
> On 23 Nov 2010, at 11:02, Hartmut Reuter  wrote:
>>
>> The problem here ist that afs_DoPartialWrite is called with each write. 
>> Normally it gets out without doing anything, but if the percentage of dirty 
>> chunks is to high it triggers a background store.
>
> On master, at last DoPartialWrite does an immediate store - the only place we 
> can do a background write is in response to a normal close request.
>
> In any case, this problem arises regardless of how we're storing the file. 
> The issue is that our cache eviction strategy picks the most recently 
> accessed chunk to evict, and then we dirty that chunk again immediately after 
> we've flushed it.
>
> We need a better solution to cache eviction. The problem is that, until very 
> recently, we didn't have the means for one process to successfully flush 
> files written by a different process.

even at that, we did have the opportunity to try to evict earlier
chunks, but things can now get better in head; (and 1.6)



-- 
Derrick
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Andrew Deason
On Tue, 23 Nov 2010 11:23:03 +
Simon Wilkinson  wrote:

> We need a better solution to cache eviction. The problem is that,
> until very recently, we didn't have the means for one process to
> successfully flush files written by a different process.

I'm not following you; why can the cache truncate daemon not be
triggered and waited for, like in normal cache shortage conditions?

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Simon Wilkinson


On 23 Nov 2010, at 11:02, Hartmut Reuter  wrote:
> 
> The problem here ist that afs_DoPartialWrite is called with each write. 
> Normally it gets out without doing anything, but if the percentage of dirty 
> chunks is to high it triggers a background store.

On master, at last DoPartialWrite does an immediate store - the only place we 
can do a background write is in response to a normal close request.

In any case, this problem arises regardless of how we're storing the file. The 
issue is that our cache eviction strategy picks the most recently accessed 
chunk to evict, and then we dirty that chunk again immediately after we've 
flushed it.

We need a better solution to cache eviction. The problem is that, until very 
recently, we didn't have the means for one process to successfully flush files 
written by a different process.

S.___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Hartmut Reuter

Simon Wilkinson wrote:



Yep, this is what's happening in the trace Achim provided, too. Every 4k
we write the chunk. I'm not sure how that's possible unless something is
closing the file a lot, or the cache is full of stuff we can't kick out.



Actually, it's entirely possible. Here's how it all goes wrong...

When the cache is full, every call to write results in us attempting to
empty the cache. On Linux the page cache means that we only call write
once for each 4k chunk. However, our attempts to empty the cache are a
little pathetic. We just attempt to store all of the chunks of the file
currently being written back to the fileserver. If it's a new file there
is only one such chunk - the one that we are currently writing. As
chunks are much larger than pages, and when a chunk is dirty we flush
the whole thing to the server, this is why we see repeated writes of the
same data. The process goes something like this:

*) Write page at 0k, dirties first chunk of file.
*) Discover cache is full, flush first chunk (0->1024k) to the file server
*) Write page at 4k, dirties first chunk of file
*) Cache is still full, flush first chunk to file server
*) Write page at 8k, dirties first chunk of file

... and so on.

The problem is that we don't make good decisions when we decide to flush
the cache. However, any change to flush items which are less active will
be a behaviour change - in particular, on a multi-user system it would
mean that one user could break write-on-close for other users simply by
filling the cache.


The problem here ist that afs_DoPartialWrite is called with each write. Normally 
it gets out without doing anything, but if the percentage of dirty chunks is to 
high it triggers a background store. However, this can happen multiple times 
before the background job starts executing. Therefore I introduced in AFS/OSD a 
new flag bit CStoring which is switched on when the background task is submitted 
and switched off when it's done. And during that time no new background stores 
are scheduled for this file.


Hartmut


Cheers,

Simon.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Simon Wilkinson


Yep, this is what's happening in the trace Achim provided, too.  
Every 4k
we write the chunk. I'm not sure how that's possible unless  
something is
closing the file a lot, or the cache is full of stuff we can't kick  
out.



Actually, it's entirely possible. Here's how it all goes wrong...

When the cache is full, every call to write results in us attempting  
to empty the cache. On Linux the page cache means that we only call  
write once for each 4k chunk. However, our attempts to empty the cache  
are a little pathetic. We just attempt to store all of the chunks of  
the file currently being written back to the fileserver. If it's a new  
file there is only one such chunk - the one that we are currently  
writing. As chunks are much larger than pages, and when a chunk is  
dirty we flush the whole thing to the server, this is why we see  
repeated writes of the same data. The process goes something like this:


*) Write page at 0k, dirties first chunk of file.
*) Discover cache is full, flush first chunk (0->1024k) to the file  
server

*) Write page at 4k, dirties first chunk of file
*) Cache is still full, flush first chunk to file server
*) Write page at 8k, dirties first chunk of file

... and so on.

The problem is that we don't make good decisions when we decide to  
flush the cache. However, any change to flush items which are less  
active will be a behaviour change - in particular, on a multi-user  
system it would mean that one user could break write-on-close for  
other users simply by filling the cache.


Cheers,

Simon.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Hartmut Reuter

Achim Gsell wrote:


On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote:



On 22 Nov 2010, at 23:06, Achim Gsell wrote:


3.) But if I first open 8 files and - after this is done - start writing
to these files sequentially, the problem occurs. The difference to 1.)
and 2.) is, that I have these 8 open files while the test is running.
This simulates the "putc-test" of bonnie++ more or less:


AFS is a write-on-close filesystem, so holding all of these files open
means that it is trying really hard not to flush any data back to the
fileserver. However, at some point the cache fills, and it has to start
writing data back. In 1.4, we make some really bad choices about which data
to write back, and so we end up thrashing the cache. With Marc Dionne's
work in 1.5, we at least have the ability to make better choices, but
nobody has really looked in detail at what happens when the cache fills, as
the best solution is to avoid it happening in the first place!


Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB
disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast
then performance drops to<  3MB/s ...


We are always using memcache with only 64 or 256 MB, but I have seen this 
problem, too. I think it's on the server side: Today's server have a lot of 
memory and the data are written into the buffers first. Only when the buffers 
rach the limit the operating system starts to really sync them out to the disks. 
And with this huge amount of buffers you regularly see for some time the 
performance going down. I suppose that during the sync the fileserver's writes 
are hanging.


Hartmut



So long

Achim


___ OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Andrew Deason
On Mon, 22 Nov 2010 19:36:28 -0500
Marc Dionne  wrote:

> 2 - It can repeatedly flush out the same data to the server.  I don't
> understand exactly what's occurring in this particular case, but
> looking at packet traces (I reproduced it here), I see the client
> sending a series of overlapping ranges (0-4k, 0-8k... 0-1MB) to the
> server.  So on average at that point it is writing each 4k block 128
> times to the server...

Yep, this is what's happening in the trace Achim provided, too. Every 4k
we write the chunk. I'm not sure how that's possible unless something is
closing the file a lot, or the cache is full of stuff we can't kick out.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Marc Dionne
On Mon, Nov 22, 2010 at 6:56 PM, Achim Gsell  wrote:
>
> On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote:
>
>>
>> On 22 Nov 2010, at 23:06, Achim Gsell wrote:
>>>
>>> 3.) But if I first open 8 files and - after this is done - start writing to 
>>> these files sequentially, the problem occurs. The difference to 1.) and 2.) 
>>> is, that I have these 8 open files while the test is running. This 
>>> simulates the "putc-test" of bonnie++ more or less:
>>
>> AFS is a write-on-close filesystem, so holding all of these files open means 
>> that it is trying really hard not to flush any data back to the fileserver. 
>> However, at some point the cache fills, and it has to start writing data 
>> back. In 1.4, we make some really bad choices about which data to write 
>> back, and so we end up thrashing the cache. With Marc Dionne's work in 1.5, 
>> we at least have the ability to make better choices, but nobody has really 
>> looked in detail at what happens when the cache fills, as the best solution 
>> is to avoid it happening in the first place!
>
> Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB 
> disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast 
> then performance drops to < 3MB/s ...
>
> So long
>
> Achim

Same question as Simon.. what's your memory size, and also what's your
dirty background ratio (cat /proc/sys/vm/dirty_background_ratio)?
Quite a bit of writing can occur before the VM decides to initiate
writeback, so issues with the cache can show up later than one would
think if there's a lot of memory and/or the dirty ratio is set high.

The cache manager has 2 basic problems in this situation:
1 - It only tries to write back data for the file it's currently
writing.  Data from the earlier files is occupying most of the cache
but won't be evicted.  So it ends up spinning within a small section
of cache.
2 - It can repeatedly flush out the same data to the server.  I don't
understand exactly what's occurring in this particular case, but
looking at packet traces (I reproduced it here), I see the client
sending a series of overlapping ranges (0-4k, 0-8k... 0-1MB) to the
server.  So on average at that point it is writing each 4k block 128
times to the server...

Marc
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Simon Wilkinson


On 22 Nov 2010, at 23:56, Achim Gsell wrote:


Sounds reasonable. But I have the same problem with a 9GB disk- 
cache, a 1GB disk-cache, 1GB mem-cache and a 256kB mem-cache: I can  
write 6 GB pretty fast then performance drops to < 3MB/s ...


How much memory does your test machine have?

S.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Achim Gsell

On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote:

> 
> On 22 Nov 2010, at 23:06, Achim Gsell wrote:
>> 
>> 3.) But if I first open 8 files and - after this is done - start writing to 
>> these files sequentially, the problem occurs. The difference to 1.) and 2.) 
>> is, that I have these 8 open files while the test is running. This simulates 
>> the "putc-test" of bonnie++ more or less:
> 
> AFS is a write-on-close filesystem, so holding all of these files open means 
> that it is trying really hard not to flush any data back to the fileserver. 
> However, at some point the cache fills, and it has to start writing data 
> back. In 1.4, we make some really bad choices about which data to write back, 
> and so we end up thrashing the cache. With Marc Dionne's work in 1.5, we at 
> least have the ability to make better choices, but nobody has really looked 
> in detail at what happens when the cache fills, as the best solution is to 
> avoid it happening in the first place!

Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB 
disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast 
then performance drops to < 3MB/s ...

So long

Achim


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Simon Wilkinson

On 22 Nov 2010, at 23:06, Achim Gsell wrote:
> 
> 3.) But if I first open 8 files and - after this is done - start writing to 
> these files sequentially, the problem occurs. The difference to 1.) and 2.) 
> is, that I have these 8 open files while the test is running. This simulates 
> the "putc-test" of bonnie++ more or less:

AFS is a write-on-close filesystem, so holding all of these files open means 
that it is trying really hard not to flush any data back to the fileserver. 
However, at some point the cache fills, and it has to start writing data back. 
In 1.4, we make some really bad choices about which data to write back, and so 
we end up thrashing the cache. With Marc Dionne's work in 1.5, we at least have 
the ability to make better choices, but nobody has really looked in detail at 
what happens when the cache fills, as the best solution is to avoid it 
happening in the first place!

Cheers,

Simon.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Achim Gsell

On Nov 22, 2010, at 11:32 PM, Andrew Deason wrote:

> On Mon, 22 Nov 2010 23:12:57 +0100
> Achim Gsell  wrote:
> 
 Trashing? Mmh. I can write 8 1 GB in parallel with dd without
 problems ...
>>> 
>>> Are you sure the access pattern for that is the same as the bonnie++
>>> test that's running?
>> 
>> OK, here is a simple shell script reproducing the problem:
> 
> Er, I'm sorry, I'm not understanding something here. I thought you just
> said that this is _not_ a problem when doing this in parallel with dd.

Yes, and the script is working serial ... the parallel dd's are just an 
example, that I don't think it's trashing problem.

1.) This works:
$ for i in 1 2 3 4 5 6 7 8; do dd if=/dev/zero of=file$i bs=1024k count=1024; 
done 

2.) And this too:
$ for i in 1 2 3 4 5 6 7 8; do dd if=/dev/zero of=file$i bs=1024k count=1024 & 
done 

3.) But if I first open 8 files and - after this is done - start writing to 
these files sequentially, the problem occurs. The difference to 1.) and 2.) is, 
that I have these 8 open files while the test is running. This simulates the 
"putc-test" of bonnie++ more or less:

fd1 = open("file1");
fd2 = open("file2");
fd3 = open("file3");
fd4 = open("file4");
fd5 = open("file5");
fd6 = open("file6");
fd7 = open("file7");
fd8 = open("file8");

write 1 GB to fd1
write 1 GB to fd2
write 1 GB to fd3
write 1 GB to fd4
write 1 GB to fd5
write 1 GB to fd6
write 1 GB to fd7  # usually here I run into the problem :-(
write 1 GB to fd8 

close (fd1);
close (fd2);
close (fd3);
close (fd4);
close (fd5);
close (fd6);
close (fd7);
close (fd8);


Achim___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Andrew Deason
On Mon, 22 Nov 2010 23:12:57 +0100
Achim Gsell  wrote:

> >> Trashing? Mmh. I can write 8 1 GB in parallel with dd without
> >> problems ...
> > 
> > Are you sure the access pattern for that is the same as the bonnie++
> > test that's running?
> 
> OK, here is a simple shell script reproducing the problem:

Er, I'm sorry, I'm not understanding something here. I thought you just
said that this is _not_ a problem when doing this in parallel with dd.

If both bonnie++ and writing with dd give you the same problems, I would
suspect that the cache is too small (and thrashing). I'm not saying it's
not a bug if so, but could you just try this with a 10G disk cache and
see if it still happens?

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Achim Gsell

On Nov 22, 2010, at 8:15 PM, Andrew Deason wrote:

> On Mon, 22 Nov 2010 20:01:31 +0100
> Achim Gsell  wrote:
> 
>>> If the latter (or regardless), I'd try increasing the size
>>> of the cache; since you're seeing traffic across the network, I'd
>>> suspect you're thrashing.
>> 
>> Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems
>> ...
> 
> Are you sure the access pattern for that is the same as the bonnie++
> test that's running?

OK, here is a simple shell script reproducing the problem:

#!/bin/bash

exec 4> 1
exec 5> 2
exec 6> 3
exec 7> 4
exec 8> 5
exec 9> 6
exec 10> 7

dd if=/dev/zero bs=1024k count=1024 1>&4
dd if=/dev/zero bs=1024k count=1024 1>&5
dd if=/dev/zero bs=1024k count=1024 1>&6
dd if=/dev/zero bs=1024k count=1024 1>&7
dd if=/dev/zero bs=1024k count=1024 1>&8
dd if=/dev/zero bs=1024k count=1024 1>&9
dd if=/dev/zero bs=1024k count=1024 1>&10
#EOF


So, I guess, the trouble makers are the open file descriptors ... 

> I mean, if it's writing to random places in these files, for example,
> your working set is somewhere closer to 8G, and you have a cache that is
> 1G, well...

bonnie++ writes sequentially to the each file, there is no random pattern.

> 
>> I will store a network dump in my public AFS directory and tell you as
>> soon as I have it - may take some time ...
> 
> Before you do this, take a look at 'xstat_cm_test  -collID 2
> -onceonly' (again before/after to be on the safe side), specifically at
> the hits vs misses, and the numbers for FetchData, FetchStatus, and
> StoreData, which will give some information about how you're using the
> cache and how much you're hitting the server.
> 
> (If you want to show the data here, putting it in public AFS or a
> pastebin may be nice to the other list denizens)

/afs/psi.ch/user/g/gsell/public/dd2afs.tcpdump

Achim
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Andrew Deason
On Mon, 22 Nov 2010 20:01:31 +0100
Achim Gsell  wrote:

> > If the latter (or regardless), I'd try increasing the size
> > of the cache; since you're seeing traffic across the network, I'd
> > suspect you're thrashing.
>
> Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems
> ...

Are you sure the access pattern for that is the same as the bonnie++
test that's running?

I mean, if it's writing to random places in these files, for example,
your working set is somewhere closer to 8G, and you have a cache that is
1G, well...

> I will store a network dump in my public AFS directory and tell you as
> soon as I have it - may take some time ...

Before you do this, take a look at 'xstat_cm_test  -collID 2
-onceonly' (again before/after to be on the safe side), specifically at
the hits vs misses, and the numbers for FetchData, FetchStatus, and
StoreData, which will give some information about how you're using the
cache and how much you're hitting the server.

(If you want to show the data here, putting it in public AFS or a
pastebin may be nice to the other list denizens)

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Achim Gsell

On Nov 22, 2010, at 7:51 PM, Andrew Deason wrote:

> On Mon, 22 Nov 2010 18:53:26 +0100
> Achim Gsell  wrote:
> 
>>> Some stats might help say what's wrong. 'rxdebug  7001
>>> -rxstats' and 'rxdebug  -rxstats' before and after. Also
>>> 'xstat_fs_test  -collID 3 -onceonly' before and after, but I
>>> don't think that's it.
>> 
>> Output of commands is attached.
> 
> client spurious counter seems high, but maybe it's just higher than I'm
> used to because of the load here. I'm not aware of anything recent
> related to handling RX call numbers, but if someone would like to
> correct me...
> 
> About how much data and time had gone by between the 'before' and
> 'after' invocations, would you say?

~6GB; less then 3 minutes.

Achim
> 
> -- 
> Andrew Deason
> adea...@sinenomine.net
> 
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Achim Gsell

On Nov 22, 2010, at 7:11 PM, Andrew Deason wrote:

> On Mon, 22 Nov 2010 18:00:41 +0100
> Achim Gsell  wrote:
> 
>> Hi,
>> 
>> bonnie++ on OpenAFS with sizes above 8192 behaves strange on my Linux
>> systems: after writing up to 5-7 files with 1GB each, write
>> performance drops from 50-60 MB/s to around 3MB/s.
> 
> Is this writing to several files sequentially? Or do you mean bonnie++
> writes has about 5-7 files open at the same time, and has written about
> 1G into each?

I mean the latter. If you start bonnie++ with -s 8192 it opens 8 files and 
starts writing sequentially to these files: the first file will be filled with 
1 GB then the second and so on.

> If the latter (or regardless), I'd try increasing the size
> of the cache; since you're seeing traffic across the network, I'd
> suspect you're thrashing.
Trashing? Mmh. I can write 8 1 GB in parallel with dd without problems ...

> If you feel like sharing a network dump of the
> traffic when the problem occurs (probably not to the list), we could
> tell you what it all is.
I will store a network dump in my public AFS directory and tell you as soon as 
I have it - may take some time ...

Achim___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Andrew Deason
On Mon, 22 Nov 2010 18:53:26 +0100
Achim Gsell  wrote:

> > Some stats might help say what's wrong. 'rxdebug  7001
> > -rxstats' and 'rxdebug  -rxstats' before and after. Also
> > 'xstat_fs_test  -collID 3 -onceonly' before and after, but I
> > don't think that's it.
> 
> Output of commands is attached.

client spurious counter seems high, but maybe it's just higher than I'm
used to because of the load here. I'm not aware of anything recent
related to handling RX call numbers, but if someone would like to
correct me...

About how much data and time had gone by between the 'before' and
'after' invocations, would you say?

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Andrew Deason
On Mon, 22 Nov 2010 18:00:41 +0100
Achim Gsell  wrote:

> Hi,
> 
> bonnie++ on OpenAFS with sizes above 8192 behaves strange on my Linux
> systems: after writing up to 5-7 files with 1GB each, write
> performance drops from 50-60 MB/s to around 3MB/s.

Is this writing to several files sequentially? Or do you mean bonnie++
writes has about 5-7 files open at the same time, and has written about
1G into each? If the latter (or regardless), I'd try increasing the size
of the cache; since you're seeing traffic across the network, I'd
suspect you're thrashing. If you feel like sharing a network dump of the
traffic when the problem occurs (probably not to the list), we could
tell you what it all is.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Achim Gsell

On Nov 22, 2010, at 6:14 PM, Andrew Deason wrote:

> On Mon, 22 Nov 2010 18:00:41 +0100
> Achim Gsell  wrote:
> 
>> Cache type:memory
> 
> I'm not saying this is a problem, but does the same thing happen with a
> disk cache?

Same problem ...

> 
> Some stats might help say what's wrong. 'rxdebug  7001 -rxstats'
> and 'rxdebug  -rxstats' before and after. Also 'xstat_fs_test
>  -collID 3 -onceonly' before and after, but I don't think that's
> it.

Output of commands is attached.

> 
>> Server:
>> Scientific Linux 5, kernel 2.6.18-194.17.4.el5
>> OpenAFS 1.4.12.1
>> 
>> Server and clients are test-systems without any other load.
> 
> Server and client parameters?

/usr/vice/etc/afsd -cachedir /var/cache/openafs -blocks 1048576 -rootvol 
root.afs -mountdir /afs -afsdb -dynroot -fakestat-all -nosettime -chunksize 18 
[-memcache]

/fileserver -p 128 -b 512 -l 2048 -s 2048 -vc 2048 -cb 512000 -rxpck 2048 
-sendsize 2097152 -udpsize 2097152 -syslog -d 0 -auditlog 
/var/opt/openafs-server/fileserver.audit -allow-dotted-principals

So long

Achim


rxdebug-client.after
Description: Binary data



rxdebug-client.before
Description: Binary data



rxdebug-server.after
Description: Binary data



rxdebug-server.before
Description: Binary data



xstat_fs_test.after
Description: Binary data



xstat_fs_test.before
Description: Binary data


[OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Andrew Deason
On Mon, 22 Nov 2010 18:00:41 +0100
Achim Gsell  wrote:

> Cache type:memory

I'm not saying this is a problem, but does the same thing happen with a
disk cache?

Some stats might help say what's wrong. 'rxdebug  7001 -rxstats'
and 'rxdebug  -rxstats' before and after. Also 'xstat_fs_test
 -collID 3 -onceonly' before and after, but I don't think that's
it.

> Server:
> Scientific Linux 5, kernel 2.6.18-194.17.4.el5
> OpenAFS 1.4.12.1
> 
> Server and clients are test-systems without any other load.

Server and client parameters?

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info