Re: [OpenAFS] openafs on Fedora 12?

Rainer Toebbicke Fri, 11 Dec 2009 01:01:12 -0800

Chas Williams (CONTRACTOR) wrote:

In message <4b20b344.5010...@pclella.cern.ch>,Rainer Toebbicke writes:

Chas Williams (CONTRACTOR) wrote:

i still wonder if the cache manager shouldnt open a single file (sparse
mode) and just seek/read/write.  this would solve a couple of potential
problems with other filesystems as well.

There are some issues with the canonical approach of just using one file andseek to chunkno*chunksize:


1. directories are read in total, regardless of chunk boundaries;


ah.  i did indeed forget this point.  this is particular annoying with
regard to memcache (it causes a realloc of the chunk if the chunk is
undersized).  for now, we could ensure that chunk sizes are 'sufficiently'
large.

With the current "dir" package this means a chunk size of 2MB. Assuming theunit of transfer is still "chunksize" and you do not intentionally fill chunkspartially you'd give up a valuable tuning parameter.

2. it is, to my knowledge and on a POSIX level, not possible to "free" partsof a file. Hence, if the number of chunks in the cache exceeds the size of/usr/vice/cache you run out of space;
i dont ever wish to free parts of a file.  i just wanted to create the
file quickly to avoid making the user wait while a 1GB is written.
oversubscribing /usr/vice/cache is somewhat like asking the doctor why
it hurts when you hit yourself with a hammer.

We typically create a 10GiB AFS cache with ~100000 cache files, but achunksize of 256 kB. What's wrong with that? The cache occupancy is measuredin kiB anyway and the cache manager figures out whom to recycle. As biggerchunks have an increased probability of being only partially filled (because,after all, we also have "small" files), this all works out without the userseeing any adverse effect. With your 2 MB chunk size suggest above such acache would have to be... 200 GB.

BTW: on decent machines an individual 1 GiB write does not make the user wait:on write the data is first copied in the the AFS file's mapping, later intothe cache file's mapping (the former step can be avoided by writing into thechunk files directly). On reads the reader is woken up on every RX packet,ensuring streaming to the user. Here again, the double copy can be avoided.

3. unless done carefully, if you re-write parts of a file the system may endup reading it in first (partial blocks).
With individual cache files and well-placed truncate() calls these issues goaway.
i am not convinced that the well placed truncate calls have any meaning.
the filesystems in question tend to just do what they want.

They do! They free the blocks used up by the cache file, just in case thechunk you're writing is smaller. They also make sure that while re-writingnon-block/page-aligned parts data do not have to be read in just to be thrownaway on the next write.

So if you want to put the cache into one big file you'll at least have tothink about space allocation and fragmentation. You'd also better ensurepage-aligned writes.



--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] openafs on Fedora 12?

Reply via email to