Chas Williams (CONTRACTOR) wrote:
In message <4b20b344.5010...@pclella.cern.ch>,Rainer Toebbicke writes:
Chas Williams (CONTRACTOR) wrote:

i still wonder if the cache manager shouldnt open a single file (sparse
mode) and just seek/read/write.  this would solve a couple of potential
problems with other filesystems as well.
There are some issues with the canonical approach of just using one file and seek to chunkno*chunksize:

1. directories are read in total, regardless of chunk boundaries;

ah.  i did indeed forget this point.  this is particular annoying with
regard to memcache (it causes a realloc of the chunk if the chunk is
undersized).  for now, we could ensure that chunk sizes are 'sufficiently'
large.

With the current "dir" package this means a chunk size of 2MB. Assuming the unit of transfer is still "chunksize" and you do not intentionally fill chunks partially you'd give up a valuable tuning parameter.


2. it is, to my knowledge and on a POSIX level, not possible to "free" parts of a file. Hence, if the number of chunks in the cache exceeds the size of /usr/vice/cache you run out of space;

i dont ever wish to free parts of a file.  i just wanted to create the
file quickly to avoid making the user wait while a 1GB is written.
oversubscribing /usr/vice/cache is somewhat like asking the doctor why
it hurts when you hit yourself with a hammer.

We typically create a 10GiB AFS cache with ~100000 cache files, but a chunksize of 256 kB. What's wrong with that? The cache occupancy is measured in kiB anyway and the cache manager figures out whom to recycle. As bigger chunks have an increased probability of being only partially filled (because, after all, we also have "small" files), this all works out without the user seeing any adverse effect. With your 2 MB chunk size suggest above such a cache would have to be... 200 GB.

BTW: on decent machines an individual 1 GiB write does not make the user wait: on write the data is first copied in the the AFS file's mapping, later into the cache file's mapping (the former step can be avoided by writing into the chunk files directly). On reads the reader is woken up on every RX packet, ensuring streaming to the user. Here again, the double copy can be avoided.


3. unless done carefully, if you re-write parts of a file the system may end up reading it in first (partial blocks).

With individual cache files and well-placed truncate() calls these issues go away.

i am not convinced that the well placed truncate calls have any meaning.
the filesystems in question tend to just do what they want.

They do! They free the blocks used up by the cache file, just in case the chunk you're writing is smaller. They also make sure that while re-writing non-block/page-aligned parts data do not have to be read in just to be thrown away on the next write.

So if you want to put the cache into one big file you'll at least have to think about space allocation and fragmentation. You'd also better ensure page-aligned writes.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to