Re: [Jfs-discussion] filefrags and jfs

Peter Grandi Wed, 16 Dec 2009 08:31:04 -0800

>>> What might be the best way to "defrag" files.

>> Dump/reload to/from a freshly formatted filesystem
>> *on a different device*.


>>> I've noticed a /huge/ speedup when defrag'ing files using the
>>> following method:

>>> create a new file in same directory
>>> use ftruncate to set the new file's size the same as existing file
>>> copy data from existing to new file
>>> close both files and rename new file to the old file (atomic replace)

>> That is a silly, random way.

> I've had very good success at doing that on a much larger basis.
> [ ... ] in my couple of a dozen experiments it never failed to
> substantially improve overall fragmentation - frequently by
> quite a bit.

Asking for advice here is a bit pointless if you already know
better :-).

> Several filesystems I tested with had over 40% fragmentation
> (as reported by e2fsck)

Weren't we talking about JFS here?

> and each one of them ended up in the single digits.

A silly random way can still appear to work in a dozen
small random experiments. You were asking not for some way
to defrag something, but the "best way to "defrag" files".

>>> I recently did this to a .iso image for openSUSE 11.2:
>>> openSUSE-11.2-DVD-x86_64.iso: 20517 extents found
>>> turned into:
>>> openSUSE-11.2-DVD-x86_64.iso: 23 extents found

>> That ISO file is heavily discontiguous most likely because it
>> was written incrememtally in many tiny pieces (typically a
>> download) and the algorithms in the JFS allocator that try to
>> pick contiguous allocation areas don't handle that well

> Yes, I know why it was discontiguous.

Then you could have mentioned that. The explanation was meant to
help understand what is going on and how to do things better.

But then if you already suspected that "heavily discontiguous
most likely because it was written incrememtally in many tiny
pieces", why suggest below that "write a \0 every 4K" is a good
idea?

>> (they handle fairly well continuous write of largish bits).

> What I'm asking is what filesystem operations best suit large
> JFS allocations.

I think that the answers are "continuous write of largish bits",
ideally to a "freshly formatted filesystem". The reasons why
may be inferred from the disk layout, described here:

  http://en.wikipedia.org/wiki/JFS_(file_system)
  http://www.sabi.co.uk/Notes/linuxFS.html#jfsStruct
  http://jfs.sourceforge.net/project/pub/jfslayout.pdf

> Should I allocate the entire file by way of truncate?

That should not allocate anything:

  base# grep /dev/root /proc/mounts
  /dev/root / jfs rw 0 0
  base# perl -e 'truncate STDOUT,1000*1000*1000' > /1G
  base# ls -lsd /1G
  4 -rw------- 1 root root 1000000000 Dec 16 16:27 /1G

> Should I write a \0 every 4K (this is similar to what
> fallocate_posix does in glibc) for the size of the file? 
> Normally I use truncate but I've had good success with both
> methods.

Both are sort of random methods. Also, writing a single byte
every 4KiB is less than optimal (larger writes and less seeking
would probably be better).

I'd use something like:

  dd bs=10M count=100 if=/dev/zero oflag=direct of=tmp/1G

again ideally on a non-busy, freshly formatted filesystem.

> I'm asking specifically for the best way to do this *for JFS*.

It is not very JFS specific, because JFS does not have specific
ways to preallocate space for a file.

Howver, a large difference between JFS and some other file
systems is that it allocates stuff in much larger "cylinder
groups" (AGs in JFS), and that it will allocate space for one
file per "cylinder group" (as per the references above).

As a curiosity I have just checked AG size vs. aggregate size
for 3 JFS filesystems I got (12G, 120G, 460GB):

  # for A in sdc1 sdc6 sdc9; do jfs_tune -l /dev/$A; done | egrep '(gate|group) 
size'
  Aggregate size:         24373496 blocks
  Allocation group size:  32768 aggregate blocks
  Aggregate size:         249124744 blocks
  Allocation group size:  262144 aggregate blocks
  Aggregate size:         976179568 blocks
  Allocation group size:  1048576 aggregate blocks

A bit surprising as I was expecting bigger allocation groups,
but it looks like the goal is to have around 1,000 AGs per
aggregate (AG size is 120MB, 1GB, 3.8GB). Extents will be grow
across AGs, but only one file at a time.

>>> Is there a way to get "1 extents"?

>> That is pointless. What matters is what percentage of IO is

> I should have asked "what is the most optimal way to allocate
> space for large files on JFS".

Again, that would be "continuous write of largish bits" and "on
a less busy filesystem", but pre-creating files is often
regrettably fairly pointless as many applications don't
overwrite files, but just open them with "O_CREAT", and that
truncates them to 0 before writing. Ideally applications would
(optionally) open and overwrite and truncate at the end, but
that is quite rare.

>>> I have other .iso images of similar size with 1 extent.

>> You copied them on a less busy filesystem.

> The filesystem was no more or less busy, but that's non the
> matter.

That can have a lot of influence on the result -- if a
filesystem is busy (either in the sense of having quite a bit of
IO ongoing or of having had quite a bit of allocation in the
past) the free space is more likely to be widely scattered.

Sometimes even just writing two files at the same time in small
pieces causes trouble (but much less so on a fresh mostly unused
filesystem, especially if they are largish, as the AGs will then
be bigger).

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Re: [Jfs-discussion] filefrags and jfs

Reply via email to