Re: nocow flags

2012-03-02 Thread Duncan
Kyle Gates posted on Fri, 02 Mar 2012 11:29:40 -0600 as excerpted:

> I set the C (NOCOW) and z (Not_Compressed) flags on a folder but the
> extent counts of files contained there keep increasing.
> Said files are large and frequently modified but not changing in size.
> This does not happen when the filesystem is mounted with nodatacow.
> 
> I'm using this as a workaround since subvolumes can't be mounted with
> different options simultaneously. ie. one with COW, one with nodatacow
> 
> Any ideas why the flags are being ignored?
> 
> I'm running 32bit 3.3rc4 with
> noatime,nodatasum,space_cache,autodefrag,inode_cache on a 3 disk RAID0
> data RAID1 metadata filesystem.

I'm not sure if it applies here or not, but there's a note on the wiki 
under the defrag discussion, that mentions that defrag, anyway, is per-
file, and defragging a dir doesn't defrag the files in that dir.  I'm
/guessing/ the same thing may apply here since these are per-file flags.

There's a workaround suggested.  Let me see if I can find that note 
again...

Found it in the problem FAQ:

http://btrfs.ipv5.de/index.php?
title=Problem_FAQ#Defragmenting_a_directory_doesn.27t_work

>

Defragmenting a directory doesn't work

Running this:

# btrfs filesystem defragment ~/stuff

doesn't defragment the contents of the directory.

This is by design. btrfs fi defrag operates on the single filesystem 
object passed to it. This means that the command defragments just the 
metadata held by the directory object, and not the contents of the 
directory. If you want to defragment the contents of the directory, 
something like this would be more useful:

# find -type f -xdev -print0 | xargs -0 btrfs fi defrag

<

Perhaps you need to do something similar to set the flags on all the 
files under a specific dir?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvolume nomenclature

2012-03-02 Thread cwillu
On Fri, Mar 2, 2012 at 1:44 PM, Brian J. Murrell  wrote:
> On 12-03-02 08:36 AM, cwillu wrote:
>>
>> Try btrfs sub delete /etc/apt/oneiric, assuming that that's the path
>> where you actually see it.
>
> Well, there is a root filesystem at /etc/apt/oneiric:
>
> # ls /etc/apt/oneiric/
> bin   etc         initrd.img.old  mnt   root  selinux  tmp  vmlinuz
> boot  home        lib             opt   run   srv      usr  vmlinuz.old
> dev   initrd.img  media           proc  sbin  sys      var
>
> but it doesn't delete:
>
> # btrfs subvolume delete /etc/apt/oneiric
> Delete subvolume '/etc/apt/oneiric'
> ERROR: cannot delete '/etc/apt/oneiric' - Device or resource busy

root@repository:~/foo$ btrfs sub create bar
Create subvolume './bar'
root@repository:~/foo$ cd bar
root@repository:~/foo/bar$ btrfs sub del /home/cwillu/foo/bar
Delete subvolume '/home/cwillu/foo/bar'
ERROR: cannot delete '/home/cwillu/foo/bar' - Device or resource busy


It's likely there's at least one process with an open handle to that
subvolume (even just a shell that as its current working directory).
lsof | grep /etc/apt/oneiric should tell you what you need.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Chris Mason
On Fri, Mar 02, 2012 at 02:32:15PM -0500, Ted Ts'o wrote:
> On Fri, Mar 02, 2012 at 09:26:51AM -0500, Chris Mason wrote:
> > 
> > filefrag will tell you how many extents each file has, any file with
> > more than one extent is interesting.  (The ext4 crowd may have better
> > suggestions on measuring fragmentation).
> 
> You can get a *huge* amount of information (probably more than you'll
> want to analyze) by doing this:
> 
>  e2fsck -nf -E fragcheck /dev/ >& /tmp/fragcheck.out
> 
> I haven't had time to do this in a while, but a while back I used this
> to debug the writeback code with an eye towards reducing
> fragmentation.  At the time I was trying to optimize the case of
> reducing fragmentation in the easist case possible, where you start
> with an empty file system, and then copy all of the data from another
> file system onto it using rsync -avH.
> 
> It would be worth while to see what happens with files written by the
> compiler and linker.  Given that libelf tends to write .o files
> non-sequentially, and without telling us how big the space is in
> advance, I could well imagine that we're not doing the best job
> avoiding free space fragmentation, which eventually leads to extra
> file system aging.

I just realized that I confused things.  He's doing a read on the
results of a cp -a to a fresh FS, so there's no way the compiler/linker
are causing trouble.

> 
> It would be interesting to have a project where someone added
> fallocate() support into libelf, and then added some hueristics into
> ext4 so that if a file is fallocated to a precise size, or if the file
> is fully written and closed before writeback begins, that we use this
> to more efficiently pack the space used by the files by the block
> allocator.  This is a place where I would not be surprised that XFS
> has some better code to avoid accelerated file system aging, and where
> we could do better with ext4 with some development effort.

The part I don't think any of us have solved is writing back the files
in a good order after we've fallocated the blocks.

So this will probably be great for reads and not so good for writes.

> 
> Of course, it might also be possible to hack around this by simply
> using VPATH and dropping your build files in a separate place from
> your source files, and periodically reformatting the file system where
> your build tree lives.  (As a side note, something that works well for
> me is to use an SSD for my source files, and a separate 5400rpm HDD
> for my build tree.  That allows me to use a smaller and more
> affordable SSD, and since the object files can be written
> asynchronously by the writeback threads, but the compiler can't move
> forward until it gets file data from the .c or .h file, it gets me the
> best price/performance for a laptop build environment.)

mkfs for defrag ;)  It's the only way to be sure.

> 
> BTW, I suspect we could make acp even more efficient by teaching it to
> use FIEMAP ioctl to map out the data blocks for all of the files in
> the source file system, and then copied the files (or perhaps even
> parts of files) in a read order which reduced seeking on the source
> drive.

acp does have a -b mode where it fibmaps (I was either lazy or it is
older than fiemap, I forget) the first block in the file, and uses that
to sort.  It does help if the file blocks aren't ordered well wrt their
inode numbers, but not if the files are fragmented.

It's also worth mentioning that acp doesn't actually cp.  I never got
that far.  It was supposed to be the perfect example of why everything
should be done via aio, but it just ended up demonstrating that ordering
by inode number and leveraging kernel/hardware reada were more
important.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvolume nomenclature

2012-03-02 Thread Brian J. Murrell
On 12-03-02 08:36 AM, cwillu wrote:
> 
> Try btrfs sub delete /etc/apt/oneiric, assuming that that's the path
> where you actually see it.

Well, there is a root filesystem at /etc/apt/oneiric:

# ls /etc/apt/oneiric/
bin   etc initrd.img.old  mnt   root  selinux  tmp  vmlinuz
boot  homelib opt   run   srv  usr  vmlinuz.old
dev   initrd.img  media   proc  sbin  sys  var

but it doesn't delete:

# btrfs subvolume delete /etc/apt/oneiric
Delete subvolume '/etc/apt/oneiric'
ERROR: cannot delete '/etc/apt/oneiric' - Device or resource busy

and doesn't unmount:

# umount /etc/apt/oneiric
umount: /etc/apt/oneiric: not mounted

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Ted Ts'o
On Fri, Mar 02, 2012 at 09:26:51AM -0500, Chris Mason wrote:
> 
> filefrag will tell you how many extents each file has, any file with
> more than one extent is interesting.  (The ext4 crowd may have better
> suggestions on measuring fragmentation).

You can get a *huge* amount of information (probably more than you'll
want to analyze) by doing this:

 e2fsck -nf -E fragcheck /dev/ >& /tmp/fragcheck.out

I haven't had time to do this in a while, but a while back I used this
to debug the writeback code with an eye towards reducing
fragmentation.  At the time I was trying to optimize the case of
reducing fragmentation in the easist case possible, where you start
with an empty file system, and then copy all of the data from another
file system onto it using rsync -avH.

It would be worth while to see what happens with files written by the
compiler and linker.  Given that libelf tends to write .o files
non-sequentially, and without telling us how big the space is in
advance, I could well imagine that we're not doing the best job
avoiding free space fragmentation, which eventually leads to extra
file system aging.

It would be interesting to have a project where someone added
fallocate() support into libelf, and then added some hueristics into
ext4 so that if a file is fallocated to a precise size, or if the file
is fully written and closed before writeback begins, that we use this
to more efficiently pack the space used by the files by the block
allocator.  This is a place where I would not be surprised that XFS
has some better code to avoid accelerated file system aging, and where
we could do better with ext4 with some development effort.

Of course, it might also be possible to hack around this by simply
using VPATH and dropping your build files in a separate place from
your source files, and periodically reformatting the file system where
your build tree lives.  (As a side note, something that works well for
me is to use an SSD for my source files, and a separate 5400rpm HDD
for my build tree.  That allows me to use a smaller and more
affordable SSD, and since the object files can be written
asynchronously by the writeback threads, but the compiler can't move
forward until it gets file data from the .c or .h file, it gets me the
best price/performance for a laptop build environment.)

BTW, I suspect we could make acp even more efficient by teaching it to
use FIEMAP ioctl to map out the data blocks for all of the files in
the source file system, and then copied the files (or perhaps even
parts of files) in a read order which reduced seeking on the source
drive.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nocow flags

2012-03-02 Thread Kyle Gates

I set the C (NOCOW) and z (Not_Compressed) flags on a folder but the extent 
counts of files contained there keep increasing.
Said files are large and frequently modified but not changing in size. This 
does not happen when the filesystem is mounted with nodatacow.

I'm using this as a workaround since subvolumes can't be mounted with different 
options simultaneously. ie. one with COW, one with nodatacow

Any ideas why the flags are being ignored?

I'm running 32bit 3.3rc4 with 
noatime,nodatasum,space_cache,autodefrag,inode_cache on a 3 disk RAID0 data 
RAID1 metadata filesystem.

Thanks,
Kyle
  --
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Chris Mason
On Fri, Mar 02, 2012 at 03:16:12PM +0100, Jacek Luczak wrote:
> 2012/3/2 Chris Mason :
> > On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote:
> >>
> >> I've took both on tests. The subject is acp and spd_readdir used with
> >> tar, all on ext4:
> >> 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
> >> 2) spd_readdir: 
> >> http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png
> >> 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png
> >>
> >> The acp looks much better than spd_readdir but directory copy with
> >> spd_readdir decreased to 52m 39sec (30 min less).
> >
> > Do you have stats on how big these files are, and how fragmented they
> > are?  For acp and spd to give us this, I think something has gone wrong
> > at writeback time (creating individual fragmented files).
> 
> How big? Which files?

All the files you're reading ;)

filefrag will tell you how many extents each file has, any file with
more than one extent is interesting.  (The ext4 crowd may have better
suggestions on measuring fragmentation).

Since you mention this is a compile farm, I'm guessing there are a bunch
of .o files created by parallel builds.  There are a lot of chances for
delalloc and the kernel writeback code to do the wrong thing here.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Jacek Luczak
2012/3/2 Chris Mason :
> On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote:
>>
>> I've took both on tests. The subject is acp and spd_readdir used with
>> tar, all on ext4:
>> 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
>> 2) spd_readdir: 
>> http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png
>> 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png
>>
>> The acp looks much better than spd_readdir but directory copy with
>> spd_readdir decreased to 52m 39sec (30 min less).
>
> Do you have stats on how big these files are, and how fragmented they
> are?  For acp and spd to give us this, I think something has gone wrong
> at writeback time (creating individual fragmented files).

How big? Which files?

-Jacek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Chris Mason
On Fri, Mar 02, 2012 at 11:05:56AM +0100, Jacek Luczak wrote:
> 
> I've took both on tests. The subject is acp and spd_readdir used with
> tar, all on ext4:
> 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
> 2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png
> 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png
> 
> The acp looks much better than spd_readdir but directory copy with
> spd_readdir decreased to 52m 39sec (30 min less).

Do you have stats on how big these files are, and how fragmented they
are?  For acp and spd to give us this, I think something has gone wrong
at writeback time (creating individual fragmented files).

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvolume nomenclature

2012-03-02 Thread cwillu
On Fri, Mar 2, 2012 at 7:31 AM, Brian J. Murrell  wrote:
> I seem to have the following subvolumes of my filesystem:
>
> # btrfs sub li /
> ID 256 top level 5 path @
> ID 257 top level 5 path @home
> ID 258 top level 5 path @/etc/apt/oneiric
>
> I *think* the last one is there due to a:
>
> # btrfsctl -s oneiric /
>
> that I did prior to doing an upgrade.  I can't seem to figure out the
> nomenclature to delete it though:
>
> # btrfs sub de /@/etc/apt/oneiric

Try btrfs sub delete /etc/apt/oneiric, assuming that that's the path
where you actually see it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


subvolume nomenclature

2012-03-02 Thread Brian J. Murrell
I seem to have the following subvolumes of my filesystem:

# btrfs sub li /
ID 256 top level 5 path @
ID 257 top level 5 path @home
ID 258 top level 5 path @/etc/apt/oneiric

I *think* the last one is there due to a:

# btrfsctl -s oneiric /

that I did prior to doing an upgrade.  I can't seem to figure out the
nomenclature to delete it though:

# btrfs sub de /@/etc/apt/oneiric
ERROR: error accessing '/@/etc/apt/oneiric'

I've tried lots of other combinations with no luck.

Can anyone give me a hint (or the answer :-) )?

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] [RFC] Add btrfs autosnap feature

2012-03-02 Thread Sander
cwillu wrote (ao):
> > While developing snapper I faced similar problems and looked at
> > find-new but unfortunately it is not sufficient. E.g. when a file
> > is deleted find-new does not report anything, see the reply to my
> > mail here one year ago [1]. Also for newly created empty files
> > find-new reports nothing, the same with metadata changes.

> For a system-wide undo'ish sort of thing that I think autosnapper is
> going for, it should work quite nicely, but you're right that it
> doesn't help a whole lot with a backup system.  It can't tell you
> which files were touched or deleted, but it will still tell you that
> _something_ in the subvolume was touched, modified or deleted (at
> least, as of the last commit), which is all you need if you're only
> ever comparing it to its source.

Tar can remove deleted files for you during a restore. This is (imho) a
really cool feature of tar, and I use it in combination with btrfs
snapshots.

https://www.gnu.org/software/tar/manual/tar.html#SEC94

"The option `--listed-incremental' instructs tar to operate on an
incremental archive with additional metadata stored in a standalone
file, called a snapshot file. The purpose of this file is to help
determine which files have been changed, added or deleted since the last
backup"

"When extracting from the incremental backup GNU tar attempts to restore
the exact state the file system had when the archive was created. In
particular, it will delete those files in the file system that did not
exist in their directories when the archive was created"

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem full when it's not? out of inodes? huh?

2012-03-02 Thread Fajar A. Nugraha
On Fri, Mar 2, 2012 at 6:50 PM, Brian J. Murrell  wrote:
> Is  2010-06-01 really the last time the tools were considered
> stable or are Ubuntu just being conservative and/or lazy about updating?

The last one :)

Or probably no one has bugged them enough and point out they're
already using a git snapshot anyway and there are many new features in
the "current" git version of btrfs-tools.

I have been compiling my own kernel (just recently switched to
Precise's kernel though) and btrfs-progs for quite some time, so even
if Ubuntu doesn't provide updated package it wouldn't matter much to
me. If it's important for you, you could file a bug report in
launchpad asking for an update. Even debian testing has an updated
version (which you might be able to use:
http://packages.debian.org/btrfs-tools)

Or create your own ppa with an updated version (or at least rebuilt of
Debian's version).

-- 
Fajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Add btrfs autosnap feature

2012-03-02 Thread cwillu
>> Perhaps all that is unnecessary:  rather than doing the walk, why not
>> make use of btrfs subvolume find-new (or rather, the syscalls it
>> uses)?
>
> While developing snapper I faced similar problems and looked at
> find-new but unfortunately it is not sufficient. E.g. when a file
> is deleted find-new does not report anything, see the reply to my
> mail here one year ago [1]. Also for newly created empty files
> find-new reports nothing, the same with metadata changes.
>
> If I'm wrong or find-new gets extended I happy to implement it in
> snapper.

For a system-wide undo'ish sort of thing that I think autosnapper is
going for, it should work quite nicely, but you're right that it
doesn't help a whole lot with a backup system.  It can't tell you
which files were touched or deleted, but it will still tell you that
_something_ in the subvolume was touched, modified or deleted (at
least, as of the last commit), which is all you need if you're only
ever comparing it to its source.

-- Carey
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem full when it's not? out of inodes? huh?

2012-03-02 Thread Brian J. Murrell
On 12-02-26 06:00 AM, Hugo Mills wrote:
> 
>The option that nobody's mentioned yet is to use mixed mode. This
> is the -M or --mixed option when you create the filesystem. It's
> designed specifically for small filesystems, and removes the
> data/metadata split for more efficient packing.

Cool.

>As mentioned before, you probably need to upgrade to 3.2 or 3.3-rc5
> anyway. There were quite a few fixes in the ENOSPC/allocation area
> since then.

I've upgraded to the Ubuntu Precise kernel which looks to be 3.2.6 with
btrfs-tools 0.19+20100601-3ubuntu3 so that would look like a btrfs-progs
snapshot from 2010-06-01 and (unsurprisingly) I don't see the -M option
in mkfs.btrfs.

So I went digging and I just wanted to verify what I think I am seeing.

Looking at

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=67377734fd24c32cbdfeb697c2e2bd7fed519e75

it would appear that the mixed data+metadata code landed in the kernel
back in Sep, of 2010, is that correct?

And looking at

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commit;h=b8802ae3fa0c70d4cfc3287ed07479925973b0ac

the userspace support for this landed in Dec. of 2010, is that right?

If my archeology is correct, then I only need to update my btrfs-tools,
yes?  Is  2010-06-01 really the last time the tools were considered
stable or are Ubuntu just being conservative and/or lazy about updating?

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] [RFC] Add btrfs autosnap feature

2012-03-02 Thread Arvin Schnell
On Thu, Mar 01, 2012 at 05:54:40AM -0600, cwillu wrote:

> There doesn't appear to be any reason for the scratch file to exist at
> all (one can build up the hash while reading the directories), and
> keeping a scratch file in /etc/ is poor practice in the first place
> (that's what /tmp and/or /var/run is for).  It's also a lot of io to
> stat every file in the subvolume every time you make a snapshot, and
> I'm not convinced that the walk is actually correctly implemented:
> what stops an autosnap of / from including all of /proc and /sys in
> the hash?
> 
> Perhaps all that is unnecessary:  rather than doing the walk, why not
> make use of btrfs subvolume find-new (or rather, the syscalls it
> uses)?

While developing snapper I faced similar problems and looked at
find-new but unfortunately it is not sufficient. E.g. when a file
is deleted find-new does not report anything, see the reply to my
mail here one year ago [1]. Also for newly created empty files
find-new reports nothing, the same with metadata changes.

If I'm wrong or find-new gets extended I happy to implement it in
snapper.

Regards,
  Arvin

[1] http://www.spinics.net/lists/linux-btrfs/msg08683.html

-- 
Arvin Schnell, 
Senior Software Engineer, Research & Development
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
16746 (AG Nürnberg)
Maxfeldstraße 5
90409 Nürnberg
Germany
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Jacek Luczak
2012/3/1 Chris Mason :
> On Wed, Feb 29, 2012 at 11:44:31PM -0500, Theodore Tso wrote:
>> You might try sorting the entries returned by readdir by inode number before 
>> you stat them.    This is a long-standing weakness in ext3/ext4, and it has 
>> to do with how we added hashed tree indexes to directories in (a) a 
>> backwards compatible way, that (b) was POSIX compliant with respect to 
>> adding and removing directory entries concurrently with reading all of the 
>> directory entries using readdir.
>>
>> You might try compiling spd_readdir from the e2fsprogs source tree (in the 
>> contrib directory):
>>
>> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209
>>
>> … and then using that as a LD_PRELOAD, and see how that changes things.
>>
>> The short version is that we can't easily do this in the kernel since it's a 
>> problem that primarily shows up with very big directories, and using 
>> non-swappable kernel memory to store all of the directory entries and then 
>> sort them so they can be returned in inode number just isn't practical.   It 
>> is something which can be easily done in userspace, though, and a number of 
>> programs (including mutt for its Maildir support) does do, and it helps 
>> greatly for workloads where you are calling readdir() followed by something 
>> that needs to access the inode (i.e., stat, unlink, etc.)
>>
>
> For reading the files, the acp program I sent him tries to do something
> similar.  I had forgotten about spd_readdir though, we should consider
> hacking that into cp and tar.
>
> One interesting note is the page cache used to help here.  Picture two
> tests:
>
> A) time tar cf /dev/zero /home
>
> and
>
> cp -a /home /new_dir_in_new_fs
> unmount / flush caches
> B) time tar cf /dev/zero /new_dir_in_new_fs
>
> On ext, The time for B used to be much faster than the time for A
> because the files would get written back to disk in roughly htree order.
> Based on Jacek's data, that isn't true anymore.

I've took both on tests. The subject is acp and spd_readdir used with
tar, all on ext4:
1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png
3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png

The acp looks much better than spd_readdir but directory copy with
spd_readdir decreased to 52m 39sec (30 min less).

-Jacek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Jacek Luczak
2012/3/1 Ted Ts'o :
> On Thu, Mar 01, 2012 at 03:43:41PM +0100, Jacek Luczak wrote:
>>
>> Yep, ext4 is close to my wife's closet.
>>
>
> Were all of the file systems freshly laid down, or was this an aged
> ext4 file system?

Always fresh, recreated for each tests - that's why it takes quite
much time as I had to copy the test dir back in places. Env is kept in
all tests as much consistent as possible thus the values should have
credibility.

> Also you should beware that if you have a workload which is heavy
> parallel I/O, with lots of random, read/write accesses to small files,
> a benchmark using tar might not be representative of what you will see
> in production --- different file systems have different strengths and
> weaknesses --- and the fact that ext3/ext4's readdir() returns inodes
> in a non-optimal order for stat(2) or unlink(2) or file copy in the
> cold cache case may not matter as much as you think in a build server.
> (i.e., the directories that do need to be searched will probably be
> serviced out of the dentry cache, etc.)

The set of tests were chosen as is not to find best fs for build purposes.

For a pure builds ext4 is as of now the best and most of the points
you've put above are valid here. We've performed a real tests in a
clone of production environment. Results are not that surprising as
one can find same from e.g. Phoronix test. We've done test in XFS vs
ext[34] and ext4 vs btrfs. Here if we are taking into account only a
software compilation ext4 rocks. Btrfs is only few seconds slower (max
5 in average). The choice then was to use ext4 due to more mature
foundations and support in RHEL. Why we're looking for new one? Well,
the build environment is not only based on software building. There
are e.g. some strange tests running in parallel, code analysis, etc.
Users are doing damn odd things there ... are using Java, you can
imagine how bad bad bad bad zombie java can be. We've failed here and
are not able to isolate each use cases and create profiled
environment. Thus we need to find sth that will provide common sense
for all use cases.

The previous tests shown that ext[34] rocks on compilation timing, but
all around not really. Also one need to remember that fs content
changes often. The ext3 was ageing in 4-6 months of use. XFS on the
other hand was great all around while not in compilation timings.
Roughly 10% is not that much but if hosts is doing builds 24h/7 then
after a few days we've been much behind ext[34] clone env. Btrfs was
only tested against compilation timings, not in general use.

We've created a simple test case for compilation. It's quite not same
as what we got in real env but is good baseline (kernel build system
is too perfect). Simply parallel kernel builds with randomly
allyesconfig or allmodconfig. Below are the seekwatcher graphs of
around 1h of tests running. There were 10 builds (kernels
2.6.20-2.6.29) running with three parallel threads:
1) ext4: http://91.234.146.107/~difrost/seekwatcher/kt_ext4.png
2) btrfs: http://91.234.146.107/~difrost/seekwatcher/kt_btrfs.png
3) both: http://91.234.146.107/~difrost/seekwatcher/kt_btrfs_ext4.png

Above graphs prove that ext4 is ahead in this ,,competition''. I will
try to setup there a real build env to see how those two compare.

-Jacek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html