date:20120507

Re: btrfs and 1 billion small files

2012-05-07 Thread Chris Samuel

On 07/05/12 20:06, Boyd Waters wrote:

> Use a directory hierarchy. Even if the filesystem handles a
> flat structure effectively, userspace programs will choke on
> tens of thousands of files in a single directory. For example
> 'ls' will try to lexically sort its output (very slowly) unless
> given the command-line option not to do so.

In my experience it's not so much that lexical sorting that kills you
but the default -F option which gets set for users these days, that
results in ls doing an lstat() on every file to work out if it's an
executable, directory, symlink, etc to modify how it displays it to you.

For instance on one of our HPC systems here we've a user with over
200,000 files in one directory.  It takes about 4 seconds for \ls
whereas \ls -F takes, well I can't tell you because it was still running
after 53 minutes (strace confirmed it was still lstat()ing) when I
killed it..

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Subdirectory creation on snapshot

2012-05-07 Thread Brendan Smithyman

Hi All,

I'm experiencing some odd-seeming behaviour with btrfs on Ubuntu 12.04, using 
the Ubuntu x86-64 generic 3.2.0-24 kernel and btrfs-tools 
0.19+20120328-1~precise1 (backported from the current Debian version using 
Ubuntu's backportpackage).  When I snapshot a subvolume on some of my drives, 
it creates an empty directory inside the snapshot with the same name as the 
original subvolume.  Example case and details below:



root@zeus:/mnt/reaper-all# ls
2012-05-03  @working

root@zeus:/mnt/reaper-all# ls @working
documents  games   MeganMusic   misc-docs photos   vm
Downloads  isosMeganPhotos  Network Trash Folder  Temporary Items  workdir
DropboxiTunes  MeganUBC otherphotos   videos

root@zeus:/mnt/reaper-all# btrfs subvolume snapshot @working test
Create a snapshot of '@working' in './test'

root@zeus:/mnt/reaper-all# ls test/
documents  games   MeganMusic   misc-docs photos   vm
Downloads  isosMeganPhotos  Network Trash Folder  Temporary Items  workdir
DropboxiTunes  MeganUBC otherphotos   videos   @working

root@zeus:/mnt/reaper-all# ls test/@working/

root@zeus:/mnt/reaper-all# btrfs subvolume list .
ID 257 top level 5 path @working
ID 258 top level 5 path 2012-05-03
ID 263 top level 5 path test



The preceding volume is mounted with "rw,compress" on top of LVM on md raid1, 
and I get the same behaviour from another volume running directly on a gpt 
partition.  In both cases, the volumes are 1 TB btrfs with default creation 
parameters, and were converted from ext4 using btrfs-convert (with the Ubuntu 
0.19+20100601 version, before switching btrfs-tools to version 0.19+20120328).  
As far as I am able to tell, both filesystems are healthy.  The drives are 
exported to some Mac OS systems via netatalk, which leads to some odd 
directories, but AFAIK shouldn't affect much else.  The subdirectory under the 
snapshot is just that (i.e., a directory and not a subvolume).

I don't see this behaviour on either the "@" or "@home" subvolumes of the 
system SSD (conforming to the Ubuntu btrfs layout), which are mounted 
"rw,noatime,discard,subvol=@" and "rw,noatime,discard,subvol=@home".  However, 
the btrfs volume on the SSD was built without btrfs-convert.  It's an Intel 520 
with Sandforce controller, which is why I'm not using -o compress in this case. 
 I haven't confirmed one way or the other if it is an issue of -o compress, but 
can do a test reboot with different options if that will help.

Could anyone shed some light on what's going on?  If it's simply an issue with 
out of date btrfs-progs, I can either live with it or upgrade.  However, I'd 
rather not track the bleeding edge on this system if I can avoid it.

Thanks!

All the best,
Brendan Smithyman

smime.p7s
Description: S/MIME cryptographic signature

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Martin Steigerwald

Am Montag, 7. Mai 2012 schrieb Helmut Hullen:
> > If you want to survive losing a single disk without the (absolute)
> > fear of the whole filesystem breaking you have to have some sort of
> > redundancy either by separating filesystems or using some version of
> > raid other than raid0.
> 
> No - since some years I use a kind of outsourced backup. A copy of
> all   data is on a bundle of disks somewhere in the neighbourhood. As
> mentionend: the data isn't business critical, it's just "nice to
> have". It's not worth something like raid1 or so (with twice the costs
> of a non raid solution).

Thats not true when you use BTRFS RAID1 with three disks. BTRFS will only 
store each chunk on two different drives then, not on all three. Such it is 
not twice the cost, but given all three drives have the same capacity 
about one and a half times the cost.

Consider the time to recover the files from the outsourced backup. Maybe it 
does make up the money you would have to spend for one additional 
harddisk.

Anyway, I agree with the others responding to your post that this one 
harddisk died and I do not see a kernel version related issue. Any striped 
RAID 0 would have failed in that case.

And you can use three BTRFS filesystems the same way as three Ext4 
filesystems if you prefer such a setup if the time spent for restoring the 
backup does not make up the cost for one additional disk for you.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread cwillu

On Mon, May 7, 2012 at 3:17 PM, Helmut Hullen  wrote:
> Hallo, Daniel,
>
> Du meintest am 07.05.12:
>
>>>    mkfs.btrfs  -m raid1 -d raid0
>>>
>>> with 3 disks gives me a "cluster" which looks like 1 disk/partition/
>>> directory.
>>> If one disk fails nothing is usable.
>
>> How is that different from putting ext on top of a raid0?
>
> Classic raid0 doesn't allow deleting/removing disks from a cluster.
>
>>> With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
>>> disk fails the contents of the 2 other disks is still readable,
>
>> There is nothing that prevents you from using this strategy with
>> btrfs.
>
> How?
> I've tried many installations of btrfs, sometimes 1 disk failed, and
> then the data on all other disks was inaccessible.

"With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
disk fails the contents of the 2 other disks is still readable,"

There's nothing stopping you from using 3 btrfs filesystems mounted in
the same way as you would 3 ext4 filesystems.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Daniel,

Du meintest am 07.05.12:

>>mkfs.btrfs  -m raid1 -d raid0
>>
>> with 3 disks gives me a "cluster" which looks like 1 disk/partition/
>> directory.
>> If one disk fails nothing is usable.

> How is that different from putting ext on top of a raid0?

Classic raid0 doesn't allow deleting/removing disks from a cluster.

>> With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
>> disk fails the contents of the 2 other disks is still readable,

> There is nothing that prevents you from using this strategy with
> btrfs.

How?
I've tried many installations of btrfs, sometimes 1 disk failed, and  
then the data on all other disks was inaccessible.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Daniel Lee


On 05/07/2012 01:21 PM, Helmut Hullen wrote:

Hallo, Daniel,

Du meintest am 07.05.12:


Yes - I know. But btrfs promises that I can add bigger disks and
delete smaller disks "on the fly". For something like a video
collection which will grow on and on an interesting feature. And
such a (big) collection does need a "gradfather-father-son" backup,
that's no critical data.

With a file system like ext2/3/4 I can work with several directories
which are mounted together, but (as said before) one broken disk
doesn't disturb the others.



How can you do that with ext2/3/4? If you mean create several
different filesystems and mount them separately then that's very
different from your current situation. What you did in this case is
comparable to creating a raid0 array out of your disks. I don't see
how an ext filesystem is going to work any better if one of the disks
drops out than with a btrfs filesystem.


   mkfs.btrfs  -m raid1 -d raid0

with 3 disks gives me a "cluster" which looks like 1 disk/partition/
directory.
If one disk fails nothing is usable.


How is that different from putting ext on top of a raid0?



(Yes - I've read Hugo's explanation of "-d single", I'll try this way)

With ext2/3/4 I mount 2 disks/partitions into the first disk. If one
disk fails the contents of the 2 other disks is still readable,


There is nothing that prevents you from using this strategy with btrfs.




It sounds like what you're thinking of is creating several separate
ext filesystems and then just mounting them separately.


Yes - that's the old way. It's reliable but "ugly".


There's nothing inherently special about doing this with ext, you can
do the same thing with btrfs and it would amount to about the same
level of protection (potentially more if you consider [meta]data
checksums important but potentially less if you feel that ext is more
robust for whatever reason).


No - as just mentionend: there's a big difference when one disk fails.


No there isn't.




If you want to survive losing a single disk without the (absolute)
fear of the whole filesystem breaking you have to have some sort of
redundancy either by separating filesystems or using some version of
raid other than raid0.


No - since some years I use a kind of outsourced backup. A copy of all
data is on a bundle of disks somewhere in the neighbourhood. As
mentionend: the data isn't business critical, it's just "nice to have".
It's not worth something like raid1 or so (with twice the costs of a non
raid solution).


I suppose the volume management of btrfs is
sort of confusing at the moment but when btrfs promises you can
remove disks "on the fly" it doesn't mean you can just unplug disks
from a raid0 without telling btrfs to put that data elsewhere first.


No - it's not confusing. It only needs a kind of recipe and much time:

 btrfs device add ...
 btrfs filesystem balance ... (perhaps no necessary)
 btrfs device delete ...
 btrfs filesystem balance ... (perhaps not necessary)

No intellectual challenge.
And completely different to "hot pluggable".


This is no different to any raid0 or spanning disk setup that allows 
growing/shrinking of the array.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Daniel,

Du meintest am 07.05.12:

>> Yes - I know. But btrfs promises that I can add bigger disks and
>> delete smaller disks "on the fly". For something like a video
>> collection which will grow on and on an interesting feature. And
>> such a (big) collection does need a "gradfather-father-son" backup,
>> that's no critical data.
>>
>> With a file system like ext2/3/4 I can work with several directories
>> which are mounted together, but (as said before) one broken disk
>> doesn't disturb the others.

> How can you do that with ext2/3/4? If you mean create several
> different filesystems and mount them separately then that's very
> different from your current situation. What you did in this case is
> comparable to creating a raid0 array out of your disks. I don't see
> how an ext filesystem is going to work any better if one of the disks
> drops out than with a btrfs filesystem.

  mkfs.btrfs  -m raid1 -d raid0

with 3 disks gives me a "cluster" which looks like 1 disk/partition/ 
directory.
If one disk fails nothing is usable.

(Yes - I've read Hugo's explanation of "-d single", I'll try this way)

With ext2/3/4 I mount 2 disks/partitions into the first disk. If one  
disk fails the contents of the 2 other disks is still readable,

> It sounds like what you're thinking of is creating several separate
> ext filesystems and then just mounting them separately.

Yes - that's the old way. It's reliable but "ugly".

> There's nothing inherently special about doing this with ext, you can
> do the same thing with btrfs and it would amount to about the same
> level of protection (potentially more if you consider [meta]data
> checksums important but potentially less if you feel that ext is more
> robust for whatever reason).

No - as just mentionend: there's a big difference when one disk fails.

> If you want to survive losing a single disk without the (absolute)
> fear of the whole filesystem breaking you have to have some sort of
> redundancy either by separating filesystems or using some version of
> raid other than raid0.

No - since some years I use a kind of outsourced backup. A copy of all  
data is on a bundle of disks somewhere in the neighbourhood. As  
mentionend: the data isn't business critical, it's just "nice to have".  
It's not worth something like raid1 or so (with twice the costs of a non  
raid solution).

> I suppose the volume management of btrfs is
> sort of confusing at the moment but when btrfs promises you can
> remove disks "on the fly" it doesn't mean you can just unplug disks
> from a raid0 without telling btrfs to put that data elsewhere first.

No - it's not confusing. It only needs a kind of recipe and much time:

btrfs device add ...
btrfs filesystem balance ... (perhaps no necessary)
btrfs device delete ...
btrfs filesystem balance ... (perhaps not necessary)

No intellectual challenge.
And completely different to "hot pluggable".

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Daniel Lee


On 05/07/2012 10:52 AM, Helmut Hullen wrote:

Hallo, Felix,

Du meintest am 07.05.12:


I'm just going back to ext4 - then one broken disk doesn't disturb
the contents of the other disks.



?! If you use raid0 one broken disk will always disturb the contents
of the other disks, that is what raid0 does, no matter what
filesystem you use.


Yes - I know. But btrfs promises that I can add bigger disks and delete
smaller disks "on the fly". For something like a video collection which
will grow on and on an interesting feature. And such a (big) collection
does need a "gradfather-father-son" backup, that's no critical data.

With a file system like ext2/3/4 I can work with several directories
which are mounted together, but (as said before) one broken disk doesn't
disturb the others.



How can you do that with ext2/3/4? If you mean create several different 
filesystems and mount them separately then that's very different from 
your current situation. What you did in this case is comparable to 
creating a raid0 array out of your disks. I don't see how an ext 
filesystem is going to work any better if one of the disks drops out 
than with a btrfs filesystem. Using -d single isn't going to be of much 
use in this case either because that's like spanning a lvm volume over 
several disks and then putting ext over that, it's pretty 
nondeterministic how much you'll actually save should a large chunk of 
the filesystem suddenly disappear.


It sounds like what you're thinking of is creating several separate ext 
filesystems and then just mounting them separately. There's nothing 
inherently special about doing this with ext, you can can do the same 
thing with btrfs and it would amount to about the same level of 
protection (potentially more if you consider [meta]data checksums 
important but potentially less if you feel that ext is more robust for 
whatever reason).


If you want to survive losing a single disk without the (absolute) fear 
of the whole filesystem breaking you have to have some sort of 
redundancy either by separating filesystems or using some version of 
raid other than raid0. I suppose the volume management of btrfs is sort 
of confusing at the moment but when btrfs promises you can remove disks 
"on the fly" it doesn't mean you can just unplug disks from a raid0 
without telling btrfs to put that data elsewhere first.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS Benchmarking

2012-05-07 Thread Josef Bacik

On Mon, May 07, 2012 at 08:42:45PM +0200, Olivier Doucet wrote:
> > Not what I've been seeing at all, but we've been working a lot in this area
> > recently.  Please retest with btrfs-next.  Thanks,
> 
> Hi,
> 
> I tested again with kernel 3.3.4 ; I wondered if latest btrfs code is
> present in this release or not.
> 
> Results are very similar with 3.3.4 compared to 3.3.0
> 

It's not, you need to clone the btrfs-next tree and test with that.

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: balancing metadata fails with no space left on device

2012-05-07 Thread Martin Steigerwald

Am Sonntag, 6. Mai 2012 schrieb Ilya Dryomov:
> On Sun, May 06, 2012 at 01:19:38PM +0200, Martin Steigerwald wrote:
> > Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald:
> > > Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald:
> > > > Hi!
> > > > 
> > > > merkaba:~> btrfs balance start -m /
> > > > ERROR: error during balancing '/' - No space left on device
> > > > There may be more info in syslog - try dmesg | tail
> > > > merkaba:~#19> dmesg | tail -22
> > > > [   62.918734] CPU0: Package power limit normal
> > > > [  525.229976] btrfs: relocating block group 20422066176 flags 1
> > > > [  526.940452] btrfs: found 3048 extents
> > > > [  528.803778] btrfs: found 3048 extents
> > 
> > […]
> > 
> > > > [  635.906517] btrfs: found 1 extents
> > > > [  636.038096] btrfs: 1 enospc errors during balance
> > > > 
> > > > 
> > > > merkaba:~> btrfs filesystem show
> > > > failed to read /dev/sr0
> > > > Label: 'debian'  uuid: […]
> > > > 
> > > > Total devices 1 FS bytes used 7.89GB
> > > > devid1 size 18.62GB used 17.58GB path /dev/dm-0
> > > > 
> > > > Btrfs Btrfs v0.19
> > > > merkaba:~> btrfs filesystem df /
> > > > Data: total=15.52GB, used=7.31GB
> > > > System, DUP: total=32.00MB, used=4.00KB
> > > > System: total=4.00MB, used=0.00
> > > > Metadata, DUP: total=1.00GB, used=587.83MB
> > > 
> > > I thought data tree might have been to big, so out of curiousity I
> > > tried a full balance. It shrunk the data tree but it failed as
> > > well:
> > > 
> > > merkaba:~> btrfs balance start /
> > > ERROR: error during balancing '/' - No space left on device
> > > There may be more info in syslog - try dmesg | tail
> > > merkaba:~#19> dmesg | tail -63
> > > [   89.306718] postgres (2876): /proc/2876/oom_adj is deprecated,
> > > please use /proc/2876/oom_score_adj instead.
> > > [  159.939728] btrfs: relocating block group 21994930176 flags 34
> > > [  160.010427] btrfs: relocating block group 21860712448 flags 1
> > > [  161.188104] btrfs: found 6 extents
> > > [  161.507388] btrfs: found 6 extents
> > 
> > […]
> > 
> > > [  335.897953] btrfs: relocating block group 1103101952 flags 1
> > > [  347.888295] btrfs: found 28458 extents
> > > [  352.736987] btrfs: found 28458 extents
> > > [  353.099659] btrfs: 1 enospc errors during balance
> > > 
> > > merkaba:~> btrfs filesystem df /
> > > Data: total=10.00GB, used=7.31GB
> > > System, DUP: total=64.00MB, used=4.00KB
> > > System: total=4.00MB, used=0.00
> > > Metadata, DUP: total=1.12GB, used=587.20MB
> > > 
> > > merkaba:~> btrfs filesystem show
> > > failed to read /dev/sr0
> > > Label: 'debian'  uuid: […]
> > > 
> > > Total devices 1 FS bytes used 7.88GB
> > > devid1 size 18.62GB used 12.38GB path /dev/dm-0
> > > 
> > > For the sake of it I tried another time. It failed again:
> > > 
> > > martin@merkaba:~> dmesg | tail -32
> > > [  353.099659] btrfs: 1 enospc errors during balance
> > > [  537.057375] btrfs: relocating block group 32833011712 flags 36
> > 
> > […]
> > 
> > > [  641.479140] btrfs: relocating block group 22062039040 flags 34
> > > [  641.695614] btrfs: relocating block group 22028484608 flags 34
> > > [  641.840179] btrfs: found 1 extents
> > > [  641.965843] btrfs: 1 enospc errors during balance
> > > 
> > > 
> > > merkaba:~#19> btrfs filesystem df /
> > > Data: total=10.00GB, used=7.31GB
> > > System, DUP: total=32.00MB, used=4.00KB
> > > System: total=4.00MB, used=0.00
> > > Metadata, DUP: total=1.12GB, used=586.74MB
> > > merkaba:~> btrfs filesystem show
> > > failed to read /dev/sr0
> > > Label: 'debian'  uuid: […]
> > > 
> > > Total devices 1 FS bytes used 7.88GB
> > > devid1 size 18.62GB used 12.32GB path /dev/dm-0
> > > 
> > > Btrfs Btrfs v0.19
> > > 
> > > 
> > > Well, in order to be gentle to the SSD again I stop my experiments
> > > now ;).
> > 
> > I had subjective impression that the speed of the BTRFS filesystem
> > decreased after all these
> > 
> > Anyway, after reading the a -musage hint by Ilya in thread
> > 
> > Is it possible to reclaim block groups once they ar allocated to data
> > or metadata?
> 
> Currently there is no way to reclaim block groups other than performing
> a balance.  We will add a kernel thread for this in future, but a
> couple of things have to be fixed before that can happen.

Thanks. Yes, I got that. I just referenced the other thread for other 
readers.

> > I tried:
> > 
> > merkaba:~> btrfs filesystem df /
> > Data: total=10.00GB, used=7.34GB
> > System, DUP: total=32.00MB, used=4.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=1.12GB, used=586.39MB
> > 
> > merkaba:~> btrfs balance start -musage=1 /
> > Done, had to relocate 2 out of 13 chunks
> > 
> > merkaba:~> btrfs filesystem df /
> > Data: total=10.00GB, used=7.34GB
> > System, DUP: total=32.00MB, used=4.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=1.00GB, used=586.39MB
> > 
> > So this worked.
> 
> > But I wasn´t able to specify less than a Gig:

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Hugo Mills

On Mon, May 07, 2012 at 08:25:00PM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 07.05.12:
> 
> >> With a file system like ext2/3/4 I can work with several directories
> >> which are mounted together, but (as said before) one broken disk
> >> doesn't disturb the others.
> 
> >mkfs.btrfs -m raid1 -d single should give you that.
> 
> What's the difference to
> 
>  mkfs.btrfs -m raid1 -d raid0

 - RAID-0 stripes each piece of data across all the disks.
 - single puts data on one disk at a time.

   So, on three disks (each disk running horizontally), the FS will
allocate block groups this way for RAID-0:

Disk 1:   | A1 | B1 | C1 |...
Disk 2:   | A2 | B2 | C2 |...
Disk 3:   | A3 | B3 | C3 |...

where each chunk, e.g. A2, is 1G in size. Then data is striped across
all of the An chunks (a single block group of size 3G) in 64k
sub-stripes, until block group A is filled up, and then it'll move on
to another block group.

   For "single" allocation on the same disks, you will instead get:

Disk 1:  | A  | D  | G  |...
Disk 2:  | B  | E  | H  |...
Disk 3:  | C  | F  | I  |...

where, again, each chunk is 1G in size. Data written to the FS will
live in one of the chunks, overflowing to some other chunk when
there's no more space.

   With large files, you've still got a chance that (some of) the data
from the file will be on more than one disk, but it's a much much
better situation than you'd have with RAID-0.

   Of course, you still need RAID-1 metadata, so that when a disk does
go bang, you still have all the filesystem structures you need to read
the remaining data. :)

   In fact, this is probably a good argument for having the option to
put back the old allocator algorithm, which would have ensured that
the first disk would fill up completely first before it touched the
next one...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  

signature.asc
Description: Digital signature

Re: BTRFS Benchmarking

2012-05-07 Thread Olivier Doucet

> Not what I've been seeing at all, but we've been working a lot in this area
> recently.  Please retest with btrfs-next.  Thanks,

Hi,

I tested again with kernel 3.3.4 ; I wondered if latest btrfs code is
present in this release or not.

Results are very similar with 3.3.4 compared to 3.3.0

Olivier
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Hugo,

Du meintest am 07.05.12:

>> With a file system like ext2/3/4 I can work with several directories
>> which are mounted together, but (as said before) one broken disk
>> doesn't disturb the others.

>mkfs.btrfs -m raid1 -d single should give you that.

What's the difference to

 mkfs.btrfs -m raid1 -d raid0

(what I have used the last time)?

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Hugo Mills

On Mon, May 07, 2012 at 07:52:00PM +0200, Helmut Hullen wrote:
> Hallo, Felix,
> 
> Du meintest am 07.05.12:
> 
> >> I'm just going back to ext4 - then one broken disk doesn't disturb
> >> the contents of the other disks.
> 
> > ?! If you use raid0 one broken disk will always disturb the contents
> > of the other disks, that is what raid0 does, no matter what
> > filesystem you use.
> 
> Yes - I know. But btrfs promises that I can add bigger disks and delete  
> smaller disks "on the fly". For something like a video collection which  
> will grow on and on an interesting feature. And such a (big) collection  
> does need a "gradfather-father-son" backup, that's no critical data.
> 
> With a file system like ext2/3/4 I can work with several directories  
> which are mounted together, but (as said before) one broken disk doesn't  
> disturb the others.

   mkfs.btrfs -m raid1 -d single should give you that.

   There may be a kernel patch you need to stop it doing the silly
single → raid0 "upgrade" automatically, as well.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   ---   __(_'>  Squeak!   ---   


signature.asc
Description: Digital signature

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Felix,

Du meintest am 07.05.12:

>> I'm just going back to ext4 - then one broken disk doesn't disturb
>> the contents of the other disks.

> ?! If you use raid0 one broken disk will always disturb the contents
> of the other disks, that is what raid0 does, no matter what
> filesystem you use.

Yes - I know. But btrfs promises that I can add bigger disks and delete  
smaller disks "on the fly". For something like a video collection which  
will grow on and on an interesting feature. And such a (big) collection  
does need a "gradfather-father-son" backup, that's no critical data.

With a file system like ext2/3/4 I can work with several directories  
which are mounted together, but (as said before) one broken disk doesn't  
disturb the others.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Felix Blanke


On 5/7/12 6:36 PM, Helmut Hullen wrote:

Hallo, Hugo,

Du meintest am 07.05.12:


It's dead - R.I.P.



Sorry to be the bearer of bad news. I don't think we can point the
finger at btrfs here.


a) you know what to do with the bearer?
b) I like such errors - completely independent, but simultaneously.


It looks like you've lost most of your data -- losing a RAID-0
stripe across the whole FS isn't likely to have left much of it
intact.


I'm just going back to ext4 - then one broken disk doesn't disturb the
contents of the other disks.


?! If you use raid0 one broken disk will allways disturb the contents of 
the other disks, that is what raid0 does, no matter what filesystem you 
use. You could easly use btrfs with the "normal" or raid1 mode. Btrfs is 
still in development and often times you can blaim it for a corrupt 
filesystem, but in this case it's simply "raid0 -> 1 disc dies -> data 
are gone".




The data is not very valuable - DVB video mpegs. Most of the files are
repeated on and on.


If you've got the space (or the money to get it), mkfs.btrfs
-m raid1 -d raid1 would have saved you here.


About 400 ... 500 Euro for backing up videos? Not necessary.

(No: I don't count the minutes and hours working with the system ...)





[ Incidentally, thinking about it, the failure coming at a kernel
upgrade could well be down to the additional stress of the
power-down/reboot finally pushing a bad drive over the edge. ]


Just now it's again an "open system"; I had to wobble the cables too ...

Maybe the SATA-PCI-controller needs to be replaced too ...

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Hugo,

Du meintest am 07.05.12:

>> It's dead - R.I.P.

>Sorry to be the bearer of bad news. I don't think we can point the
> finger at btrfs here.

a) you know what to do with the bearer?
b) I like such errors - completely independent, but simultaneously.

>It looks like you've lost most of your data -- losing a RAID-0
> stripe across the whole FS isn't likely to have left much of it
> intact.

I'm just going back to ext4 - then one broken disk doesn't disturb the  
contents of the other disks.

The data is not very valuable - DVB video mpegs. Most of the files are  
repeated on and on.

> If you've got the space (or the money to get it), mkfs.btrfs
> -m raid1 -d raid1 would have saved you here.

About 400 ... 500 Euro for backing up videos? Not necessary.

(No: I don't count the minutes and hours working with the system ...)

> [ Incidentally, thinking about it, the failure coming at a kernel
>upgrade could well be down to the additional stress of the
>power-down/reboot finally pushing a bad drive over the edge. ]

Just now it's again an "open system"; I had to wobble the cables too ...

Maybe the SATA-PCI-controller needs to be replaced too ...

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANN] btrfs.wiki.kernel.org with up-to-date content again

2012-05-07 Thread David Sterba

Hi,

the time of temporary wiki hosted at btrfs.ipv5.de is over, the content has
been migrated back to official site at

 http://btrfs.wiki.kernel.org

(ipv5.de wiki is set to redirect there).

cheers,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs and 1 billion small files

2012-05-07 Thread David Sterba

On Mon, May 07, 2012 at 11:28:13AM +0200, Alessio Focardi wrote:
> I tough about compression, but is not clear to me the compression is
> handled at the file level or at the block level.

I don't recommend using compression for your expected file size range.
Unless the files are highly compressible (50-75%, which I don't
expect), the extra cpu processing of compression will make things only
worse.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Hugo Mills

On Mon, May 07, 2012 at 03:34:00PM +0200, Helmut Hullen wrote:
> Hallo, Hugo,
> 
> Du meintest am 07.05.12:
> 
> >> === boot messages, kernel related ==
> >>
> >> [boot with kernel 3.3.4]
> >> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0
> >> SErr 0x1 action 0xe frozen
> >> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
> >> May  7 06:55:26 Arktur kernel: ata5: hard resetting link
> 
> [...]
> 
> >This is a hardware error. You have a device that's either dead or
> > dying. (Given the number of errors, probably already dead).
> 
> It's dead - R.I.P.
> 
> I've tried it with a SATA-USB-adapter - that adapter produces dmesg  
> lines when connecting or disconnecting.
> 
> And this special drive doesn't tell anything now. Shit.

   Sorry to be the bearer of bad news. I don't think we can point the
finger at btrfs here.

   It looks like you've lost most of your data -- losing a RAID-0
stripe across the whole FS isn't likely to have left much of it
intact. If you've got the space (or the money to get it), mkfs.btrfs
-m raid1 -d raid1 would have saved you here.

[ Incidentally, thinking about it, the failure coming at a kernel
   upgrade could well be down to the additional stress of the
   power-down/reboot finally pushing a bad drive over the edge. ]

   In sympathy,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- But somewhere along the line, it seems / That pimp became ---
   cool,  and punk mainstream.   


signature.asc
Description: Digital signature

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Hugo,

Du meintest am 07.05.12:

>> === boot messages, kernel related ==
>>
>> [boot with kernel 3.3.4]
>> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0
>> SErr 0x1 action 0xe frozen
>> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
>> May  7 06:55:26 Arktur kernel: ata5: hard resetting link

[...]

>This is a hardware error. You have a device that's either dead or
> dying. (Given the number of errors, probably already dead).

It's dead - R.I.P.

I've tried it with a SATA-USB-adapter - that adapter produces dmesg  
lines when connecting or disconnecting.

And this special drive doesn't tell anything now. Shit.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Liu Bo

On 05/07/2012 06:46 PM, Helmut Hullen wrote:

> btrfs: error reading free space cache
> BUG: unable to handle kernel NULL pointer dereference at 0001
> IP: [] io_ctl_drop_pages+0x26/0x50
> *pdpt = 29712001 *pde = 
> Oops: 0002 [#1]



Could you please try this and show us the results?

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 202008e..ae514ad 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -296,7 +296,9 @@ static void io_ctl_free(struct io_ctl *io_ctl)
 static void io_ctl_unmap_page(struct io_ctl *io_ctl)
 {
if (io_ctl->cur) {
-   kunmap(io_ctl->page);
+   WARN_ON(!io_ctl->page);
+   if (io_ctl->page)
+   kunmap(io_ctl->page);
io_ctl->cur = NULL;
io_ctl->orig = NULL;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2] btrfs: fix message printing

2012-05-07 Thread Daniel J Blueman

Fix various messages to include newline and module prefix.

Signed-off-by: Daniel J Blueman 
---
 fs/btrfs/super.c   |8 
 fs/btrfs/volumes.c |6 +++---
 fs/btrfs/zlib.c|8 
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c5f8fca..c0b8727 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -216,7 +216,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno)
 {
-   WARN_ONCE(1, KERN_DEBUG "btrfs: Transaction aborted");
+   WARN_ONCE(1, KERN_DEBUG "btrfs: Transaction aborted\n");
trans->aborted = errno;
/* Nothing used. The other threads that have joined this
 * transaction may be able to continue. */
@@ -511,11 +511,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_set_opt(info->mount_opt, ENOSPC_DEBUG);
break;
case Opt_defrag:
-   printk(KERN_INFO "btrfs: enabling auto defrag");
+   printk(KERN_INFO "btrfs: enabling auto defrag\n");
btrfs_set_opt(info->mount_opt, AUTO_DEFRAG);
break;
case Opt_recovery:
-   printk(KERN_INFO "btrfs: enabling auto recovery");
+   printk(KERN_INFO "btrfs: enabling auto recovery\n");
btrfs_set_opt(info->mount_opt, RECOVERY);
break;
case Opt_skip_balance:
@@ -1501,7 +1501,7 @@ static int btrfs_interface_init(void)
 static void btrfs_interface_exit(void)
 {
if (misc_deregister(&btrfs_misc) < 0)
-   printk(KERN_INFO "misc_deregister failed for control device");
+   printk(KERN_INFO "btrfs: misc_deregister failed for control 
device\n");
 }
 
 static int __init init_btrfs_fs(void)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1411b99..79b603d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -619,7 +619,7 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
 
bdev = blkdev_get_by_path(device->name, flags, holder);
if (IS_ERR(bdev)) {
-   printk(KERN_INFO "open %s failed\n", device->name);
+   printk(KERN_INFO "btrfs: open %s failed\n", 
device->name);
goto error;
}
filemap_write_and_wait(bdev->bd_inode->i_mapping);
@@ -3719,7 +3719,7 @@ static int __btrfs_map_block(struct btrfs_mapping_tree 
*map_tree, int rw,
read_unlock(&em_tree->lock);
 
if (!em) {
-   printk(KERN_CRIT "unable to find logical %llu len %llu\n",
+   printk(KERN_CRIT "btrfs: unable to find logical %llu len 
%llu\n",
   (unsigned long long)logical,
   (unsigned long long)*length);
BUG();
@@ -4129,7 +4129,7 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct 
bio *bio,
 
total_devs = bbio->num_stripes;
if (map_length < length) {
-   printk(KERN_CRIT "mapping failed logical %llu bio len %llu "
+   printk(KERN_CRIT "btrfs: mapping failed logical %llu bio len 
%llu "
   "len %llu\n", (unsigned long long)logical,
   (unsigned long long)length,
   (unsigned long long)map_length);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index 92c2065..9acb846 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -97,7 +97,7 @@ static int zlib_compress_pages(struct list_head *ws,
*total_in = 0;
 
if (Z_OK != zlib_deflateInit(&workspace->def_strm, 3)) {
-   printk(KERN_WARNING "deflateInit failed\n");
+   printk(KERN_WARNING "btrfs: deflateInit failed\n");
ret = -1;
goto out;
}
@@ -125,7 +125,7 @@ static int zlib_compress_pages(struct list_head *ws,
while (workspace->def_strm.total_in < len) {
ret = zlib_deflate(&workspace->def_strm, Z_SYNC_FLUSH);
if (ret != Z_OK) {
-   printk(KERN_DEBUG "btrfs deflate in loop returned %d\n",
+   printk(KERN_DEBUG "btrfs: deflate in loop returned 
%d\n",
   ret);
zlib_deflateEnd(&workspace->def_strm);
ret = -1;
@@ -252,7 +252,7 @@ static int zlib_decompress_biovec(struct list_head *ws, 
struct page **pages_in,
}
 
if (Z_OK != zlib_inflateInit2(&workspace->inf_strm, wbits)) {
-   printk(KERN_WARNING "inflateInit failed\n");
+   printk(KERN_WARNING "btrfs: inflateInit failed\n");
return -1;
}
while (workspace->inf_strm.total_in < srcl

Re: btrfs and 1 billion small files

2012-05-07 Thread Johannes Hirte

Am Mon, 7 May 2012 12:39:28 +0100
schrieb Hugo Mills :

> On Mon, May 07, 2012 at 01:15:26PM +0200, Alessio Focardi wrote:
...
> > That's a very clever suggestion, I'm preparing a test server right
> > now: going to use the -m single option. Any other suggestion
> > regarding format options?
> > 
> > pagesize? leafsize?
> 
>I'm not sure about these -- some values of them definitely break
> things. I think they are required to be the same, and that you could
> take them up to 64k with no major problems, but do check that first
> with someone who actually knows.

First, if you have this filesystem as rootfs, a separate /boot
partition is needed. Grub is unable to boot from btrfs with different
node-/leafsize. Second a very recent kernel is needed (linux-3.4-rc1 at
least).

regards,
  Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Fajar,

Du meintest am 07.05.12:

>> For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without
>> problems.
>>
>> Yesterday I compiled kernel 3.3.4, and this morning I started the
>> machine with this kernel. There may be some ugly problems.


>> Data, RAID0: total=5.29TB, used=4.29TB

> Raid0? Yaiks!

Why not?
You know the price of 1 3-TByte disk?
The data isn't irreproducible, in this case.

[...]

>> May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline
>> device
>> May  7 07:15:19 Arktur kernel: end_request: I/O error, dev
>> sdf, sector 0
>> May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting
>> I/O to offline device
>> May  7 07:15:19 Arktur kernel: lost page write
>> due to I/O error on sdf1


> That looks like a bad disk to me, and it shouldn't be related to the
> kernel version you use.

But why does it happen just when I change the kernel?
(Yes - I know: Murphy works reliable ...)

> Your best chance might be:
> - unmount the fs
> - get another disk to replace /dev/sdf, copy the content over with
> dd_rescue. Ata resets can be a PITA, so you might be better of by
> moving the failed disk to a usb external adapter, and du some
> creative combination of plug-unplug and selectively skip bad sectors
> manually (by passing "-s" to dd_rescue).

Hmmm - I'll take a try ...


> - reboot, with the bad disk unplugged
> - (optional) run "btrfs filesystem scrub" (you might need to build
> btrfs-progs manually from git source).

Last time I'd tried this command (some months ago) it had produced a  
completely unusable system of disks/partitions ...


> or simply read the entire fs
> (e.g. using tar to /dev/null, or whatever). It should check the
> checksum of all files and print out which files are damaged (either
> in stdout or syslog).

And that's the other try - I had to use it for another disk (also WD,  
but only 2 TByte - I could watch how it died ...).

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo, Hugo,

Du meintest am 07.05.12:

>> Yesterday I compiled kernel 3.3.4, and this morning I started the
>> machine with this kernel. There may be some ugly problems.
>>
>> Copying something into the btrfs "directory" worked well for some
>> files, and then I got error messages (I've not copied them,
>> something with "IO error" under Samba).

[...]

>> Data, RAID0: total=5.29TB, used=4.29TB
>> System, RAID1: total=8.00MB, used=352.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, RAID1: total=149.00GB, used=5.00GB

>>
>> Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
>>  Total devices 3 FS bytes used 4.29TB
>>  devid3 size 2.73TB used 1.98TB path /dev/sdi1
>>  devid2 size 2.73TB used 1.94TB path /dev/sdf1
>>  devid1 size 1.82TB used 1.63TB path /dev/sdc1
>>
>> Btrfs Btrfs v0.19
>>
>> === boot messages, kernel related ==
>>
>> [boot with kernel 3.3.4]
>> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0
>> SErr 0x1 action 0xe frozen
>> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
>> May  7 06:55:26 Arktur kernel: ata5: hard resetting link

>This is a hardware error. You have a device that's either dead or
> dying. (Given the number of errors, probably already dead).

It seems to be undecided which status it has ...

>> Can I repair the system? Or have I to copy it to a set of other
>> disks?

>If you have RAID-1 or RAID-10 on both data and netadata, then you
> _should_ in theory just be able to remove the dead disk (physically),
> then btrfs dev add a new one, btrfs dev del missing, and balance.


I haven't - I have a kind of copy/backup in the neighbourhood.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs and 1 billion small files

2012-05-07 Thread Hugo Mills

On Mon, May 07, 2012 at 01:15:26PM +0200, Alessio Focardi wrote:
> > This is a lot more compact (as you can have several files' data in a
> > single block), but by default will write two copies of each file,
> > even
> > on a single disk.
> 
> Great, no (or less) space wasted, then!

   Less space wasted -- you will still have empty bytes left at the
end(*) of most metadata blocks, but you will definitely be packing in
storage far more densely than otherwise.

(*) Actually, the middle, but let's ignore that here.

> I will have a filesystem that's composed mostly of metadata blocks,
> if I understand correctly. Will this create any problem?

   Not that I'm aware of -- but you probably need to run proper tests
of your likely behaviour just to see what it'll be like.

> >So, if you want to use some form of redundancy (e.g. RAID-1), then
> > that's great, and you need to do nothing unusual. However, if you
> > want
> > to maximise space usage at the expense of robustness in a device
> > failure, then you need to ensure that you only keep one copy of your
> > data. This will mean that you should format the filesystem with the
> > -m
> > single option.
> 
> 
> That's a very clever suggestion, I'm preparing a test server right now: going 
> to use the -m single option. Any other suggestion regarding format options?
> 
> pagesize? leafsize?

   I'm not sure about these -- some values of them definitely break
things. I think they are required to be the same, and that you could
take them up to 64k with no major problems, but do check that first
with someone who actually knows.

   Having a larger pagesize/leafsize will reduce the depth of the
trees, and will allow you to store more items in each tree block,
which gives you less wastage again. I don't know what the drawbacks
are, though.

> > > XFS has a minimum block size of 512, but BTRFS is more modern and,
> > > given the fact that is able to handle indexes on his own, it could
> > > help us speed up file operations (could it?)
> > 
> >Not sure what you mean by "handle indexes on its own". XFS will
> > have its own set of indexes and file metadata -- it wouldn't be much
> > of a filesystem if it didn't.

> Yes, you are perfectly right; I tough that recreating a tree like
> /d/u/m/m/y/ to store "dummy" would have been redundant since the
> whole filesystem is based on trees - I don't have to "ls"
> directories, we are using php to write and read files, I will have
> to find a "compromise" between levels of directories and number of
> files in each one of them.

   The FS tree (which is the bit that stores the directory hierarchy
and file metadata) is (broadly) a tree-structured index of inodes,
ordered by inode number. Don't confuse the inode index structure with
the directory structure -- they're totally different arrangements of
the data. You may want to try looking at [1], which attempts to
describe how the FS tree holds file data.

> May I ask you about compression? Would you use it in the scenario I
> described?

   I'm not sure if compression will apply to inline file data. Again,
someone else may be able to answer; and you should probably test it
with your own use-cases anyway.

   Hugo.

[1] http://btrfs.ipv5.de/index.php?title=Trees

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Welcome to Rivendell,  Mr Anderson... ---  

signature.asc
Description: Digital signature

Re: btrfs and 1 billion small files

2012-05-07 Thread Alessio Focardi

> This is a lot more compact (as you can have several files' data in a
> single block), but by default will write two copies of each file,
> even
> on a single disk.

Great, no (or less) space wasted, then! I will have a filesystem that's 
composed mostly of metadata blocks, if I understand correctly. Will this create 
any problem? 

>So, if you want to use some form of redundancy (e.g. RAID-1), then
> that's great, and you need to do nothing unusual. However, if you
> want
> to maximise space usage at the expense of robustness in a device
> failure, then you need to ensure that you only keep one copy of your
> data. This will mean that you should format the filesystem with the
> -m
> single option.


That's a very clever suggestion, I'm preparing a test server right now: going 
to use the -m single option. Any other suggestion regarding format options?

pagesize? leafsize?

 
> > XFS has a minimum block size of 512, but BTRFS is more modern and,
> > given the fact that is able to handle indexes on his own, it could
> > help us speed up file operations (could it?)
> 
>Not sure what you mean by "handle indexes on its own". XFS will
> have its own set of indexes and file metadata -- it wouldn't be much
> of a filesystem if it didn't.

Yes, you are perfectly right; I tough that recreating a tree like /d/u/m/m/y/ 
to store "dummy" would have been redundant since the whole filesystem is based 
on trees - I don't have to "ls" directories, we are using php to write and read 
files, I will have to find a "compromise" between levels of directories and 
number of files in each one of them.

May I ask you about compression? Would you use it in the scenario I described?

Thank you for your help!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs and 1 billion small files

2012-05-07 Thread viv...@gmail.com


Il 07/05/2012 11:28, Alessio Focardi ha scritto:

Hi,

I need some help in designing a storage structure for 1 billion of small files 
(<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in 
mind that I never worked with btrfs - I just read some documentation and browsed 
this mailing list - so forgive me if my questions are silly! :X

Are you *really* sure a database is *not* what are you looking for?


On with the main questions, then:

- What's the advice to maximize disk capacity using such small files, even 
sacrificing some speed?

- Would you store all the files "flat", or would you build a hierarchical tree 
of directories to speed up file lookups? (basically duplicating the filesystem Btree 
indexes)


I tried to answer those questions, and here is what I found:

it seems that the smallest block size is 4K. So, in this scenario, if every 
file uses a full block I will end up with lots of space wasted. Wouldn't change 
much if block was 2K, anyhow.

I tough about compression, but is not clear to me the compression is handled at 
the file level or at the block level.

Also I read that there is a mode that uses blocks for shared storage of 
metadata and data, designed for small filesystems. Haven't found any other info 
about it.


Still is not yet clear to me if btrfs can fit my situation, would you recommend 
it over XFS?

XFS has a minimum block size of 512, but BTRFS is more modern and, given the 
fact that is able to handle indexes on his own, it could help us speed up file 
operations (could it?)

Thank you for any advice!

Alessio Focardi
--


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Hugo Mills

On Mon, May 07, 2012 at 12:46:00PM +0200, Helmut Hullen wrote:
> Hallo,
> 
> "never change a running system" ...
> 
> For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without  
> problems.
> 
> Yesterday I compiled kernel 3.3.4, and this morning I started the  
> machine with this kernel. There may be some ugly problems.
> 
> Copying something into the btrfs "directory" worked well for some files,  
> and then I got error messages (I've not copied them, something with "IO  
> error" under Samba).
> 
> Rebooting the machine with kernel 3.2.9 worked, copying 1 file worked,  
> but copying more than this file didn't work. And I can't delete this  
> file.
> 
> That doesn't please me - copying more than 4 TBytes wastes time and  
> money.
> 
> === configuration =
> 
> /dev/sdc1 on /srv/MM type btrfs (rw,noatime)
> 
> /dev/sdc: SAMSUNG HD204UI: 25 °C
> /dev/sdf: WDC WD30EZRX-00MMMB0: 30 °C
> /dev/sdi: WDC WD30EZRX-00MMMB0: 29 °C
> 
> Data, RAID0: total=5.29TB, used=4.29TB
> System, RAID1: total=8.00MB, used=352.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=149.00GB, used=5.00GB
> 
> Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
>   Total devices 3 FS bytes used 4.29TB
>   devid3 size 2.73TB used 1.98TB path /dev/sdi1
>   devid2 size 2.73TB used 1.94TB path /dev/sdf1
>   devid1 size 1.82TB used 1.63TB path /dev/sdc1
> 
> Btrfs Btrfs v0.19
> 
> === boot messages, kernel related ==
> 
> [boot with kernel 3.3.4]
> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 
> 0x1 action 0xe frozen
> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
> May  7 06:55:26 Arktur kernel: ata5: hard resetting link
> May  7 06:55:31 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:31 Arktur kernel: ata5: reset failed (errno=-19), retrying in 6 
> secs
> May  7 06:55:36 Arktur kernel: ata5: hard resetting link
> May  7 06:55:38 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:38 Arktur kernel: ata5: reset failed (errno=-19), retrying in 9 
> secs
> May  7 06:55:46 Arktur kernel: ata5: hard resetting link
> May  7 06:55:47 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:47 Arktur kernel: ata5: reset failed (errno=-19), retrying in 34 
> secs
> May  7 06:56:21 Arktur kernel: ata5: hard resetting link
> May  7 06:56:22 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 
> SControl 310)
> May  7 06:56:22 Arktur kernel: ata5.00: configured for UDMA/100
> May  7 06:56:22 Arktur kernel: ata5: EH complete
> May  7 07:12:07 Arktur kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 
> 0x1 action 0xe frozen
> May  7 07:12:07 Arktur kernel: ata5: SError: { PHYRdyChg }
> May  7 07:12:07 Arktur kernel: ata5.00: failed command: WRITE DMA EXT
> May  7 07:12:07 Arktur kernel: ata5.00: cmd 
> 35/00:00:00:62:50/00:04:5e:00:00/e0 tag 0 dma 524288 out
> May  7 07:12:07 Arktur kernel:  res 
> d8/d8:d8:d8:d8:d8/d8:d8:d8:d8:d8/d8 Emask 0x12 (ATA bus error)
> May  7 07:12:07 Arktur kernel: ata5.00: status: { Busy }
> May  7 07:12:07 Arktur kernel: ata5.00: error: { ICRC UNC IDNF }

   This is a hardware error. You have a device that's either dead or
dying. (Given the number of errors, probably already dead).

> May  7 07:12:07 Arktur kernel: ata5: hard resetting link
> ==
> 
> The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
> Only 1 of the 3 disks seems to be damaged.
> 
> ==
> 
> Ca I repair the system? Or have I to copy it to a set of other disks?

   If you have RAID-1 or RAID-10 on both data and netadata, then you
_should_ in theory just be able to remove the dead disk (physically),
then btrfs dev add a new one, btrfs dev del missing, and balance.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- argc, argv, argh! ---


signature.asc
Description: Digital signature

Re: kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Fajar A. Nugraha

On Mon, May 7, 2012 at 5:46 PM, Helmut Hullen  wrote:

> For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without
> problems.
>
> Yesterday I compiled kernel 3.3.4, and this morning I started the
> machine with this kernel. There may be some ugly problems.


> Data, RAID0: total=5.29TB, used=4.29TB

Raid0? Yaiks!

> System, RAID1: total=8.00MB, used=352.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=149.00GB, used=5.00GB
>
> Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
>        Total devices 3 FS bytes used 4.29TB
>        devid    3 size 2.73TB used 1.98TB path /dev/sdi1
>        devid    2 size 2.73TB used 1.94TB path /dev/sdf1
>        devid    1 size 1.82TB used 1.63TB path /dev/sdc1
>


> May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 
> 0x1 action 0xe frozen
> May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
> May  7 06:55:26 Arktur kernel: ata5: hard resetting link
> May  7 06:55:31 Arktur kernel: ata5: COMRESET failed (errno=-19)
> May  7 06:55:31 Arktur kernel: ata5: reset failed (errno=-19), retrying in 6 
> secs


> May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline device
> May  7 07:15:19 Arktur kernel: end_request: I/O error, dev sdf, sector 0
> May  7 07:15:19 Arktur kernel: sd 5:0:0:0: rejecting I/O to offline device
> May  7 07:15:19 Arktur kernel: lost page write due to I/O error on sdf1


That looks like a bad disk to me, and it shouldn't be related to ther
kernel version you use.

Your best chance might be:
- unmount the fs
- get another disk to replace /dev/sdf, copy the content over with
dd_rescue. Ata resets can be a PITA, so you might be better of by
moving the failed disk to a usb external adapter, and du some creative
combination of plug-unplug and selectively skip bad sectors manually
(by passing "-s" to dd_rescue).
- reboot, with the bad disk unplugged
- (optional) run "btrfs filesystem scrub" (you might need to build
btrfs-progs manually from git source). or simply read the entire fs
(e.g. using tar to /dev/null, or whatever). It should check the
checksum of all files and print out which files are damaged (either in
stdout or syslog).

I don't think there's anything you can do to recover the damaged files
(other than restore from backup), but at least you know which files
are NOT damaged.

-- 
Fajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs and 1 billion small files

2012-05-07 Thread Hugo Mills

On Mon, May 07, 2012 at 11:28:13AM +0200, Alessio Focardi wrote:
> Hi,
> 
> I need some help in designing a storage structure for 1 billion of small 
> files (<512 Bytes), and I was wondering how btrfs will fit in this scenario. 
> Keep in mind that I never worked with btrfs - I just read some documentation 
> and browsed this mailing list - so forgive me if my questions are silly! :X
> 
> 
> On with the main questions, then:

> - What's the advice to maximize disk capacity using such small
>   files, even sacrificing some speed?

   See my comments below about inlining files.

> - Would you store all the files "flat", or would you build a
>   hierarchical tree of directories to speed up file lookups?
>   (basically duplicating the filesystem Btree indexes)

   Hierarchically, for the reasons Hubert and Boyd gave. (And it's not
duplicating the btree indexes -- the tree of the btree does not
reflect the tree of the directory hierarchy).

> I tried to answer those questions, and here is what I found:
>
> it seems that the smallest block size is 4K. So, in this scenario,
> if every file uses a full block I will end up with lots of space
> wasted. Wouldn't change much if block was 2K, anyhow.

   With small files, they will typically be inlined into the metadata.
This is a lot more compact (as you can have several files' data in a
single block), but by default will write two copies of each file, even
on a single disk.

   So, if you want to use some form of redundancy (e.g. RAID-1), then
that's great, and you need to do nothing unusual. However, if you want
to maximise space usage at the expense of robustness in a device
failure, then you need to ensure that you only keep one copy of your
data. This will mean that you should format the filesystem with the -m
single option.

> I tough about compression, but is not clear to me the compression is
> handled at the file level or at the block level.

> Also I read that there is a mode that uses blocks for shared storage
> of metadata and data, designed for small filesystems. Haven't found
> any other info about it.

   Don't use that unless your filesystem is <16GB or so in size. It
won't help here (i.e. file data stored in data chunks will still be
allocated on a block-by-block basis).

> Still is not yet clear to me if btrfs can fit my situation, would
> you recommend it over XFS?

   The relatively small metadata overhead (e.g. compared to ext4) and
inline capability of btrfs would seem to be a good match for your
use-case.

> XFS has a minimum block size of 512, but BTRFS is more modern and,
> given the fact that is able to handle indexes on his own, it could
> help us speed up file operations (could it?)

   Not sure what you mean by "handle indexes on its own". XFS will
have its own set of indexes and file metadata -- it wouldn't be much
of a filesystem if it didn't.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- argc, argv, argh! ---

signature.asc
Description: Digital signature

kernel 3.3.4 damages filesystem (?)

2012-05-07 Thread Helmut Hullen

Hallo,

"never change a running system" ...

For some months I run btrfs unter kernel 3.2.5 and 3.2.9, without  
problems.

Yesterday I compiled kernel 3.3.4, and this morning I started the  
machine with this kernel. There may be some ugly problems.

Copying something into the btrfs "directory" worked well for some files,  
and then I got error messages (I've not copied them, something with "IO  
error" under Samba).

Rebooting the machine with kernel 3.2.9 worked, copying 1 file worked,  
but copying more than this file didn't work. And I can't delete this  
file.

That doesn't please me - copying more than 4 TBytes wastes time and  
money.

=== configuration =

/dev/sdc1 on /srv/MM type btrfs (rw,noatime)

/dev/sdc: SAMSUNG HD204UI: 25 °C
/dev/sdf: WDC WD30EZRX-00MMMB0: 30 °C
/dev/sdi: WDC WD30EZRX-00MMMB0: 29 °C

Data, RAID0: total=5.29TB, used=4.29TB
System, RAID1: total=8.00MB, used=352.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=149.00GB, used=5.00GB

Label: 'MMedia'  uuid: 9adfdc84-0fbe-431b-bcb1-cabb6a915e91
Total devices 3 FS bytes used 4.29TB
devid3 size 2.73TB used 1.98TB path /dev/sdi1
devid2 size 2.73TB used 1.94TB path /dev/sdf1
devid1 size 1.82TB used 1.63TB path /dev/sdc1

Btrfs Btrfs v0.19

=== boot messages, kernel related ==

[boot with kernel 3.3.4]
May  7 06:55:26 Arktur kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x1 
action 0xe frozen
May  7 06:55:26 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 06:55:26 Arktur kernel: ata5: hard resetting link
May  7 06:55:31 Arktur kernel: ata5: COMRESET failed (errno=-19)
May  7 06:55:31 Arktur kernel: ata5: reset failed (errno=-19), retrying in 6 
secs
May  7 06:55:36 Arktur kernel: ata5: hard resetting link
May  7 06:55:38 Arktur kernel: ata5: COMRESET failed (errno=-19)
May  7 06:55:38 Arktur kernel: ata5: reset failed (errno=-19), retrying in 9 
secs
May  7 06:55:46 Arktur kernel: ata5: hard resetting link
May  7 06:55:47 Arktur kernel: ata5: COMRESET failed (errno=-19)
May  7 06:55:47 Arktur kernel: ata5: reset failed (errno=-19), retrying in 34 
secs
May  7 06:56:21 Arktur kernel: ata5: hard resetting link
May  7 06:56:22 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
May  7 06:56:22 Arktur kernel: ata5.00: configured for UDMA/100
May  7 06:56:22 Arktur kernel: ata5: EH complete
May  7 07:12:07 Arktur kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 
0x1 action 0xe frozen
May  7 07:12:07 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:12:07 Arktur kernel: ata5.00: failed command: WRITE DMA EXT
May  7 07:12:07 Arktur kernel: ata5.00: cmd 35/00:00:00:62:50/00:04:5e:00:00/e0 
tag 0 dma 524288 out
May  7 07:12:07 Arktur kernel:  res d8/d8:d8:d8:d8:d8/d8:d8:d8:d8:d8/d8 
Emask 0x12 (ATA bus error)
May  7 07:12:07 Arktur kernel: ata5.00: status: { Busy }
May  7 07:12:07 Arktur kernel: ata5.00: error: { ICRC UNC IDNF }
May  7 07:12:07 Arktur kernel: ata5: hard resetting link
May  7 07:12:13 Arktur kernel: ata5: link is slow to respond, please be patient 
(ready=-19)
May  7 07:12:15 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
May  7 07:12:15 Arktur kernel: ata5.00: failed to IDENTIFY (I/O error, 
err_mask=0x100)
May  7 07:12:15 Arktur kernel: ata5.00: revalidation failed (errno=-5)
May  7 07:12:20 Arktur kernel: ata5: hard resetting link
May  7 07:12:20 Arktur kernel: ata5: COMRESET failed (errno=-19)
May  7 07:12:20 Arktur kernel: ata5: reset failed (errno=-19), retrying in 10 
secs
May  7 07:12:30 Arktur kernel: ata5: hard resetting link
May  7 07:12:30 Arktur kernel: ata5: COMRESET failed (errno=-19)
May  7 07:12:30 Arktur kernel: ata5: reset failed (errno=-19), retrying in 10 
secs
May  7 07:12:40 Arktur kernel: ata5: hard resetting link
May  7 07:12:42 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
May  7 07:12:43 Arktur kernel: ata5.00: configured for UDMA/100
May  7 07:12:43 Arktur kernel: ata5: EH complete
May  7 07:12:43 Arktur kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 
0x1 action 0xe frozen
May  7 07:12:43 Arktur kernel: ata5: SError: { PHYRdyChg }
May  7 07:12:43 Arktur kernel: ata5.00: failed command: WRITE DMA EXT
May  7 07:12:43 Arktur kernel: ata5.00: cmd 35/00:00:00:72:50/00:04:5e:00:00/e0 
tag 0 dma 524288 out
May  7 07:12:43 Arktur kernel:  res d0/d0:d0:d0:d0:d0/d0:d0:d0:d0:d0/d0 
Emask 0x12 (ATA bus error)
May  7 07:12:43 Arktur kernel: ata5.00: status: { Busy }
May  7 07:12:43 Arktur kernel: ata5.00: error: { ICRC UNC IDNF }
May  7 07:12:43 Arktur kernel: ata5: hard resetting link
May  7 07:12:49 Arktur kernel: ata5: link is slow to respond, please be patient 
(ready=-19)
May  7 07:12:50 Arktur kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
May  7 07:12:51 Arktur kernel: ata5.00: configured for UDMA/100
May  7 07:12:51 Arktur kernel: ata5: EH complete
May  7 07:12:51 Arktur k

Re: btrfs and 1 billion small files

2012-05-07 Thread Boyd Waters

Use a directory hierarchy. Even if the filesystem handles a flat structure 
effectively, userspace programs will choke on tens of thousands of files in a 
single directory. For example 'ls' will try to lexically sort its output (very 
slowly) unless given the command-line option not to do so.

Sent from my iPad

On May 7, 2012, at 3:58 AM, Hubert Kario  wrote:

> I'm not sure about limits to size of directory, but I'd guess that going over 
> few tens of thousands of files in single flat directory will have speed 
> penalties
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs and 1 billion small files

2012-05-07 Thread Hubert Kario

On Monday 07 of May 2012 11:28:13 Alessio Focardi wrote:
> Hi,
> 
> I need some help in designing a storage structure for 1 billion of small
> files (<512 Bytes), and I was wondering how btrfs will fit in this
> scenario. Keep in mind that I never worked with btrfs - I just read some
> documentation and browsed this mailing list - so forgive me if my questions
> are silly! :X
> 
> 
> On with the main questions, then:
> 
> - What's the advice to maximize disk capacity using such small files, even
> sacrificing some speed?
> 
> - Would you store all the files "flat", or would you build a hierarchical
> tree of directories to speed up file lookups? (basically duplicating the
> filesystem Btree indexes)
> 
> 
> I tried to answer those questions, and here is what I found:
> 
> it seems that the smallest block size is 4K. So, in this scenario, if every
> file uses a full block I will end up with lots of space wasted. Wouldn't
> change much if block was 2K, anyhow.
> 
> I tough about compression, but is not clear to me the compression is handled
> at the file level or at the block level.
> 
> Also I read that there is a mode that uses blocks for shared storage of
> metadata and data, designed for small filesystems. Haven't found any other
> info about it.
> 
> 
> Still is not yet clear to me if btrfs can fit my situation, would you
> recommend it over XFS?
> 
> XFS has a minimum block size of 512, but BTRFS is more modern and, given the
> fact that is able to handle indexes on his own, it could help us speed up
> file operations (could it?)
> 
> Thank you for any advice!
> 

btrfs will inline such small files in metadata blocks.

I'm not sure about limits to size of directory, but I'd guess that going over 
few tens of thousands of files in single flat directory will have speed 
penalties.

Regards,
-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs and 1 billion small files

2012-05-07 Thread Alessio Focardi

Hi,

I need some help in designing a storage structure for 1 billion of small files 
(<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in 
mind that I never worked with btrfs - I just read some documentation and 
browsed this mailing list - so forgive me if my questions are silly! :X


On with the main questions, then:

- What's the advice to maximize disk capacity using such small files, even 
sacrificing some speed?

- Would you store all the files "flat", or would you build a hierarchical tree 
of directories to speed up file lookups? (basically duplicating the filesystem 
Btree indexes)


I tried to answer those questions, and here is what I found:

it seems that the smallest block size is 4K. So, in this scenario, if every 
file uses a full block I will end up with lots of space wasted. Wouldn't change 
much if block was 2K, anyhow.

I tough about compression, but is not clear to me the compression is handled at 
the file level or at the block level.

Also I read that there is a mode that uses blocks for shared storage of 
metadata and data, designed for small filesystems. Haven't found any other info 
about it.


Still is not yet clear to me if btrfs can fit my situation, would you recommend 
it over XFS?

XFS has a minimum block size of 512, but BTRFS is more modern and, given the 
fact that is able to handle indexes on his own, it could help us speed up file 
operations (could it?)

Thank you for any advice!

Alessio Focardi
--


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add missing printing newlines

2012-05-07 Thread David Sterba

On Mon, May 07, 2012 at 01:38:51PM +0800, Daniel J Blueman wrote:
> Fix BTRFS messages to print a newline where there should be one.

Please prefix the patch subject line with 'btrfs: '

> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1501,7 +1501,7 @@ static int btrfs_interface_init(void)
>  static void btrfs_interface_exit(void)
>  {
>   if (misc_deregister(&btrfs_misc) < 0)
> - printk(KERN_INFO "misc_deregister failed for control device");
> + printk(KERN_INFO "misc_deregister failed for control device\n");

printk(KERN_INFO "btrfs: misc_deregister failed for control 
device\n");

and here as well, otherwise ok, thanks.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

37 matches

Mail list logo