Re: weird bash autocomplete issue

2008-12-17 Thread Roland

On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:

On Tue, Dec 16, 2008 at 21:46,   wrote:
>> On Tue, Dec 16, 2008 at 20:37, Roland  wrote:
>> > i have come across a weird autocomplete issue i assume it is related 
>> > to

>> > btrfs.
>> >
>> > let`s have some dirs:
>> >
>> > /non-btrfs-mount
>> >   ./linux
>> >   ./testdir
>> >
>> > /brtfs-mount
>> >   ./linux
>> >   ./testdir
>> >
>> > now, if i do "cd t" in /non-btrfs-mount, "t" autocompletes to 
>> > "testdir"

>> > same for linux - bash autocompletes as expected.
>> >
>> > now, the weird thing is, that on /btrfs-mount this behaves 
>> > different.

>> >
>> > autocompletion for testdir works, but not for linux dir. weird.
>> >
>> > can someone reproduce this ?
>>
>> Open another shell, find the bash process pid of the first shell with:
>>   ps afx
>> and do:
>>   strace -p 
>> Go back to the first shell, hit , and the trace should show
>> what's going on. You see a significant difference there?
>
>
> ok, here we go (i hope i did not cut important parts).
> i don`t see the real issue, but i did another interesting finding - see 
> below

>
>
> bad (cd l):
>
> open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
> fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, 
> st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, 
> st_size=18, st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, 
> st_ctime=2008/12/16-21:32:37}) = 0
> getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, 
> d_name="."} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, 
> d_name=".."} {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24, 
> d_name="test"} {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, 
> d_reclen=32, d_name="linux"}}, 4096) = 104

> _llseek(3, 3, [3], SEEK_SET)= 0
> getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, 
> d_reclen=32, d_name="linux"}}, 4096) = 32


On Tue, Dec 16, 2008 at 22:26,   wrote:
> i assume it has something to do with the large value for d_off of the 
> last dirent ?


Looks like, 9223372036854775807 is just LLONG_MAX.


I can not reproduce that (on openSUSE 11.1). I also don't see
the _llseek() calls.


weird. no btrfs issue then !?



open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_dev=makedev(0, 18), ...
getdents64(3, {
 {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name="."}
 {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=".."}
 {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name="a"}
 {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name="b"}
 {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name="c"}
 {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name="test"}
 {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32, 
d_name="linux"}

}, 4096) = 176
getdents64(3, {}, 4096) = 0
close(3)

This is with today's git kernel and today's standalone btrfs unstable.

You are using the distro kernel and compile the standalone btrfs module?


yes.
to be honest, i`m slightly newer than 11.1 (did zypper dup to latest factory 
some days ago)


linux:~ # bash -version
GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.


roland


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Christoph Hellwig
FYI: here's a little writeup I did this summer on support for
filesystems spanning multiple block devices:


-- 

=== Notes on support for multiple devices for a single filesystem ===

== Intro ==

Btrfs (and an experimental XFS version) can support multiple underlying block
devices for a single filesystem instances in a generalized and flexible way.

Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
the special real-time device in XFS all data and metadata may be spread over a
potentially large number of block devices, and not just one (or two)


== Requirements ==

We want a scheme to support these complex filesystem topologies in way
that is

 a) easy to setup and non-fragile for the users
 b) scalable to a large number of disks in the system
 c) recoverable without requiring user space running first
 d) generic enough to work for multiple filesystems or other consumers

Requirement a) means that a multiple-device filesystem should be mountable
by a simple fstab entry (UUID/LABEL or some other cookie) which continues
to work when the filesystem topology changes.

Requirement b) implies we must not do a scan over all available block devices
in large systems, but use an event-based callout on detection of new block
devices.

Requirement c) means there must be some version to add devices to a filesystem
by kernel command lines, even if this is not the default way, and might require
additional knowledge from the user / system administrator.

Requirement d) means that we should not implement this mechanism inside a
single filesystem.


== Prior art ==

* External log and realtime volume

The most common way to specify the external log device and the XFS real time
device is to have a mount option that contains the path to the block special
device for it.  This variant means a mount option is always required, and
requires the device name doesn't change, which is enough with udev-generated
unique device names (/dev/disk/by-{label,uuid}).

An alternative way, supported by optionally by ext3 and reiserfs and
exclusively supported by jfs is to open the journal device by the device
number (dev_t) of the block special device.  While this doesn't require
an additional mount option when the device number is stored in the filesystem
superblock it relies on the device number being stable which is getting
increasingly unlikely in complex storage topologies.


* RAID (MD) and LVM

Software RAID and volume managers, although not strictly filesystems,
have a similar very similar problem finding their devices.  The traditional
solution used for early versions of the Linux MD driver and LVM version 1
was to hook into the partitions scanning code and add device with the
right partition type to a kernel-internal list of potential RAID / LVM
devices.  This approach has the advantage of being simple to implement,
fast, reliable and not requiring additional user space programs in the boot
process.  The downside is that it only works with specific partition table
formats that allow specifying a partition type, and doesn't work with
unpartitioned disks at all.  Recent MD setups and LVM2 thus move the scanning
to user space, typically using a command iterating over all block device
nodes and performing the format-specific scanning.  While this is more flexible
than the in-kernel scanning, it scales very badly to a large number of
block devices, and requires additional user space commands to run early
in the boot process.  A variant of this schemes runs a scanning callout
from udev once disk device are detected, which avoids the scanning overhead.


== High-level design considerations ==

Due to requirement b) we need a layer that finds devices for a single
fstab entry.  We can either do this in user space, or in kernel space. As we've
traditionally always done UUID/LABEL to device mapping in userspace, and we
already have libvolume_id and libblkid dealing with the specialized case
of UUID/LABEL to single device mapping I would recommend to keep doing
this in user space and try to reuse the libvolume_id / libblkid.

There are to options to perform the assembly of the device list for
a filesystem:

 1) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem it gets added to a list of all devices of this
particular filesystem type.
On mount type mount(8) or a mount.fstype helpers calls out to the
libraries to get a list of devices belonging to this filesystem
type and translates them to device names, which can be passed to
the kernel on the mount command line.

Advantage:  Requires a mount.fstype helper or fs-specific knowledge
in mount(8).
Disadvantages:  Required libvolume_id / libblkid to keep state.

 2) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem they call into the kernel through and ioctl / sysfs /
etc to add it to a list in kernel space.  The kernel code t

Re: [PATCH] fix wrong value returned from btrfs_listxattr when buffer is too small

2008-12-17 Thread Chris Mason
On Fri, 2008-12-12 at 14:36 -0800, Yehuda Sadeh Weinraub wrote:
> Fix bug, btrfs_listxattr doesn't return an error when the buffer size
> is too small (ret was overridden).
> 

Thank you, I've applied this one locally and will push it out.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compilation problem on last unstable

2008-12-17 Thread Lee Trager
On Wed, Dec 17, 2008 at 05:43:50PM +, Michele Petrazzo wrote:
> Hi,
> I just tried to compile the last unstable version, but:
> 
>   CC [M]  /home/michele/btrfs-unstable-standalone/inode.o
> /home/michele/btrfs-unstable-standalone/inode.c: In function 
> ???btrfs_new_inode???:
> /home/michele/btrfs-unstable-standalone/inode.c:3470: error: implicit
> declaration of function ???current_fsuid???
> /home/michele/btrfs-unstable-standalone/inode.c:3471: error: implicit
> declaration of function ???current_fsgid???
> /home/michele/btrfs-unstable-standalone/inode.c: In function 
> ???btrfs_cache_create???:
> /home/michele/btrfs-unstable-standalone/inode.c:4527: warning: passing 
> argument
> 5 of ???kmem_cache_create??? from incompatible pointer type
> /home/michele/btrfs-unstable-standalone/inode.c: At top level:
> /home/michele/btrfs-unstable-standalone/inode.c:4966: warning: initialization
> from incompatible pointer type
> /home/michele/btrfs-unstable-standalone/inode.c:4970: warning: initialization
> from incompatible pointer type
> /home/michele/btrfs-unstable-standalone/inode.c:5024: warning: initialization
> from incompatible pointer type
> /home/michele/btrfs-unstable-standalone/inode.c:5030: warning: initialization
> from incompatible pointer type
> /home/michele/btrfs-unstable-standalone/inode.c:5040: warning: initialization
> from incompatible pointer type
> make[2]: *** [/home/michele/btrfs-unstable-standalone/inode.o] Error 1
> make[1]: *** [_module_/home/michele/btrfs-unstable-standalone] Error 2
> make[1]: Leaving directory `/usr/src/linux-headers-2.6.26-1-686'
> make: *** [all] Error 2
> michele:~/btrfs-unstable-standalone$ 
> 
> 
> michele:~/btrfs-unstable-standalone$ uname -r
> 2.6.26-1-686
> 
> from debian 
Currently btrfs only compiles on 2.6.27 and above although support all
the way back to 2.6.18 is planned. I'm currently using Ubuntu 8.10 for
all btrfs testing.
> 
> Thanks,
> Michele
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 15:17, Chris Mason  wrote:
> On Wed, 2008-12-17 at 14:59 +0100, Kay Sievers wrote:
>> On Wed, Dec 17, 2008 at 09:45, Roland  wrote:
>> >> On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:
>> >>>
>> >>> > open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
>> >>> > fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, >
>> >>> > st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, > 
>> >>> > st_size=18,
>> >>> > st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, >
>> >>> > st_ctime=2008/12/16-21:32:37}) = 0
>> >>> > getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, >
>> >>> > d_name="."} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, > 
>> >>> > d_name=".."}
>> >>> > {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24, > d_name="test"}
>> >>> > {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, > d_reclen=32,
>> >>> > d_name="linux"}}, 4096) = 104
>> >>> > _llseek(3, 3, [3], SEEK_SET)= 0
>> >>> > getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, >
>> >>> > d_reclen=32, d_name="linux"}}, 4096) = 32
>> >>>
>> >>> On Tue, Dec 16, 2008 at 22:26,   wrote:
>> >>> > i assume it has something to do with the large value for d_off of the >
>> >>> > last dirent ?
>> >>>
>> >>> Looks like, 9223372036854775807 is just LLONG_MAX.
>> >>
>> >> I can not reproduce that (on openSUSE 11.1). I also don't see
>> >> the _llseek() calls.
>> >
>> > weird. no btrfs issue then !?
>> >
>> >>
>> >> open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
>> >> fstat(3, {st_dev=makedev(0, 18), ...
>> >> getdents64(3, {
>> >>  {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name="."}
>> >>  {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=".."}
>> >>  {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name="a"}
>> >>  {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name="b"}
>> >>  {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name="c"}
>> >>  {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name="test"}
>> >>  {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
>> >> d_name="linux"}
>> >> }, 4096) = 176
>> >> getdents64(3, {}, 4096) = 0
>> >> close(3)
>> >>
>> >> This is with today's git kernel and today's standalone btrfs unstable.
>> >>
>> >> You are using the distro kernel and compile the standalone btrfs module?
>> >
>> > yes.
>> > to be honest, i`m slightly newer than 11.1 (did zypper dup to latest 
>> > factory
>> > some days ago)
>> >
>> > linux:~ # bash -version
>> > GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
>> > Copyright (C) 2007 Free Software Foundation, Inc.
>>
>> That is still the same bash, the one you use is a 32bit version. Do
>> you run a 32 bit kernel too? I could try that on a 32 bit box then.
>
> At least on my 32 bit box, tab completion works fine.

It works fine here too on 64 bit. I'll try with openSUSE 11.1 on a
32bit box later tonight.

> But, the d_off of
> LLONG_MAX comes from btrfs_readdir().  Git had a feature where it would
> loop infinitely over a directory in some cases and this was my
> workaround.

There are other filesystems doing the same, usually with 32bit int max
instead of 64 bit int max, I guess that should work fine.

> This should be fixed in git by now, so I can drop it if that really is
> causing problems in bash.

I'll come back if I can reproduce it with the same environment Roland is using.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 16:08, Christoph Hellwig  wrote:
> On Wed, Dec 17, 2008 at 03:50:45PM +0100, Kay Sievers wrote:
>> Sounds all sensible. Btrfs already stores the (possibly incomplete)
>> device tree state in the kernel, which should make things pretty easy
>> for userspace, compared to other already existing subsystems.
>>
>> We could have udev maintain a btrfs volume tree:
>>   /dev/btrfs/
>>   |-- 0cdedd75-2d03-41e6-a1eb-156c0920a021
>>   |   |-- 897fac06-569c-4f45-a0b9-a1f91a9564d4 -> ../../sda10
>>   |   `-- aac20975-b642-4650-b65b-b92ce22616f2 -> ../../sda9
>>   `-- a1ec970a-2463-414e-864c-2eb8ac4e1cf2
>>   |-- 4d1f1fff-4c6b-4b87-8486-36f58abc0610 -> ../../sdb2
>>   `-- e7fe3065-c39f-4295-a099-a89e839ae350 -> ../../sdb1
>>
>> At the same time, by-uuid/ is created:
>>   /dev/disk/by-uuid/
>>   |-- 0cdedd75-2d03-41e6-a1eb-156c0920a021 -> ../../sda10
>>   |-- a1ec970a-2463-414e-864c-2eb8ac4e1cf2 -> ../../sdb2
>>   ...
>
> Well, it's not just btrfs, it's also md, lvm and xfs.  I think the right
> way is to make the single node for the /dev/disk/by-uuid/ just a legacy
> case for potential multiple devices.  E.g. by having
>
> /dev/disk/by-uuid/
>0cdedd75-2d03-41e6-a1eb-156c0920a021-> ../../sda10
>0cdedd75-2d03-41e6-a1eb-156c0920a021.d
>foo -> ../../sda10
>bar -> ../../sda9
>
> where foo nad bar could be uuids if the filesystem / volume manager
> supports it, otherwise just the short name for it.

Sure, we can do something like that. /dev/btrfs/ was just something
for me to start with, and see how the stuff works.

>> For recue and recovery cases, it will still be nice to be able to
>> trigger "scan all devices" code in btrfsctrl (own code or libbklid),
>> but it should be avoided in any normal operation mode.
>
> Again, that's something we should do generically for the whole
> /dev/disk/ tree.   For that we need to merge libvolume_id and libblkid
> so that it has a few related but separate use cases:
>
>  - a lowlevel probe what fs / volume manager / etc is this for
>   the udev callout, mkfs, strip size detection etc

A low-level api will be offered by a future libblkid version in util-linux-ng.

>  - a way to rescan everything, either for non-udev static /dev case
>   or your above recovery scenario

The scan code is part of libblkid, we just need some explicit controls
to enable disable the scanning. It should never be the default, like
it is today.

>  - plus potentially some sort of caching for the non-recovery static
>   /dev case

It's also in libblkid. Today it's pretty useless to cache stuff
indexed by major/minor, but it's there.

> I've long planned to put you and Ted into a room and not let you out
> until we see white smoke :)

A new libblkid already happened at:
  
http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=shortlog;h=topic/blkid

Almost all of libvolume_id is already merged into this new version
(only btrfs is missing :)). Udev will switch over to calling blkid
when it's available in a released version of util-linux-ng. I will
just delete the current libvolume_id library after that.

No white smoke, if all works out as planned. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 14:59 +0100, Kay Sievers wrote:
> On Wed, Dec 17, 2008 at 09:45, Roland  wrote:
> >> On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:
> >>>
> >>> > open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
> >>> > fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, >
> >>> > st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, > 
> >>> > st_size=18,
> >>> > st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, >
> >>> > st_ctime=2008/12/16-21:32:37}) = 0
> >>> > getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, >
> >>> > d_name="."} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, > 
> >>> > d_name=".."}
> >>> > {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24, > d_name="test"}
> >>> > {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, > d_reclen=32,
> >>> > d_name="linux"}}, 4096) = 104
> >>> > _llseek(3, 3, [3], SEEK_SET)= 0
> >>> > getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, >
> >>> > d_reclen=32, d_name="linux"}}, 4096) = 32
> >>>
> >>> On Tue, Dec 16, 2008 at 22:26,   wrote:
> >>> > i assume it has something to do with the large value for d_off of the >
> >>> > last dirent ?
> >>>
> >>> Looks like, 9223372036854775807 is just LLONG_MAX.
> >>
> >> I can not reproduce that (on openSUSE 11.1). I also don't see
> >> the _llseek() calls.
> >
> > weird. no btrfs issue then !?
> >
> >>
> >> open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
> >> fstat(3, {st_dev=makedev(0, 18), ...
> >> getdents64(3, {
> >>  {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name="."}
> >>  {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=".."}
> >>  {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name="a"}
> >>  {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name="b"}
> >>  {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name="c"}
> >>  {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name="test"}
> >>  {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
> >> d_name="linux"}
> >> }, 4096) = 176
> >> getdents64(3, {}, 4096) = 0
> >> close(3)
> >>
> >> This is with today's git kernel and today's standalone btrfs unstable.
> >>
> >> You are using the distro kernel and compile the standalone btrfs module?
> >
> > yes.
> > to be honest, i`m slightly newer than 11.1 (did zypper dup to latest factory
> > some days ago)
> >
> > linux:~ # bash -version
> > GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
> > Copyright (C) 2007 Free Software Foundation, Inc.
> 
> That is still the same bash, the one you use is a 32bit version. Do
> you run a 32 bit kernel too? I could try that on a 32 bit box then.

At least on my 32 bit box, tab completion works fine.  But, the d_off of
LLONG_MAX comes from btrfs_readdir().  Git had a feature where it would
loop infinitely over a directory in some cases and this was my
workaround.

This should be fixed in git by now, so I can drop it if that really is
causing problems in bash.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Andrew Morton
On Wed, 17 Dec 2008 08:23:44 -0500
Christoph Hellwig  wrote:

> FYI: here's a little writeup I did this summer on support for
> filesystems spanning multiple block devices:
> 
> 
> -- 
> 
> === Notes on support for multiple devices for a single filesystem ===
> 
> == Intro ==
> 
> Btrfs (and an experimental XFS version) can support multiple underlying block
> devices for a single filesystem instances in a generalized and flexible way.
> 
> Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
> the special real-time device in XFS all data and metadata may be spread over a
> potentially large number of block devices, and not just one (or two)
> 
> 
> == Requirements ==
> 
> We want a scheme to support these complex filesystem topologies in way
> that is
> 
>  a) easy to setup and non-fragile for the users
>  b) scalable to a large number of disks in the system
>  c) recoverable without requiring user space running first
>  d) generic enough to work for multiple filesystems or other consumers
> 
> Requirement a) means that a multiple-device filesystem should be mountable
> by a simple fstab entry (UUID/LABEL or some other cookie) which continues
> to work when the filesystem topology changes.

"device topology"?

> Requirement b) implies we must not do a scan over all available block devices
> in large systems, but use an event-based callout on detection of new block
> devices.
> 
> Requirement c) means there must be some version to add devices to a filesystem
> by kernel command lines, even if this is not the default way, and might 
> require
> additional knowledge from the user / system administrator.
> 
> Requirement d) means that we should not implement this mechanism inside a
> single filesystem.
> 

One thing I've never seen comprehensively addressed is: why do this in
the filesystem at all?  Why not let MD take care of all this and
present a single block device to the fs layer?

Lots of filesystems are violating this, and I'm sure the reasons for
this are good, but this document seems like a suitable place in which to
briefly decribe those reasons.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync-related lockdep warnings

2008-12-17 Thread Chris Mason
On Mon, 2008-12-15 at 16:01 -0500, Chris Mason wrote:
> On Mon, 2008-12-15 at 11:01 -0800, Sage Weil wrote:
> > Hi-
> > 
> > I've been regularly getting a lockdep warning on inode_lock vs tree->lock.  
> > It is quickly triggered by my code, which calls ioctl(fd, BTRFS_IOC_SYNC) 
> > (which just does a btrfs_sync_fs) at regular intervals.
> > 
> > http://ceph.newdream.net/dump/btrfs-lockdep-sync-ioctl.txt
> > 
> > The second warning is similar, but looks to be a bit more revealing.  It 
> > is easily triggered by 'while [ 1 ] ; do sync ; done' and then something 
> > like 'echo a > a' a few times.
> > 
> > http://ceph.newdream.net/dump/btrfs-lockdep-sync.txt
> > 
> > Let me know if there's any other info on my end that would help sort this 
> > out...
> 
> If you pull from btrfs-unstable, this should be fixed.

Well, not so much fixed as traded for a different lockdep warning of the
same time.  I've got a new patch in testing here ;)

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs conference call

2008-12-17 Thread Chris Mason
Hello everyone,

There will be a btrfs conference call today Dec 17th.  Topics will
include mainline merging, and making a new stable release.

Time: 1:30pm US Eastern (10:30am Pacific)

* Dial-in Number(s):
* Toll Free: +1-888-967-2253
* Toll  +1-650-607-2253 
* Meeting id: 665734
* Passcode: 428737 (which hopefully spells 4Btrfs)

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 08:23 -0500, Christoph Hellwig wrote:
> FYI: here's a little writeup I did this summer on support for
> filesystems spanning multiple block devices:
> 
> 

Thanks Christoph, I'll start with a description of what btrfs does
today.

Every Btrfs filesystem has a uuid, and a tree that stores all the device
uuids that belong to the FS uuid.

Every btrfs device has a device uuid and a super block that indicates
which FS uuid it belongs to.

The btrfs kernel module holds a list of the FS uuids found and the
devices that belong to each one.  This list is populated by a block
device scanning ioctl that opens a bdev and checks for btrfs supers.

I tried to keep this code as simple as possible because I knew we'd end
up replacing it.  At mount time, btrfs makes sure the devices found by
scanning match the devices the FS expected to find.

btrfsctl -a scans all of /dev calling the scan ioctl on each device and
btrfsctl -A /dev/ just calls the ioctl on a single device.

No scanning is required for a single device filesystem.  After the scan
is done, sending any device in a multi-device filesystem is enough for
the kernel to mount the FS.  IOW:

mkfs.btrfs /dev/sdb ; mount /dev/sdb /mnt just works.
mkfs.btrfs /dev/sdb /dev/sdc ; mount /dev/sdb /mnt also works

(mkfs.btrfs calls the ioctl on multi-device filesystems).

UUIDS and labels are important in large systems, but if the admin knows
a given device is part of an FS, they are going to expect to be able to
send that one device to mount and have things work.

Even though btrfs currently maintains the device list in the kernel, I'm
happy to move it into a userland api once we settle on one.  Kay has
some code so that udev can discover the btrfs device<->FS uuid mappings,
resulting in a tree like this:

  $ tree /dev/btrfs/
  /dev/btrfs/
  |-- 0cdedd75-2d03-41e6-a1eb-156c0920a021
  |   |-- 897fac06-569c-4f45-a0b9-a1f91a9564d4 -> ../../sda10
  |   `-- aac20975-b642-4650-b65b-b92ce22616f2 -> ../../sda9
  `-- a1ec970a-2463-414e-864c-2eb8ac4e1cf2
  |-- 4d1f1fff-4c6b-4b87-8486-36f58abc0610 -> ../../sdb2
  `-- e7fe3065-c39f-4295-a099-a89e839ae350 -> ../../sdb1

It makes sense to me to use /dev/multi-device/ instead of /dev/btrfs/,
I'm fine with anything really.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


compilation problem on last unstable

2008-12-17 Thread Michele Petrazzo
Hi,
I just tried to compile the last unstable version, but:

  CC [M]  /home/michele/btrfs-unstable-standalone/inode.o
/home/michele/btrfs-unstable-standalone/inode.c: In function ‘btrfs_new_inode’:
/home/michele/btrfs-unstable-standalone/inode.c:3470: error: implicit
declaration of function ‘current_fsuid’
/home/michele/btrfs-unstable-standalone/inode.c:3471: error: implicit
declaration of function ‘current_fsgid’
/home/michele/btrfs-unstable-standalone/inode.c: In function 
‘btrfs_cache_create’:
/home/michele/btrfs-unstable-standalone/inode.c:4527: warning: passing argument
5 of ‘kmem_cache_create’ from incompatible pointer type
/home/michele/btrfs-unstable-standalone/inode.c: At top level:
/home/michele/btrfs-unstable-standalone/inode.c:4966: warning: initialization
from incompatible pointer type
/home/michele/btrfs-unstable-standalone/inode.c:4970: warning: initialization
from incompatible pointer type
/home/michele/btrfs-unstable-standalone/inode.c:5024: warning: initialization
from incompatible pointer type
/home/michele/btrfs-unstable-standalone/inode.c:5030: warning: initialization
from incompatible pointer type
/home/michele/btrfs-unstable-standalone/inode.c:5040: warning: initialization
from incompatible pointer type
make[2]: *** [/home/michele/btrfs-unstable-standalone/inode.o] Error 1
make[1]: *** [_module_/home/michele/btrfs-unstable-standalone] Error 2
make[1]: Leaving directory `/usr/src/linux-headers-2.6.26-1-686'
make: *** [all] Error 2
michele:~/btrfs-unstable-standalone$ 


michele:~/btrfs-unstable-standalone$ uname -r
2.6.26-1-686

from debian 

Thanks,
Michele

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread devzero
> > linux:~ # bash -version
> > GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
> > Copyright (C) 2007 Free Software Foundation, Inc.
> 
> That is still the same bash, the one you use is a 32bit version. Do
> you run a 32 bit kernel too? I could try that on a 32 bit box then.
> 
> Thanks,
> Kay
> 

yes, all 32bit. 
___
Täglich 1.000.000 Euro gewinnen! Jetzt kostenlos WEB.DE MillionenKlick 
spielen! https://millionenklick.web.de/?mc=m...@footer.mklick@home

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:
> On Wed, 17 Dec 2008 08:23:44 -0500
> Christoph Hellwig  wrote:
> 
> > FYI: here's a little writeup I did this summer on support for
> > filesystems spanning multiple block devices:
> > 
> > 
> > -- 
> > 
> > === Notes on support for multiple devices for a single filesystem ===
> > 
> > == Intro ==
> > 
> > Btrfs (and an experimental XFS version) can support multiple underlying 
> > block
> > devices for a single filesystem instances in a generalized and flexible way.
> > 
> > Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
> > the special real-time device in XFS all data and metadata may be spread 
> > over a
> > potentially large number of block devices, and not just one (or two)
> > 
> > 
> > == Requirements ==
> > 
> > We want a scheme to support these complex filesystem topologies in way
> > that is
> > 
> >  a) easy to setup and non-fragile for the users
> >  b) scalable to a large number of disks in the system
> >  c) recoverable without requiring user space running first
> >  d) generic enough to work for multiple filesystems or other consumers
> > 
> > Requirement a) means that a multiple-device filesystem should be mountable
> > by a simple fstab entry (UUID/LABEL or some other cookie) which continues
> > to work when the filesystem topology changes.
> 
> "device topology"?
> 
> > Requirement b) implies we must not do a scan over all available block 
> > devices
> > in large systems, but use an event-based callout on detection of new block
> > devices.
> > 
> > Requirement c) means there must be some version to add devices to a 
> > filesystem
> > by kernel command lines, even if this is not the default way, and might 
> > require
> > additional knowledge from the user / system administrator.
> > 
> > Requirement d) means that we should not implement this mechanism inside a
> > single filesystem.
> > 
> 
> One thing I've never seen comprehensively addressed is: why do this in
> the filesystem at all?  Why not let MD take care of all this and
> present a single block device to the fs layer?
> 
> Lots of filesystems are violating this, and I'm sure the reasons for
> this are good, but this document seems like a suitable place in which to
> briefly decribe those reasons.

I'd almost rather see this doc stick to the device topology interface in
hopes of describing something that RAID and MD can use too.  But just to
toss some information into the pool:

* When moving data around (raid rebuild, restripe, pvmove etc), we want
to make sure the data read off the disk is correct before writing it to
the new location (checksum verification).

* When moving data around, we don't want to move data that isn't
actually used by the filesystem.  This could be solved via new APIs, but
keeping it crash safe would be very tricky.

* When checksum verification fails on read, the FS should be able to ask
the raid implementation for another copy.  This could be solved via new
APIs.

* Different parts of the filesystem might want different underlying raid
parameters.  The easiest example is metadata vs data, where a 4k
stripesize for data might be a bad idea and a 64k stripesize for
metadata would result in many more rwm cycles.

* Sharing the filesystem transaction layer.  LVM and MD have to pretend
they are a single consistent array of bytes all the time, for each and
every write they return as complete to the FS.

By pushing the multiple device support up into the filesystem, I can
share the filesystem's transaction layer.  Work can be done in larger
atomic units, and the filesystem will stay consistent because it is all
coordinated.

There are other bits and pieces like high speed front end caching
devices that would be difficult in MD/LVM, but since I don't have that
coded yet I suppose they don't really count...

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 09:45, Roland  wrote:
>> On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:
>>>
>>> On Tue, Dec 16, 2008 at 21:46,   wrote:
>>> >> On Tue, Dec 16, 2008 at 20:37, Roland  wrote:
>>> >> > i have come across a weird autocomplete issue i assume it is related
>>> >> > >> > to
>>> >> > btrfs.
>>> >> >
>>> >> > let`s have some dirs:
>>> >> >
>>> >> > /non-btrfs-mount
>>> >> >   ./linux
>>> >> >   ./testdir
>>> >> >
>>> >> > /brtfs-mount
>>> >> >   ./linux
>>> >> >   ./testdir
>>> >> >
>>> >> > now, if i do "cd t" in /non-btrfs-mount, "t" autocompletes to
>>> >> > >> > "testdir"
>>> >> > same for linux - bash autocompletes as expected.
>>> >> >
>>> >> > now, the weird thing is, that on /btrfs-mount this behaves >> >
>>> >> > different.
>>> >> >
>>> >> > autocompletion for testdir works, but not for linux dir. weird.
>>> >> >
>>> >> > can someone reproduce this ?
>>> >>
>>> >> Open another shell, find the bash process pid of the first shell with:
>>> >>   ps afx
>>> >> and do:
>>> >>   strace -p 
>>> >> Go back to the first shell, hit , and the trace should show
>>> >> what's going on. You see a significant difference there?
>>> >
>>> >
>>> > ok, here we go (i hope i did not cut important parts).
>>> > i don`t see the real issue, but i did another interesting finding - see
>>> > > below
>>> >
>>> >
>>> > bad (cd l):
>>> >
>>> > open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
>>> > fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, >
>>> > st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, > 
>>> > st_size=18,
>>> > st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, >
>>> > st_ctime=2008/12/16-21:32:37}) = 0
>>> > getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, >
>>> > d_name="."} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, > 
>>> > d_name=".."}
>>> > {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24, > d_name="test"}
>>> > {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, > d_reclen=32,
>>> > d_name="linux"}}, 4096) = 104
>>> > _llseek(3, 3, [3], SEEK_SET)= 0
>>> > getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, >
>>> > d_reclen=32, d_name="linux"}}, 4096) = 32
>>>
>>> On Tue, Dec 16, 2008 at 22:26,   wrote:
>>> > i assume it has something to do with the large value for d_off of the >
>>> > last dirent ?
>>>
>>> Looks like, 9223372036854775807 is just LLONG_MAX.
>>
>> I can not reproduce that (on openSUSE 11.1). I also don't see
>> the _llseek() calls.
>
> weird. no btrfs issue then !?
>
>>
>> open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
>> fstat(3, {st_dev=makedev(0, 18), ...
>> getdents64(3, {
>>  {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name="."}
>>  {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=".."}
>>  {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name="a"}
>>  {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name="b"}
>>  {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name="c"}
>>  {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name="test"}
>>  {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
>> d_name="linux"}
>> }, 4096) = 176
>> getdents64(3, {}, 4096) = 0
>> close(3)
>>
>> This is with today's git kernel and today's standalone btrfs unstable.
>>
>> You are using the distro kernel and compile the standalone btrfs module?
>
> yes.
> to be honest, i`m slightly newer than 11.1 (did zypper dup to latest factory
> some days ago)
>
> linux:~ # bash -version
> GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
> Copyright (C) 2007 Free Software Foundation, Inc.

That is still the same bash, the one you use is a 32bit version. Do
you run a 32 bit kernel too? I could try that on a 32 bit box then.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 14:23, Christoph Hellwig  wrote:
> === Notes on support for multiple devices for a single filesystem ===
>
> == Intro ==
>
> Btrfs (and an experimental XFS version) can support multiple underlying block
> devices for a single filesystem instances in a generalized and flexible way.
>
> Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
> the special real-time device in XFS all data and metadata may be spread over a
> potentially large number of block devices, and not just one (or two)
>
>
> == Requirements ==
>
> We want a scheme to support these complex filesystem topologies in way
> that is
>
>  a) easy to setup and non-fragile for the users
>  b) scalable to a large number of disks in the system
>  c) recoverable without requiring user space running first
>  d) generic enough to work for multiple filesystems or other consumers
>
> Requirement a) means that a multiple-device filesystem should be mountable
> by a simple fstab entry (UUID/LABEL or some other cookie) which continues
> to work when the filesystem topology changes.
>
> Requirement b) implies we must not do a scan over all available block devices
> in large systems, but use an event-based callout on detection of new block
> devices.
>
> Requirement c) means there must be some version to add devices to a filesystem
> by kernel command lines, even if this is not the default way, and might 
> require
> additional knowledge from the user / system administrator.
>
> Requirement d) means that we should not implement this mechanism inside a
> single filesystem.
>
>
> == Prior art ==
>
> * External log and realtime volume
>
> The most common way to specify the external log device and the XFS real time
> device is to have a mount option that contains the path to the block special
> device for it.  This variant means a mount option is always required, and
> requires the device name doesn't change, which is enough with udev-generated
> unique device names (/dev/disk/by-{label,uuid}).
>
> An alternative way, supported by optionally by ext3 and reiserfs and
> exclusively supported by jfs is to open the journal device by the device
> number (dev_t) of the block special device.  While this doesn't require
> an additional mount option when the device number is stored in the filesystem
> superblock it relies on the device number being stable which is getting
> increasingly unlikely in complex storage topologies.
>
>
> * RAID (MD) and LVM
>
> Software RAID and volume managers, although not strictly filesystems,
> have a similar very similar problem finding their devices.  The traditional
> solution used for early versions of the Linux MD driver and LVM version 1
> was to hook into the partitions scanning code and add device with the
> right partition type to a kernel-internal list of potential RAID / LVM
> devices.  This approach has the advantage of being simple to implement,
> fast, reliable and not requiring additional user space programs in the boot
> process.  The downside is that it only works with specific partition table
> formats that allow specifying a partition type, and doesn't work with
> unpartitioned disks at all.  Recent MD setups and LVM2 thus move the scanning
> to user space, typically using a command iterating over all block device
> nodes and performing the format-specific scanning.  While this is more 
> flexible
> than the in-kernel scanning, it scales very badly to a large number of
> block devices, and requires additional user space commands to run early
> in the boot process.  A variant of this schemes runs a scanning callout
> from udev once disk device are detected, which avoids the scanning overhead.
>
>
> == High-level design considerations ==
>
> Due to requirement b) we need a layer that finds devices for a single
> fstab entry.  We can either do this in user space, or in kernel space. As 
> we've
> traditionally always done UUID/LABEL to device mapping in userspace, and we
> already have libvolume_id and libblkid dealing with the specialized case
> of UUID/LABEL to single device mapping I would recommend to keep doing
> this in user space and try to reuse the libvolume_id / libblkid.
>
> There are to options to perform the assembly of the device list for
> a filesystem:
>
>  1) whenever libvolume_id / libblkid find a device detected as a multi-device
>capable filesystem it gets added to a list of all devices of this
>particular filesystem type.
>On mount type mount(8) or a mount.fstype helpers calls out to the
>libraries to get a list of devices belonging to this filesystem
>type and translates them to device names, which can be passed to
>the kernel on the mount command line.
>
>Advantage:  Requires a mount.fstype helper or fs-specific knowledge
>in mount(8).
>Disadvantages:  Required libvolume_id / libblkid to keep state.
>
>  2) whenever libvolume_id / libblkid find a device detected as a multi-device
>capable filesyst

Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Christoph Hellwig
On Wed, Dec 17, 2008 at 03:50:45PM +0100, Kay Sievers wrote:
> Sounds all sensible. Btrfs already stores the (possibly incomplete)
> device tree state in the kernel, which should make things pretty easy
> for userspace, compared to other already existing subsystems.
> 
> We could have udev maintain a btrfs volume tree:
>   /dev/btrfs/
>   |-- 0cdedd75-2d03-41e6-a1eb-156c0920a021
>   |   |-- 897fac06-569c-4f45-a0b9-a1f91a9564d4 -> ../../sda10
>   |   `-- aac20975-b642-4650-b65b-b92ce22616f2 -> ../../sda9
>   `-- a1ec970a-2463-414e-864c-2eb8ac4e1cf2
>   |-- 4d1f1fff-4c6b-4b87-8486-36f58abc0610 -> ../../sdb2
>   `-- e7fe3065-c39f-4295-a099-a89e839ae350 -> ../../sdb1
> 
> At the same time, by-uuid/ is created:
>   /dev/disk/by-uuid/
>   |-- 0cdedd75-2d03-41e6-a1eb-156c0920a021 -> ../../sda10
>   |-- a1ec970a-2463-414e-864c-2eb8ac4e1cf2 -> ../../sdb2
>   ...

Well, it's not just btrfs, it's also md, lvm and xfs.  I think the right
way is to make the single node for the /dev/disk/by-uuid/ just a legacy
case for potential multiple devices.  E.g. by having

/dev/disk/by-uuid/
0cdedd75-2d03-41e6-a1eb-156c0920a021-> ../../sda10
0cdedd75-2d03-41e6-a1eb-156c0920a021.d
foo -> ../../sda10
bar -> ../../sda9

where foo nad bar could be uuids if the filesystem / volume manager
supports it, otherwise just the short name for it.


> For recue and recovery cases, it will still be nice to be able to
> trigger "scan all devices" code in btrfsctrl (own code or libbklid),
> but it should be avoided in any normal operation mode.

Again, that's something we should do generically for the whole
/dev/disk/ tree.   For that we need to merge libvolume_id and libblkid
so that it has a few related but separate use cases:

 - a lowlevel probe what fs / volume manager / etc is this for
   the udev callout, mkfs, strip size detection etc
 - a way to rescan everything, either for non-udev static /dev case
   or your above recovery scenario
 - plus potentially some sort of caching for the non-recovery static
   /dev case

I've long planned to put you and Ted into a room and not let you out
until we see white smoke :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 21:58, Chris Mason  wrote:
> On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:

>> One thing I've never seen comprehensively addressed is: why do this in
>> the filesystem at all?  Why not let MD take care of all this and
>> present a single block device to the fs layer?
>>
>> Lots of filesystems are violating this, and I'm sure the reasons for
>> this are good, but this document seems like a suitable place in which to
>> briefly decribe those reasons.
>
> I'd almost rather see this doc stick to the device topology interface in
> hopes of describing something that RAID and MD can use too.  But just to
> toss some information into the pool:
>
> * When moving data around (raid rebuild, restripe, pvmove etc), we want
> to make sure the data read off the disk is correct before writing it to
> the new location (checksum verification).
>
> * When moving data around, we don't want to move data that isn't
> actually used by the filesystem.  This could be solved via new APIs, but
> keeping it crash safe would be very tricky.
>
> * When checksum verification fails on read, the FS should be able to ask
> the raid implementation for another copy.  This could be solved via new
> APIs.
>
> * Different parts of the filesystem might want different underlying raid
> parameters.  The easiest example is metadata vs data, where a 4k
> stripesize for data might be a bad idea and a 64k stripesize for
> metadata would result in many more rwm cycles.
>
> * Sharing the filesystem transaction layer.  LVM and MD have to pretend
> they are a single consistent array of bytes all the time, for each and
> every write they return as complete to the FS.
>
> By pushing the multiple device support up into the filesystem, I can
> share the filesystem's transaction layer.  Work can be done in larger
> atomic units, and the filesystem will stay consistent because it is all
> coordinated.
>
> There are other bits and pieces like high speed front end caching
> devices that would be difficult in MD/LVM, but since I don't have that
> coded yet I suppose they don't really count...

Features like the very nice and useful directory-based snapshots would
also not be possible with simple block-based multi-devices, right?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Andreas Dilger
On Dec 17, 2008  15:58 -0500, Chris Mason wrote:
> On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:
> > One thing I've never seen comprehensively addressed is: why do this in
> > the filesystem at all?  Why not let MD take care of all this and
> > present a single block device to the fs layer?
> > 
> > Lots of filesystems are violating this, and I'm sure the reasons for
> > this are good, but this document seems like a suitable place in which to
> > briefly decribe those reasons.
> 
> I'd almost rather see this doc stick to the device topology interface in
> hopes of describing something that RAID and MD can use too.  But just to
> toss some information into the pool:

Add in here (most important reason, IMHO) that the filesystem wants to make
sure that different copies of redundant metadata are stored on different
physical devices.  It seems pointless to have 4 copies of important data if
a single disk failure makes them all inaccessible.

At the same time, not all data/metadata is of the same importance, so
it makes sense to store e.g. 4 full copies of important metadata like
the allocation bitmaps and the tree root block, but only RAID-5 for
file data.  Even if MD was used to implement the RAID-1 and RAID-5 layer
in this case there would need to be multiple MD devices involved.

> * When moving data around (raid rebuild, restripe, pvmove etc), we want
> to make sure the data read off the disk is correct before writing it to
> the new location (checksum verification).
> 
> * When moving data around, we don't want to move data that isn't
> actually used by the filesystem.  This could be solved via new APIs, but
> keeping it crash safe would be very tricky.
> 
> * When checksum verification fails on read, the FS should be able to ask
> the raid implementation for another copy.  This could be solved via new
> APIs.
> 
> * Different parts of the filesystem might want different underlying raid
> parameters.  The easiest example is metadata vs data, where a 4k
> stripesize for data might be a bad idea and a 64k stripesize for
> metadata would result in many more rwm cycles.

Not just different underlying RAID parameters, but completely separate
physical storage characteristics.  Having e.g. metadata stored on RAID-1
SSD flash (excellent for small random IO) while the data for large files
is stored on SATA RAID-5 would maximize performance while minimizing cost.

If there is a single virtual block device the filesystem can't make such
allocation decisions unless the virtual block device exposes grotty
details like "first 1MB of 128MB is really SSD" or "first 64GB is SSD,
rest is SATA" to the filesystem somehow, at which point you are just
shoehorning multiple devices into a bad interface (linear array of block
numbers) that has to be worked around.

> * Sharing the filesystem transaction layer.  LVM and MD have to pretend
> they are a single consistent array of bytes all the time, for each and
> every write they return as complete to the FS.
> 
> By pushing the multiple device support up into the filesystem, I can
> share the filesystem's transaction layer.  Work can be done in larger
> atomic units, and the filesystem will stay consistent because it is all
> coordinated.

This is even true with filesystems other than btrfs.  As it stands today
the MD RAID-1 code implements its own transaction mechanism for the
recovery bitmaps, and it would have been more efficient to hook this into
the JBD transaction code to avoid 2 layers of flush-then-wait_for_completion.


I can't speak for btrfs, but I don't think multiple device access from
the filesystem is a "layering violation" as some people comment.  It is
just a different type of layering.  With ZFS there is a distinct layer
that is handling the allocation, redundancy, and transactions (SPA, DMU)
that is exporting an object interface, and the filesystem (ZPL, or future
versions of Lustre) is built on top of that object interface.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs conference call

2008-12-17 Thread Gabor MICSKO
Hi,

After the event, can someone provide a brief summary of the conference
call to the list, please? :)

Thank you!


On Wed, 2008-12-17 at 09:31 -0500, Chris Mason wrote:
> Hello everyone,
> 
> There will be a btrfs conference call today Dec 17th.  Topics will
> include mainline merging, and making a new stable release.

--
mg

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 22:20 +0100, Kay Sievers wrote:
> On Wed, Dec 17, 2008 at 21:58, Chris Mason  wrote:
> > On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:
> >
> > There are other bits and pieces like high speed front end caching
> > devices that would be difficult in MD/LVM, but since I don't have that
> > coded yet I suppose they don't really count...
> 
> Features like the very nice and useful directory-based snapshots would
> also not be possible with simple block-based multi-devices, right?

At least for btrfs, the snapshotting is independent from the
multi-device code, and you still get snapshotting on single device
filesystems.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Jeff Garzik

Kay Sievers wrote:

Features like the very nice and useful directory-based snapshots would
also not be possible with simple block-based multi-devices, right?


Snapshotting via block device has always been an incredibly dumb hack, 
existing primarily because filesystem-based snapshots did not exist for 
the filesystem in question.


Snapshots are better at the filesystem level because the filesystem is 
the only entity that knows when the filesystem is quiescent and 
snapshot-able.


ISTR we had to add ->write_super_lockfs() to hack in support for LVM in 
this manner, rather than doing it the right way.


Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Jeff Garzik

Andreas Dilger wrote:

I can't speak for btrfs, but I don't think multiple device access from
the filesystem is a "layering violation" as some people comment.  It is
just a different type of layering.  With ZFS there is a distinct layer
that is handling the allocation, redundancy, and transactions (SPA, DMU)
that is exporting an object interface, and the filesystem (ZPL, or future
versions of Lustre) is built on top of that object interface.



Furthermore...  think about object-based storage filesystems.  They will 
need to directly issue SCSI commands to storage devices.  Call it a 
layering violation if you will, but you simply cannot even pretend that 
an OSD is a linear block device for the purposes of our existing block 
layer.


Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 14:24 -0700, Andreas Dilger wrote:

> I can't speak for btrfs, but I don't think multiple device access from
> the filesystem is a "layering violation" as some people comment.  It
> is
> just a different type of layering.  With ZFS there is a distinct layer
> that is handling the allocation, redundancy, and transactions (SPA,
> DMU)
> that is exporting an object interface, and the filesystem (ZPL, or
> future
> versions of Lustre) is built on top of that object interface.

Clean interfaces aren't really my best talent, but btrfs also layers
this out.  logical->physical mappings happen in a centralized function,
and all of the on disk structures use logical block numbers.

The only exception to that rule is the superblock offsets on the device.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Andreas Dilger
On Dec 17, 2008  08:23 -0500, Christoph Hellwig wrote:
> == Prior art ==
> 
> * External log and realtime volume
> 
> The most common way to specify the external log device and the XFS real time
> device is to have a mount option that contains the path to the block special
> device for it.  This variant means a mount option is always required, and
> requires the device name doesn't change, which is enough with udev-generated
> unique device names (/dev/disk/by-{label,uuid}).
> 
> An alternative way, supported by optionally by ext3 and reiserfs and
> exclusively supported by jfs is to open the journal device by the device
> number (dev_t) of the block special device.  While this doesn't require
> an additional mount option when the device number is stored in the filesystem
> superblock it relies on the device number being stable which is getting
> increasingly unlikely in complex storage topologies.

Just as an FYI here - the dev_t stored in the ext3/4 superblock for the
journal device is only a "cached" device.  The journal is properly
identified by its UUID, and should the device mapping change there is a
"journal_dev=" option that can be used to specify the new device.  The
one shortcoming is that there is no mount.ext3 helper which does this 
journal UUID->dev mapping and automatically passes "journal_dev=" if
needed.

> * RAID (MD) and LVM
> 
> Recent MD setups and LVM2 thus move the scanning to user space, typically
> using a command iterating over all block device nodes and performing the
> format-specific scanning.  While this is more flexible
> than the in-kernel scanning, it scales very badly to a large number of
> block devices, and requires additional user space commands to run early
> in the boot process.  A variant of this schemes runs a scanning callout
> from udev once disk device are detected, which avoids the scanning overhead.

My (admittedly somewhat vague) impression is that with large numbers of
devices the udev callout can itself be a huge overhead because this involves
a userspace fork/exec for each new device being added.  For the same
number of devices, a single scan from userspace only requires a single
process, and an equal number of device probes.

Added to this is that the blkid cache can be used to eliminate the need
to do any scanning if the devices have not changed from the previous boot
makes it unclear which mechanism is more efficient.  The drawback is that
the initrd device cache is never going to be up-to-date so it wouldn't
be useful until the root partition is mounted.

We've used blkid for our testing of Lustre-on-DMU with up to 48 (local)
disks w/o any kind of performance issues.  We'll eventually be able to
test on systems with around 400 disks in a JBOD configuration, but until
then we only run on systems with hundreds of disks behind a RAID controller.

> == High-level design considerations ==
> 
> Due to requirement b) we need a layer that finds devices for a single
> fstab entry.  We can either do this in user space, or in kernel space.
> As we've traditionally always done UUID/LABEL to device mapping in
> userspace, and we already have libvolume_id and libblkid dealing with
> the specialized case of UUID/LABEL to single device mapping I would
> recommend to keep doing this in user space and reuse libvolume_id/libblkid.
> 
> There are to options to perform the assembly of the device list for
> a filesystem:
> 
>  1) whenever libvolume_id / libblkid find a device detected as a multi-device
> capable filesystem it gets added to a list of all devices of this
> particular filesystem type.
> On mount type mount(8) or a mount.fstype helpers calls out to the
> libraries to get a list of devices belonging to this filesystem
> type and translates them to device names, which can be passed to
> the kernel on the mount command line.

I would actually suggest that instead of keeping devices in groups by
the filesystem type, rather keep a list of devices with the same UUID
and/or LABEL, and if the mount is looking for this UUID/LABEL it gets
the whole list of matching devices back.

This could also be done in the kernel by having the filesystems register
a "probe" function that examines the device/partitions as they are added,
similar to the way that MD used to do it.  There would likely be very few
probe functions needed, only ext3/4 (for journal devices), btrfs, and
maybe MD, LVM2 and a handful more.

If we wanted to avoid code duplication, this could share code between
libblkid and the kernel (just the enhanced probe-only functions in the
util-linux-ng implementation) since these functions are little more than
"take a pointer, cast it to struct X, check some magic fields and return
match + {LABEL, UUID}, or no-match".

That MD used to check only the partition type doesn't mean that we can't
have simple functions that read the superblock (or equivalent) to make
an internal list of suitable devices attached to a filesystem-type global
structure (poss

Re: weird bash autocomplete issue

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 15:46, Kay Sievers  wrote:
> On Wed, Dec 17, 2008 at 15:17, Chris Mason  wrote:
>> On Wed, 2008-12-17 at 14:59 +0100, Kay Sievers wrote:
>>> On Wed, Dec 17, 2008 at 09:45, Roland  wrote:
>>> >> On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:
>>> >>>
>>> >>> > open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
>>> >>> > fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, >
>>> >>> > st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, > 
>>> >>> > st_size=18,
>>> >>> > st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, >
>>> >>> > st_ctime=2008/12/16-21:32:37}) = 0
>>> >>> > getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, >
>>> >>> > d_name="."} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, > 
>>> >>> > d_name=".."}
>>> >>> > {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24, > d_name="test"}
>>> >>> > {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, > d_reclen=32,
>>> >>> > d_name="linux"}}, 4096) = 104
>>> >>> > _llseek(3, 3, [3], SEEK_SET)= 0
>>> >>> > getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, >
>>> >>> > d_reclen=32, d_name="linux"}}, 4096) = 32
>>> >>>
>>> >>> On Tue, Dec 16, 2008 at 22:26,   wrote:
>>> >>> > i assume it has something to do with the large value for d_off of the 
>>> >>> > >
>>> >>> > last dirent ?
>>> >>>
>>> >>> Looks like, 9223372036854775807 is just LLONG_MAX.
>>> >>
>>> >> I can not reproduce that (on openSUSE 11.1). I also don't see
>>> >> the _llseek() calls.
>>> >
>>> > weird. no btrfs issue then !?
>>> >
>>> >>
>>> >> open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
>>> >> fstat(3, {st_dev=makedev(0, 18), ...
>>> >> getdents64(3, {
>>> >>  {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name="."}
>>> >>  {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=".."}
>>> >>  {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name="a"}
>>> >>  {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name="b"}
>>> >>  {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name="c"}
>>> >>  {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name="test"}
>>> >>  {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
>>> >> d_name="linux"}
>>> >> }, 4096) = 176
>>> >> getdents64(3, {}, 4096) = 0
>>> >> close(3)
>>> >>
>>> >> This is with today's git kernel and today's standalone btrfs unstable.
>>> >>
>>> >> You are using the distro kernel and compile the standalone btrfs module?
>>> >
>>> > yes.
>>> > to be honest, i`m slightly newer than 11.1 (did zypper dup to latest 
>>> > factory
>>> > some days ago)
>>> >
>>> > linux:~ # bash -version
>>> > GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
>>> > Copyright (C) 2007 Free Software Foundation, Inc.
>>>
>>> That is still the same bash, the one you use is a 32bit version. Do
>>> you run a 32 bit kernel too? I could try that on a 32 bit box then.
>>
>> At least on my 32 bit box, tab completion works fine.
>
> It works fine here too on 64 bit. I'll try with openSUSE 11.1 on a
> 32bit box later tonight.
>
>> But, the d_off of
>> LLONG_MAX comes from btrfs_readdir().  Git had a feature where it would
>> loop infinitely over a directory in some cases and this was my
>> workaround.
>
> There are other filesystems doing the same, usually with 32bit int max
> instead of 64 bit int max, I guess that should work fine.
>
>> This should be fixed in git by now, so I can drop it if that really is
>> causing problems in bash.
>
> I'll come back if I can reproduce it with the same environment Roland is 
> using.

I see the same issue on x86 32 bit, with the additional __llseek()
between the getdents64(), and the last entry returned by readdir
ignored.

If I change the returned LLONG_MAX to LONG_MAX in inode.c, it all
works fine, and the __llseek() disappears.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Dave Kleikamp
On Wed, 2008-12-17 at 15:04 -0700, Andreas Dilger wrote:
> On Dec 17, 2008  08:23 -0500, Christoph Hellwig wrote:

> > An alternative way, supported by optionally by ext3 and reiserfs and
> > exclusively supported by jfs is to open the journal device by the device
> > number (dev_t) of the block special device.  While this doesn't require
> > an additional mount option when the device number is stored in the 
> > filesystem
> > superblock it relies on the device number being stable which is getting
> > increasingly unlikely in complex storage topologies.
> 
> Just as an FYI here - the dev_t stored in the ext3/4 superblock for the
> journal device is only a "cached" device.  The journal is properly
> identified by its UUID, and should the device mapping change there is a
> "journal_dev=" option that can be used to specify the new device.  The
> one shortcoming is that there is no mount.ext3 helper which does this 
> journal UUID->dev mapping and automatically passes "journal_dev=" if
> needed.

An additional FYI.  JFS also treats the dev_t in its superblock the same
way.  Since jfs relies on jfs_fsck running at boot time to ensure that
the journal is replayed, jfs_fsck makes sure that the dev_t is accurate.
If not, then it scans all of the block devices until it finds the uuid
of the journal device, updating the superblock so that the kernel will
find the journal.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 23:15 +0100, Kay Sievers wrote:
> > There are other filesystems doing the same, usually with 32bit int max
> > instead of 64 bit int max, I guess that should work fine.
> >
> >> This should be fixed in git by now, so I can drop it if that really is
> >> causing problems in bash.
> >
> > I'll come back if I can reproduce it with the same environment Roland is 
> > using.
> 
> I see the same issue on x86 32 bit, with the additional __llseek()
> between the getdents64(), and the last entry returned by readdir
> ignored.
> 
> If I change the returned LLONG_MAX to LONG_MAX in inode.c, it all
> works fine, and the __llseek() disappears.

Ok, thanks I'll work up a patch.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG at fs/buffer.c:2925! when mounted USB-disk is disconnected

2008-12-17 Thread Kay Sievers
I see the following when disconnecting a USB-stick containing a
mounted 2-partitions btrfs volume, and I try to umount it later.

I reproduced it 3 times, always after a fresh reboot. The box
is unstable after that, modules can not be unloaded, other fs's
can not be unmounted.

Thanks,
Kay


usb 1-2: USB disconnect, address 4
...
lost page write due to I/O error on sdb2
end_request: I/O error, dev sdb, sector 131072
lost page write due to I/O error on sdb2
lost page write due to I/O error on sdb1
end_request: I/O error, dev sdb, sector 131072
lost page write due to I/O error on sdb1
[ cut here ]
kernel BUG at fs/buffer.c:2925!
invalid opcode:  [#1] SMP 
last sysfs file: 
/sys/devices/pci:00/:00:1c.1/:03:00.0/rfkill/rfkill0/state
CPU 0 
Modules linked in: usb_storage btrfs zlib_inflate zlib_deflate crc32c libcrc32c 
ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd acpi_cpufreq fuse 
dm_crypt loop dm_mod rtc_cmos rtc_core rtc_lib uinput usbhid hid pcmcia arc4 
thinkpad_acpi ecb hwmon snd_hda_intel backlight yenta_socket iwl3945 snd_pcm 
uhci_hcd snd_timer rfkill snd soundcore thermal mac80211 ehci_hcd pcspkr 
snd_page_alloc rsrc_nonstatic led_class battery ac evdev usbcore pcmcia_core 
nvram button cfg80211 e1000e sg intel_agp processor
Pid: 2994, comm: umount Not tainted 2.6.28-rc8-00057-g1bda712 #33
RIP: 0010:[]  [] submit_bh+0x128/0x130
RSP: 0018:880061685c38  EFLAGS: 00010246
RAX: 0028 RBX: 880063eb0160 RCX: 
RDX: 0004 RSI: 880063eb0160 RDI: 0001
RBP: 880061685c58 R08: 0003 R09: 1000
R10: 0001 R11: 0001 R12: 
R13: 0001 R14: 0003 R15: 880061688f43
FS:  7f71833eb6f0() GS:8069e7c0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f91e27de010 CR3: 6fac CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process umount (pid: 2994, threadinfo 880061684000, task 880061519140)
Stack:
  880063eb0160  880078957918
 880061685cb8 a03366bd 880063ed0038 880061688f63
 000161685cb8 0001 000c0001 880078957918
Call Trace:
 [] write_dev_supers+0x20d/0x340 [btrfs]
 [] write_all_supers+0x218/0x260 [btrfs]
 [] write_ctree_super+0xe/0x10 [btrfs]
 [] btrfs_commit_transaction+0x5ff/0x7e0 [btrfs]
 [] ? autoremove_wake_function+0x0/0x40
 [] ? mutex_unlock+0x9/0x10
 [] btrfs_sync_fs+0x5d/0x90 [btrfs]
 [] __fsync_super+0x52/0x80
 [] fsync_super+0x11/0x30
 [] generic_shutdown_super+0x22/0x100
 [] kill_anon_super+0x11/0x50
 [] deactivate_super+0x56/0x80
 [] mntput_no_expire+0xd9/0x150
 [] sys_umount+0x5f/0x3c0
 [] ? lockdep_sys_exit_thunk+0x35/0x67
 [] system_call_fastpath+0x16/0x1b
Code: e8 7e 3f 00 00 f7 d3 48 83 c4 08 83 e3 a1 89 d8 5b 41 5c 41 5d c9 c3 0f 
0b eb fe 0f 1f 84 00 00 00 00 00 0f 0b eb fe 0f 1f 40 00 <0f> 0b eb fe 0f 1f 40 
00 55 48 89 e5 53 48 89 fb 48 83 ec 08 83 
RIP  [] submit_bh+0x128/0x130
 RSP 
---[ end trace 941b43e9d76fb177 ]---


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html