from:"Mike Fedyk"

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Mike Fedyk

On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
  512MB.
 
  'free' reports 75MB, 419MB free.
 
  I originally noticed the problem on really real hardware (thinkpad
  T61p), however.

 If you can easily reproduce it could you try a git bisect?

 Do we have a known good kernel?  I looked back through the thread and
 didn't see any reports where the postgres test on ext4 passed in this
 config.


2.6.34.something.  -- Any chance a newer kernel can be tested to be found good?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Mike Fedyk

On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
 Hi,

 I think that the disk allocation size of each file becomes a monotone 
 increase
 when the file is made.
 But, it sometimes return to 0.  Is it correct?

 Well, there's a window during the processing of delayed allocation where
 we don't have the bytes recorded as delalloc and we don't have the bytes
 recorded in the inode yet.  That's why they are showing up as zero.

 We don't call inode_add_bytes() until after we insert the extent, but we
 drop the delalloc byte count on the file before the IO is done.

 Fixing it will be a little tricky because all the extent accounting
 assumes the inode_add_bytes happens at extent insertion time.


How does opening the inode with O_APPEND during this window know where
to write the bytes?  If it's a pointer/cursor to the EOF then that
size could be used during the window.  Is that right?



 The result of the test at 2.6.37-rc4 is shown below.
 (see inode no. 291)

     # df -T /test14
     Filesystem    Type   1K-blocks      Used Available Use% Mounted on
     /dev/sdd14   btrfs     4162560      8736   3709440   1% /test14
     # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100
     # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200
     # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 oflag=direct
     # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400
     # ls -lis /test14/dir
     total 406528
     288      0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sleep 3
     # ls -lis /test14/dir
     total 406528
     288      0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sleep 3
     # ls -lis /test14/dir
     total 307200
     288      0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291      0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sleep 3
     # ls -lis /test14/dir
     total 409600
     288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289      0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291      0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
     # sync
     # ls -lis /test14/dir
     total 1024000
     288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
     289 204800 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
  - 291 409600 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
     290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603

 The trace result of btrfs_getattr() is shown below.

  Dec  7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
 delalloc_bytes:101711872
  Dec  7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
 delalloc_bytes:0
  Dec  7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 
 i_bytes:0 delalloc_bytes:0


 Regards,
 Itoh

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Mike Fedyk

On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
 On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
  Hi,
 
  I think that the disk allocation size of each file becomes a monotone 
  increase
  when the file is made.
  But, it sometimes return to 0.  Is it correct?
 
  Well, there's a window during the processing of delayed allocation where
  we don't have the bytes recorded as delalloc and we don't have the bytes
  recorded in the inode yet.  That's why they are showing up as zero.
 
  We don't call inode_add_bytes() until after we insert the extent, but we
  drop the delalloc byte count on the file before the IO is done.
 
  Fixing it will be a little tricky because all the extent accounting
  assumes the inode_add_bytes happens at extent insertion time.
 

 How does opening the inode with O_APPEND during this window know where
 to write the bytes?  If it's a pointer/cursor to the EOF then that
 size could be used during the window.  Is that right?

 This counter records the number of blocks allocated to the file, and
 reading it with ls -l or stat is somewhat racey by nature.  Most of the
 time its fine, btrfs just has a really big window where the results from
 ls -l seem wrong.


I see.  Is it using per-cpu vars or something similar?

 But, the counter really means nothing to the btrfs internals.  When we
 do file operations we go based on the extent pointers we find in the
 tree and i_size (i_size is strictly maintained).


Would it be too heavy of an operation to have stat walk the btrfs tree
to get its data?

 The incorrect results are confusing but they don't hurt the metadata
 itself.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The value displayed by 'ls -s' command is strange.

2010-12-07 Thread Mike Fedyk

On Tue, Dec 7, 2010 at 12:15 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Mike Fedyk's message of 2010-12-07 15:07:08 -0500:
 On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500:
  On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com 
  wrote:
   Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500:
   Hi,
  
   I think that the disk allocation size of each file becomes a monotone 
   increase
   when the file is made.
   But, it sometimes return to 0.  Is it correct?
  
   Well, there's a window during the processing of delayed allocation where
   we don't have the bytes recorded as delalloc and we don't have the bytes
   recorded in the inode yet.  That's why they are showing up as zero.
  
   We don't call inode_add_bytes() until after we insert the extent, but we
   drop the delalloc byte count on the file before the IO is done.
  
   Fixing it will be a little tricky because all the extent accounting
   assumes the inode_add_bytes happens at extent insertion time.
  
 
  How does opening the inode with O_APPEND during this window know where
  to write the bytes?  If it's a pointer/cursor to the EOF then that
  size could be used during the window.  Is that right?
 
  This counter records the number of blocks allocated to the file, and
  reading it with ls -l or stat is somewhat racey by nature.  Most of the
  time its fine, btrfs just has a really big window where the results from
  ls -l seem wrong.
 

 I see.  Is it using per-cpu vars or something similar?


Ok, so to make sure I fully understand I'm going to make some psuedo
code based on your description.

 Our stat function returns the block count in the inode plus the number
 of bytes we have accounted as delayed allocation.


stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes

 As we do writes to the file, the delayed allocation count goes up and
 then eventually we decide we need to do some IO.

 Before we do the IO, we have to decide where on the disk to write the
 extents.

inode_a2 = inode_a1

inode_a1 and inode_a2 are the same inode, but inode_a2 has a different
list of extents and is not written yet (in the case of appending, most
of the extents will be the same in the two extent lists, but inode_a2
will have more extents for the newly appended data)

 Once that is decided, we decrement the count of delayed
 allocation bytes.

 This is when stat starts returning the wrong answer.


inode_a2.bytes += inode_a1_delayed_allocation_bytes
inode_a1_delayed_allocation_bytes -= inode_a1_delayed_allocation_bytes
stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes

Is it possible to have stat read from inode_a2 during this window?

So it would be instead:

stat = inode_a2.bytes

 Then we do the IO, and when the IO is done we actually insert the file
 extents into the file metadata.  This is when stat starts returning the
 right answer again.


/* implicit when write completes */
inode_a1 = inode_a2
kfree(inode_a2)
stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes

 The whole setup sounds strange, but this is how btrfs implements the
 semantics from data=ordered.  We don't update the file to point to
 the new blocks until after the IO is done, so we never have to wait on
 the data IO before we can do a transaction commit.  It avoids all kinds
 of latencies with fsync and other problems.

 One easy solution is to just add another counter in the in-memory inode
 for the number of bytes in flight that aren't accounted for in other
 places.  But I'd rather not make the inode any bigger, so I'll have to
 think if we can solve this another way.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 800 GByte free, but no space left

2010-12-02 Thread Mike Fedyk

On Thu, Dec 2, 2010 at 10:23 AM, Helmut Hullen hul...@t-online.de wrote:
 Btrfs Btrfs v0.19


btrfs in the kernel has been version 0.19 for a *long* time.  The
version number there may never change.  How do you encode a feature
mask in a version number?  Some features may be in one tree but not
upstreamed all together and other such minutiae.

What you need to do is use a more recent kernel than 2.6.32 if you
want to use btrfs (modulo backports, but let's not talk about that
right now).

So if you're using a kernel older than 2.6.36, then you should probably upgrade.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What to do about subvolumes?

2010-12-01 Thread Mike Fedyk

On Wed, Dec 1, 2010 at 3:32 PM, Freddie Cash fjwc...@gmail.com wrote:
 On Wed, Dec 1, 2010 at 1:28 PM, Hugo Mills hugo-l...@carfax.org.uk wrote:
 On Wed, Dec 01, 2010 at 12:24:28PM -0800, Freddie Cash wrote:
 On Wed, Dec 1, 2010 at 11:35 AM, Hugo Mills hugo-l...@carfax.org.uk wrote:
   The idea is you are only charged for what blocks
  you have on the disk.  Thanks,
 
    My point was that it's perfectly possible to have blocks on the
  disk that are effectively owned by two people, and that the person to
  charge for those blocks is, to me, far from clear. You either end up
  charging twice for a single set of blocks on the disk, or you end up
  in a situation where one person's actions can cause another person's
  quota to fill up. Neither of these is particularly obvious behaviour.

 As a sysadmin and as a user, quotas shouldn't be about physical
 blocks of storage used but should be about logical storage used.

 IOW, if the filesystem is compressed, using 1 GB of physical space to
 store 10 GB of data, my quota used should be 10 GB.

 Similar for deduplication.  The quota is based on the storage *before*
 the file is deduped.  Not after.

 Similar for snapshots.  If UserA has 10 GB of quota used, I snapshot
 their filesystem, then my quota used would be 10 GB as well.  As
 data in my snapshot changes, my quota used is updated to reflect
 that (change 1 GB of data compared to snapshot, use 1 GB of quota).

   So if I've got 10G of data, and I snapshot it, I've just used
 another 10G of quota?

 Sorry, forgot the per user bit above.

 If UserA has 10 GB of data, then UserB snapshots it, UserB's quota
 usage is 10 GB.

 If UserA has 10 GB of data and snapshots it, then only 10 GB of quota
 usage is used, as there is 0 difference between the snapshot and the
 filesystem.  As UserA modifies data, their quota usage increases by
 the amount that is modified (ie 10 GB data, snapshot, modify 1 GB data
 == 11 GB quota usage).

 If you combine the two scenarios, you end up with:
  - UserA has 10 GB of data == 10 GB quota usage
  - UserB snapshots UserA's filesystem (clone), so UserB has 10 GB
 quota usage (even though 0 blocks have changed on disk)

Please define where the owner of a subvolume/snapshot is stored.

To my knowledge when you make a snapshot, you have the same set of
files with the same set of owners and groups.  Whatever user does the
snapshot this does not change this unless chown or chgrp are used.

Also a non-root user (or a process without CAP_whatever) should not be
able to snapshot a subvolume where the root directory of that
subvolume is not owned by the user attempting the snapshot.   If you
do not do so then you end up with the same security and quota issues
that hard links have when you don't have separate filesystems.

You could have separate subvolumes for / and /home/foo and user foo
could snapshot / to /home/foo/exploit_later_001 and then foo can just
wait for an exploit to come along for one of the binaries or libs in
/home/foo/exploit_later_001 and own.

Yes, snapshot creation should be more restricted than hard links, for
good reason.

I have other questions but the answer to this fundamental game changer
may solve many of the mentioned issues.

  - UserA snapshots UserA's filesystem == no change to quota usage (no
 blocks on disk have changed)
  - UserA modifies 1 GB of data in the filesystem == 1 GB new quota
 usage (11 GB total) (1 GB of blocks owned by UserA have changed, plus
 the 10 GB in the snapshot)
  - UserB still only has 10 GB quota usage, since their snapshot
 hasn't changed (0 blocks changed)

 If UserA deletes their filesystem and all their snapshots, freeing up
 11 GB of quota usage on their account, UserB's quota will still be 10
 GB, and the blocks on the disk aren't actually removed (still
 referenced by UserB's snapshot).

 Basically, within a user's account, only the data unique to a snapshot
 should count toward the quota.

 Across accounts, the original (root) snapshot would count completely
 to the new user's quota, and then only data unique to subsequent
 snapshots would count.

 I hope that makes it more clear.  :)  All the different layers and
 whatnot get confusing.  :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly

2010-12-01 Thread Mike Fedyk

On Wed, Dec 1, 2010 at 8:28 PM, Yan, Zheng yanzh...@21cn.com wrote:
 On Thu, Dec 2, 2010 at 11:42 AM, liubo liubo2...@cn.fujitsu.com wrote:
 On 12/01/2010 06:20 PM, liubo wrote:
 When the filesystem is readonly, avoid transaction stuff by checking 
 MS_RDONLY at
 start transaction time.


 This patch may lead btrfs panic.

 Since btrfs allows transaction under readonly fs state, which is a bit 
 weird, btrfs
 does not even check the returned transaction from start_transaction, 
 although it may
 return -ENOMEM.

 btrfs may do log replay even mount as readonly.


What part is logged besides tree roots and/or superblocks?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)

2010-11-29 Thread Mike Fedyk

On Mon, Nov 29, 2010 at 12:02 AM, Li Zefan l...@cn.fujitsu.com wrote:
 (Cc: Sage Weil s...@newdream.net for changes in async snapshots)

 This patchset adds readonly-snapshots support. You can create a
 readonly snapshot, and you can also set a snapshot readonly/writable
 on the fly.

 A few readonly checks are added in setattr, permission, remove_xattr
 and set_xattr callbacks, as well as in some ioctls.


Great work!

I have a suggestion on defaults when snapshots are created.  I think
they should default to being read-only and if they are meant to be
read-write a flag can be set at creation time (and changable at a
later time as well of course).

This way user/admin preconceptions of a snapshot being read-only can
be enforced by default, and the exception when you want a read-write
snapshot can be available with a switch at the cli level (and probably
a flag at the ioctl level).

It gives one more natural distinction between a snapshot and a
subvolume at the user conceptual level.

What do you think?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)

2010-11-29 Thread Mike Fedyk

On Mon, Nov 29, 2010 at 12:41 PM, David Arendt ad...@prnet.org wrote:
 On 11/29/10 21:02, Mike Fedyk wrote:

 On Mon, Nov 29, 2010 at 12:02 AM, Li Zefanl...@cn.fujitsu.com  wrote:

 (Cc: Sage Weils...@newdream.net  for changes in async snapshots)

 This patchset adds readonly-snapshots support. You can create a
 readonly snapshot, and you can also set a snapshot readonly/writable
 on the fly.

 A few readonly checks are added in setattr, permission, remove_xattr
 and set_xattr callbacks, as well as in some ioctls.

 Great work!

 I have a suggestion on defaults when snapshots are created.  I think
 they should default to being read-only and if they are meant to be
 read-write a flag can be set at creation time (and changable at a
 later time as well of course).

 This way user/admin preconceptions of a snapshot being read-only can
 be enforced by default, and the exception when you want a read-write
 snapshot can be available with a switch at the cli level (and probably
 a flag at the ioctl level).

 It gives one more natural distinction between a snapshot and a
 subvolume at the user conceptual level.

 What do you think?

 I completely agree with you. I think lots of people use snapshots for backup
 purposes and these ones shouldn't be writable.

 by default.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase

2010-11-29 Thread Mike Fedyk

On Mon, Nov 29, 2010 at 12:10 PM, Josef Bacik jo...@redhat.com wrote:
 On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote:
 Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic.
 Meanwhile, they are very ugly and should be handled more propriately.

 There are mainly two ways to deal with these BUG_ON()s.

 1. For those errors which can be handled well by callers, we just return 
 their
 error number to callers.

 2. For others, We can force the filesystem readonly when it hits errors, 
 which
  is what this patchset has done. Replaced BUG_ON() with the interface 
 provided
  in this patchset, we will get error infomation via dmesg. Since btrfs is now
 readonly, we can save our data safely and umount it, then a btrfsck is
 recommended.

 By these ways, we can protect our filesystem from panic caused by those
 BUG_ONs.

 ---
  fs/btrfs/ctree.h       |   21 ++
  fs/btrfs/disk-io.c     |   23 +++
  fs/btrfs/super.c       |  100 
 ++-
  fs/btrfs/transaction.c |    7 +++
  4 files changed, 148 insertions(+), 3 deletions(-)


 Overall seems sane, but what about kernels that don't make these checks?  I'm 
 ok
 with well sucks for them as an answer, just want to make sure we've at least
 though about it.

 Also I'm not sure marking the fs as broken is the right move here.  Ext3/4 
 don't
 do this, they just mount read-only, as long as you can still unmount the
 filesystem everything comes out ok.  Think of the case where we just get a
 spurious EIO, the fs should be fine the next time around, there's reason to
 force the user to run fsck in this case.


Did you mean there's no reason to?

Also I guess you mean this in the case when there is no redundancy
(single and raid0) as the other cases should recover from spurious EIO
at run time.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)

2010-11-29 Thread Mike Fedyk

On Mon, Nov 29, 2010 at 1:31 PM, Andrey Kuzmin
andrey.v.kuz...@gmail.com wrote:
 This may sound excessive as any new concept introduction that late in
 development, but readonly/writable snapshots could be further
 differentiated by naming the latter clones. This way end-user would
 naturally perceive snapsot as read-only PIT fs image, while clone
 would naturally refer to (writable) head fork.


I'm not sure we want to take all of the terminology that zfs uses as
it may also bring the percieved drawbacks as well.  Isn't there some
additional overhead for a zfs clone compared to a snapshot?  I'm not
very familiar with zfs so that's why I ask.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Update to Project_ideas wiki page

2010-11-17 Thread Mike Fedyk

On Wed, Nov 17, 2010 at 7:12 AM, Bart Noordervliet
b...@noordervliet.net wrote:
 On Wed, Nov 17, 2010 at 15:31, Hugo Mills hugo-l...@carfax.org.uk wrote:
 On Tue, Nov 16, 2010 at 10:19:45PM -0500, Chris Ball wrote:
 == Changing RAID levels ==

 We need ioctls to change between different raid levels.  Some of these
 are quite easy -- e.g. for RAID0 to RAID1, we just halve the available
 bytes on the fs, then queue a rebalance.

   I would be interested in the rebalancing ioctls, and in RAID level
 management. I'm still very much trying to learn the basics, though, so
 I may go very slowly at first...

   Hugo.

 Can I suggest we combine this new RAID level management with a
 modernisation of the terminology for storage redundancy, as has been
 discussed previously in the Raid1 with 3 drives thread of March this
 year? I.e. abandon the burdened raid* terminology in favour of
 something that makes more sense for a filesystem.

 Mostly this would involve a discussion about what terms would make
 most sense, though some changes in the behaviour of btrfs redundancy
 modes may be warranted if they make things more intuitive.

 I could help you make these changes in your patches, or write my own
 patches against yours, though I'm also completely new to kernel
 development.


That would inherently solve the need to convert between dup and raid1
as well.  Why those are separate and why dup does not become raid1
when there are N  1 drives is beyond me.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs-progs: Update man page for mixed data+metadata option.

2010-11-12 Thread Mike Fedyk

On Fri, Nov 12, 2010 at 6:28 AM, Marek Otahal markota...@gmail.com wrote:
 On Friday 12 of November 2010 18:44:12 you wrote:
 On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik jo...@redhat.com wrote:
  On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote:
  On 11/11/10 23:52, Josef Bacik wrote:
   This feature incurs a performance penalty in larger filesystems, it is
   recommended for use with filesystems of 1 GiB or smaller.
 
  Maybe slightly stronger, for example:
 
  This feature incurs a performance penalty for larger filesystems and it
  is ONLY recommended for use with filesystems of 1 GiB or smaller.
 
  Is it worth having a check and a warning printed if a user does
  try and make a filesystem larger than 1GiB with this option ?
 
  Just in case they don't RTFM...
 
  No because depending on your usage it's actually kind of usefull for
  anything less than 5 GiB, and you're only looking at about a 5-10% perf
  degredation when using it on larger filesystems.  Thanks,

 Then a warning of 10% slowdown if  10GB would be good.  It's
 surprising how many will just read some forum post and not concern
 themselves with the docs at all.

 And making them type yes if  100GB is probably a good idea too...
 My 2c: I'm against bloating the program just because of people who don't RTFM.
 Just mention it clearly in docs and that's enough, linux does what it's asked
 for, not the Are you really really sure you want to do this? known from some
 other OS. Anyway, btrfs-progs would be probably run by a user with root

I was thinking of what ssh does when it sees a changed key...
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/2] Control filesystem balances (kernel side)

2010-11-08 Thread Mike Fedyk

[ sorry for breaking the thread, I'm replying from the archives, I was
unsubbed after a mail server issue and didn't notice till now... ]

On Sat, Oct 30, 2010 at 07:44:35PM +0200, Goffredo Baroncelli wrote:
   balance- info on balancing

Hugo Mills wrote:
 For the one-value-per-file rule of sysfs, this should probably be
 balance_expected and balance_completed, each holding a count of block
 groups.

I'd name it balance_chunks_expected and balance_chunks_completed
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add the btrfs filesystem label command

2010-09-15 Thread Mike Fedyk

On Mon, Sep 13, 2010 at 12:24 PM, Goffredo Baroncelli
kreij...@gmail.com wrote:
 +int get_label(char *btrfs_dev)
 +{
 +
 +       int ret;
 +       ret = check_mounted(btrfs_dev);
 +       if (ret  0)
 +       {
 +              fprintf(stderr, FATAL: error checking %s mount status\n,
 btrfs_dev);
 +              return -1;
 +       }
 +
 +       if(ret != 0)
 +       {
 +              fprintf(stderr, FATAL: the filesystem has to be
 unmounted\n);
 +              return -2;
 +       }
 +       get_label_unmounted(btrfs_dev);
 +       return 0;
 +}
 +
 +

Why can't the label be read while the fs is mounted?  It shouldn't
hurt anything.  I can read the superblock on my ext3 fs while it's
mounted...  This is what people have come to expect.

 --- a/utils.c
 +++ b/utils.c
 @@ -638,6 +638,39 @@ int check_mounted(char *file)
        return ret;
  }

 +/* Gets the mount point of btrfs filesystem that is using the specified
 device.
 + * Returns 0 is everything is good, 0 if we have an error.
 + * TODO: Fix this fucntion and check_mounted to work with multiple drive
 BTRFS
 + * setups.
 + */

Typo: s/fucntion/function/g


 +int get_mountpt(char *dev, char *mntpt, size_t size)
 +{
 +       struct mntent *mnt;
 +       FILE *f;
 +       int ret = 0;
 +
 +       f = setmntent(/proc/mounts, r);
 +       if (f == NULL)
 +               return -errno;
 +
 +       while ((mnt = getmntent(f)) != NULL )
 +       {
 +               if (strcmp(dev, mnt-mnt_fsname) == 0)
 +               {
 +                       strncpy(mntpt, mnt-mnt_dir, size);
 +                       break;
 +               }
 +       }
 +
 +       if (mnt == NULL)
 +       {
 +               /* We didn't find an entry so lets report an error */
 +               ret = -1;
 +       }
 +
 +       return ret;
 +}
 +
  struct pending_dir {
        struct list_head list;
        char name[256];
 @@ -820,3 +853,27 @@ char *pretty_sizes(u64 size)
        return pretty;
  }

 +/*
 + * Checks to make sure that the label matches our requirements.
 + * Returns:
 +       0    if everything is safe and usable
 +      -1    if the label is too long
 +      -2    if the label contains an invalid character
 + */
 +int check_label(char *input)
 +{
 +       int i;
 +       int len = strlen(input);
 +
 +       if (len  BTRFS_LABEL_SIZE) {
 +               return -1;
 +       }
 +
 +       for (i = 0; i  len; i++) {
 +               if (input[i] == '/' || input[i] == '\\') {
 +                       return -2;
 +               }
 +       }
 +
 +       return 0;
 +}

How can one char equal two chars?

input[i] == '\\'

This should never be able to happen.  Right?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: Remove useless condition

2010-09-12 Thread Mike Fedyk

On Sun, Sep 12, 2010 at 6:56 AM, Jaswinder Singh Rajput
jaswinderli...@gmail.com wrote:
 Hello,

 On Sun, Sep 12, 2010 at 5:59 PM, Johannes Weiner han...@cmpxchg.org wrote:
 On Sun, Sep 12, 2010 at 04:32:20PM +0530, Jaswinder Singh Rajput wrote:

 if (ret) is useless as it will be never NULL as in previous statement
 we are setting ret = prev for !ret

 If there is no match and no extent below the given file offset, `prev'
 will be NULL as well, no?

 So the check is not useless, it prevents throwing out a cached success
 in case of a lookup failure.


 Got it !!


Wouldn't it be clearer and easier to read if prev was checked directly
instead of checking ret after it becomes the same as prev?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

2010-06-23 Thread Mike Fedyk

On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor daniel.tay...@wdc.com wrote:
 Just an FYI reminder.  The original test (2K files) is utterly
 pathological for disk drives with 4K physical sectors, such as
 those now shipping from WD, Seagate, and others.  Some of the
 SSDs have larger (16K0 or smaller blocks (2K).  There is also
 the issue of btrfs over RAID (which I know is not entirely
 sensible, but which will happen).

 The absolute minimum allocation size for data should be the same
 as, and aligned with, the underlying disk block size.  If that
 results in underutilization, I think that's a good thing for
 performance, compared to read-modify-write cycles to update
 partial disk blocks.

Block size = 4k

Btrfs packs smaller objects into the blocks in certain cases.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: A couple of questions

2010-05-31 Thread Mike Fedyk

On Mon, May 31, 2010 at 11:06 AM, Paul Millar paul.mil...@desy.de wrote:
 Hi Chris,

 On Thursday 27 May 2010 18:00:44 Chris Mason wrote:
 I'd suggest that you look at T10 DIF and DIX, which are targeted at
 exactly this kind of thing.  We're looking at integrating dif/dix into
 btrfs at some point.

 I've been keeping half-an-eye on T10's work in ensuring end-to-end integrity.
 That you guys are planning to integrate dif/dix support is certainly welcome
 news!

 In my use-case (a file-server that receives a new file from a remote client),
 I believe that, to ensure end-to-end integrity,  the server software would
 have to push the client-supplied checksum into the FS when writing a new file.
 (I believe there's some T10 slides somewhere that show this use-case) -- or
 (equivalently) the server software obtains the FS checksum for the file and
 matches it against the client-supplied value.

 I'm deliberately taking the simplest case when the client has chosen the same
 checksum algorithm as the FS uses.  In reality, this may not be the case, but
 we can probably cope with that.

 My concern is that, if the server-software doesn't push the client-provided
 checksum then the FS checksum (plus T-10 DIF/DIX) would not provide a rigorous
 assurance that the bytes are the same.  Without this assurance, corruption
 could still occur; for example, within the server's memory.


Have you taken into account the boundaries of the data checksums?
Your app may checksum per file or some logical partition in the file
format.  Btrfs does the checksum per-extent so unless you keep track
of where the extent boundaries are, that checksum will be useless to
the userspace app.  Also the app would be tied specifically to a
storage technology.  No matter how great foo might be, not everyone's
going to use it.

Also are you going to get this info over nfs, cifs, lustre, gluster,
ceph, foo, bar and baz?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Disk space accounting and subvolume delete

2010-05-31 Thread Mike Fedyk

On Mon, May 31, 2010 at 12:01 PM, Bruce Guenter br...@untroubled.org wrote:
 On Wed, May 12, 2010 at 01:02:07PM +0800, Yan, Zheng  wrote:
 Dropping a tree can be lengthy. It's not good to let sync wait for hours.
 For most linux FS, 'sync' just force an transaction/journal commit. I don't
 think they wait for large operations that can span multiple transactions to
 complete.

 What happens to the consistency of the filesystem if a crash happens
 during this process?

There's a good test case for you to try.  Let us know what you find.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/11] btrfs: remove unneeded null check in btrfs_rename()

2010-05-29 Thread Mike Fedyk

On Sat, May 29, 2010 at 2:45 AM, Dan Carpenter erro...@gmail.com wrote:
 old_inode cannot be null here, because we dereference it
 unconditionally throughout the function.

 Signed-off-by: Dan Carpenter erro...@gmail.com

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index fa6ccc1..0bc29be 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -6487,10 +6487,8 @@ static int btrfs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
         * make sure the inode gets flushed if it is replacing
         * something.
         */
 -       if (new_inode  new_inode-i_size 
 -           old_inode  S_ISREG(old_inode-i_mode)) {
 +       if (new_inode  new_inode-i_size  S_ISREG(old_inode-i_mode))
                btrfs_add_ordered_operation(trans, root, old_inode);
 -       }

I think code like this is here because there are still a lot of
features that are being added to btrfs and it's easier to have the
additional checks than continually adding and removing them as the
code changes.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Confused by performance

2010-05-24 Thread Mike Fedyk

On Mon, May 24, 2010 at 2:08 PM, K. Richard Pixley r...@noir.com wrote:
 I've just started to work with btrfs so I started with a benchmark.  On four
 identical servers, (2 dual core cpus, single local disk), I built
 filesystems - ext3, ext4, nilfs2, and btrfs.  I checked out a sizable code
 tree and timed a build.  The build is parallelized to use 4 threads when
 possible.

 I'm seeing similar build times on ext[34] and nilfs2 but I'm seeing almost
 double the times for btrfs using default options.  And I'm having trouble
 reconciling this performance cost with the benchmarks I'm seeing around the
 net.

 Is this a common result?  Is there a trick to getting ext4 competitive
 performance out of btrfs?  Is my application a poor choice for btrfs?  Am I
 missing something obvious here?


Please make sure you're testing with the latest btrfs from git or
linus latest kernel.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID[56] status?

2010-05-23 Thread Mike Fedyk

On Sun, May 23, 2010 at 1:55 PM, Roy Sigurd Karlsbakk r...@karlsbakk.net 
wrote:
 Hi all

 It's about a year now since I saw the first posts about RAID[56] in Btrfs. 
 Has this gotten any further?


There are patches in development.  Nothing ready to test yet.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] direct-io: do not merge logically non-contiguous requests

2010-05-21 Thread Mike Fedyk

On Fri, May 21, 2010 at 10:03 AM, Josef Bacik jo...@redhat.com wrote:
 Btrfs cannot handle having logically non-contiguous requests submitted.  For
 example if you have

 Logical:  [0-4095][HOLE][8192-12287]
 Physical: [0-4095]      [4096-8191]

 Normally the DIO code would put these into the same BIO's.  The problem is we
 need to know exactly what offset is associated with what BIO so we can do our
 checksumming and unlocking properly, so putting them in the same BIO doesn't
 work.  So add another check where we submit the current BIO if the physical
 blocks are not contigous OR the logical blocks are not contiguous.

 Signed-off-by: Josef Bacik jo...@redhat.com
 ---

 V1-V2
 -Be more verbose in the in-code comment

  fs/direct-io.c |   20 ++--
  1 files changed, 18 insertions(+), 2 deletions(-)


Btrfs has been pretty much self-contained (working well compiled
against 2.6.32 for example).  Is there a way that this wouldn't just
start silently breaking for people compiling the latest btrfs with
dkms against older kernels?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/10] btrfs: Add error check for add_to_page_cache_lru

2010-05-20 Thread Mike Fedyk

On Thu, May 20, 2010 at 12:18 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 From: Liu Bo liubo2...@cn.fujitsu.com

 If add_to_page_cache_lru() returns -EEXIST, it indicates the page
 that belongs to this page_index has been added and this readahead
 action can go on to next page.

 If add_to_page_cache_lru() returns -ENOMEM, it should break for
 no memory left.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/compression.c |   19 ---
  1 files changed, 16 insertions(+), 3 deletions(-)

 diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
 index 1d54c53..1bd4d92 100644
 --- a/fs/btrfs/compression.c
 +++ b/fs/btrfs/compression.c
 @@ -480,10 +480,23 @@ static noinline int add_ra_bio_pages(struct inode 
 *inode,
                if (!page)
                        break;

 -               if (add_to_page_cache_lru(page, mapping, page_index,
 -                                                               GFP_NOFS)) {
 +               ret = add_to_page_cache_lru(page, mapping, page_index,
 +                                                               GFP_NOFS);
 +               if (ret) {
                        page_cache_release(page);
 -                       goto next;
 +
 +                       /*
 +                        * -EEXIST indicates the page has been added, so
 +                        * it can move on to next page.
 +                        */
 +                       if (ret == -EEXIST) {
 +                               misses++;
 +                               if (misses  4)
 +                                       break;

Shouldn't this use a pre-processor label instead of hard coding
compression sensitivity or readahead tuning?  this way it'll be set in
one place.

 +                               goto next;
 +                       }
 +
 +                       break;
                }

                end = last_offset + PAGE_CACHE_SIZE - 1;
 --
 1.6.5.2

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/10] btrfs: fix wrong ctime when adding link

2010-05-20 Thread Mike Fedyk

On Thu, May 20, 2010 at 12:22 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 the ctime of file has not been updated when I create a link for it.

 Steps to reproduce:
  # touch file1
  # stat -c %Z file1
  1273592239
  # link flink1 file1
  # stat -c %Z file1
  1273592239             -- have not been updated

 This patch fix this problem.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c |    8 ++--
  1 files changed, 6 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index a85b90c..5271887 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -4218,8 +4218,12 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,

                btrfs_i_size_write(parent_inode, parent_inode-i_size +
                                   name_len * 2);
 -               parent_inode-i_mtime = parent_inode-i_ctime = CURRENT_TIME;
 -               ret = btrfs_update_inode(trans, root, parent_inode);
 +               parent_inode-i_mtime = parent_inode-i_ctime = inode-i_ctime
 +                                     = CURRENT_TIME;
 +
 +               ret = btrfs_update_inode(trans, root, inode);
 +               if (!ret)
 +                       ret = btrfs_update_inode(trans, root, parent_inode);

You only update parent inode if write to current inode fails?

Also should you be updating the ctime of parent inode even with link
count of parent inode is not modified (btrfs always reports link count
of 1 on directories)?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Adding mirroring to an existing filesystem

2010-05-16 Thread Mike Fedyk

On Sun, May 16, 2010 at 8:38 AM, J G yoosty_...@yahoo.com wrote:


 --- On Sun, 5/16/10, Donald Gordon d...@dis.org.nz wrote:

 From: Donald Gordon d...@dis.org.nz
 Subject: Adding mirroring to an existing filesystem
 To: linux-btrfs@vger.kernel.org
 Date: Sunday, May 16, 2010, 4:39 AM
 Hi

 Is there some way I can add an extra disk as a mirror to an
 existing
 btrfs filesystem?  Or must I create a new filesystem
 with RAID1, and
 then copy over all my data?

 https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Adding_New_Devices

 I would be sure to back my data up before performing this procedure ;)

You should have good backups and keep having good backups before and
during the use of btrfs.  Period.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: help message of btrfsctl does not tell anything about deletion of a subvolume

2010-05-15 Thread Mike Fedyk

On Sat, May 15, 2010 at 11:47 AM, Andreas Philipp
philipp.andr...@gmail.com wrote:
 Hi,

 The help message of the btrfsctl command does not tell anything about
 the deletion of a subvolume. See patch below.

 Kind regards,
 Andreas

 diff --git a/btrfsctl.c b/btrfsctl.c
 index be6bf25..3ed6f2d 100644
 --- a/btrfsctl.c
 +++ b/btrfsctl.c
 @@ -56,7 +56,7 @@ static void print_usage(void)
     printf(\t-A device: scans the device file for a Btrfs filesystem\n);
     printf(\t-a: scans all devices for Btrfs filesystems\n);
     printf(\t-c: forces a single FS sync\n);
 -    printf(\t-D: delete snapshot\n);
 +    printf(\t-D: delete snapshot or subvolume\n);
     printf(\t-m [tree id] directory: set the default mounted subvolume
             to the [tree id] or the directory\n);
     printf(%s\n, BTRFS_BUILD_VERSION);


We have a new command btrfs subvolume delete path which can be
shortened even as far as btrfs s d path.

Are we going to keep the btrfsctl program indefinitely when we have a
replacement in the btrfs program?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] Fix version.sh to work with dash

2010-05-04 Thread Mike Fedyk


---

 fs/btrfs/version.h  |6 +++---
 fs/btrfs/version.sh |   16 
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/version.h b/fs/btrfs/version.h
index 9bf3946..12f7e5c 100644
--- a/fs/btrfs/version.h
+++ b/fs/btrfs/version.h
@@ -1,4 +1,4 @@
-#ifndef __BTRFS_VERSION_H
-#define __BTRFS_VERSION_H
-#define BTRFS_BUILD_VERSION Btrfs
+#ifndef __BUILD_VERSION
+#define __BUILD_VERSION
+#define BTRFS_BUILD_VERSION Btrfs 2010-05-04_08:46:49_-0700_ea1dcb3-dirty
 #endif
diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh
index a4576f2..0733eef 100755
--- a/fs/btrfs/version.sh
+++ b/fs/btrfs/version.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/dash
 #
 # determine-version -- report a useful version for releases
 #
@@ -8,10 +8,10 @@
  
 v=v0.16
 
-which git  /dev/null
-if [ $? == 0 ]; then
-git branch  /dev/null
-if [ $? == 0 ]; then
+which git 21  /dev/null
+if [ $? -eq 0 ]; then
+git branch 21  /dev/null
+if [ $? -eq 0 ]; then
v=`git show --format='%ci_%h'|head -n 1|sed 
's/[^a-z0-9_-:]/_/ig'`
 
# Are there uncommitted changes?
@@ -19,7 +19,7 @@ if [ $? == 0 ]; then
if git diff-index --name-only HEAD | \
grep -v ^scripts/package \
| read dummy; then
-   v=$v-dirty
+   v=${v}-dirty
fi
 fi
 fi
@@ -29,9 +29,9 @@ echo #define __BUILD_VERSION  .build-version.h
 echo #define BTRFS_BUILD_VERSION \Btrfs $v\  .build-version.h
 echo #endif  .build-version.h
 
-diff -q version.h .build-version.h  /dev/null
+diff -q version.h .build-version.h 21  /dev/null
 
-if [ $? == 0 ]; then
+if [ $? -eq 0 ]; then
 rm .build-version.h
 exit 0
 fi

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] Change version.sh from last tag and hash to output last commit date and hash

2010-05-04 Thread Mike Fedyk

The btrfs git repo doesn't have all of the tags from the base 2.6.32 kernel
it's currently based upon and the btrfs module is regularly compiled
against other kernels so this changes the version to be based upon the date
and hash of the latest commit instead which is more relevant to most people
testing.

An example version string with this change:
Btrfs 2010-04-06_09:37:47_-0400_9f680ce
---

 fs/btrfs/version.sh |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)
 mode change 100644 = 100755 fs/btrfs/version.sh

diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh
old mode 100644
new mode 100755
index 1ca1952..a4576f2
--- a/fs/btrfs/version.sh
+++ b/fs/btrfs/version.sh
@@ -12,10 +12,7 @@ which git  /dev/null
 if [ $? == 0 ]; then
 git branch  /dev/null
 if [ $? == 0 ]; then
-   if head=`git rev-parse --verify HEAD 2/dev/null`; then
-   if tag=`git describe --tags 2/dev/null`; then
-   v=$tag
-   fi
+   v=`git show --format='%ci_%h'|head -n 1|sed 
's/[^a-z0-9_-:]/_/ig'`
 
# Are there uncommitted changes?
git update-index --refresh --unmerged  /dev/null
@@ -24,7 +21,6 @@ if [ $? == 0 ]; then
| read dummy; then
v=$v-dirty
fi
-   fi
 fi
 fi
  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Fix version.sh to work with dash

2010-05-04 Thread Mike Fedyk

Please ignore this patch, I will resend a fixed one.

On Tue, May 4, 2010 at 9:12 AM, Mike Fedyk mfe...@mikefedyk.com wrote:

 ---

  fs/btrfs/version.h  |    6 +++---
  fs/btrfs/version.sh |   16 
  2 files changed, 11 insertions(+), 11 deletions(-)

 diff --git a/fs/btrfs/version.h b/fs/btrfs/version.h
 index 9bf3946..12f7e5c 100644
 --- a/fs/btrfs/version.h
 +++ b/fs/btrfs/version.h
 @@ -1,4 +1,4 @@
 -#ifndef __BTRFS_VERSION_H
 -#define __BTRFS_VERSION_H
 -#define BTRFS_BUILD_VERSION Btrfs
 +#ifndef __BUILD_VERSION
 +#define __BUILD_VERSION
 +#define BTRFS_BUILD_VERSION Btrfs 2010-05-04_08:46:49_-0700_ea1dcb3-dirty
  #endif
 diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh
 index a4576f2..0733eef 100755
 --- a/fs/btrfs/version.sh
 +++ b/fs/btrfs/version.sh
 @@ -1,4 +1,4 @@
 -#!/bin/bash
 +#!/bin/dash
  #
  # determine-version -- report a useful version for releases
  #
 @@ -8,10 +8,10 @@

  v=v0.16

 -which git  /dev/null
 -if [ $? == 0 ]; then
 -    git branch  /dev/null
 -    if [ $? == 0 ]; then
 +which git 21  /dev/null
 +if [ $? -eq 0 ]; then
 +    git branch 21  /dev/null
 +    if [ $? -eq 0 ]; then
                v=`git show --format='%ci_%h'|head -n 1|sed 
 's/[^a-z0-9_-:]/_/ig'`

                # Are there uncommitted changes?
 @@ -19,7 +19,7 @@ if [ $? == 0 ]; then
                if git diff-index --name-only HEAD | \
                    grep -v ^scripts/package \
                    | read dummy; then
 -                   v=$v-dirty
 +                   v=${v}-dirty
                fi
     fi
  fi
 @@ -29,9 +29,9 @@ echo #define __BUILD_VERSION  .build-version.h
  echo #define BTRFS_BUILD_VERSION \Btrfs $v\  .build-version.h
  echo #endif  .build-version.h

 -diff -q version.h .build-version.h  /dev/null
 +diff -q version.h .build-version.h 21  /dev/null

 -if [ $? == 0 ]; then
 +if [ $? -eq 0 ]; then
     rm .build-version.h
     exit 0
  fi

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/2] Change version.sh from last tag and hash to output last commit date and hash

2010-05-04 Thread Mike Fedyk

The btrfs git repo doesn't have all of the tags from the base 2.6.32 kernel
it's currently based upon and the btrfs module is regularly compiled
against other kernels so this changes the version to be based upon the date
and hash of the latest commit instead which is more relevant to most people
testing.

An example version string with this change:
Btrfs 2010-04-06_09:37:47_-0400_9f680ce
---

 fs/btrfs/version.sh |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)
 mode change 100644 = 100755 fs/btrfs/version.sh

diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh
old mode 100644
new mode 100755
index 1ca1952..a4576f2
--- a/fs/btrfs/version.sh
+++ b/fs/btrfs/version.sh
@@ -12,10 +12,7 @@ which git  /dev/null
 if [ $? == 0 ]; then
 git branch  /dev/null
 if [ $? == 0 ]; then
-   if head=`git rev-parse --verify HEAD 2/dev/null`; then
-   if tag=`git describe --tags 2/dev/null`; then
-   v=$tag
-   fi
+   v=`git show --format='%ci_%h'|head -n 1|sed 
's/[^a-z0-9_-:]/_/ig'`
 
# Are there uncommitted changes?
git update-index --refresh --unmerged  /dev/null
@@ -24,7 +21,6 @@ if [ $? == 0 ]; then
| read dummy; then
v=$v-dirty
fi
-   fi
 fi
 fi
  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/2] Fix version.sh to work with dash

2010-05-04 Thread Mike Fedyk


---

 fs/btrfs/version.sh |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh
index a4576f2..d87daf4 100755
--- a/fs/btrfs/version.sh
+++ b/fs/btrfs/version.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 #
 # determine-version -- report a useful version for releases
 #
@@ -8,10 +8,10 @@
  
 v=v0.16
 
-which git  /dev/null
-if [ $? == 0 ]; then
-git branch  /dev/null
-if [ $? == 0 ]; then
+which git 21  /dev/null
+if [ $? -eq 0 ]; then
+git branch 21  /dev/null
+if [ $? -eq 0 ]; then
v=`git show --format='%ci_%h'|head -n 1|sed 
's/[^a-z0-9_-:]/_/ig'`
 
# Are there uncommitted changes?
@@ -19,7 +19,7 @@ if [ $? == 0 ]; then
if git diff-index --name-only HEAD | \
grep -v ^scripts/package \
| read dummy; then
-   v=$v-dirty
+   v=${v}-dirty
fi
 fi
 fi
@@ -29,9 +29,9 @@ echo #define __BUILD_VERSION  .build-version.h
 echo #define BTRFS_BUILD_VERSION \Btrfs $v\  .build-version.h
 echo #endif  .build-version.h
 
-diff -q version.h .build-version.h  /dev/null
+diff -q version.h .build-version.h 21  /dev/null
 
-if [ $? == 0 ]; then
+if [ $? -eq 0 ]; then
 rm .build-version.h
 exit 0
 fi

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops while attempting to mount degraded multi-device raid1 data/metadata btrfs filesystem

2010-03-26 Thread Mike Fedyk

On Fri, Mar 26, 2010 at 11:06 AM, Josef Bacik jo...@redhat.com wrote:
 On Fri, Mar 26, 2010 at 10:49:57AM -0700, Mike Fedyk wrote:
 I still get this oops with the latest btrfs kernel code from git (as
 of  2010-03-21) compiled against 2.6.32.9-70.fc12.x86_64


 Will you try this patch

 [PATCH] Btrfs: fail to mount if we have problems reading the block groups

 and see if it works?  Thanks,

 Josef


As already discussed on irc

With that patch it doesn't oops anymore, but it still doesn't mount.

I get this kernel output:
btrfs: failed to read the system array on sda7
btrfs: open_ctree failed
device fsid 3546a1a7a4563c4b-2b1289f58c64988c devid 1 transid 974 /dev/sda7
btrfs: allowing degraded mounts
Failed to read block groups: -5
btrfs: open_ctree failed

btrfs-debug-tree crashes on the FS:

# time btrfs-debug-tree /dev/sda7  /tmp/sda7-debug-tree.out
failed to read /dev/sde
failed to read /dev/sdd
failed to read /dev/sdc
failed to read /dev/sdb
btrfs-debug-tree: volumes.c:1381: btrfs_read_sys_array: Assertion
`!(ret)' failed.
Aborted (core dumped)

real0m0.304s
user0m0.001s
sys 0m0.016s


# wc /tmp/sda7-debug-tree.out
0 0 0 /tmp/sda7-debug-tree.out

# rpm -qa |grep -i btrfs
btrfs-progs-0.19-9.fc13.x86_64

I'll leave this patch running in my btrfs module for every-day testing.

I'll await any patches you'd like me to test.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] scheduling while atomic: init/1/0x00000002

2010-03-13 Thread Mike Fedyk

On Sat, Mar 13, 2010 at 3:49 PM, Phillip Michael oopsicra...@gmail.com wrote:
 I have a btrfs filesystem with three subvolumes. One of them (named
 arch64) has 64 bit linux, one (arch32)  has 32 bit linux, and the
 third (files) has various files. After an unsuccessful tuxonice
 resume, the arch64 subvolume will no longer boot. It shows this bug:

 VFS: Mounted root (btrfs filesystem) readonly on device 0:13.
 Freeing unused kernel memory: 480k freed
 BFS CPU scheduler v0.315 by Con Kolivas.
 BUG: scheduling while atomic: init/1/0x0002
 Modules linked in:
 Pid: 1, comm: init Not tainted 2.6.33-zen2-20100307-stable #6

Can you reproduce this error on stock 2.6.33 without the zen patches?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SSD Optimizations

2010-03-10 Thread Mike Fedyk

On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote:
 I'm looking to try BTRFS on a SSD, and I would like to know what SSD
 optimizations it applies. Is there a comprehensive list of what ssd mount
 option does? How are the blocks and metadata arranged? Are there options
 available comparable to ext2/ext3 to help reduce wear and improve
 performance?

 Specifically, on ext2 (journal means more writes, so I don't use ext3 on
 SSDs, since fsck typically only takes a few seconds when access time is 
 100us), I usually apply the
 -b 4096 -E stripe-width = (erase_block/4096)
 parameters to mkfs in order to reduce the multiple erase cycles on the same
 underlying block.

 Are there similar optimizations available in BTRFS?

I think you'll get more out of btrfs, but another thing you can look
into is ext4 without the journal.  Support was added for that recently
(thanks to google).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cross-subvolume link causes kernel BUG

2010-03-08 Thread Mike Fedyk

On Mon, Mar 8, 2010 at 10:26 AM, Bruce Guenter br...@untroubled.org wrote:
 On Mon, Mar 08, 2010 at 12:39:38PM -0500, Chris Ball wrote:
 I think this is fixed in 2.6.33, as a result of the patch below.
 Let us know if you see a segfault on 2.6.33, or after applying this
 patch to your current kernel.

 This patch does fix the problem for 2.6.32.9, thanks.  Has this patch
 been submitted for the 2.6.32.y series?

Btrfs patches aren't in any stable series yet.  Also I suspect -stable
for .32 will stop soon.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cross-subvolume link causes kernel BUG

2010-03-08 Thread Mike Fedyk

On Mon, Mar 8, 2010 at 1:21 PM, Mike Fedyk mfe...@mikefedyk.com wrote:
 On Mon, Mar 8, 2010 at 10:26 AM, Bruce Guenter br...@untroubled.org wrote:
 On Mon, Mar 08, 2010 at 12:39:38PM -0500, Chris Ball wrote:
 I think this is fixed in 2.6.33, as a result of the patch below.
 Let us know if you see a segfault on 2.6.33, or after applying this
 patch to your current kernel.

 This patch does fix the problem for 2.6.32.9, thanks.  Has this patch
 been submitted for the 2.6.32.y series?

 Btrfs patches aren't in any stable series yet.  Also I suspect -stable
 for .32 will stop soon.

Oh, disregard that comment about -stable and .32.  I forgot it will be
maintained for a few years
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Oops while attempting to mount degraded multi-device raid1 data/metadata btrfs filesystem

2010-03-05 Thread Mike Fedyk

Hi,

I get an oops with 2.6.33-0.46.rc8.git1.fc13.x86_64 while trying to
mount a degraded raid1 btrfs filesystem.

Here are the steps I performed to get to this stage.

- Install fedora12 btrfs / on sda2
- mkfs.btrfs -m raid1 -d raid1 /dev/sda7
- cp -a from sda2 to sda7
- reboot into sda7 as /
- btrfs-vol -a /dev/sda2 /
- btrfs-vol -b /  (system hangs here)
- reboot (boot fails with ctree error) [1]
- boot into fedora 12 recovery cd (based on 2.6.31).  after running
btrfsctl -a, the filesystem is mountable.
- umount
- dd bs=1M count=2000  /dev/zero  /dev/sda2
- mount /dev/sda7, get oops (on 2.6.31)
- install fedora12 btrfs / on sda2
- update to 2.6.33-0.46.rc8.git1.fc13.x86_64
- btrfsctl -a
- mount /dev/sda7 (get oops below)


1. Turns out that fedora12 doesn't have a call to btrfsctl -a in the
boot process.  Working on a patch for that...







Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.33-0.46.rc8.git1.fc13.x86_64
(mockbu...@x86-04.phx2.fedoraproject.org) (gcc version 4.4.3 20100211
(Red Hat 4.4.3-6) (GCC) ) #1 SMP Tue Feb 16 19:47:00 UTC 2010
Command line: ro root=UUID=7d23c60c-c072-431d-971a-87bcf61ac6a2
LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
rhgb quiet
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f000 (usable)
 BIOS-e820: 0009f000 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - afff (usable)
 BIOS-e820: afff - afff3000 (ACPI NVS)
 BIOS-e820: afff3000 - b000 (ACPI data)
 BIOS-e820: b000 - c000 (reserved)
 BIOS-e820: f000 - f400 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 00014000 (usable)
NX (Execute Disable) protection: active
DMI 2.3 present.
Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.
e820 update range:  - 0001 (usable) == (reserved)
No AGP bridge found
last_pfn = 0x14 max_arch_pfn = 0x4
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-B uncachable
  C-C7FFF write-protect
  C8000-F uncachable
MTRR variable ranges enabled:
  0 base 00 mask FF8000 write-back
  1 base 008000 mask FFC000 write-back
  2 base 01 mask FFC000 write-back
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
TOM2: 00014000 aka 5120M
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
e820 update range: c000 - 0001 (usable) == (reserved)
last_pfn = 0xafff0 max_arch_pfn = 0x4
initial memory mapped : 0 - 2000
found SMP MP-table at [880f3c90] f3c90
init_memory_mapping: -afff
 00 - 00afe0 page 2M
 00afe0 - 00afff page 4k
kernel direct mapping tables up to afff @ 16000-1b000
init_memory_mapping: 0001-00014000
 01 - 014000 page 2M
kernel direct mapping tables up to 14000 @ 19000-1f000
RAMDISK: 37435000 - 37fefbfc
ACPI: RSDP 000f8370 00014 (v00 Nvidia)
ACPI: RSDT afff3040 00038 (v01 Nvidia AWRDACPI 42302E31 AWRD )
ACPI: FACP afff30c0 00074 (v01 Nvidia AWRDACPI 42302E31 AWRD )
ACPI: DSDT afff3180 0631A (v01 NVIDIA AWRDACPI 1000 MSFT 010E)
ACPI: FACS afff 00040
ACPI: SSDT afff95c0 00248 (v01 PTLTD  POWERNOW 0001  LTP 0001)
ACPI: HPET afff9880 00038 (v01 Nvidia AWRDACPI 42302E31 AWRD 0098)
ACPI: MCFG afff9900 0003C (v01 Nvidia AWRDACPI 42302E31 AWRD )
ACPI: APIC afff9500 0007C (v01 Nvidia AWRDACPI 42302E31 AWRD )
ACPI: Local APIC address 0xfee0
Scanning NUMA topology in Northbridge 24
No NUMA configuration found
Faking a node at -00014000
Bootmem setup node 0 -00014000
  NODE_DATA [0001a000 - 00032fff]
  bootmap [00033000 -  0005afff] pages 28
(13 early reservations) == bootmem [00 - 014000]
  #0 [00 - 001000]   BIOS data page == [00 - 001000]
  #1 [000100 - 00029b9138]TEXT DATA BSS == [000100 - 00029b9138]
  #2 [0037435000 - 0037fefbfc]  RAMDISK == [0037435000 - 0037fefbfc]
  #3 [00029ba000 - 00029ba0b9]  BRK == [00029ba000 - 00029ba0b9]
  #4 [0f3ca0 - 10]BIOS reserved == [0f3ca0 - 10]
  #5 [0f3c90 - 0f3ca0] MP-table mpf == [0f3c90 - 0f3ca0]
  #6 [09f000 - 0f1fe4]BIOS reserved == [09f000 - 0f1fe4]
  #7 [0f2140 - 0f3c90]BIOS reserved == [0f2140 - 0f3c90]
  #8 [0f1fe4 - 0f2140] MP-table mpc == [0f1fe4 - 0f2140]
  #9 [01 - 012000]   TRAMPOLINE == [01 - 012000]
  #10 [012000 -

Re: Raid1 with 3 drives

2010-03-05 Thread Mike Fedyk

On Fri, Mar 5, 2010 at 1:49 PM, Bart Noordervliet b...@noordervliet.net wrote:
 On Fri, Mar 5, 2010 at 21:31, Josef Bacik jo...@redhat.com wrote:
 Since I have three devices in a RAID1 pool, can it survive 2 drive failures?

 Yes, tho you won't be able to remove more than 1 at a time (since it wants 
 you
 to keep at least two disks around).  Thanks,

 Josef

 Hmm, I would expect the raid1 data mode to keep 2 copies of each file
 and thus yield 50% effective storage capacity, even with 3 disks. I
 see no real reason to stick with the full-disk mirroring mentality of
 previous raid systems since raid implemented in a filesystem works
 differently. Or would it be difficult to implement btrfs raid1 like
 this?

 Maybe it's worth to consider leaving the burdened raid* terminology
 behind and name the btrfs redundancy modes more clearly by what they
 do. For instance -d double|triple or -d 2n|3n. And for raid5/6 -d
 single-parity|double-parity or -d n+1|n+2.


+1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: assertion failures

2010-02-26 Thread Mike Fedyk

On Fri, Feb 26, 2010 at 10:11 AM, Bill Pemberton
wf...@viridian.itc.virginia.edu wrote:

 Does the array have any kind of writeback cache?


 Yes, the array has a writeback cache.


 Are all of the filesystems spread across all of the drives?  Or do some
 filesystems use some drives only?


 In all cases the array is presenting 1 physical volume to the host
 system (which is RAID 6 on the array itself).  That physical volume is
 made into a volume group and the filesystems are on logical volumes in
 that volume group.


I wonder if the barrier messages are making it to this write back
cache.  Do you see any messages about barriers in your kernel logs?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs no csum found for inode X start 0

2010-02-25 Thread Mike Fedyk

On Thu, Feb 25, 2010 at 12:52 PM, Leszek Ciesielski skol...@gmail.com wrote:
 I have changed the btrfs code to ignore checksum failures and now I
 can read files correctly from the filesystem. Also, moving them onto
 another volume and then back into btrfs fixes the checksums and no
 more errors are reported for the file in question.

 Quick and dirty code I used for getting my files out:

Yes, but did you verify your data?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3 V3] btrfs: a new tool to manage a btrfs filesystem

2010-02-24 Thread Mike Fedyk

On Wed, Feb 24, 2010 at 3:35 PM, Chris Mason chris.ma...@oracle.com wrote:
 On Mon, Feb 22, 2010 at 07:47:40PM +0100, Goffredo Baroncelli wrote:
 On Monday 22 February 2010, Mike Fedyk wrote:
  On Sun, Feb 21, 2010 at 8:40 AM, Goffredo Baroncelli kreij...@gmail.com
 wrote:
         filesystem resize [+/-]size[gkm]|max filesystem
 
  -filesystem resize [+/-]size[gkm]|max filesystem
  +filesystem resize [+/-]size[gkm]|max dev
 
  This command works on devices, not paths.

 Are you sure ? To me it results (test and code inspection) to work on path.

 The ioctl takes a path so that it knows which btrfs filesystem to
 change.


Then how does it know which device to shrink in a multi-device filesystem?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3 V3] btrfs: a new tool to manage a btrfs filesystem

2010-02-21 Thread Mike Fedyk

On Sun, Feb 21, 2010 at 8:40 AM, Goffredo Baroncelli kreij...@gmail.com wrote:
       filesystem resize [+/-]size[gkm]|max filesystem

-filesystem resize [+/-]size[gkm]|max filesystem
+filesystem resize [+/-]size[gkm]|max dev

This command works on devices, not paths.

              Resize a filesystem identified by path.  The size parame‐

-Resize a filesystem identified by path.  The size parame‐
+Resize a filesystem identified by dev.  The size parame‐

              ter  specifies the new size of the filesystem.  If the prefix
              + or - is present the size is increased or decreased  by  the
              quantity  size.  If no units are specified, the unit of the
              size parameter defaults  to  bytes.  Optionally,  the  size
              parameter  may  be suffixed by one of the following the units
              designators: 'K', 'M', or 'G', kilobytes, megabytes, or giga‐
              bytes, respectively.

              If  'max' is passed, the filesystem will occupy all available
              space on the volume(s).

              The resize command does not manipulate the size of underlying
              partitions.   If you wish to enlarge/reduce a filesystem, you

-partitions.   If you wish to enlarge/reduce a filesystem, you
+partition.   If you wish to enlarge/reduce a filesystem, you

              must make sure you can expand/reduce the size of  the  parti‐
              tion also.

-must make sure you can expand/reduce the size of  the  parti‐
-tion also.
+must make sure you can expand the partition before enlarging
+the filesystem and shrink the partition after reducing the size
+of the filesystem.



       filesystem show [uuid|label]
              Show  the  btrfs  filesystem with some additional info. If no
              UUID or label is passed, btrfs show info  of  all  the  btrfs
              filesystem.


       device balance|-b path

-device balance|-b path
+device balance path

Mike
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

df shows wrong device while waiting for umount

2010-02-19 Thread Mike Fedyk

Hi,

Kernel 2.6.33-0.46.rc8.git1.fc13.x86_64

I think I ran into the issue that triggers when you write the a btrfs
filesystem and then umount it and it takes a long time while writing
out the data.  It ends up writing at about 1MiB/second according to
dstat.  My understanding this issue is already fixed in the latest
code.

But that is not the issue I am reporting.  While waiting for the
umount to complete, df shows the wrong device.

# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda2 9.8G  7.5G  2.3G  77% /
tmpfs 1.9G  536K  1.9G   1% /dev/shm
/dev/sda6 485M   62M  398M  14% /boot
/dev/sda7 111G  464M  111G   1% /mnt/t

# umount /mnt/t

# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda2 9.8G  7.5G  2.3G  77% /
tmpfs 1.9G  536K  1.9G   1% /dev/shm
/dev/sda6 485M   62M  398M  14% /boot
/dev/sda7 9.8G  7.5G  2.3G  77% /mnt/t

The second df was run while waiting for the 111GB partition to unmount.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs testing suite?

2010-02-19 Thread Mike Fedyk

On Fri, Feb 19, 2010 at 10:46 AM, Mr. Tux tuxoho...@hotmail.de wrote:


 Hi listIs there a btrfs testing suite the btrfs developers use to check the 
 codebase? I did some research and found a projectcalled xfstests-dev. It 
 supports ext4 as well - are there any patches to get btrfs support with 
 xfstests?

There are also different versions of fsx that attempt to fuzz the
filesystems.  Btrfs has additional edges that need to be fuzzed so
it will need to be extended.  Running those as well as normal every
day use and your typical workload will help btrfs most.

Just install it and use it like normal, but make sure you have backups
and another system you can switch to if something goes wrong.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] btrfs: a new tool to manage a btrfs filesystem

2010-02-19 Thread Mike Fedyk

On Fri, Feb 19, 2010 at 12:12 PM, Goffredo Baroncelli
kreij...@gmail.com wrote:
 Hi all,

 on the basis of the suggestion received, I update my btrfs tool.

 The main changes are:
 - removed the short form of the command (like '-C')
 - deployed the multi level command (i.e.: btrfs snapshot create)
 - split the source in three files. This because the new parses are quite big
 (about  295 lines; for example btrfsctl.c are only 239 lines).

 The multi level command parser is quite flexible. They accept the full-
 length command (btrfs subvolume create) and a contract form (btrfs subvol cr).
 The commands may be arbitrary shortly (even 1 chars) but they have to be un-
 ambiguous. For example
 - btrfs s s             - OK (matches 'btrfs subvolume snapshot' only)
 - btrfs dev s           - FAIL (matches both 'btrfs dev show' and
                                'btrfs dev scan')

 The parser highlights which part of the command are ambiguous.

 This is a RFC because there is no agreement about the name of the command.
 I am proposing the following structure:

   btrfs object action

 where object are:
 - subvolume (valid action: create, delete, snapshot, list [not implemented])
 - filesystem (valid action: defrag, sync, resize [not implemented])
 - device (valid action: add, delete, scan, show, balance)

 You can find the source at

        http://cassiopea.homelinux.net/git/btrfs-command.git

 (commit 3deec45d18879d60b4032dc1f8895d7b7e1211ec, remember to switch to the
 remotes/origin/multi-level-command branch (I hate git!!!)


 BR
 G.Baroncelli
 

 $ git diff remotes/origin/orig | diffstat
  Makefile             |    6
  btrfs.c              |   73 ++
  btrfs_cmds.c         |  587 +++[...]
  btrfs_cmds.h         |   30 ++
  btrfs_cmds_parse.c   |  296 +
  man/Makefile         |    5
  man/btrfs.8.in       |  148 
  13 files changed, 1291 insertions(+), 2 deletions(-)

 
 $ ./btrfs
 Usage:
        btrfs subvolume snapshot [dest/]name
                Create a writeble snapshot of the subvolume source with
                the name name in the dest directory.
        btrfs subvolume delete subvolume
                Delete the subvolume subvolume.
        btrfs subvolume create [dest/]name
                Create a subvolume in dest (or the current directory if
                not passed).
        btrfs filesystem defrag file|dir [file|dir...]
                Defragment a file or a directory.
        btrfs device scan [device [device..]
                Scan all device for or the passed device for a btrfs
                filesystem.
        btrfs filesystem sync path
                Force a fs sync on the filesystem path
        btrfs filesystem resize [+/-]newsize[gkm]|max filesystem
                Resize the file system. If 'max' is passed, the filesystem
                will occupe all available space on the device.
        btrfs device show [dev|label...]
                Show the btrfs devices
        btrfs device balance path
                Balance the chunk across the device
        btrfs device add dev [dev..] path
                Add a device to a filesystem
        btrfs device delete dev [dev..] path
                Remove a device from a filesystem

        btrfs help|--help|-h
                Show the help.

 Btrfs v0.19-22-g07a97f0-dirty

 

 $ man man/btrfs.8.in | cat
 BTRFS(8)                           btrfs                           BTRFS(8)



 NAME
       btrfs - control a btrfs filesystem

 SYNOPSIS
       btrfs subvolume snapshot source [dest/]name

       btrfs subvolume delete subvolume

       btrfs subvolume create [dest/]name

       btrfs filesystem defrag file|dir [file|dir...]

       btrfs filesystem fssync path

       btrfs filesystem resize [+/-]size[gkm]|max filesystem

       btrfs device scan [device [device..]]

       btrfs device show dev|label [dev|label...]

       btrfs device balance path

       btrfs device add dev [dev..] path

       btrfs device delete dev [dev..] path ]


       btrfs help|--help|-h

 DESCRIPTION
       btrfs  is  used to control the filesystem and the files and directo‐
       ries stored. It is the tool to create or destroy a new snapshot or a

-create or destroy a new snapshot
+create or destroy a snapshot

       new  subvolume  for the filesystem, to defrag a file or a directory,

-new  subvolume  for the filesystem, to defrag a file or a directory
+subvolume for the filesystem, defrag a file or a directory

       to flush the dato to the disk, to resize the filesystem, to scan the
       device.

-to flush the dato to the disk, to resize the filesystem, to scan the
+flush the data to the disk, resize the filesystem, scan the


       It  is  possible to abbreviate the commands unless the commands  are
       ambiguous.  For example: it is  possible  to  run  btrfs  sub  snaps
       instead  of  btrfs  subvolume  snapshot.   But  btrfs  dev  s is not

Re: [Regression] Filesystem I/O is CPU-bound in rc7 and rc8

2010-02-19 Thread Mike Fedyk

On Sat, Feb 13, 2010 at 7:11 PM, James Cloos cl...@jhcloos.com wrote:
 Sometime between rc6 and rc7 all filesystem I/O started using 100% CPU,
 usually on the order of 60% sys, 40% user.

 I've tried this with each of ext4, jfs and btrfs filesystems.  All show
 the same issue.


Are you sure you're not running with any of the debugging options
enabled?  I see the same, but I have debugging enabled (rawhide
kernel).

 Using dd(1) to read from the block specials directly works as well and
 as fast as it always has; only reading or writing to mounted filesystems
 is affected.

 Box is 32-bit x86, PentiumIII-M; drives are ide using libata.

 If the btrfs fs is mounted, the slowdown is enought to trigger the
 hung_task call trace (120s) on the btrfs-transac process.

 But the regression is just as apparent when only jfs and ext4 are mounted.

 The only filesystems I've found which avoid the regression are tmpfs and
 devtmpfs.

 I didn't have time to write up a report when I noticed this in rc7 but
 had to boot back into rc6 for work.

 Some of the commits since rc7 looked like they might have addressed this
 regression, but it persists in rc8.

 -JimC
 --
 James Cloos cl...@jhcloos.com         OpenPGP: 1024D/ED7DAEA6
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Which volume? no space left, need 4096, 274432 delalloc bytes, 8360148992 bytes_used, 4096 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total

2010-02-19 Thread Mike Fedyk

If I had more than one btrfs volume, how would I know which volume
caused these errors?  Sure I can look at df and btrfs-show, but
shouldn't these messages say definitively?

Feb 19 04:31:26 dt01 kernel: no space left, need 4096, 274432 delalloc
bytes, 8360148992 bytes_used, 4096 bytes_reserved, 0 bytes_pinned, 0
bytes_readonly, 0 may use 8360427520 total
Feb 19 04:34:02 dt01 kernel: no space left, need 4096, 270336 delalloc
bytes, 8360153088 bytes_used, 4096 bytes_reserved, 0 bytes_pinned, 0
bytes_readonly, 0 may use 8360427520 total
Feb 19 04:43:18 dt01 kernel: no space left, need 270336, 36864
delalloc bytes, 8360140800 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 04:44:03 dt01 kernel: no space left, need 270336, 49152
delalloc bytes, 8360165376 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 04:44:09 dt01 kernel: device fsid
5e44906058064a32-b179f2f9b4e606a9 devid 1 transid 7 /dev/sda7
Feb 19 04:46:18 dt01 kernel: no space left, need 270336, 77824
delalloc bytes, 8360275968 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 04:47:03 dt01 kernel: no space left, need 270336, 102400
delalloc bytes, 8360300544 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 04:52:18 dt01 kernel: no space left, need 270336, 65536
delalloc bytes, 8360214528 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 04:53:03 dt01 kernel: no space left, need 270336, 36864
delalloc bytes, 8360280064 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 04:53:48 dt01 kernel: no space left, need 270336, 32768
delalloc bytes, 8360329216 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 05:27:48 dt01 kernel: no space left, need 270336, 36864
delalloc bytes, 8360140800 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 05:27:58 dt01 kernel: device fsid
5945de3116704a8a-11f3234550356c85 devid 1 transid 7 /dev/sda7
Feb 19 05:30:34 dt01 kernel: no space left, need 270336, 53248
delalloc bytes, 8360284160 bytes_used, 12288 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 05:40:18 dt01 kernel: no space left, need 270336, 32768
delalloc bytes, 8360378368 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:06:33 dt01 kernel: no space left, need 270336, 28672
delalloc bytes, 8360153088 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:07:18 dt01 kernel: no space left, need 270336, 53248
delalloc bytes, 8360169472 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:08:03 dt01 kernel: no space left, need 270336, 65536
delalloc bytes, 8360206336 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:14:03 dt01 kernel: no space left, need 270336, 69632
delalloc bytes, 8360251392 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:14:30 dt01 kernel: no space left, need 270336, 94208
delalloc bytes, 8360251392 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:14:42 dt01 kernel: no space left, need 94208, 24576 delalloc
bytes, 8360325120 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0
bytes_readonly, 0 may use 8360427520 total
Feb 19 06:15:35 dt01 kernel: no space left, need 270336, 57344
delalloc bytes, 8360349696 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:40:19 dt01 kernel: no space left, need 270336, 49152
delalloc bytes, 8360202240 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:41:49 dt01 kernel: no space left, need 270336, 65536
delalloc bytes, 8360271872 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:42:12 dt01 kernel: no space left, need 94208, 98304 delalloc
bytes, 8360271872 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0
bytes_readonly, 0 may use 8360427520 total
Feb 19 06:42:33 dt01 kernel: no space left, need 270336, 86016
delalloc bytes, 8360292352 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:44:03 dt01 kernel: no space left, need 270336, 73728
delalloc bytes, 8360341504 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:47:48 dt01 kernel: no space left, need 270336, 24576
delalloc bytes, 8360312832 bytes_used, 0 bytes_reserved, 0
bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
Feb 19 06:48:25 dt01 kernel: no space left, need 270336, 45056
delalloc bytes,

Re: [PATCH 0/2 V2] btrfs: a new tool to manage a btrfs filesystem

2010-02-18 Thread Mike Fedyk

On Thu, Feb 18, 2010 at 8:58 AM, Chris Mason chris.ma...@oracle.com wrote:
 I do like the subcommand method, more details below.

 On Wed, Feb 17, 2010 at 03:35:26PM -0800, Mike Fedyk wrote:
 I think he need some command hierarchy here.

 On Wed, Feb 17, 2010 at 12:02 PM, Goffredo Baroncelli
 kreij...@gmail.com wrote:
  OPTIONS
        snapshot|-s source [dest/]name
               Create a writeble snapshot of the subvolume  source  with  
  the
               name  name  in the dest directory. If source is not a 
  sub‐
               volume, btrfs returns an error.

 This should be btrfs subvolume snapshot source [dest/]name.
 It only works on subvolumes.

 If we can type subvol instead of subvolume I like it.  Basically the
 perl/python arg parsing system where any short form of the command that
 uniquely matches it is allowed.

 We keep the long forms but allow the user to pick a shorter form if it
 isn't ambiguous.


Yes, I agree.  This is why I compared it with the ip command which
does the same.


 
 
        delete|-D subvolume
               Delete the subvolume subvolume. If subvolume is not  a  
  sub‐
               volume, btrfs returns an error.
 

 This becomes:

 btrfs subvolume delete subvolume

 subvol del (same as above).


 This works with snapshots as well.

 
        subvolume|-c [dest/]name
               Create  a  subvolume  in  dest (or in the current directory 
  if
               dest is not passed).

 btrfs subvolume create [dest/]name

 
 
        defrag|-f file|dir [file|dir...]
               Defragment files and/or directories.

 This will defrag individual files?  Does it defrag a directory tree?
 Does it defrag a subvolume?  Does it defrag a pool?

 For now lets change this to only do files.  That's the only thing the
 tool supports today.


 
 
        scan|-n [device [device..]]
               Scan devices for a btrfs filesystem. If no devices  are  
  passed,
               btrfs scans all the block devices.

 btrfs pool scan [device [device..]]

 Instead of btrfs pool, please use btrfs dev


 
 
        fssync|-y path
               Force a sync for the filesystem identified by path.
 

 Does it sync a pool or subvolume?  Assuming it works against
 subvolumes, it would be:

 btrfs subvolume sync path

 
 
        resize|-z [+/-]size[gkm]|max filesystem
               Resize a file system identified by path.  The size 
  parameter
               specifies the new size of the filesystem.  If the prefix + or 
   -
               is  present  the  size is increased or decreased by the 
  quantity
               size.  If no units are  specified,  the  unit  of  the  
  size
               parameter  is  the  byte.  Optionally, the size parameter may 
  be
               suffixed by one of the following  the  units  designators:  
  'K',
               'M', or 'G', kilobytes, megabytes, or gigabytes, respectively.
 
               If  'max'  is  passed,  the filesystem will occupy all 
  available
               space on the volume(s).
 
               The resize command does not manipulate the  size  of  
  underlying
               partitions.   If  you  wish  to enlarge/reduce a filesystem, 
  you
               must make sure you can expand/reduce the size of  the  
  partition
               also.
 

 This works with physical devices, not a pool or subvolume.  I get the
 name physical volume from lvm.  Also I think it should resize to max
 without arguments, in order to do that, the size argument would need
 to be the last argument.

 We don't have physical volumes and logical volumes the way lvm does, so
 I'd like to avoid the pvolume theme.


 It becomes:

 btrfs pvolume resize [+/-]size[gkm]|max filesystem

 Or:

 btrfs pvolume resize filesystem [[+/-]size[gkm]]

 btrfs dev resize


Dev works for me, I could only think of the lvm terms at the time.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 V2] btrfs: a new tool to manage a btrfs filesystem

2010-02-18 Thread Mike Fedyk

On Thu, Feb 18, 2010 at 11:59 AM, Goffredo Baroncelli
kreij...@gmail.com wrote:
 On Thursday 18 February 2010, Chris Mason wrote:
 I do like the subcommand method, more details below.


 I try to summarise your suggestions. But there are some cases not to clear for
 me.
 I grouped the commands in three categories: subvolume, devices, and
 filesystem.


 devices         scan
 devices         show
 devices         balance
 devices         add
 devices         remove

 subvolume       snapshot
 subvolume       delete
 subvolume       create
 [subvolume      list]

 filesystem      resize
 [filesystem     label]

 ???     defrag
 ???     sync



 For the first two categories both Chris and Mike agreed; but IMHO there are
 some commands that don't fit nor in devices, nor subvolume, like resize (we
 resize a filesystem) and label (not available now).


A btrfs filesystem can span multiple devices.  Resize resizes how big
of a chunk of one device btrfs uses.  This would be used by
partitioning programs for instance.  zfs uses the term pool instead
of filesystem to solve this ambiguous use of the term filesystem
since btrfs and zfs break people's existing definition of the word
filesystem.

 I don't know how classify defrag (per file / directory level ?) and sync
 (filesystem ?)

It turns out that defrag is per file, which seems most cumbersome.
Maybe since it will probably eventually work against several types of
objects we could have:

btrfs defrag file file
btrfs defrag directory directory
btrfs defrag subvol subvol
btrfs defrag pool pool


 An option is to consider commands without classification. For examples:

 $ btrfs subvolume create [path/]subvolname
 $ btrfs sync path
 $ btrfs defrag file

Maybe if the btrfs developers are agreeable, we could do this as well:

btrfs sync file file
btrfs sync directory directory
btrfs sync subvol subvol
btrfs sync pool pool

I'm not sure how useful syncing the pool or a directory tree would be,
but I'll include it here for further discussion.

Mike
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 V2] btrfs: a new tool to manage a btrfs filesystem

2010-02-17 Thread Mike Fedyk

I think he need some command hierarchy here.

On Wed, Feb 17, 2010 at 12:02 PM, Goffredo Baroncelli
kreij...@gmail.com wrote:
 OPTIONS
       snapshot|-s source [dest/]name
              Create a writeble snapshot of the subvolume  source  with  the
              name  name  in the dest directory. If source is not a sub‐
              volume, btrfs returns an error.

This should be btrfs subvolume snapshot source [dest/]name.
It only works on subvolumes.



       delete|-D subvolume
              Delete the subvolume subvolume. If subvolume is not  a  sub‐
              volume, btrfs returns an error.


This becomes:

btrfs subvolume delete subvolume

This works with snapshots as well.


       subvolume|-c [dest/]name
              Create  a  subvolume  in  dest (or in the current directory if
              dest is not passed).

btrfs subvolume create [dest/]name



       defrag|-f file|dir [file|dir...]
              Defragment files and/or directories.

This will defrag individual files?  Does it defrag a directory tree?
Does it defrag a subvolume?  Does it defrag a pool?



       scan|-n [device [device..]]
              Scan devices for a btrfs filesystem. If no devices  are  passed,
              btrfs scans all the block devices.

btrfs pool scan [device [device..]]



       fssync|-y path
              Force a sync for the filesystem identified by path.


Does it sync a pool or subvolume?  Assuming it works against
subvolumes, it would be:

btrfs subvolume sync path



       resize|-z [+/-]size[gkm]|max filesystem
              Resize a file system identified by path.  The size parameter
              specifies the new size of the filesystem.  If the prefix + or  -
              is  present  the  size is increased or decreased by the quantity
              size.  If no units are  specified,  the  unit  of  the  size
              parameter  is  the  byte.  Optionally, the size parameter may be
              suffixed by one of the following  the  units  designators:  'K',
              'M', or 'G', kilobytes, megabytes, or gigabytes, respectively.

              If  'max'  is  passed,  the filesystem will occupy all available
              space on the volume(s).

              The resize command does not manipulate the  size  of  underlying
              partitions.   If  you  wish  to enlarge/reduce a filesystem, you
              must make sure you can expand/reduce the size of  the  partition
              also.


This works with physical devices, not a pool or subvolume.  I get the
name physical volume from lvm.  Also I think it should resize to max
without arguments, in order to do that, the size argument would need
to be the last argument.

It becomes:

btrfs pvolume resize [+/-]size[gkm]|max filesystem

Or:

btrfs pvolume resize filesystem [[+/-]size[gkm]]


       show|-l [dev|label...]
              Show  the btrfs devices with some additional info. If no devices
              or labels are passed, btrfs scans all the block devices.

This becomes:

btrfs pool show [dev|label...]



       balance|-b path
              Balance the chunk of the filesystem identified by path  across
              the devices.

Is path to one of the block devices in the pool?

This becomes:

btrfs pool balance path



       add-dev|-A  dev [dev..] path
              Add device(s) to the filesystem identified by path.

What is path?  Somewhere the pool is mounted?  The root of where the
pool is mounted?

this becomes:

btrfs pvolume add dev [dev..] path



       rm-dev|-R  dev [dev..] path
              Remove device(s) to the filesystem identified by path.

(same questions as with add)

This becomes:

btrfs pvolume remove dev [dev..] path

Mike
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] btrfs: a new tool to manage a btrfs filesystem

2010-02-16 Thread Mike Fedyk

On Sun, Feb 14, 2010 at 7:39 AM, Goffredo Baroncelli kreij...@gmail.com wrote:
 Hi all,

 On Sunday 14 February 2010, Thomas Kupper wrote:
 Hi Goffredo,

 Great work! It is indeed much easier to work with one tool instead with the
 many of them!

  Usage:
          btrfs snapshot|-s source [dest/]name
                  Create a writeble snapshot of the subvolume
                  source with the name name in the dest
                  directory.
          btrfs delete|-D subvolume
                  Delete the subvolume subvolume.


 I backup up Mike on the opinion that the short options aren't what I would
 expect. Personally I'd prefer a command line syntax like git, command
 action [sub-action options|arguments] 

 Seriously, you (as also Michel and Mike) raised some concern about the command
 line syntax. The main issues are:
 1) possible mistaken between the '-d' (delete) command and '-D' (defrag)
 command. It was suggested to remove the short form command.
 2) some commands are not very auto-explainant

 Regarding the point #1, I am against about removing the short command ('-
 s'...). If someone fears to mistake, he has the option to use the log form
 command. But I don't see any reason to force all others peoples to use the
 long form command.

The problem here is maintainability of scripts when people use the
short names.  I will refer to the ip command used in linux
networking.

It has these subcommands:
where  OBJECT := { link | addr | addrlabel | route | rule | neigh |
ntable | tunnel |
   maddr | mroute | monitor | xfrm }

Which are listed here:

ip link
ip addr
ip addrlabel
ip route
ip rule
ip neigh
ip ntable
ip tunnel
ip maddr
ip mroute
ip monitor
ip xfrm

You can shorten them as long as they are not ambiguous:

ip ro = ip route
ip ru = ip rule
ip a = ip addr
ip l = ip link

Those are the ones I used most personally.

There are no equivalent short options, and you don't have different
sets of people using different commands in scripts and howtos for
instance.  It builds a common base of knowledge and is easy to type
from memory.

Commands that document themselves are good IMO.

ip route replace default via 1.2.3.4

Replace or set the current default to ip address 1.2.3.4 (the tool
makes sure 1.2.3.4 is reachable by an already existing route and looks
up the layer 2 address for that ip.

It's not ip -r default -d 1.2.3.4

Now someone reading a howto or script with that hypothetical command
will have to find out if -r is route or -R is rule.  This is how the
btrfs commands currently look to me.

 If there is an agreement I am open to rename the command -D/delete in order
 to reduce the conflict. For examples the -D/delete command may be renamed as -
 R/remove. The conflict with the -r/resize command is not a problem because the
 former requires 1 arguments, the latter two. Another renaming option may be -
 E/erase.


This just illustrates my point.  Btrfs has a rich feature set and the
short option formats are only going to create more confusion because
some of them will only be usable with a subset of operations and there
will be so many things you can do with btrfs that explicit long
options are needed to make it clear even to yourself what it does 6
months later.

Mike
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: zero-length files in snapshots

2010-02-12 Thread Mike Fedyk

On Fri, Feb 12, 2010 at 7:19 AM, Josef Bacik jo...@redhat.com wrote:
 On Thu, Feb 11, 2010 at 08:50:48PM -0800, Mike Fedyk wrote:
 On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball c...@laptop.org wrote:
     echo x1  /mnt/x/d/foo.txt || exit 2
     btrfsctl -s /mnt/x/snap /mnt/x/d
 
  You're just missing a sync/fsync() between these two lines.
 
  We argued on IRC a while ago about whether this is a sensible default;
  cmason wants the no-sync version of snapshot creation to be available,
  but was amenable to the idea of changing the default to be sync before
  snapshot, since it was pointed out that no-one other than him had
  understood we were supposed to be running sync first.
 
 You're saying that it only snapshots the on-disk data structures and
 not the in-memory versions?  That can only lead to pain.  What do you
 do if something else during this race condition?  What would a sync do
 to solve this?  Have the semantics of sync been changed in btrfs from
 sync everything that hasn't been written yet to sync this
 subvolume?


 Welcome to delalloc.  You either get fast writes or you get all of your data 
 on
 the disk every 5 seconds.  If you don't like delalloc, use ext3.  The data
 you've written to memory doesn't go down to disk unless explicitly told to, 
 such
 as

 1) fsync - this is obvious
 2) vm - the vm has decided that this dirty page has been sitting around long
 enough and should be written back to the disk, could happen now, could happen 
 10
 years from now.
 3) sync - this is not as obvious.  sync doesn't mean anything than start
 writing back dirty data to the fs, and returns before it's done.  For btrfs
 what that means is we run through _every_ inode that has delalloc pages
 associated with them and start writeback on them.  This will get most of your
 data into the current transaction, which is when the snapshot happens.

 If you don't want empty files, do something like this

 btrfsctl -c /dir/to/volume
 btrfsctl -s /dir/to/volume/snapshotname /dir/to/volume

 this is what we do with yum and its rollback plugin, and it works out quite
 well.  Thanks,


Then you broke your ordering guarantee.  If the data isn't there, the
meta-data shouldn't be there either.  So the snapshots made before the
data hits a transaction shouldn't have the file at all.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: zero-length files in snapshots

2010-02-12 Thread Mike Fedyk

On Fri, Feb 12, 2010 at 8:22 AM, Josef Bacik jo...@redhat.com wrote:
 On Fri, Feb 12, 2010 at 08:18:01AM -0800, Mike Fedyk wrote:
 On Fri, Feb 12, 2010 at 7:19 AM, Josef Bacik jo...@redhat.com wrote:
  On Thu, Feb 11, 2010 at 08:50:48PM -0800, Mike Fedyk wrote:
  On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball c...@laptop.org wrote:
      echo x1  /mnt/x/d/foo.txt || exit 2
      btrfsctl -s /mnt/x/snap /mnt/x/d
  
   You're just missing a sync/fsync() between these two lines.
  
   We argued on IRC a while ago about whether this is a sensible default;
   cmason wants the no-sync version of snapshot creation to be available,
   but was amenable to the idea of changing the default to be sync before
   snapshot, since it was pointed out that no-one other than him had
   understood we were supposed to be running sync first.
  
  You're saying that it only snapshots the on-disk data structures and
  not the in-memory versions?  That can only lead to pain.  What do you
  do if something else during this race condition?  What would a sync do
  to solve this?  Have the semantics of sync been changed in btrfs from
  sync everything that hasn't been written yet to sync this
  subvolume?
 
 
  Welcome to delalloc.  You either get fast writes or you get all of your 
  data on
  the disk every 5 seconds.  If you don't like delalloc, use ext3.  The data
  you've written to memory doesn't go down to disk unless explicitly told 
  to, such
  as
 
  1) fsync - this is obvious
  2) vm - the vm has decided that this dirty page has been sitting around 
  long
  enough and should be written back to the disk, could happen now, could 
  happen 10
  years from now.
  3) sync - this is not as obvious.  sync doesn't mean anything than start
  writing back dirty data to the fs, and returns before it's done.  For 
  btrfs
  what that means is we run through _every_ inode that has delalloc pages
  associated with them and start writeback on them.  This will get most of 
  your
  data into the current transaction, which is when the snapshot happens.
 
  If you don't want empty files, do something like this
 
  btrfsctl -c /dir/to/volume
  btrfsctl -s /dir/to/volume/snapshotname /dir/to/volume
 
  this is what we do with yum and its rollback plugin, and it works out quite
  well.  Thanks,
 

 Then you broke your ordering guarantee.  If the data isn't there, the
 meta-data shouldn't be there either.  So the snapshots made before the
 data hits a transaction shouldn't have the file at all.

 Nope, what is happening is

 fd = creat(file)  - this is metadata that needs to be written
 write(fd, buf)      - because of delalloc there is no metadata that is 
 created
 for this operation, therefore it doesn't need to be written out.
 close(fd)

 so the file has metadata created for it, which needs to be written out.  
 Because
 of delalloc there are no extents created or anything for the data, therefore
 there is nothing to write.  Thanks,


So file creation is effectively synchronous?  So I could create a
benchmark that creates millions of files and it would be limited to
the IO OP performance of the disks?

Why does file creation need to hit the disk before the contents (with
limits to size of data that can fit in one transaction)?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] btrfs: a new tool to manage a btrfs filesystem

2010-02-12 Thread Mike Fedyk

On Fri, Feb 12, 2010 at 11:01 AM, Goffredo Baroncelli
kreij...@gmail.com wrote:
 Usage:
        btrfs delete|-D subvolume
                Delete the subvolume subvolume.
        btrfs defrag|-d file|dir [file|dir...]
                Defragment a file or a directory.

I think the short options should be removed or else you'll still have
the easy misuses of btrfs -d and btrfs -D.

The best example would be the ip command which has commands and
-[a-z] options that do different types of things.  for instance, all
of the short options are applicable to all commands and change the
verbosity or format of the output.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel BUG on mounting BtrFS / after reboot

2010-02-12 Thread Mike Fedyk

On Fri, Feb 12, 2010 at 1:04 PM, Alex Elsayed eternal...@gmail.com wrote:
 I'm getting a rather nasty BUG when I try to mount this filesystem,
 _including_ when I specify -o ro. I'm unsure what caused it, but the problem
 manifested after my computer hardlocked while reading my RSS feeds, complete
 with flashing lights. After I rebooted it, the screen filled with panic
 messages when the initramfs tried to mount it RO to pivot into. I am running
 2.6.33-rc6. The BUG message is as follows:

Is this the bug you mentioned on IRC that you fixed somehow?

If so please post the steps you performed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: zero-length files in snapshots

2010-02-11 Thread Mike Fedyk

On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball c...@laptop.org wrote:
    echo x1  /mnt/x/d/foo.txt || exit 2
    btrfsctl -s /mnt/x/snap /mnt/x/d

 You're just missing a sync/fsync() between these two lines.

 We argued on IRC a while ago about whether this is a sensible default;
 cmason wants the no-sync version of snapshot creation to be available,
 but was amenable to the idea of changing the default to be sync before
 snapshot, since it was pointed out that no-one other than him had
 understood we were supposed to be running sync first.

You're saying that it only snapshots the on-disk data structures and
not the in-memory versions?  That can only lead to pain.  What do you
do if something else during this race condition?  What would a sync do
to solve this?  Have the semantics of sync been changed in btrfs from
sync everything that hasn't been written yet to sync this
subvolume?

From what I understand what should be happening is much like what LVM should 
do:

step 1: defer all other writes to subvolume (userspace processes get
stuck in D state until step 4)
step 2: sync all changes not already committed to subvolume
step 3: create snapshot
step 4: resume writes from userspace

Now if all 4 steps can be done with in-memory data structures without
forcing data (not necessarily meta-data) to disk, so much the better.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

57 matches

Mail list logo