Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: 512MB. 'free' reports 75MB, 419MB free. I originally noticed the problem on really real hardware (thinkpad T61p), however. If you can easily reproduce it could you try a git bisect? Do we have a known good kernel? I looked back through the thread and didn't see any reports where the postgres test on ext4 passed in this config. 2.6.34.something. -- Any chance a newer kernel can be tested to be found good? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The value displayed by 'ls -s' command is strange.
On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500: Hi, I think that the disk allocation size of each file becomes a monotone increase when the file is made. But, it sometimes return to 0. Is it correct? Well, there's a window during the processing of delayed allocation where we don't have the bytes recorded as delalloc and we don't have the bytes recorded in the inode yet. That's why they are showing up as zero. We don't call inode_add_bytes() until after we insert the extent, but we drop the delalloc byte count on the file before the IO is done. Fixing it will be a little tricky because all the extent accounting assumes the inode_add_bytes happens at extent insertion time. How does opening the inode with O_APPEND during this window know where to write the bytes? If it's a pointer/cursor to the EOF then that size could be used during the window. Is that right? The result of the test at 2.6.37-rc4 is shown below. (see inode no. 291) # df -T /test14 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdd14 btrfs 4162560 8736 3709440 1% /test14 # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100 # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200 # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 oflag=direct # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400 # ls -lis /test14/dir total 406528 288 0 -rw-r--r-- 1 root root 104857600 Dec 7 15:07 as001.26603 289 0 -rw-r--r-- 1 root root 209715200 Dec 7 15:07 as002.26603 - 291 99328 -rw-r--r-- 1 root root 419430400 Dec 7 15:08 as003.26603 290 307200 -rw-r--r-- 1 root root 314572800 Dec 7 15:08 sy001.26603 # sleep 3 # ls -lis /test14/dir total 406528 288 0 -rw-r--r-- 1 root root 104857600 Dec 7 15:07 as001.26603 289 0 -rw-r--r-- 1 root root 209715200 Dec 7 15:07 as002.26603 - 291 99328 -rw-r--r-- 1 root root 419430400 Dec 7 15:08 as003.26603 290 307200 -rw-r--r-- 1 root root 314572800 Dec 7 15:08 sy001.26603 # sleep 3 # ls -lis /test14/dir total 307200 288 0 -rw-r--r-- 1 root root 104857600 Dec 7 15:07 as001.26603 289 0 -rw-r--r-- 1 root root 209715200 Dec 7 15:07 as002.26603 - 291 0 -rw-r--r-- 1 root root 419430400 Dec 7 15:08 as003.26603 290 307200 -rw-r--r-- 1 root root 314572800 Dec 7 15:08 sy001.26603 # sleep 3 # ls -lis /test14/dir total 409600 288 102400 -rw-r--r-- 1 root root 104857600 Dec 7 15:07 as001.26603 289 0 -rw-r--r-- 1 root root 209715200 Dec 7 15:07 as002.26603 - 291 0 -rw-r--r-- 1 root root 419430400 Dec 7 15:08 as003.26603 290 307200 -rw-r--r-- 1 root root 314572800 Dec 7 15:08 sy001.26603 # sync # ls -lis /test14/dir total 1024000 288 102400 -rw-r--r-- 1 root root 104857600 Dec 7 15:07 as001.26603 289 204800 -rw-r--r-- 1 root root 209715200 Dec 7 15:07 as002.26603 - 291 409600 -rw-r--r-- 1 root root 419430400 Dec 7 15:08 as003.26603 290 307200 -rw-r--r-- 1 root root 314572800 Dec 7 15:08 sy001.26603 The trace result of btrfs_getattr() is shown below. Dec 7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 delalloc_bytes:101711872 Dec 7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 delalloc_bytes:101711872 Dec 7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 delalloc_bytes:0 Dec 7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 delalloc_bytes:0 Dec 7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 i_bytes:0 delalloc_bytes:0 Regards, Itoh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The value displayed by 'ls -s' command is strange.
On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500: On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500: Hi, I think that the disk allocation size of each file becomes a monotone increase when the file is made. But, it sometimes return to 0. Is it correct? Well, there's a window during the processing of delayed allocation where we don't have the bytes recorded as delalloc and we don't have the bytes recorded in the inode yet. That's why they are showing up as zero. We don't call inode_add_bytes() until after we insert the extent, but we drop the delalloc byte count on the file before the IO is done. Fixing it will be a little tricky because all the extent accounting assumes the inode_add_bytes happens at extent insertion time. How does opening the inode with O_APPEND during this window know where to write the bytes? If it's a pointer/cursor to the EOF then that size could be used during the window. Is that right? This counter records the number of blocks allocated to the file, and reading it with ls -l or stat is somewhat racey by nature. Most of the time its fine, btrfs just has a really big window where the results from ls -l seem wrong. I see. Is it using per-cpu vars or something similar? But, the counter really means nothing to the btrfs internals. When we do file operations we go based on the extent pointers we find in the tree and i_size (i_size is strictly maintained). Would it be too heavy of an operation to have stat walk the btrfs tree to get its data? The incorrect results are confusing but they don't hurt the metadata itself. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The value displayed by 'ls -s' command is strange.
On Tue, Dec 7, 2010 at 12:15 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Mike Fedyk's message of 2010-12-07 15:07:08 -0500: On Tue, Dec 7, 2010 at 11:29 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Mike Fedyk's message of 2010-12-07 14:16:55 -0500: On Tue, Dec 7, 2010 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tsutomu Itoh's message of 2010-12-07 02:59:52 -0500: Hi, I think that the disk allocation size of each file becomes a monotone increase when the file is made. But, it sometimes return to 0. Is it correct? Well, there's a window during the processing of delayed allocation where we don't have the bytes recorded as delalloc and we don't have the bytes recorded in the inode yet. That's why they are showing up as zero. We don't call inode_add_bytes() until after we insert the extent, but we drop the delalloc byte count on the file before the IO is done. Fixing it will be a little tricky because all the extent accounting assumes the inode_add_bytes happens at extent insertion time. How does opening the inode with O_APPEND during this window know where to write the bytes? If it's a pointer/cursor to the EOF then that size could be used during the window. Is that right? This counter records the number of blocks allocated to the file, and reading it with ls -l or stat is somewhat racey by nature. Most of the time its fine, btrfs just has a really big window where the results from ls -l seem wrong. I see. Is it using per-cpu vars or something similar? Ok, so to make sure I fully understand I'm going to make some psuedo code based on your description. Our stat function returns the block count in the inode plus the number of bytes we have accounted as delayed allocation. stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes As we do writes to the file, the delayed allocation count goes up and then eventually we decide we need to do some IO. Before we do the IO, we have to decide where on the disk to write the extents. inode_a2 = inode_a1 inode_a1 and inode_a2 are the same inode, but inode_a2 has a different list of extents and is not written yet (in the case of appending, most of the extents will be the same in the two extent lists, but inode_a2 will have more extents for the newly appended data) Once that is decided, we decrement the count of delayed allocation bytes. This is when stat starts returning the wrong answer. inode_a2.bytes += inode_a1_delayed_allocation_bytes inode_a1_delayed_allocation_bytes -= inode_a1_delayed_allocation_bytes stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes Is it possible to have stat read from inode_a2 during this window? So it would be instead: stat = inode_a2.bytes Then we do the IO, and when the IO is done we actually insert the file extents into the file metadata. This is when stat starts returning the right answer again. /* implicit when write completes */ inode_a1 = inode_a2 kfree(inode_a2) stat = inode_a1.bytes + inode_a1_delayed_allocation_bytes The whole setup sounds strange, but this is how btrfs implements the semantics from data=ordered. We don't update the file to point to the new blocks until after the IO is done, so we never have to wait on the data IO before we can do a transaction commit. It avoids all kinds of latencies with fsync and other problems. One easy solution is to just add another counter in the in-memory inode for the number of bytes in flight that aren't accounted for in other places. But I'd rather not make the inode any bigger, so I'll have to think if we can solve this another way. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 800 GByte free, but no space left
On Thu, Dec 2, 2010 at 10:23 AM, Helmut Hullen hul...@t-online.de wrote: Btrfs Btrfs v0.19 btrfs in the kernel has been version 0.19 for a *long* time. The version number there may never change. How do you encode a feature mask in a version number? Some features may be in one tree but not upstreamed all together and other such minutiae. What you need to do is use a more recent kernel than 2.6.32 if you want to use btrfs (modulo backports, but let's not talk about that right now). So if you're using a kernel older than 2.6.36, then you should probably upgrade. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
On Wed, Dec 1, 2010 at 3:32 PM, Freddie Cash fjwc...@gmail.com wrote: On Wed, Dec 1, 2010 at 1:28 PM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Wed, Dec 01, 2010 at 12:24:28PM -0800, Freddie Cash wrote: On Wed, Dec 1, 2010 at 11:35 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: The idea is you are only charged for what blocks you have on the disk. Thanks, My point was that it's perfectly possible to have blocks on the disk that are effectively owned by two people, and that the person to charge for those blocks is, to me, far from clear. You either end up charging twice for a single set of blocks on the disk, or you end up in a situation where one person's actions can cause another person's quota to fill up. Neither of these is particularly obvious behaviour. As a sysadmin and as a user, quotas shouldn't be about physical blocks of storage used but should be about logical storage used. IOW, if the filesystem is compressed, using 1 GB of physical space to store 10 GB of data, my quota used should be 10 GB. Similar for deduplication. The quota is based on the storage *before* the file is deduped. Not after. Similar for snapshots. If UserA has 10 GB of quota used, I snapshot their filesystem, then my quota used would be 10 GB as well. As data in my snapshot changes, my quota used is updated to reflect that (change 1 GB of data compared to snapshot, use 1 GB of quota). So if I've got 10G of data, and I snapshot it, I've just used another 10G of quota? Sorry, forgot the per user bit above. If UserA has 10 GB of data, then UserB snapshots it, UserB's quota usage is 10 GB. If UserA has 10 GB of data and snapshots it, then only 10 GB of quota usage is used, as there is 0 difference between the snapshot and the filesystem. As UserA modifies data, their quota usage increases by the amount that is modified (ie 10 GB data, snapshot, modify 1 GB data == 11 GB quota usage). If you combine the two scenarios, you end up with: - UserA has 10 GB of data == 10 GB quota usage - UserB snapshots UserA's filesystem (clone), so UserB has 10 GB quota usage (even though 0 blocks have changed on disk) Please define where the owner of a subvolume/snapshot is stored. To my knowledge when you make a snapshot, you have the same set of files with the same set of owners and groups. Whatever user does the snapshot this does not change this unless chown or chgrp are used. Also a non-root user (or a process without CAP_whatever) should not be able to snapshot a subvolume where the root directory of that subvolume is not owned by the user attempting the snapshot. If you do not do so then you end up with the same security and quota issues that hard links have when you don't have separate filesystems. You could have separate subvolumes for / and /home/foo and user foo could snapshot / to /home/foo/exploit_later_001 and then foo can just wait for an exploit to come along for one of the binaries or libs in /home/foo/exploit_later_001 and own. Yes, snapshot creation should be more restricted than hard links, for good reason. I have other questions but the answer to this fundamental game changer may solve many of the mentioned issues. - UserA snapshots UserA's filesystem == no change to quota usage (no blocks on disk have changed) - UserA modifies 1 GB of data in the filesystem == 1 GB new quota usage (11 GB total) (1 GB of blocks owned by UserA have changed, plus the 10 GB in the snapshot) - UserB still only has 10 GB quota usage, since their snapshot hasn't changed (0 blocks changed) If UserA deletes their filesystem and all their snapshots, freeing up 11 GB of quota usage on their account, UserB's quota will still be 10 GB, and the blocks on the disk aren't actually removed (still referenced by UserB's snapshot). Basically, within a user's account, only the data unique to a snapshot should count toward the quota. Across accounts, the original (root) snapshot would count completely to the new user's quota, and then only data unique to subsequent snapshots would count. I hope that makes it more clear. :) All the different layers and whatnot get confusing. :) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly
On Wed, Dec 1, 2010 at 8:28 PM, Yan, Zheng yanzh...@21cn.com wrote: On Thu, Dec 2, 2010 at 11:42 AM, liubo liubo2...@cn.fujitsu.com wrote: On 12/01/2010 06:20 PM, liubo wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. This patch may lead btrfs panic. Since btrfs allows transaction under readonly fs state, which is a bit weird, btrfs does not even check the returned transaction from start_transaction, although it may return -ENOMEM. btrfs may do log replay even mount as readonly. What part is logged besides tree roots and/or superblocks? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Mon, Nov 29, 2010 at 12:02 AM, Li Zefan l...@cn.fujitsu.com wrote: (Cc: Sage Weil s...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. Great work! I have a suggestion on defaults when snapshots are created. I think they should default to being read-only and if they are meant to be read-write a flag can be set at creation time (and changable at a later time as well of course). This way user/admin preconceptions of a snapshot being read-only can be enforced by default, and the exception when you want a read-write snapshot can be available with a switch at the cli level (and probably a flag at the ioctl level). It gives one more natural distinction between a snapshot and a subvolume at the user conceptual level. What do you think? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Mon, Nov 29, 2010 at 12:41 PM, David Arendt ad...@prnet.org wrote: On 11/29/10 21:02, Mike Fedyk wrote: On Mon, Nov 29, 2010 at 12:02 AM, Li Zefanl...@cn.fujitsu.com wrote: (Cc: Sage Weils...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. Great work! I have a suggestion on defaults when snapshots are created. I think they should default to being read-only and if they are meant to be read-write a flag can be set at creation time (and changable at a later time as well of course). This way user/admin preconceptions of a snapshot being read-only can be enforced by default, and the exception when you want a read-write snapshot can be available with a switch at the cli level (and probably a flag at the ioctl level). It gives one more natural distinction between a snapshot and a subvolume at the user conceptual level. What do you think? I completely agree with you. I think lots of people use snapshots for backup purposes and these ones shouldn't be writable. by default. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On Mon, Nov 29, 2010 at 12:10 PM, Josef Bacik jo...@redhat.com wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c | 7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Did you mean there's no reason to? Also I guess you mean this in the case when there is no redundancy (single and raid0) as the other cases should recover from spurious EIO at run time. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Mon, Nov 29, 2010 at 1:31 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: This may sound excessive as any new concept introduction that late in development, but readonly/writable snapshots could be further differentiated by naming the latter clones. This way end-user would naturally perceive snapsot as read-only PIT fs image, while clone would naturally refer to (writable) head fork. I'm not sure we want to take all of the terminology that zfs uses as it may also bring the percieved drawbacks as well. Isn't there some additional overhead for a zfs clone compared to a snapshot? I'm not very familiar with zfs so that's why I ask. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Update to Project_ideas wiki page
On Wed, Nov 17, 2010 at 7:12 AM, Bart Noordervliet b...@noordervliet.net wrote: On Wed, Nov 17, 2010 at 15:31, Hugo Mills hugo-l...@carfax.org.uk wrote: On Tue, Nov 16, 2010 at 10:19:45PM -0500, Chris Ball wrote: == Changing RAID levels == We need ioctls to change between different raid levels. Some of these are quite easy -- e.g. for RAID0 to RAID1, we just halve the available bytes on the fs, then queue a rebalance. I would be interested in the rebalancing ioctls, and in RAID level management. I'm still very much trying to learn the basics, though, so I may go very slowly at first... Hugo. Can I suggest we combine this new RAID level management with a modernisation of the terminology for storage redundancy, as has been discussed previously in the Raid1 with 3 drives thread of March this year? I.e. abandon the burdened raid* terminology in favour of something that makes more sense for a filesystem. Mostly this would involve a discussion about what terms would make most sense, though some changes in the behaviour of btrfs redundancy modes may be warranted if they make things more intuitive. I could help you make these changes in your patches, or write my own patches against yours, though I'm also completely new to kernel development. That would inherently solve the need to convert between dup and raid1 as well. Why those are separate and why dup does not become raid1 when there are N 1 drives is beyond me. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs-progs: Update man page for mixed data+metadata option.
On Fri, Nov 12, 2010 at 6:28 AM, Marek Otahal markota...@gmail.com wrote: On Friday 12 of November 2010 18:44:12 you wrote: On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik jo...@redhat.com wrote: On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote: On 11/11/10 23:52, Josef Bacik wrote: This feature incurs a performance penalty in larger filesystems, it is recommended for use with filesystems of 1 GiB or smaller. Maybe slightly stronger, for example: This feature incurs a performance penalty for larger filesystems and it is ONLY recommended for use with filesystems of 1 GiB or smaller. Is it worth having a check and a warning printed if a user does try and make a filesystem larger than 1GiB with this option ? Just in case they don't RTFM... No because depending on your usage it's actually kind of usefull for anything less than 5 GiB, and you're only looking at about a 5-10% perf degredation when using it on larger filesystems. Thanks, Then a warning of 10% slowdown if 10GB would be good. It's surprising how many will just read some forum post and not concern themselves with the docs at all. And making them type yes if 100GB is probably a good idea too... My 2c: I'm against bloating the program just because of people who don't RTFM. Just mention it clearly in docs and that's enough, linux does what it's asked for, not the Are you really really sure you want to do this? known from some other OS. Anyway, btrfs-progs would be probably run by a user with root I was thinking of what ssh does when it sees a changed key... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/2] Control filesystem balances (kernel side)
[ sorry for breaking the thread, I'm replying from the archives, I was unsubbed after a mail server issue and didn't notice till now... ] On Sat, Oct 30, 2010 at 07:44:35PM +0200, Goffredo Baroncelli wrote: balance- info on balancing Hugo Mills wrote: For the one-value-per-file rule of sysfs, this should probably be balance_expected and balance_completed, each holding a count of block groups. I'd name it balance_chunks_expected and balance_chunks_completed -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add the btrfs filesystem label command
On Mon, Sep 13, 2010 at 12:24 PM, Goffredo Baroncelli kreij...@gmail.com wrote: +int get_label(char *btrfs_dev) +{ + + int ret; + ret = check_mounted(btrfs_dev); + if (ret 0) + { + fprintf(stderr, FATAL: error checking %s mount status\n, btrfs_dev); + return -1; + } + + if(ret != 0) + { + fprintf(stderr, FATAL: the filesystem has to be unmounted\n); + return -2; + } + get_label_unmounted(btrfs_dev); + return 0; +} + + Why can't the label be read while the fs is mounted? It shouldn't hurt anything. I can read the superblock on my ext3 fs while it's mounted... This is what people have come to expect. --- a/utils.c +++ b/utils.c @@ -638,6 +638,39 @@ int check_mounted(char *file) return ret; } +/* Gets the mount point of btrfs filesystem that is using the specified device. + * Returns 0 is everything is good, 0 if we have an error. + * TODO: Fix this fucntion and check_mounted to work with multiple drive BTRFS + * setups. + */ Typo: s/fucntion/function/g +int get_mountpt(char *dev, char *mntpt, size_t size) +{ + struct mntent *mnt; + FILE *f; + int ret = 0; + + f = setmntent(/proc/mounts, r); + if (f == NULL) + return -errno; + + while ((mnt = getmntent(f)) != NULL ) + { + if (strcmp(dev, mnt-mnt_fsname) == 0) + { + strncpy(mntpt, mnt-mnt_dir, size); + break; + } + } + + if (mnt == NULL) + { + /* We didn't find an entry so lets report an error */ + ret = -1; + } + + return ret; +} + struct pending_dir { struct list_head list; char name[256]; @@ -820,3 +853,27 @@ char *pretty_sizes(u64 size) return pretty; } +/* + * Checks to make sure that the label matches our requirements. + * Returns: + 0 if everything is safe and usable + -1 if the label is too long + -2 if the label contains an invalid character + */ +int check_label(char *input) +{ + int i; + int len = strlen(input); + + if (len BTRFS_LABEL_SIZE) { + return -1; + } + + for (i = 0; i len; i++) { + if (input[i] == '/' || input[i] == '\\') { + return -2; + } + } + + return 0; +} How can one char equal two chars? input[i] == '\\' This should never be able to happen. Right? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Remove useless condition
On Sun, Sep 12, 2010 at 6:56 AM, Jaswinder Singh Rajput jaswinderli...@gmail.com wrote: Hello, On Sun, Sep 12, 2010 at 5:59 PM, Johannes Weiner han...@cmpxchg.org wrote: On Sun, Sep 12, 2010 at 04:32:20PM +0530, Jaswinder Singh Rajput wrote: if (ret) is useless as it will be never NULL as in previous statement we are setting ret = prev for !ret If there is no match and no extent below the given file offset, `prev' will be NULL as well, no? So the check is not useless, it prevents throwing out a cached success in case of a lookup failure. Got it !! Wouldn't it be clearer and easier to read if prev was checked directly instead of checking ret after it becomes the same as prev? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)
On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor daniel.tay...@wdc.com wrote: Just an FYI reminder. The original test (2K files) is utterly pathological for disk drives with 4K physical sectors, such as those now shipping from WD, Seagate, and others. Some of the SSDs have larger (16K0 or smaller blocks (2K). There is also the issue of btrfs over RAID (which I know is not entirely sensible, but which will happen). The absolute minimum allocation size for data should be the same as, and aligned with, the underlying disk block size. If that results in underutilization, I think that's a good thing for performance, compared to read-modify-write cycles to update partial disk blocks. Block size = 4k Btrfs packs smaller objects into the blocks in certain cases. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A couple of questions
On Mon, May 31, 2010 at 11:06 AM, Paul Millar paul.mil...@desy.de wrote: Hi Chris, On Thursday 27 May 2010 18:00:44 Chris Mason wrote: I'd suggest that you look at T10 DIF and DIX, which are targeted at exactly this kind of thing. We're looking at integrating dif/dix into btrfs at some point. I've been keeping half-an-eye on T10's work in ensuring end-to-end integrity. That you guys are planning to integrate dif/dix support is certainly welcome news! In my use-case (a file-server that receives a new file from a remote client), I believe that, to ensure end-to-end integrity, the server software would have to push the client-supplied checksum into the FS when writing a new file. (I believe there's some T10 slides somewhere that show this use-case) -- or (equivalently) the server software obtains the FS checksum for the file and matches it against the client-supplied value. I'm deliberately taking the simplest case when the client has chosen the same checksum algorithm as the FS uses. In reality, this may not be the case, but we can probably cope with that. My concern is that, if the server-software doesn't push the client-provided checksum then the FS checksum (plus T-10 DIF/DIX) would not provide a rigorous assurance that the bytes are the same. Without this assurance, corruption could still occur; for example, within the server's memory. Have you taken into account the boundaries of the data checksums? Your app may checksum per file or some logical partition in the file format. Btrfs does the checksum per-extent so unless you keep track of where the extent boundaries are, that checksum will be useless to the userspace app. Also the app would be tied specifically to a storage technology. No matter how great foo might be, not everyone's going to use it. Also are you going to get this info over nfs, cifs, lustre, gluster, ceph, foo, bar and baz? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disk space accounting and subvolume delete
On Mon, May 31, 2010 at 12:01 PM, Bruce Guenter br...@untroubled.org wrote: On Wed, May 12, 2010 at 01:02:07PM +0800, Yan, Zheng wrote: Dropping a tree can be lengthy. It's not good to let sync wait for hours. For most linux FS, 'sync' just force an transaction/journal commit. I don't think they wait for large operations that can span multiple transactions to complete. What happens to the consistency of the filesystem if a crash happens during this process? There's a good test case for you to try. Let us know what you find. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/11] btrfs: remove unneeded null check in btrfs_rename()
On Sat, May 29, 2010 at 2:45 AM, Dan Carpenter erro...@gmail.com wrote: old_inode cannot be null here, because we dereference it unconditionally throughout the function. Signed-off-by: Dan Carpenter erro...@gmail.com diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fa6ccc1..0bc29be 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6487,10 +6487,8 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, * make sure the inode gets flushed if it is replacing * something. */ - if (new_inode new_inode-i_size - old_inode S_ISREG(old_inode-i_mode)) { + if (new_inode new_inode-i_size S_ISREG(old_inode-i_mode)) btrfs_add_ordered_operation(trans, root, old_inode); - } I think code like this is here because there are still a lot of features that are being added to btrfs and it's easier to have the additional checks than continually adding and removing them as the code changes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Confused by performance
On Mon, May 24, 2010 at 2:08 PM, K. Richard Pixley r...@noir.com wrote: I've just started to work with btrfs so I started with a benchmark. On four identical servers, (2 dual core cpus, single local disk), I built filesystems - ext3, ext4, nilfs2, and btrfs. I checked out a sizable code tree and timed a build. The build is parallelized to use 4 threads when possible. I'm seeing similar build times on ext[34] and nilfs2 but I'm seeing almost double the times for btrfs using default options. And I'm having trouble reconciling this performance cost with the benchmarks I'm seeing around the net. Is this a common result? Is there a trick to getting ext4 competitive performance out of btrfs? Is my application a poor choice for btrfs? Am I missing something obvious here? Please make sure you're testing with the latest btrfs from git or linus latest kernel. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID[56] status?
On Sun, May 23, 2010 at 1:55 PM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: Hi all It's about a year now since I saw the first posts about RAID[56] in Btrfs. Has this gotten any further? There are patches in development. Nothing ready to test yet. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] direct-io: do not merge logically non-contiguous requests
On Fri, May 21, 2010 at 10:03 AM, Josef Bacik jo...@redhat.com wrote: Btrfs cannot handle having logically non-contiguous requests submitted. For example if you have Logical: [0-4095][HOLE][8192-12287] Physical: [0-4095] [4096-8191] Normally the DIO code would put these into the same BIO's. The problem is we need to know exactly what offset is associated with what BIO so we can do our checksumming and unlocking properly, so putting them in the same BIO doesn't work. So add another check where we submit the current BIO if the physical blocks are not contigous OR the logical blocks are not contiguous. Signed-off-by: Josef Bacik jo...@redhat.com --- V1-V2 -Be more verbose in the in-code comment fs/direct-io.c | 20 ++-- 1 files changed, 18 insertions(+), 2 deletions(-) Btrfs has been pretty much self-contained (working well compiled against 2.6.32 for example). Is there a way that this wouldn't just start silently breaking for people compiling the latest btrfs with dkms against older kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/10] btrfs: Add error check for add_to_page_cache_lru
On Thu, May 20, 2010 at 12:18 AM, Miao Xie mi...@cn.fujitsu.com wrote: From: Liu Bo liubo2...@cn.fujitsu.com If add_to_page_cache_lru() returns -EEXIST, it indicates the page that belongs to this page_index has been added and this readahead action can go on to next page. If add_to_page_cache_lru() returns -ENOMEM, it should break for no memory left. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/compression.c | 19 --- 1 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 1d54c53..1bd4d92 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -480,10 +480,23 @@ static noinline int add_ra_bio_pages(struct inode *inode, if (!page) break; - if (add_to_page_cache_lru(page, mapping, page_index, - GFP_NOFS)) { + ret = add_to_page_cache_lru(page, mapping, page_index, + GFP_NOFS); + if (ret) { page_cache_release(page); - goto next; + + /* + * -EEXIST indicates the page has been added, so + * it can move on to next page. + */ + if (ret == -EEXIST) { + misses++; + if (misses 4) + break; Shouldn't this use a pre-processor label instead of hard coding compression sensitivity or readahead tuning? this way it'll be set in one place. + goto next; + } + + break; } end = last_offset + PAGE_CACHE_SIZE - 1; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/10] btrfs: fix wrong ctime when adding link
On Thu, May 20, 2010 at 12:22 AM, Miao Xie mi...@cn.fujitsu.com wrote: the ctime of file has not been updated when I create a link for it. Steps to reproduce: # touch file1 # stat -c %Z file1 1273592239 # link flink1 file1 # stat -c %Z file1 1273592239 -- have not been updated This patch fix this problem. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/inode.c | 8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a85b90c..5271887 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4218,8 +4218,12 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, btrfs_i_size_write(parent_inode, parent_inode-i_size + name_len * 2); - parent_inode-i_mtime = parent_inode-i_ctime = CURRENT_TIME; - ret = btrfs_update_inode(trans, root, parent_inode); + parent_inode-i_mtime = parent_inode-i_ctime = inode-i_ctime + = CURRENT_TIME; + + ret = btrfs_update_inode(trans, root, inode); + if (!ret) + ret = btrfs_update_inode(trans, root, parent_inode); You only update parent inode if write to current inode fails? Also should you be updating the ctime of parent inode even with link count of parent inode is not modified (btrfs always reports link count of 1 on directories)? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding mirroring to an existing filesystem
On Sun, May 16, 2010 at 8:38 AM, J G yoosty_...@yahoo.com wrote: --- On Sun, 5/16/10, Donald Gordon d...@dis.org.nz wrote: From: Donald Gordon d...@dis.org.nz Subject: Adding mirroring to an existing filesystem To: linux-btrfs@vger.kernel.org Date: Sunday, May 16, 2010, 4:39 AM Hi Is there some way I can add an extra disk as a mirror to an existing btrfs filesystem? Or must I create a new filesystem with RAID1, and then copy over all my data? https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Adding_New_Devices I would be sure to back my data up before performing this procedure ;) You should have good backups and keep having good backups before and during the use of btrfs. Period. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help message of btrfsctl does not tell anything about deletion of a subvolume
On Sat, May 15, 2010 at 11:47 AM, Andreas Philipp philipp.andr...@gmail.com wrote: Hi, The help message of the btrfsctl command does not tell anything about the deletion of a subvolume. See patch below. Kind regards, Andreas diff --git a/btrfsctl.c b/btrfsctl.c index be6bf25..3ed6f2d 100644 --- a/btrfsctl.c +++ b/btrfsctl.c @@ -56,7 +56,7 @@ static void print_usage(void) printf(\t-A device: scans the device file for a Btrfs filesystem\n); printf(\t-a: scans all devices for Btrfs filesystems\n); printf(\t-c: forces a single FS sync\n); - printf(\t-D: delete snapshot\n); + printf(\t-D: delete snapshot or subvolume\n); printf(\t-m [tree id] directory: set the default mounted subvolume to the [tree id] or the directory\n); printf(%s\n, BTRFS_BUILD_VERSION); We have a new command btrfs subvolume delete path which can be shortened even as far as btrfs s d path. Are we going to keep the btrfsctl program indefinitely when we have a replacement in the btrfs program? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Fix version.sh to work with dash
--- fs/btrfs/version.h |6 +++--- fs/btrfs/version.sh | 16 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/version.h b/fs/btrfs/version.h index 9bf3946..12f7e5c 100644 --- a/fs/btrfs/version.h +++ b/fs/btrfs/version.h @@ -1,4 +1,4 @@ -#ifndef __BTRFS_VERSION_H -#define __BTRFS_VERSION_H -#define BTRFS_BUILD_VERSION Btrfs +#ifndef __BUILD_VERSION +#define __BUILD_VERSION +#define BTRFS_BUILD_VERSION Btrfs 2010-05-04_08:46:49_-0700_ea1dcb3-dirty #endif diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh index a4576f2..0733eef 100755 --- a/fs/btrfs/version.sh +++ b/fs/btrfs/version.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/dash # # determine-version -- report a useful version for releases # @@ -8,10 +8,10 @@ v=v0.16 -which git /dev/null -if [ $? == 0 ]; then -git branch /dev/null -if [ $? == 0 ]; then +which git 21 /dev/null +if [ $? -eq 0 ]; then +git branch 21 /dev/null +if [ $? -eq 0 ]; then v=`git show --format='%ci_%h'|head -n 1|sed 's/[^a-z0-9_-:]/_/ig'` # Are there uncommitted changes? @@ -19,7 +19,7 @@ if [ $? == 0 ]; then if git diff-index --name-only HEAD | \ grep -v ^scripts/package \ | read dummy; then - v=$v-dirty + v=${v}-dirty fi fi fi @@ -29,9 +29,9 @@ echo #define __BUILD_VERSION .build-version.h echo #define BTRFS_BUILD_VERSION \Btrfs $v\ .build-version.h echo #endif .build-version.h -diff -q version.h .build-version.h /dev/null +diff -q version.h .build-version.h 21 /dev/null -if [ $? == 0 ]; then +if [ $? -eq 0 ]; then rm .build-version.h exit 0 fi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Change version.sh from last tag and hash to output last commit date and hash
The btrfs git repo doesn't have all of the tags from the base 2.6.32 kernel it's currently based upon and the btrfs module is regularly compiled against other kernels so this changes the version to be based upon the date and hash of the latest commit instead which is more relevant to most people testing. An example version string with this change: Btrfs 2010-04-06_09:37:47_-0400_9f680ce --- fs/btrfs/version.sh |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) mode change 100644 = 100755 fs/btrfs/version.sh diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh old mode 100644 new mode 100755 index 1ca1952..a4576f2 --- a/fs/btrfs/version.sh +++ b/fs/btrfs/version.sh @@ -12,10 +12,7 @@ which git /dev/null if [ $? == 0 ]; then git branch /dev/null if [ $? == 0 ]; then - if head=`git rev-parse --verify HEAD 2/dev/null`; then - if tag=`git describe --tags 2/dev/null`; then - v=$tag - fi + v=`git show --format='%ci_%h'|head -n 1|sed 's/[^a-z0-9_-:]/_/ig'` # Are there uncommitted changes? git update-index --refresh --unmerged /dev/null @@ -24,7 +21,6 @@ if [ $? == 0 ]; then | read dummy; then v=$v-dirty fi - fi fi fi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Fix version.sh to work with dash
Please ignore this patch, I will resend a fixed one. On Tue, May 4, 2010 at 9:12 AM, Mike Fedyk mfe...@mikefedyk.com wrote: --- fs/btrfs/version.h | 6 +++--- fs/btrfs/version.sh | 16 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/version.h b/fs/btrfs/version.h index 9bf3946..12f7e5c 100644 --- a/fs/btrfs/version.h +++ b/fs/btrfs/version.h @@ -1,4 +1,4 @@ -#ifndef __BTRFS_VERSION_H -#define __BTRFS_VERSION_H -#define BTRFS_BUILD_VERSION Btrfs +#ifndef __BUILD_VERSION +#define __BUILD_VERSION +#define BTRFS_BUILD_VERSION Btrfs 2010-05-04_08:46:49_-0700_ea1dcb3-dirty #endif diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh index a4576f2..0733eef 100755 --- a/fs/btrfs/version.sh +++ b/fs/btrfs/version.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/dash # # determine-version -- report a useful version for releases # @@ -8,10 +8,10 @@ v=v0.16 -which git /dev/null -if [ $? == 0 ]; then - git branch /dev/null - if [ $? == 0 ]; then +which git 21 /dev/null +if [ $? -eq 0 ]; then + git branch 21 /dev/null + if [ $? -eq 0 ]; then v=`git show --format='%ci_%h'|head -n 1|sed 's/[^a-z0-9_-:]/_/ig'` # Are there uncommitted changes? @@ -19,7 +19,7 @@ if [ $? == 0 ]; then if git diff-index --name-only HEAD | \ grep -v ^scripts/package \ | read dummy; then - v=$v-dirty + v=${v}-dirty fi fi fi @@ -29,9 +29,9 @@ echo #define __BUILD_VERSION .build-version.h echo #define BTRFS_BUILD_VERSION \Btrfs $v\ .build-version.h echo #endif .build-version.h -diff -q version.h .build-version.h /dev/null +diff -q version.h .build-version.h 21 /dev/null -if [ $? == 0 ]; then +if [ $? -eq 0 ]; then rm .build-version.h exit 0 fi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] Change version.sh from last tag and hash to output last commit date and hash
The btrfs git repo doesn't have all of the tags from the base 2.6.32 kernel it's currently based upon and the btrfs module is regularly compiled against other kernels so this changes the version to be based upon the date and hash of the latest commit instead which is more relevant to most people testing. An example version string with this change: Btrfs 2010-04-06_09:37:47_-0400_9f680ce --- fs/btrfs/version.sh |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) mode change 100644 = 100755 fs/btrfs/version.sh diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh old mode 100644 new mode 100755 index 1ca1952..a4576f2 --- a/fs/btrfs/version.sh +++ b/fs/btrfs/version.sh @@ -12,10 +12,7 @@ which git /dev/null if [ $? == 0 ]; then git branch /dev/null if [ $? == 0 ]; then - if head=`git rev-parse --verify HEAD 2/dev/null`; then - if tag=`git describe --tags 2/dev/null`; then - v=$tag - fi + v=`git show --format='%ci_%h'|head -n 1|sed 's/[^a-z0-9_-:]/_/ig'` # Are there uncommitted changes? git update-index --refresh --unmerged /dev/null @@ -24,7 +21,6 @@ if [ $? == 0 ]; then | read dummy; then v=$v-dirty fi - fi fi fi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] Fix version.sh to work with dash
--- fs/btrfs/version.sh | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/version.sh b/fs/btrfs/version.sh index a4576f2..d87daf4 100755 --- a/fs/btrfs/version.sh +++ b/fs/btrfs/version.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/sh # # determine-version -- report a useful version for releases # @@ -8,10 +8,10 @@ v=v0.16 -which git /dev/null -if [ $? == 0 ]; then -git branch /dev/null -if [ $? == 0 ]; then +which git 21 /dev/null +if [ $? -eq 0 ]; then +git branch 21 /dev/null +if [ $? -eq 0 ]; then v=`git show --format='%ci_%h'|head -n 1|sed 's/[^a-z0-9_-:]/_/ig'` # Are there uncommitted changes? @@ -19,7 +19,7 @@ if [ $? == 0 ]; then if git diff-index --name-only HEAD | \ grep -v ^scripts/package \ | read dummy; then - v=$v-dirty + v=${v}-dirty fi fi fi @@ -29,9 +29,9 @@ echo #define __BUILD_VERSION .build-version.h echo #define BTRFS_BUILD_VERSION \Btrfs $v\ .build-version.h echo #endif .build-version.h -diff -q version.h .build-version.h /dev/null +diff -q version.h .build-version.h 21 /dev/null -if [ $? == 0 ]; then +if [ $? -eq 0 ]; then rm .build-version.h exit 0 fi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops while attempting to mount degraded multi-device raid1 data/metadata btrfs filesystem
On Fri, Mar 26, 2010 at 11:06 AM, Josef Bacik jo...@redhat.com wrote: On Fri, Mar 26, 2010 at 10:49:57AM -0700, Mike Fedyk wrote: I still get this oops with the latest btrfs kernel code from git (as of 2010-03-21) compiled against 2.6.32.9-70.fc12.x86_64 Will you try this patch [PATCH] Btrfs: fail to mount if we have problems reading the block groups and see if it works? Thanks, Josef As already discussed on irc With that patch it doesn't oops anymore, but it still doesn't mount. I get this kernel output: btrfs: failed to read the system array on sda7 btrfs: open_ctree failed device fsid 3546a1a7a4563c4b-2b1289f58c64988c devid 1 transid 974 /dev/sda7 btrfs: allowing degraded mounts Failed to read block groups: -5 btrfs: open_ctree failed btrfs-debug-tree crashes on the FS: # time btrfs-debug-tree /dev/sda7 /tmp/sda7-debug-tree.out failed to read /dev/sde failed to read /dev/sdd failed to read /dev/sdc failed to read /dev/sdb btrfs-debug-tree: volumes.c:1381: btrfs_read_sys_array: Assertion `!(ret)' failed. Aborted (core dumped) real0m0.304s user0m0.001s sys 0m0.016s # wc /tmp/sda7-debug-tree.out 0 0 0 /tmp/sda7-debug-tree.out # rpm -qa |grep -i btrfs btrfs-progs-0.19-9.fc13.x86_64 I'll leave this patch running in my btrfs module for every-day testing. I'll await any patches you'd like me to test. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] scheduling while atomic: init/1/0x00000002
On Sat, Mar 13, 2010 at 3:49 PM, Phillip Michael oopsicra...@gmail.com wrote: I have a btrfs filesystem with three subvolumes. One of them (named arch64) has 64 bit linux, one (arch32) has 32 bit linux, and the third (files) has various files. After an unsuccessful tuxonice resume, the arch64 subvolume will no longer boot. It shows this bug: VFS: Mounted root (btrfs filesystem) readonly on device 0:13. Freeing unused kernel memory: 480k freed BFS CPU scheduler v0.315 by Con Kolivas. BUG: scheduling while atomic: init/1/0x0002 Modules linked in: Pid: 1, comm: init Not tainted 2.6.33-zen2-20100307-stable #6 Can you reproduce this error on stock 2.6.33 without the zen patches? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SSD Optimizations
On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote: I'm looking to try BTRFS on a SSD, and I would like to know what SSD optimizations it applies. Is there a comprehensive list of what ssd mount option does? How are the blocks and metadata arranged? Are there options available comparable to ext2/ext3 to help reduce wear and improve performance? Specifically, on ext2 (journal means more writes, so I don't use ext3 on SSDs, since fsck typically only takes a few seconds when access time is 100us), I usually apply the -b 4096 -E stripe-width = (erase_block/4096) parameters to mkfs in order to reduce the multiple erase cycles on the same underlying block. Are there similar optimizations available in BTRFS? I think you'll get more out of btrfs, but another thing you can look into is ext4 without the journal. Support was added for that recently (thanks to google). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cross-subvolume link causes kernel BUG
On Mon, Mar 8, 2010 at 10:26 AM, Bruce Guenter br...@untroubled.org wrote: On Mon, Mar 08, 2010 at 12:39:38PM -0500, Chris Ball wrote: I think this is fixed in 2.6.33, as a result of the patch below. Let us know if you see a segfault on 2.6.33, or after applying this patch to your current kernel. This patch does fix the problem for 2.6.32.9, thanks. Has this patch been submitted for the 2.6.32.y series? Btrfs patches aren't in any stable series yet. Also I suspect -stable for .32 will stop soon. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cross-subvolume link causes kernel BUG
On Mon, Mar 8, 2010 at 1:21 PM, Mike Fedyk mfe...@mikefedyk.com wrote: On Mon, Mar 8, 2010 at 10:26 AM, Bruce Guenter br...@untroubled.org wrote: On Mon, Mar 08, 2010 at 12:39:38PM -0500, Chris Ball wrote: I think this is fixed in 2.6.33, as a result of the patch below. Let us know if you see a segfault on 2.6.33, or after applying this patch to your current kernel. This patch does fix the problem for 2.6.32.9, thanks. Has this patch been submitted for the 2.6.32.y series? Btrfs patches aren't in any stable series yet. Also I suspect -stable for .32 will stop soon. Oh, disregard that comment about -stable and .32. I forgot it will be maintained for a few years -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oops while attempting to mount degraded multi-device raid1 data/metadata btrfs filesystem
Hi, I get an oops with 2.6.33-0.46.rc8.git1.fc13.x86_64 while trying to mount a degraded raid1 btrfs filesystem. Here are the steps I performed to get to this stage. - Install fedora12 btrfs / on sda2 - mkfs.btrfs -m raid1 -d raid1 /dev/sda7 - cp -a from sda2 to sda7 - reboot into sda7 as / - btrfs-vol -a /dev/sda2 / - btrfs-vol -b / (system hangs here) - reboot (boot fails with ctree error) [1] - boot into fedora 12 recovery cd (based on 2.6.31). after running btrfsctl -a, the filesystem is mountable. - umount - dd bs=1M count=2000 /dev/zero /dev/sda2 - mount /dev/sda7, get oops (on 2.6.31) - install fedora12 btrfs / on sda2 - update to 2.6.33-0.46.rc8.git1.fc13.x86_64 - btrfsctl -a - mount /dev/sda7 (get oops below) 1. Turns out that fedora12 doesn't have a call to btrfsctl -a in the boot process. Working on a patch for that... Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.33-0.46.rc8.git1.fc13.x86_64 (mockbu...@x86-04.phx2.fedoraproject.org) (gcc version 4.4.3 20100211 (Red Hat 4.4.3-6) (GCC) ) #1 SMP Tue Feb 16 19:47:00 UTC 2010 Command line: ro root=UUID=7d23c60c-c072-431d-971a-87bcf61ac6a2 LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet BIOS-provided physical RAM map: BIOS-e820: - 0009f000 (usable) BIOS-e820: 0009f000 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - afff (usable) BIOS-e820: afff - afff3000 (ACPI NVS) BIOS-e820: afff3000 - b000 (ACPI data) BIOS-e820: b000 - c000 (reserved) BIOS-e820: f000 - f400 (reserved) BIOS-e820: fec0 - 0001 (reserved) BIOS-e820: 0001 - 00014000 (usable) NX (Execute Disable) protection: active DMI 2.3 present. Phoenix BIOS detected: BIOS may corrupt low RAM, working around it. e820 update range: - 0001 (usable) == (reserved) No AGP bridge found last_pfn = 0x14 max_arch_pfn = 0x4 MTRR default type: uncachable MTRR fixed ranges enabled: 0-9 write-back A-B uncachable C-C7FFF write-protect C8000-F uncachable MTRR variable ranges enabled: 0 base 00 mask FF8000 write-back 1 base 008000 mask FFC000 write-back 2 base 01 mask FFC000 write-back 3 disabled 4 disabled 5 disabled 6 disabled 7 disabled TOM2: 00014000 aka 5120M x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 e820 update range: c000 - 0001 (usable) == (reserved) last_pfn = 0xafff0 max_arch_pfn = 0x4 initial memory mapped : 0 - 2000 found SMP MP-table at [880f3c90] f3c90 init_memory_mapping: -afff 00 - 00afe0 page 2M 00afe0 - 00afff page 4k kernel direct mapping tables up to afff @ 16000-1b000 init_memory_mapping: 0001-00014000 01 - 014000 page 2M kernel direct mapping tables up to 14000 @ 19000-1f000 RAMDISK: 37435000 - 37fefbfc ACPI: RSDP 000f8370 00014 (v00 Nvidia) ACPI: RSDT afff3040 00038 (v01 Nvidia AWRDACPI 42302E31 AWRD ) ACPI: FACP afff30c0 00074 (v01 Nvidia AWRDACPI 42302E31 AWRD ) ACPI: DSDT afff3180 0631A (v01 NVIDIA AWRDACPI 1000 MSFT 010E) ACPI: FACS afff 00040 ACPI: SSDT afff95c0 00248 (v01 PTLTD POWERNOW 0001 LTP 0001) ACPI: HPET afff9880 00038 (v01 Nvidia AWRDACPI 42302E31 AWRD 0098) ACPI: MCFG afff9900 0003C (v01 Nvidia AWRDACPI 42302E31 AWRD ) ACPI: APIC afff9500 0007C (v01 Nvidia AWRDACPI 42302E31 AWRD ) ACPI: Local APIC address 0xfee0 Scanning NUMA topology in Northbridge 24 No NUMA configuration found Faking a node at -00014000 Bootmem setup node 0 -00014000 NODE_DATA [0001a000 - 00032fff] bootmap [00033000 - 0005afff] pages 28 (13 early reservations) == bootmem [00 - 014000] #0 [00 - 001000] BIOS data page == [00 - 001000] #1 [000100 - 00029b9138]TEXT DATA BSS == [000100 - 00029b9138] #2 [0037435000 - 0037fefbfc] RAMDISK == [0037435000 - 0037fefbfc] #3 [00029ba000 - 00029ba0b9] BRK == [00029ba000 - 00029ba0b9] #4 [0f3ca0 - 10]BIOS reserved == [0f3ca0 - 10] #5 [0f3c90 - 0f3ca0] MP-table mpf == [0f3c90 - 0f3ca0] #6 [09f000 - 0f1fe4]BIOS reserved == [09f000 - 0f1fe4] #7 [0f2140 - 0f3c90]BIOS reserved == [0f2140 - 0f3c90] #8 [0f1fe4 - 0f2140] MP-table mpc == [0f1fe4 - 0f2140] #9 [01 - 012000] TRAMPOLINE == [01 - 012000] #10 [012000 -
Re: Raid1 with 3 drives
On Fri, Mar 5, 2010 at 1:49 PM, Bart Noordervliet b...@noordervliet.net wrote: On Fri, Mar 5, 2010 at 21:31, Josef Bacik jo...@redhat.com wrote: Since I have three devices in a RAID1 pool, can it survive 2 drive failures? Yes, tho you won't be able to remove more than 1 at a time (since it wants you to keep at least two disks around). Thanks, Josef Hmm, I would expect the raid1 data mode to keep 2 copies of each file and thus yield 50% effective storage capacity, even with 3 disks. I see no real reason to stick with the full-disk mirroring mentality of previous raid systems since raid implemented in a filesystem works differently. Or would it be difficult to implement btrfs raid1 like this? Maybe it's worth to consider leaving the burdened raid* terminology behind and name the btrfs redundancy modes more clearly by what they do. For instance -d double|triple or -d 2n|3n. And for raid5/6 -d single-parity|double-parity or -d n+1|n+2. +1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assertion failures
On Fri, Feb 26, 2010 at 10:11 AM, Bill Pemberton wf...@viridian.itc.virginia.edu wrote: Does the array have any kind of writeback cache? Yes, the array has a writeback cache. Are all of the filesystems spread across all of the drives? Or do some filesystems use some drives only? In all cases the array is presenting 1 physical volume to the host system (which is RAID 6 on the array itself). That physical volume is made into a volume group and the filesystems are on logical volumes in that volume group. I wonder if the barrier messages are making it to this write back cache. Do you see any messages about barriers in your kernel logs? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs no csum found for inode X start 0
On Thu, Feb 25, 2010 at 12:52 PM, Leszek Ciesielski skol...@gmail.com wrote: I have changed the btrfs code to ignore checksum failures and now I can read files correctly from the filesystem. Also, moving them onto another volume and then back into btrfs fixes the checksums and no more errors are reported for the file in question. Quick and dirty code I used for getting my files out: Yes, but did you verify your data? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3 V3] btrfs: a new tool to manage a btrfs filesystem
On Wed, Feb 24, 2010 at 3:35 PM, Chris Mason chris.ma...@oracle.com wrote: On Mon, Feb 22, 2010 at 07:47:40PM +0100, Goffredo Baroncelli wrote: On Monday 22 February 2010, Mike Fedyk wrote: On Sun, Feb 21, 2010 at 8:40 AM, Goffredo Baroncelli kreij...@gmail.com wrote: filesystem resize [+/-]size[gkm]|max filesystem -filesystem resize [+/-]size[gkm]|max filesystem +filesystem resize [+/-]size[gkm]|max dev This command works on devices, not paths. Are you sure ? To me it results (test and code inspection) to work on path. The ioctl takes a path so that it knows which btrfs filesystem to change. Then how does it know which device to shrink in a multi-device filesystem? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3 V3] btrfs: a new tool to manage a btrfs filesystem
On Sun, Feb 21, 2010 at 8:40 AM, Goffredo Baroncelli kreij...@gmail.com wrote: filesystem resize [+/-]size[gkm]|max filesystem -filesystem resize [+/-]size[gkm]|max filesystem +filesystem resize [+/-]size[gkm]|max dev This command works on devices, not paths. Resize a filesystem identified by path. The size parame‐ -Resize a filesystem identified by path. The size parame‐ +Resize a filesystem identified by dev. The size parame‐ ter specifies the new size of the filesystem. If the prefix + or - is present the size is increased or decreased by the quantity size. If no units are specified, the unit of the size parameter defaults to bytes. Optionally, the size parameter may be suffixed by one of the following the units designators: 'K', 'M', or 'G', kilobytes, megabytes, or giga‐ bytes, respectively. If 'max' is passed, the filesystem will occupy all available space on the volume(s). The resize command does not manipulate the size of underlying partitions. If you wish to enlarge/reduce a filesystem, you -partitions. If you wish to enlarge/reduce a filesystem, you +partition. If you wish to enlarge/reduce a filesystem, you must make sure you can expand/reduce the size of the parti‐ tion also. -must make sure you can expand/reduce the size of the parti‐ -tion also. +must make sure you can expand the partition before enlarging +the filesystem and shrink the partition after reducing the size +of the filesystem. filesystem show [uuid|label] Show the btrfs filesystem with some additional info. If no UUID or label is passed, btrfs show info of all the btrfs filesystem. device balance|-b path -device balance|-b path +device balance path Mike -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
df shows wrong device while waiting for umount
Hi, Kernel 2.6.33-0.46.rc8.git1.fc13.x86_64 I think I ran into the issue that triggers when you write the a btrfs filesystem and then umount it and it takes a long time while writing out the data. It ends up writing at about 1MiB/second according to dstat. My understanding this issue is already fixed in the latest code. But that is not the issue I am reporting. While waiting for the umount to complete, df shows the wrong device. # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda2 9.8G 7.5G 2.3G 77% / tmpfs 1.9G 536K 1.9G 1% /dev/shm /dev/sda6 485M 62M 398M 14% /boot /dev/sda7 111G 464M 111G 1% /mnt/t # umount /mnt/t # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda2 9.8G 7.5G 2.3G 77% / tmpfs 1.9G 536K 1.9G 1% /dev/shm /dev/sda6 485M 62M 398M 14% /boot /dev/sda7 9.8G 7.5G 2.3G 77% /mnt/t The second df was run while waiting for the 111GB partition to unmount. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs testing suite?
On Fri, Feb 19, 2010 at 10:46 AM, Mr. Tux tuxoho...@hotmail.de wrote: Hi listIs there a btrfs testing suite the btrfs developers use to check the codebase? I did some research and found a projectcalled xfstests-dev. It supports ext4 as well - are there any patches to get btrfs support with xfstests? There are also different versions of fsx that attempt to fuzz the filesystems. Btrfs has additional edges that need to be fuzzed so it will need to be extended. Running those as well as normal every day use and your typical workload will help btrfs most. Just install it and use it like normal, but make sure you have backups and another system you can switch to if something goes wrong. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs: a new tool to manage a btrfs filesystem
On Fri, Feb 19, 2010 at 12:12 PM, Goffredo Baroncelli kreij...@gmail.com wrote: Hi all, on the basis of the suggestion received, I update my btrfs tool. The main changes are: - removed the short form of the command (like '-C') - deployed the multi level command (i.e.: btrfs snapshot create) - split the source in three files. This because the new parses are quite big (about 295 lines; for example btrfsctl.c are only 239 lines). The multi level command parser is quite flexible. They accept the full- length command (btrfs subvolume create) and a contract form (btrfs subvol cr). The commands may be arbitrary shortly (even 1 chars) but they have to be un- ambiguous. For example - btrfs s s - OK (matches 'btrfs subvolume snapshot' only) - btrfs dev s - FAIL (matches both 'btrfs dev show' and 'btrfs dev scan') The parser highlights which part of the command are ambiguous. This is a RFC because there is no agreement about the name of the command. I am proposing the following structure: btrfs object action where object are: - subvolume (valid action: create, delete, snapshot, list [not implemented]) - filesystem (valid action: defrag, sync, resize [not implemented]) - device (valid action: add, delete, scan, show, balance) You can find the source at http://cassiopea.homelinux.net/git/btrfs-command.git (commit 3deec45d18879d60b4032dc1f8895d7b7e1211ec, remember to switch to the remotes/origin/multi-level-command branch (I hate git!!!) BR G.Baroncelli $ git diff remotes/origin/orig | diffstat Makefile | 6 btrfs.c | 73 ++ btrfs_cmds.c | 587 +++[...] btrfs_cmds.h | 30 ++ btrfs_cmds_parse.c | 296 + man/Makefile | 5 man/btrfs.8.in | 148 13 files changed, 1291 insertions(+), 2 deletions(-) $ ./btrfs Usage: btrfs subvolume snapshot [dest/]name Create a writeble snapshot of the subvolume source with the name name in the dest directory. btrfs subvolume delete subvolume Delete the subvolume subvolume. btrfs subvolume create [dest/]name Create a subvolume in dest (or the current directory if not passed). btrfs filesystem defrag file|dir [file|dir...] Defragment a file or a directory. btrfs device scan [device [device..] Scan all device for or the passed device for a btrfs filesystem. btrfs filesystem sync path Force a fs sync on the filesystem path btrfs filesystem resize [+/-]newsize[gkm]|max filesystem Resize the file system. If 'max' is passed, the filesystem will occupe all available space on the device. btrfs device show [dev|label...] Show the btrfs devices btrfs device balance path Balance the chunk across the device btrfs device add dev [dev..] path Add a device to a filesystem btrfs device delete dev [dev..] path Remove a device from a filesystem btrfs help|--help|-h Show the help. Btrfs v0.19-22-g07a97f0-dirty $ man man/btrfs.8.in | cat BTRFS(8) btrfs BTRFS(8) NAME btrfs - control a btrfs filesystem SYNOPSIS btrfs subvolume snapshot source [dest/]name btrfs subvolume delete subvolume btrfs subvolume create [dest/]name btrfs filesystem defrag file|dir [file|dir...] btrfs filesystem fssync path btrfs filesystem resize [+/-]size[gkm]|max filesystem btrfs device scan [device [device..]] btrfs device show dev|label [dev|label...] btrfs device balance path btrfs device add dev [dev..] path btrfs device delete dev [dev..] path ] btrfs help|--help|-h DESCRIPTION btrfs is used to control the filesystem and the files and directo‐ ries stored. It is the tool to create or destroy a new snapshot or a -create or destroy a new snapshot +create or destroy a snapshot new subvolume for the filesystem, to defrag a file or a directory, -new subvolume for the filesystem, to defrag a file or a directory +subvolume for the filesystem, defrag a file or a directory to flush the dato to the disk, to resize the filesystem, to scan the device. -to flush the dato to the disk, to resize the filesystem, to scan the +flush the data to the disk, resize the filesystem, scan the It is possible to abbreviate the commands unless the commands are ambiguous. For example: it is possible to run btrfs sub snaps instead of btrfs subvolume snapshot. But btrfs dev s is not
Re: [Regression] Filesystem I/O is CPU-bound in rc7 and rc8
On Sat, Feb 13, 2010 at 7:11 PM, James Cloos cl...@jhcloos.com wrote: Sometime between rc6 and rc7 all filesystem I/O started using 100% CPU, usually on the order of 60% sys, 40% user. I've tried this with each of ext4, jfs and btrfs filesystems. All show the same issue. Are you sure you're not running with any of the debugging options enabled? I see the same, but I have debugging enabled (rawhide kernel). Using dd(1) to read from the block specials directly works as well and as fast as it always has; only reading or writing to mounted filesystems is affected. Box is 32-bit x86, PentiumIII-M; drives are ide using libata. If the btrfs fs is mounted, the slowdown is enought to trigger the hung_task call trace (120s) on the btrfs-transac process. But the regression is just as apparent when only jfs and ext4 are mounted. The only filesystems I've found which avoid the regression are tmpfs and devtmpfs. I didn't have time to write up a report when I noticed this in rc7 but had to boot back into rc6 for work. Some of the commits since rc7 looked like they might have addressed this regression, but it persists in rc8. -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Which volume? no space left, need 4096, 274432 delalloc bytes, 8360148992 bytes_used, 4096 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total
If I had more than one btrfs volume, how would I know which volume caused these errors? Sure I can look at df and btrfs-show, but shouldn't these messages say definitively? Feb 19 04:31:26 dt01 kernel: no space left, need 4096, 274432 delalloc bytes, 8360148992 bytes_used, 4096 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:34:02 dt01 kernel: no space left, need 4096, 270336 delalloc bytes, 8360153088 bytes_used, 4096 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:43:18 dt01 kernel: no space left, need 270336, 36864 delalloc bytes, 8360140800 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:44:03 dt01 kernel: no space left, need 270336, 49152 delalloc bytes, 8360165376 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:44:09 dt01 kernel: device fsid 5e44906058064a32-b179f2f9b4e606a9 devid 1 transid 7 /dev/sda7 Feb 19 04:46:18 dt01 kernel: no space left, need 270336, 77824 delalloc bytes, 8360275968 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:47:03 dt01 kernel: no space left, need 270336, 102400 delalloc bytes, 8360300544 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:52:18 dt01 kernel: no space left, need 270336, 65536 delalloc bytes, 8360214528 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:53:03 dt01 kernel: no space left, need 270336, 36864 delalloc bytes, 8360280064 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 04:53:48 dt01 kernel: no space left, need 270336, 32768 delalloc bytes, 8360329216 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 05:27:48 dt01 kernel: no space left, need 270336, 36864 delalloc bytes, 8360140800 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 05:27:58 dt01 kernel: device fsid 5945de3116704a8a-11f3234550356c85 devid 1 transid 7 /dev/sda7 Feb 19 05:30:34 dt01 kernel: no space left, need 270336, 53248 delalloc bytes, 8360284160 bytes_used, 12288 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 05:40:18 dt01 kernel: no space left, need 270336, 32768 delalloc bytes, 8360378368 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:06:33 dt01 kernel: no space left, need 270336, 28672 delalloc bytes, 8360153088 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:07:18 dt01 kernel: no space left, need 270336, 53248 delalloc bytes, 8360169472 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:08:03 dt01 kernel: no space left, need 270336, 65536 delalloc bytes, 8360206336 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:14:03 dt01 kernel: no space left, need 270336, 69632 delalloc bytes, 8360251392 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:14:30 dt01 kernel: no space left, need 270336, 94208 delalloc bytes, 8360251392 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:14:42 dt01 kernel: no space left, need 94208, 24576 delalloc bytes, 8360325120 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:15:35 dt01 kernel: no space left, need 270336, 57344 delalloc bytes, 8360349696 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:40:19 dt01 kernel: no space left, need 270336, 49152 delalloc bytes, 8360202240 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:41:49 dt01 kernel: no space left, need 270336, 65536 delalloc bytes, 8360271872 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:42:12 dt01 kernel: no space left, need 94208, 98304 delalloc bytes, 8360271872 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:42:33 dt01 kernel: no space left, need 270336, 86016 delalloc bytes, 8360292352 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:44:03 dt01 kernel: no space left, need 270336, 73728 delalloc bytes, 8360341504 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:47:48 dt01 kernel: no space left, need 270336, 24576 delalloc bytes, 8360312832 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 8360427520 total Feb 19 06:48:25 dt01 kernel: no space left, need 270336, 45056 delalloc bytes,
Re: [PATCH 0/2 V2] btrfs: a new tool to manage a btrfs filesystem
On Thu, Feb 18, 2010 at 8:58 AM, Chris Mason chris.ma...@oracle.com wrote: I do like the subcommand method, more details below. On Wed, Feb 17, 2010 at 03:35:26PM -0800, Mike Fedyk wrote: I think he need some command hierarchy here. On Wed, Feb 17, 2010 at 12:02 PM, Goffredo Baroncelli kreij...@gmail.com wrote: OPTIONS snapshot|-s source [dest/]name Create a writeble snapshot of the subvolume source with the name name in the dest directory. If source is not a sub‐ volume, btrfs returns an error. This should be btrfs subvolume snapshot source [dest/]name. It only works on subvolumes. If we can type subvol instead of subvolume I like it. Basically the perl/python arg parsing system where any short form of the command that uniquely matches it is allowed. We keep the long forms but allow the user to pick a shorter form if it isn't ambiguous. Yes, I agree. This is why I compared it with the ip command which does the same. delete|-D subvolume Delete the subvolume subvolume. If subvolume is not a sub‐ volume, btrfs returns an error. This becomes: btrfs subvolume delete subvolume subvol del (same as above). This works with snapshots as well. subvolume|-c [dest/]name Create a subvolume in dest (or in the current directory if dest is not passed). btrfs subvolume create [dest/]name defrag|-f file|dir [file|dir...] Defragment files and/or directories. This will defrag individual files? Does it defrag a directory tree? Does it defrag a subvolume? Does it defrag a pool? For now lets change this to only do files. That's the only thing the tool supports today. scan|-n [device [device..]] Scan devices for a btrfs filesystem. If no devices are passed, btrfs scans all the block devices. btrfs pool scan [device [device..]] Instead of btrfs pool, please use btrfs dev fssync|-y path Force a sync for the filesystem identified by path. Does it sync a pool or subvolume? Assuming it works against subvolumes, it would be: btrfs subvolume sync path resize|-z [+/-]size[gkm]|max filesystem Resize a file system identified by path. The size parameter specifies the new size of the filesystem. If the prefix + or - is present the size is increased or decreased by the quantity size. If no units are specified, the unit of the size parameter is the byte. Optionally, the size parameter may be suffixed by one of the following the units designators: 'K', 'M', or 'G', kilobytes, megabytes, or gigabytes, respectively. If 'max' is passed, the filesystem will occupy all available space on the volume(s). The resize command does not manipulate the size of underlying partitions. If you wish to enlarge/reduce a filesystem, you must make sure you can expand/reduce the size of the partition also. This works with physical devices, not a pool or subvolume. I get the name physical volume from lvm. Also I think it should resize to max without arguments, in order to do that, the size argument would need to be the last argument. We don't have physical volumes and logical volumes the way lvm does, so I'd like to avoid the pvolume theme. It becomes: btrfs pvolume resize [+/-]size[gkm]|max filesystem Or: btrfs pvolume resize filesystem [[+/-]size[gkm]] btrfs dev resize Dev works for me, I could only think of the lvm terms at the time. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 V2] btrfs: a new tool to manage a btrfs filesystem
On Thu, Feb 18, 2010 at 11:59 AM, Goffredo Baroncelli kreij...@gmail.com wrote: On Thursday 18 February 2010, Chris Mason wrote: I do like the subcommand method, more details below. I try to summarise your suggestions. But there are some cases not to clear for me. I grouped the commands in three categories: subvolume, devices, and filesystem. devices scan devices show devices balance devices add devices remove subvolume snapshot subvolume delete subvolume create [subvolume list] filesystem resize [filesystem label] ??? defrag ??? sync For the first two categories both Chris and Mike agreed; but IMHO there are some commands that don't fit nor in devices, nor subvolume, like resize (we resize a filesystem) and label (not available now). A btrfs filesystem can span multiple devices. Resize resizes how big of a chunk of one device btrfs uses. This would be used by partitioning programs for instance. zfs uses the term pool instead of filesystem to solve this ambiguous use of the term filesystem since btrfs and zfs break people's existing definition of the word filesystem. I don't know how classify defrag (per file / directory level ?) and sync (filesystem ?) It turns out that defrag is per file, which seems most cumbersome. Maybe since it will probably eventually work against several types of objects we could have: btrfs defrag file file btrfs defrag directory directory btrfs defrag subvol subvol btrfs defrag pool pool An option is to consider commands without classification. For examples: $ btrfs subvolume create [path/]subvolname $ btrfs sync path $ btrfs defrag file Maybe if the btrfs developers are agreeable, we could do this as well: btrfs sync file file btrfs sync directory directory btrfs sync subvol subvol btrfs sync pool pool I'm not sure how useful syncing the pool or a directory tree would be, but I'll include it here for further discussion. Mike -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 V2] btrfs: a new tool to manage a btrfs filesystem
I think he need some command hierarchy here. On Wed, Feb 17, 2010 at 12:02 PM, Goffredo Baroncelli kreij...@gmail.com wrote: OPTIONS snapshot|-s source [dest/]name Create a writeble snapshot of the subvolume source with the name name in the dest directory. If source is not a sub‐ volume, btrfs returns an error. This should be btrfs subvolume snapshot source [dest/]name. It only works on subvolumes. delete|-D subvolume Delete the subvolume subvolume. If subvolume is not a sub‐ volume, btrfs returns an error. This becomes: btrfs subvolume delete subvolume This works with snapshots as well. subvolume|-c [dest/]name Create a subvolume in dest (or in the current directory if dest is not passed). btrfs subvolume create [dest/]name defrag|-f file|dir [file|dir...] Defragment files and/or directories. This will defrag individual files? Does it defrag a directory tree? Does it defrag a subvolume? Does it defrag a pool? scan|-n [device [device..]] Scan devices for a btrfs filesystem. If no devices are passed, btrfs scans all the block devices. btrfs pool scan [device [device..]] fssync|-y path Force a sync for the filesystem identified by path. Does it sync a pool or subvolume? Assuming it works against subvolumes, it would be: btrfs subvolume sync path resize|-z [+/-]size[gkm]|max filesystem Resize a file system identified by path. The size parameter specifies the new size of the filesystem. If the prefix + or - is present the size is increased or decreased by the quantity size. If no units are specified, the unit of the size parameter is the byte. Optionally, the size parameter may be suffixed by one of the following the units designators: 'K', 'M', or 'G', kilobytes, megabytes, or gigabytes, respectively. If 'max' is passed, the filesystem will occupy all available space on the volume(s). The resize command does not manipulate the size of underlying partitions. If you wish to enlarge/reduce a filesystem, you must make sure you can expand/reduce the size of the partition also. This works with physical devices, not a pool or subvolume. I get the name physical volume from lvm. Also I think it should resize to max without arguments, in order to do that, the size argument would need to be the last argument. It becomes: btrfs pvolume resize [+/-]size[gkm]|max filesystem Or: btrfs pvolume resize filesystem [[+/-]size[gkm]] show|-l [dev|label...] Show the btrfs devices with some additional info. If no devices or labels are passed, btrfs scans all the block devices. This becomes: btrfs pool show [dev|label...] balance|-b path Balance the chunk of the filesystem identified by path across the devices. Is path to one of the block devices in the pool? This becomes: btrfs pool balance path add-dev|-A dev [dev..] path Add device(s) to the filesystem identified by path. What is path? Somewhere the pool is mounted? The root of where the pool is mounted? this becomes: btrfs pvolume add dev [dev..] path rm-dev|-R dev [dev..] path Remove device(s) to the filesystem identified by path. (same questions as with add) This becomes: btrfs pvolume remove dev [dev..] path Mike -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs: a new tool to manage a btrfs filesystem
On Sun, Feb 14, 2010 at 7:39 AM, Goffredo Baroncelli kreij...@gmail.com wrote: Hi all, On Sunday 14 February 2010, Thomas Kupper wrote: Hi Goffredo, Great work! It is indeed much easier to work with one tool instead with the many of them! Usage: btrfs snapshot|-s source [dest/]name Create a writeble snapshot of the subvolume source with the name name in the dest directory. btrfs delete|-D subvolume Delete the subvolume subvolume. I backup up Mike on the opinion that the short options aren't what I would expect. Personally I'd prefer a command line syntax like git, command action [sub-action options|arguments] Seriously, you (as also Michel and Mike) raised some concern about the command line syntax. The main issues are: 1) possible mistaken between the '-d' (delete) command and '-D' (defrag) command. It was suggested to remove the short form command. 2) some commands are not very auto-explainant Regarding the point #1, I am against about removing the short command ('- s'...). If someone fears to mistake, he has the option to use the log form command. But I don't see any reason to force all others peoples to use the long form command. The problem here is maintainability of scripts when people use the short names. I will refer to the ip command used in linux networking. It has these subcommands: where OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable | tunnel | maddr | mroute | monitor | xfrm } Which are listed here: ip link ip addr ip addrlabel ip route ip rule ip neigh ip ntable ip tunnel ip maddr ip mroute ip monitor ip xfrm You can shorten them as long as they are not ambiguous: ip ro = ip route ip ru = ip rule ip a = ip addr ip l = ip link Those are the ones I used most personally. There are no equivalent short options, and you don't have different sets of people using different commands in scripts and howtos for instance. It builds a common base of knowledge and is easy to type from memory. Commands that document themselves are good IMO. ip route replace default via 1.2.3.4 Replace or set the current default to ip address 1.2.3.4 (the tool makes sure 1.2.3.4 is reachable by an already existing route and looks up the layer 2 address for that ip. It's not ip -r default -d 1.2.3.4 Now someone reading a howto or script with that hypothetical command will have to find out if -r is route or -R is rule. This is how the btrfs commands currently look to me. If there is an agreement I am open to rename the command -D/delete in order to reduce the conflict. For examples the -D/delete command may be renamed as - R/remove. The conflict with the -r/resize command is not a problem because the former requires 1 arguments, the latter two. Another renaming option may be - E/erase. This just illustrates my point. Btrfs has a rich feature set and the short option formats are only going to create more confusion because some of them will only be usable with a subset of operations and there will be so many things you can do with btrfs that explicit long options are needed to make it clear even to yourself what it does 6 months later. Mike -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zero-length files in snapshots
On Fri, Feb 12, 2010 at 7:19 AM, Josef Bacik jo...@redhat.com wrote: On Thu, Feb 11, 2010 at 08:50:48PM -0800, Mike Fedyk wrote: On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball c...@laptop.org wrote: echo x1 /mnt/x/d/foo.txt || exit 2 btrfsctl -s /mnt/x/snap /mnt/x/d You're just missing a sync/fsync() between these two lines. We argued on IRC a while ago about whether this is a sensible default; cmason wants the no-sync version of snapshot creation to be available, but was amenable to the idea of changing the default to be sync before snapshot, since it was pointed out that no-one other than him had understood we were supposed to be running sync first. You're saying that it only snapshots the on-disk data structures and not the in-memory versions? That can only lead to pain. What do you do if something else during this race condition? What would a sync do to solve this? Have the semantics of sync been changed in btrfs from sync everything that hasn't been written yet to sync this subvolume? Welcome to delalloc. You either get fast writes or you get all of your data on the disk every 5 seconds. If you don't like delalloc, use ext3. The data you've written to memory doesn't go down to disk unless explicitly told to, such as 1) fsync - this is obvious 2) vm - the vm has decided that this dirty page has been sitting around long enough and should be written back to the disk, could happen now, could happen 10 years from now. 3) sync - this is not as obvious. sync doesn't mean anything than start writing back dirty data to the fs, and returns before it's done. For btrfs what that means is we run through _every_ inode that has delalloc pages associated with them and start writeback on them. This will get most of your data into the current transaction, which is when the snapshot happens. If you don't want empty files, do something like this btrfsctl -c /dir/to/volume btrfsctl -s /dir/to/volume/snapshotname /dir/to/volume this is what we do with yum and its rollback plugin, and it works out quite well. Thanks, Then you broke your ordering guarantee. If the data isn't there, the meta-data shouldn't be there either. So the snapshots made before the data hits a transaction shouldn't have the file at all. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zero-length files in snapshots
On Fri, Feb 12, 2010 at 8:22 AM, Josef Bacik jo...@redhat.com wrote: On Fri, Feb 12, 2010 at 08:18:01AM -0800, Mike Fedyk wrote: On Fri, Feb 12, 2010 at 7:19 AM, Josef Bacik jo...@redhat.com wrote: On Thu, Feb 11, 2010 at 08:50:48PM -0800, Mike Fedyk wrote: On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball c...@laptop.org wrote: echo x1 /mnt/x/d/foo.txt || exit 2 btrfsctl -s /mnt/x/snap /mnt/x/d You're just missing a sync/fsync() between these two lines. We argued on IRC a while ago about whether this is a sensible default; cmason wants the no-sync version of snapshot creation to be available, but was amenable to the idea of changing the default to be sync before snapshot, since it was pointed out that no-one other than him had understood we were supposed to be running sync first. You're saying that it only snapshots the on-disk data structures and not the in-memory versions? That can only lead to pain. What do you do if something else during this race condition? What would a sync do to solve this? Have the semantics of sync been changed in btrfs from sync everything that hasn't been written yet to sync this subvolume? Welcome to delalloc. You either get fast writes or you get all of your data on the disk every 5 seconds. If you don't like delalloc, use ext3. The data you've written to memory doesn't go down to disk unless explicitly told to, such as 1) fsync - this is obvious 2) vm - the vm has decided that this dirty page has been sitting around long enough and should be written back to the disk, could happen now, could happen 10 years from now. 3) sync - this is not as obvious. sync doesn't mean anything than start writing back dirty data to the fs, and returns before it's done. For btrfs what that means is we run through _every_ inode that has delalloc pages associated with them and start writeback on them. This will get most of your data into the current transaction, which is when the snapshot happens. If you don't want empty files, do something like this btrfsctl -c /dir/to/volume btrfsctl -s /dir/to/volume/snapshotname /dir/to/volume this is what we do with yum and its rollback plugin, and it works out quite well. Thanks, Then you broke your ordering guarantee. If the data isn't there, the meta-data shouldn't be there either. So the snapshots made before the data hits a transaction shouldn't have the file at all. Nope, what is happening is fd = creat(file) - this is metadata that needs to be written write(fd, buf) - because of delalloc there is no metadata that is created for this operation, therefore it doesn't need to be written out. close(fd) so the file has metadata created for it, which needs to be written out. Because of delalloc there are no extents created or anything for the data, therefore there is nothing to write. Thanks, So file creation is effectively synchronous? So I could create a benchmark that creates millions of files and it would be limited to the IO OP performance of the disks? Why does file creation need to hit the disk before the contents (with limits to size of data that can fit in one transaction)? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs: a new tool to manage a btrfs filesystem
On Fri, Feb 12, 2010 at 11:01 AM, Goffredo Baroncelli kreij...@gmail.com wrote: Usage: btrfs delete|-D subvolume Delete the subvolume subvolume. btrfs defrag|-d file|dir [file|dir...] Defragment a file or a directory. I think the short options should be removed or else you'll still have the easy misuses of btrfs -d and btrfs -D. The best example would be the ip command which has commands and -[a-z] options that do different types of things. for instance, all of the short options are applicable to all commands and change the verbosity or format of the output. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel BUG on mounting BtrFS / after reboot
On Fri, Feb 12, 2010 at 1:04 PM, Alex Elsayed eternal...@gmail.com wrote: I'm getting a rather nasty BUG when I try to mount this filesystem, _including_ when I specify -o ro. I'm unsure what caused it, but the problem manifested after my computer hardlocked while reading my RSS feeds, complete with flashing lights. After I rebooted it, the screen filled with panic messages when the initramfs tried to mount it RO to pivot into. I am running 2.6.33-rc6. The BUG message is as follows: Is this the bug you mentioned on IRC that you fixed somehow? If so please post the steps you performed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zero-length files in snapshots
On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball c...@laptop.org wrote: echo x1 /mnt/x/d/foo.txt || exit 2 btrfsctl -s /mnt/x/snap /mnt/x/d You're just missing a sync/fsync() between these two lines. We argued on IRC a while ago about whether this is a sensible default; cmason wants the no-sync version of snapshot creation to be available, but was amenable to the idea of changing the default to be sync before snapshot, since it was pointed out that no-one other than him had understood we were supposed to be running sync first. You're saying that it only snapshots the on-disk data structures and not the in-memory versions? That can only lead to pain. What do you do if something else during this race condition? What would a sync do to solve this? Have the semantics of sync been changed in btrfs from sync everything that hasn't been written yet to sync this subvolume? From what I understand what should be happening is much like what LVM should do: step 1: defer all other writes to subvolume (userspace processes get stuck in D state until step 4) step 2: sync all changes not already committed to subvolume step 3: create snapshot step 4: resume writes from userspace Now if all 4 steps can be done with in-memory data structures without forcing data (not necessarily meta-data) to disk, so much the better. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html