[PATCH] btrfs: cleanup for open-coded alignment
Though most of the btrfs codes are using ALIGN macro for page alignment, there are still some codes using open-coded alignment like the following: -- u64 mask = ((u64)root-stripesize - 1); u64 ret = (val + mask) ~mask; -- Or even hidden one: -- num_bytes = (end - start + blocksize) ~(blocksize - 1); -- Sometimes these open-coded alignment is not so easy to understand for newbie like me. This commit changes the open-coded alignment to the ALIGN macro for a better readability. Also there is a previous patch from David Sterba with similar changes, but the patch is for 3.2 kernel and seems not merged. http://www.spinics.net/lists/linux-btrfs/msg12747.html Cc: David Sterba d...@jikos.cz Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 7 +++ fs/btrfs/extent_io.c | 8 fs/btrfs/file.c| 3 +-- fs/btrfs/inode.c | 37 +++-- fs/btrfs/tree-log.c| 3 +-- fs/btrfs/volumes.c | 3 +-- 6 files changed, 25 insertions(+), 36 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 61da9d0..6af2ef6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3357,7 +3357,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) int ret = 0, committed = 0, alloc_chunk = 1; /* make sure bytes are sectorsize aligned */ - bytes = (bytes + root-sectorsize - 1) ~((u64)root-sectorsize - 1); + bytes = ALIGN(bytes, root-sectorsize); if (root == root-fs_info-tree_root || BTRFS_I(inode)-location.objectid == BTRFS_FREE_INO_OBJECTID) { @@ -3452,7 +3452,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes) struct btrfs_space_info *data_sinfo; /* make sure bytes are sectorsize aligned */ - bytes = (bytes + root-sectorsize - 1) ~((u64)root-sectorsize - 1); + bytes = ALIGN(bytes, root-sectorsize); data_sinfo = root-fs_info-data_sinfo; spin_lock(data_sinfo-lock); @@ -5455,8 +5455,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, static u64 stripe_align(struct btrfs_root *root, u64 val) { - u64 mask = ((u64)root-stripesize - 1); - u64 ret = (val + mask) ~mask; + u64 ret = ALIGN(val, root-stripesize); return ret; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1b319df..17fa84b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2682,7 +2682,7 @@ static int __extent_read_full_page(struct extent_io_tree *tree, iosize = min(extent_map_end(em) - cur, end - cur + 1); cur_end = min(extent_map_end(em) - 1, end); - iosize = (iosize + blocksize - 1) ~((u64)blocksize - 1); + iosize = ALIGN(iosize, blocksize); if (this_bio_flag EXTENT_BIO_COMPRESSED) { disk_io_size = em-block_len; sector = em-block_start 9; @@ -2982,7 +2982,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, BUG_ON(extent_map_end(em) = cur); BUG_ON(end cur); iosize = min(extent_map_end(em) - cur, end - cur + 1); - iosize = (iosize + blocksize - 1) ~((u64)blocksize - 1); + iosize = ALIGN(iosize, blocksize); sector = (em-block_start + extent_offset) 9; bdev = em-bdev; block_start = em-block_start; @@ -3678,7 +3678,7 @@ int extent_invalidatepage(struct extent_io_tree *tree, u64 end = start + PAGE_CACHE_SIZE - 1; size_t blocksize = page-mapping-host-i_sb-s_blocksize; - start += (offset + blocksize - 1) ~(blocksize - 1); + start += ALIGN(offset, blocksize); if (start end) return 0; @@ -3797,7 +3797,7 @@ static struct extent_map *get_extent_skip_holes(struct inode *inode, len = last - offset; if (len == 0) break; - len = (len + sectorsize - 1) ~(sectorsize - 1); + len = ALIGN(len, sectorsize); em = get_extent(inode, NULL, 0, offset, len, 0); if (IS_ERR_OR_NULL(em)) return em; diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b06d289..81d6271 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -505,8 +505,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, loff_t isize = i_size_read(inode); start_pos = pos ~((u64)root-sectorsize - 1); - num_bytes = (write_bytes + pos - start_pos + - root-sectorsize - 1) ~((u64)root-sectorsize - 1); + num_bytes = ALIGN(write_bytes + pos - start_pos, root-sectorsize); end_of_last_block = start_pos + num_bytes - 1; err = btrfs_set_extent_delalloc(inode, start_pos,
Re: [PATCH, RFC] btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table
On 2013/02/26 16:05, Dave Chinner wrote: On Tue, Feb 26, 2013 at 01:25:11PM +0900, Tsutomu Itoh wrote: On 2013/02/26 13:06, Eric Sandeen wrote: On 2/25/13 9:55 PM, Tsutomu Itoh wrote: EXPERIMENTAL... It's certainly so. However, I think that we should not add the option that it troubles a lot of people. Well, I sent it as an RFC. Chris merged it; I'll defer to his judgement. Agreed. So, I sent revert request to Chris :) Where? I want to NACK the revert - this is not a matter to joke about. I apologize for my childish expression. I'll also defer to Chris's judgement. Thanks, Tsutomu You're all smart enough to know how to use shell aliases and script variables, so this need to type -f all the time argument holds absolutely no weight at all. Further, most of the time you're working on systems that donMy childish expression is mistaken. 't hold any data you care about and so the consequences of a mistake are very minor. However, users often make mistakes and we have to take that into account when deciding on the default behaviour of our tools. Tools that destroy data *must* err on the side of conservative default behaviour simply because of the fact that destroying the wrong data can have catastrophic consequences. It's not your data that is being destroyed, but it is data that you have a *responsibility to safeguard* as a filesystem developer. Think about it this way: how happy would an important customer be if they decided that *you* were directly responsible for a major data loss incident because they found it would have been prevented by the -f patch? I don't think that the explanation of it was inconvenient to me would be an acceptable answer. Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lvm volume like support
Am Dienstag, 26. Februar 2013 schrieb Fajar A. Nugraha: On Tue, Feb 26, 2013 at 11:59 AM, Mike Fleetwood mike.fleetw...@googlemail.com wrote: On 25 February 2013 23:35, Suman C schakr...@gmail.com wrote: Hi, I think it would be great if there is a lvm volume or zfs zvol type support in btrfs. Btrfs already has capabilities to add and remove block devices on the fly. Data can be stripped or mirrored or both. Raid 5/6 is in testing at the moment. https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devic es https://btrfs.wiki.kernel.org/index.php/UseCases#RAID Which specific features do you think btrfs is lacking? I think he's talking about zvol-like feature. In zfs, instead of creating a filesystem-that-is-accessible-as-a-directory, you can create a zvol which behaves just like any other standard block device (e.g. you can use it as swap, or create ext4 filesystem on top of it). But it would also have most of the benefits that a normal zfs filesystem has, like: - thin provisioning (sparse allocation, snapshot clone) - compression - integrity check (via checksum) Typical use cases would be: - swap in a pure-zfs system - virtualization (xen, kvm, etc) - NAS which exports the block device using iscsi/AoE AFAIK no such feature exist in btrfs yet. Sounds like the RADOS block device stuff for Ceph. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table
Am Dienstag, 26. Februar 2013 schrieb Tsutomu Itoh: Therefore I want you to revert commit:2a2d8e1962e8b6cda7b0a7584f6d2fb95d442cb6. btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table How do you think about it? What if you submit a patch to look at an environment variable, BTRFS_CLOBBERS_ALL=1 which causes it to not require -f to overwrite? Then you can just set it once at the top of your test environment, and not change every instance? Yes. But, (Most of my test scripts fails without -f. So I'll always type mkfs.btrfs -f) is one example. Almost everyone types mkfs.btrfs -f (or BTRFS_CLOBBERS_ALL=1 :) unconditionally, I think. So, I think -f option is almost meaningless. No. I don´t. And I teach not to in my trainings as well. Everyone who uses rm -rf by default even just for deleting a single file does it as long as he or she deleted his / her home directory or something. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hybrid Storage proposal
HI, It's a bit long so that i haven't read its whole, but i want to know if it has any collision with my ongoing feature btrfs hot relocation/migration? On Thu, Feb 21, 2013 at 12:46 AM, Matias Bjorling m...@itu.dk wrote: Here is a short proposal for the hybrid storage cache idea with introduction/motivation and a bird's eye view of an approach to implement a hybrid storage cache for btrfs. Please note that there is currently no available patches. We would like to get as much input before as possible before we start designing and implementing a solution. 1. Introduction The emerge of Solid State Drives (SSD) change how data is stored for fast accesses. Their high throughput and low latency characteristics make them a good choice for applications that traditionally require many hard-drives. SSDs are still more expensive per GB, making traditional hard-drives a good and affordable solution to store bulk amount of data. Often, the working set of a filesystem is smaller than the full capacity of a drive. We can exploit this by extending the often used bulk data on SSDs. We prioritize data that is often accessed randomly, while larger bulk operations are kept on bulk storage. Recent development in Linux SSD caches, uses a block IO approach to solve caching. The approach assumes that data is stable on disk and evicts data based on LRU, temparature, etc. This is great for read only IO patterns and in-place writes. However, btrfs uses a copy on write approach, that reduces the benefits of block IO caching. The block caches are unable to track updates (require extensive hints forth and back between the cache layer). Additionally, data and metadata is the same to the block layer. The internal file-system information available within btrfs allows separation of these types of updates and enables fine-grained control of a to-be implemented cache. 2 Overview The design space for a cache is divided into read and writes. For both read and write caches, we divide them into caching metadata (trees) accesses or user data. Writes are further divided into either being write-back or write-through. 2.1 Overview Any device attached to the storage pool should allow to be used as a cache. It is natural to store the cache in the already implemented chunk architecture (as cache chunks). Each allocated cache chunk may? be available to one or more subvolumes. 2.2 Caching hierarchy By adding an extra layer, we have the following access pattern: host memory - SSD or Disk - Disk. - Host memory caches lookup paths, transactions, free space infomation, etc. - SSD/disk cache frequently used data or writes for data that cannot be in host memory. - Traditional hard-drives store the largest amount of data and store a complete copy of all data. 2.3 Hotness tracking The data to cache is defined by some hotness algorithm, which can be applied to different layers of btrfs: - Inode level The recently implemented VFS hot track patches enable us to detect hotness for files within any given VFS file-system implementation. For reads that are within a tunable cache size, we either cache the tree leaf or its extent. For writes, we track the tree updates and write the tree updates to the SSD. If the file is larger and it receives a considerable amount of reads or writes, their hot subparts should be cached. - Tree level Within the fs, we can track the hotness of each tree. If a tree is read or updated frequently, it should be served by the SSD cache. - Extent level Hole or parts of extents should be tracked and cached if needed. 2.4 Cache opportunities: - Hotness tracking for random reads Define threshold for when to cache reads. Back of envelope calculation tells us to cache when IO size is below 1.5MB. This assumes 100 IO/s and a read speed of 150MB/s from the traditional drives. This should be tunable. If data is updated, we should follow the newly written data and evict the old data from the cache. As such, if the old data was cached, we make sure to also cache the new data. Implementation details: - Use the hot track patches for VFS to track when an inode is hot and then cache the reads. - Track CoW actions and pre-warm cache. - Write-back cache * Tree updates Updates to trees are batched and flushed every 30 seconds. Flush the updates to cache layer first, and then flush them later to bulk storage. When updates are flushed to bulk storage, we reorder IOs to be as sequential as possible. This optimization allows us to have higher throughput at the cost of sorting writes at flush time. The technique requires that we track tree updates between disk cache and disk. As our trees are append only, we can track the current generation and apply the difference at timed intervals or at mount/unmount
Re: [PATCH V2][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support
Am Montag, 25. Februar 2013 schrieb Zach Brown: I updates my previous patches [1] to add support for raid5/6. These patches update the btrfs fi df command and add two new commands: - btrfs filesystem disk-usage path - btrfs device disk-usage path This seems like a ton of code. Here's a thought experiment: What's the smallest possible change that could communicate the information that people don't have today? The kind and amount of information output of these additions have been discussed several times before. I found the output provided quite useful. As others. Free space seems to be a complex matter in BTRFS and one conclusion was that its not easily possible to provide a single number to show how much space is free. I´d still like that for df, whose output is quite bogus in certain BTRFS setups at the moment and does not give applications a realistic estimate at all. One example is raid 1 with 10 GB each disk. Shows 20 GB free. An application which wants to write 15 GB will fail. Which can break installer scripts, package management, cache software or anything else which checks for free space. Thus I´d like df to default to *minimum* free. But what is it concretely, what you feel uncomfortable with? The Linux kernel is also a ton of code. I´d really like to see Goffredo improvements go in instead of them being discussed endlessly. So I´d like to feedback to be as concrete as possible, so Goffredo has a chance to work on it. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not change inode flags in rename
On Tue, Feb 26, 2013 at 08:11:17AM +0800, Liu Bo wrote: On Mon, Feb 25, 2013 at 01:56:47PM -0500, Josef Bacik wrote: On Sun, Feb 24, 2013 at 09:04:42PM -0700, Liu Bo wrote: Before we forced to change a file's NOCOW and COMPRESS flag due to the parent directory's, but this ends up a bad idea, because it confuses end users a lot about file's NOCOW status, eg. if someone change a file to NOCOW via 'chattr' and then rename it in the current directory which is without NOCOW attribute, the file will lose the NOCOW flag silently. This diables 'change flags in rename', so from now on we'll only inherit flags from the parent directory on creation stage while in other places we can use 'chattr' to set NOCOW or COMPRESS flags. I'm of the mind we definitely shouldn't drop flags we've set previously, but I think we should also inherit any flags we have set on the directory, so if we move a file into a NOCOW directory we should inherit the flag. I'm not married to the idea, but it seems to make the most sense to me. Thanks, (Said in another thread) I'm ok with either one, but... from some reports on the list, end users are more likely to control, use chattr files by themselves, inheriting flags via moving a file to a new directory is indeed not very welcomed. So for practical use, I assume that it's fairly enough to inherit flags only on creation? I still haven't figured out in what cases the silent flag inheritance (for a non-empty) file would help and the user would be happy that it works like this. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table
On Tue, Feb 26, 2013 at 08:39:52AM +0900, Tsutomu Itoh wrote: This means that it is now required to change all occurrences of mkfs.btrfs to mkfs.btrfs -f everywhere. Can't we first establish a I also think so. It means -f is not significant to me, I think. (Most of my test scripts fails without -f. So I'll always type mkfs.btrfs -f) Therefore I want you to revert commit:2a2d8e1962e8b6cda7b0a7584f6d2fb95d442cb6. btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table I personally don't want to see it reverted, but also very much understand the pain to tweak all existing test scripts that rely on direct success of plain mkfs. To resolve this I'm going to use the attached script, that intsalls itself in place of mkfs.btrfs and act's as we were used to. This is intended to help developers and is not for the end user. david first use: mkfs.btrfs.wrapper install --- $ cat mkfs.btrfs.wrapper #!/bin/sh if [ $# = 1 -a $1 = 'install' ]; then echo Install mode mkfs=`type -p mkfs.btrfs` if [ $? != 0 ]; then echo Cannot find mkfs.btrfs exit 1 fi if file $mkfs | grep -q 'ELF .* executable'; then echo Moving original mkfs to ${mkfs}.real mv $mkfs ${mkfs}.real echo Copying myself to $mkfs cp $0 $mkfs chmod 755 $mkfs echo Have a nice day exit 0 else echo $mkfs is not a binary, will not overwrite exit 1 fi fi mkfs=`type -p mkfs.btrfs.real` if [ $? != 0 ]; then echo Cannot find mkfs.btrfs.real, install the wrapper properly exit 1 fi force= if grep -q 'Use the -f option to force overwrite' $mkfs; then force='-f' fi $mkfs $force $@ --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support
Am Samstag, 23. Februar 2013 schrieb Goffredo Baroncelli: Hi all, Hi Goffredo, I updates my previous patches [1] to add support for raid5/6. These patches update the btrfs fi df command and add two new commands: - btrfs filesystem disk-usage path - btrfs device disk-usage path Tested-By: Martin Steigerwald mar...@lichtvoll.de Only for -d single and -m single, I have RAID-1 at workstation at work, but will take some time till I get to office again. No RAID-5/6 setup at the moment. So test obviously not complete yet. merkaba:~ /tmp/btrfs fi sh failed to read /dev/sr0 Label: 'home' uuid: […] Total devices 1 FS bytes used 196.11GB devid1 size 223.52GB used 204.02GB path /dev/dm-2 Label: 'debian' uuid: […] Total devices 1 FS bytes used 10.69GB devid1 size 18.62GB used 17.02GB path /dev/dm-0 Btrfs v0.19-367-g711b24a /: merkaba:~ /tmp/btrfs fi df / Disk size:18.62GB Disk allocated: 17.02GB Disk unallocated: 1.61GB Used: 10.69GB Free (Estimated): 7.93GB (Max: 7.93GB, min: 7.13GB) Data to disk ratio: 100 % merkaba:~ /tmp/btrfs fi disk-usage / Data,Single: Size:16.01GB, Used:10.14GB /dev/dm-0 16.01GB Metadata,Single: Size:1.01GB, Used:565.34MB /dev/dm-01.01GB System,Single: Size:4.00MB, Used:4.00KB /dev/dm-04.00MB Unallocated: /dev/dm-01.61GB merkaba:~ /tmp/btrfs fi disk-usage -t / DataMetadata System Single Single Single Unallocated /dev/dm-0 16.01GB 1.01GB 4.00MB 1.61GB === == === Total 16.01GB 1.01GB 4.00MB 1.61GB Used 10.14GB 565.34MB 4.00KB merkaba:~ /tmp/btrfs dev disk-usage / /dev/dm-0 18.62GB Data,Single: 16.01GB Metadata,Single: 1.01GB System,Single:4.00MB Unallocated: 1.61GB /home (quite fresh fs): merkaba:~#129 /tmp/btrfs fi df /home Disk size: 223.52GB Disk allocated: 204.02GB Disk unallocated: 19.50GB Used:196.11GB Free (Estimated): 27.40GB (Max: 27.40GB, min: 17.66GB) Data to disk ratio: 100 % merkaba:~ /tmp/btrfs fi disk-usage /home Data,Single: Size:202.01GB, Used:194.67GB /dev/dm-2 202.01GB Metadata,Single: Size:2.01GB, Used:1.44GB /dev/dm-22.01GB System,Single: Size:4.00MB, Used:48.00KB /dev/dm-24.00MB Unallocated: /dev/dm-2 19.50GB merkaba:~ /tmp/btrfs fi disk-usage -t /home Data Metadata System Single Single Single Unallocated /dev/dm-2 202.01GB 2.01GB 4.00MB 19.50GB === === Total 202.01GB 2.01GB 4.00MB 19.50GB Used 194.67GB 1.44GB 48.00KB merkaba:~ /tmp/btrfs dev disk-usage /home /dev/dm-2 223.52GB Data,Single:202.01GB Metadata,Single: 2.01GB System,Single:4.00MB Unallocated: 19.50GB Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support
Am Dienstag, 26. Februar 2013 schrieb Gareth Pye: On Tue, Feb 26, 2013 at 10:09 PM, Martin Steigerwald mar...@lichtvoll.de wrote: I´d still like that for df, whose output is quite bogus in certain BTRFS setups at the moment and does not give applications a realistic estimate at all. One example is raid 1 with 10 GB each disk. Shows 20 GB free. An application which wants to write 15 GB will fail. Which can break installer scripts, package management, cache software or anything else which checks for free space. Thus I´d like df to default to *minimum* free. Is that any better than the script failing to attempt to install because it needs 15G but because some of the storage is used in RAID1 then df shows 10G free but the 15G install would work fine. If you could force the tool to install where it know it doesn't have sufficient space? I do not quite understand your question. In RAID-1 with 10 GB and two disks, df will show 20 GB free. If the script needs 15 GB and checks for it it would run, but then fail. I would prefer that the script space check bails out in that case it is know that there is not enough space available anymore. So or so one can always argue that current free space is just a snapshot of the current moment and there is *never* a guarentee that there is enough space, cause another application may write to the filesystem at the same time. Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [POSSIBLE SPAM] Re: Hybrid Storage proposal
On 02/26/2013 12:04 PM, Zhi Yong Wu wrote: HI, It's a bit long so that i haven't read its whole, but i want to know if it has any collision with my ongoing feature btrfs hot relocation/migration? It will utilize the hot track patch set you're been creating for VFS. I think the methods are complementary. Do you have a description of the relocation/migration approach? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support
Am Dienstag, 26. Februar 2013 schrieb Martin Steigerwald: Am Dienstag, 26. Februar 2013 schrieb Gareth Pye: On Tue, Feb 26, 2013 at 10:09 PM, Martin Steigerwald mar...@lichtvoll.de wrote: I´d still like that for df, whose output is quite bogus in certain BTRFS setups at the moment and does not give applications a realistic estimate at all. One example is raid 1 with 10 GB each disk. Shows 20 GB free. An application which wants to write 15 GB will fail. Which can break installer scripts, package management, cache software or anything else which checks for free space. Thus I´d like df to default to *minimum* free. Is that any better than the script failing to attempt to install because it needs 15G but because some of the storage is used in RAID1 then df shows 10G free but the 15G install would work fine. If you could force the tool to install where it know it doesn't have sufficient space? I do not quite understand your question. In RAID-1 with 10 GB and two disks, df will show 20 GB free. If the script needs 15 GB and checks for it it would run, but then fail. I would prefer that the script space check bails out in that case it is know that there is not enough space available anymore. Well, okay, and exactly that is not known, cause if the files the installer script are installed are stored a single instead of RAID-1 there would be enough space. Anyway for the additions Goffredo made I strongly suggest reading the old discussions before starting to discuss stuff that has already been discussed back then. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/inode.c:2165 btrfs_orphan_commit_root+0xcb/0xdf()
On Sun, Feb 24, 2013 at 07:55:46PM -0700, Marc MERLIN wrote: Is this useful to anyone? Got this after a crash/reboot: if (block_rsv) { WARN_ON(block_rsv-size 0); btrfs_free_block_rsv(root, block_rsv); } Fixed in btrfs-next, thanks for reporting! Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
On Mon, Feb 25, 2013 at 11:51:02PM -0700, Marc MERLIN wrote: TL;DR; WARNING: at fs/btrfs/tree-log.c:1984 walk_down_log_tree+0x51/0x307() WARNING: at fs/btrfs/tree-log.c:1988 walk_down_log_tree+0x6c/0x307() kernel BUG at fs/btrfs/volumes.c:3753! It's way time for btrfs to stop crashing your system with no recovery flag that works to clear the log if the log can't be replayed. Hell, on non development systems, it should just auto discard the log if it can't be replayed without user input. Details: It's been almost a year that I'm doing my best to test btrfs and report bugs, but how quickly it crashes on mount if anything is off, is a huge usability problem. I just again, lost use of my machine today after an unrelated problem caused a crash/reboot, and incomplete btrfs writes to my device. That happens, it's life. But after that, I get to roll a dice of whether btrfs will recover, or just crash on mount. It's slightly more liveable if it's a scratch filesystem on a developer box, you just don't mount it. It's really really sucky if it's your root filesystem and you need to boot from a rescue partition/media to recover each time. Then, I spent 3 hours reproducing the crash again, with netconsole working so that I can get a useful bugreport, which I send here. So how did you reproduce it? I'll take a fs_image, but being able to reproduce the problem is more valuable. Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not change inode flags in rename
On Tue, Feb 26, 2013 at 05:33:00AM -0700, David Sterba wrote: On Tue, Feb 26, 2013 at 08:11:17AM +0800, Liu Bo wrote: On Mon, Feb 25, 2013 at 01:56:47PM -0500, Josef Bacik wrote: On Sun, Feb 24, 2013 at 09:04:42PM -0700, Liu Bo wrote: Before we forced to change a file's NOCOW and COMPRESS flag due to the parent directory's, but this ends up a bad idea, because it confuses end users a lot about file's NOCOW status, eg. if someone change a file to NOCOW via 'chattr' and then rename it in the current directory which is without NOCOW attribute, the file will lose the NOCOW flag silently. This diables 'change flags in rename', so from now on we'll only inherit flags from the parent directory on creation stage while in other places we can use 'chattr' to set NOCOW or COMPRESS flags. I'm of the mind we definitely shouldn't drop flags we've set previously, but I think we should also inherit any flags we have set on the directory, so if we move a file into a NOCOW directory we should inherit the flag. I'm not married to the idea, but it seems to make the most sense to me. Thanks, (Said in another thread) I'm ok with either one, but... from some reports on the list, end users are more likely to control, use chattr files by themselves, inheriting flags via moving a file to a new directory is indeed not very welcomed. So for practical use, I assume that it's fairly enough to inherit flags only on creation? I still haven't figured out in what cases the silent flag inheritance (for a non-empty) file would help and the user would be happy that it works like this. K, I'll take this as it is then and drop my previous patch. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/inode.c:2165 btrfs_orphan_commit_root+0xcb/0xdf()
On Tue, Feb 26, 2013 at 03:02:01PM +0800, Liu Bo wrote: On Sun, Feb 24, 2013 at 06:55:46PM -0800, Marc MERLIN wrote: Is this useful to anyone? Hi Marc, Thanks for the report, of course they're useful. Thanks. I wasn't sure since I haven't seen the real problem of crashes during mount due to unexpected state in replay logs being improved over the last 4 kernel versions, and I wasn't sure if they are being worked on. Could you please also show us your workloads and it'd be better to know how to reproduce this? Sure thing. Workload is a simple laptop, where something tyipcally dies during writes and I get a full system hang because linux is unable to flush its disk queues, so in the end I power cycle. Details here: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg391505.html As mentioned in my other mail, I do run on top of dmcrypt since I can't have an unencrypted laptop, and ecryptfs is way too slow while not supporting encryption of long filenames. The next question is: are your write requests getting there in the order they should. My crashes and your BUG() that get triggered indicate that maybe not. Currently, I have: gandalfthegreat:~# cat /sys/block/sda/queue/scheduler noop [deadline] cfq That could be a problem I guess, so I'll change it to noop, just in case. My boot is like this: /vmlinuz-3.7.8-amd64-preempt-20130222 root=/dev/mapper/cryptroot ro rootflags=subvol=root cryptopts=source=/dev/sda4,keyscript=/sbin/cryptgetpw,discard btrfs is mounted like so: LABEL=btrfs_pool1 / btrfs subvol=root,defaults,compress=lzo,discard,nossd,space_cache,noatime Is there anything else I can give? (I'll answer on the other thread with the fsimage) Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
Am I wrong when saying that ending up with replay journals that have unexpected data and that can't be replayed is just inevitable and something any journalling filesystem must deal with? If by journal you mean the btrfs log then yes, strictly speaking, you're wrong. btrfs does deal with the kind of incomplete and reordered writes that you're talking about and it should not result in corruption of what it calls the log. But it's a reasonable thing to be confused by. I'm guessing that you're being tripped up by what ext3 means by a journal and by what btrfs means by a log. The journal in ext3 can be partially written during a crash. The journal replay on mount notices this because the commit block isn't present and just throws it away. No worries. The equivalent consistent update mechanism in btrfs is cow tree updates. The superblock that references new tree blocks written to free space is itself only written once all those blocks are stable on disk. If the tree block writes are interrupted then the superblock isn't updated and btrfs won't see the partially written blocks. No worries. The btrfs log is itself just a logical btree *inside these consistent tree updates* that records logical operations that will need to be replayed. For the log to be corrupted, if the btrfs code is perfect, the storage had to have lied to btrfs and told it that tree update blocks were stable which caused the superblock write that referenced them prematurely. The equivalent problem in the ext3 journal would be a transaction that has blocks missing but which has a valid commit block. ext3 couldn't just throw this transaction away because after the commit block write it could have been in the process of replaying the transaction blocks at their final location on disk. And it's now missing some of those blocks to replay. This kind of corruption Shouldn't Happen and the fs can't just silently ignore it. I absolutely agree that the error messages should be greatly improved in this case, yes, and that it shouldn't BUG_ON (it should *never* BUG_ON). But btrfs is right to refuse to silently revert previously stable changes by just ignoring the corrupt log. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/17] btrfs-progs: fix fd leak in cmd_subvol_set_default
Hi Eric, On 02/25/2013 11:54 PM, Eric Sandeen wrote: Rearrange cmd_subvol_set_default() slightly so we don't have to close the fd on an error return. While we're at it, fix whitespace remove magic return values. Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-subvolume.c | 17 + 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 0dfaefe..461eed9 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -712,24 +712,25 @@ static int cmd_subvol_set_default(int argc, char **argv) subvolid = argv[1]; path = argv[2]; + objectid = (unsigned long long)strtoll(subvolid, NULL, 0); Could you replace strtoll() with strtoull() ? Note that: strtoull(0x,0,0) == 0x strtoull(-1,0,0) == 0x strtoll(-1,0,0) == 0x strtoll(0x,0,0) - ERANGE + if (errno == ERANGE) { Pay attention that if strtoull() doesn't encounter a problem errno *is not* touched: this check could catch a previous error. I don't know if it is an hole in the standard or a bug in the gnu-libc; however I think that before strtoXll() we should put 'errno = 0;'. + fprintf(stderr, ERROR: invalid tree id (%s)\n, subvolid); + return 1; + } + fd = open_file_or_dir(path); if (fd 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); - return 12; + return 1; } - objectid = (unsigned long long)strtoll(subvolid, NULL, 0); - if (errno == ERANGE) { - fprintf(stderr, ERROR: invalid tree id (%s)\n,subvolid); - return 30; - } ret = ioctl(fd, BTRFS_IOC_DEFAULT_SUBVOL, objectid); e = errno; close(fd); - if( ret 0 ){ + if (ret 0) { fprintf(stderr, ERROR: unable to set a new default subvolume - %s\n, strerror(e)); - return 30; + return 1; } return 0; } -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17] btrfs-progs: replace strtok_r with strsep
On 02/25/2013 11:54 PM, Eric Sandeen wrote: The coverity had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-balance.c | 12 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..e8b9d90 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,9 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; - for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { + while ((this_char = strsep(profiles, |))) { + printf(got profile %s\n, this_char); In the original code the printf() doesn't exist. May be this is a residual of a debugging code ? if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +134,12 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; if (!filters) return 0; - for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { + while ((this_char = strsep(filters , ,))) { + printf(got %s\n, this_char); Same here if ((value = strchr(this_char, '=')) != NULL) *value++ = 0; if (!strcmp(this_char, profiles)) { -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/17] btrfs-progs: Unify size-parsing
On 02/25/2013 11:54 PM, Eric Sandeen wrote: cmds-qgroup.c contained a parse_limit() function which duplicates much of the functionality of parse_size. The only unique behavior is to handle none; then we can just pass it off to parse_size(). Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-qgroup.c | 44 ++-- utils.c |8 +++- utils.h |2 +- 3 files changed, 14 insertions(+), 40 deletions(-) diff --git a/cmds-qgroup.c b/cmds-qgroup.c index 26f0ab0..ce013c8 100644 --- a/cmds-qgroup.c +++ b/cmds-qgroup.c @@ -198,43 +198,13 @@ done: return ret; } -static int parse_limit(const char *p, unsigned long long *s) +static u64 parse_limit(const char *p) { - char *endptr; - unsigned long long size; - - if (strcasecmp(p, none) == 0) { - *s = 0; - return 1; - } - size = strtoull(p, endptr, 10); - switch (*endptr) { - case 'T': - case 't': - size *= 1024; - case 'G': - case 'g': - size *= 1024; - case 'M': - case 'm': - size *= 1024; - case 'K': - case 'k': - size *= 1024; - ++endptr; - break; - case 0: - break; - default: - return 0; - } - - if (*endptr) + if (strcasecmp(p, none) == 0) return 0; - *s = size; - - return 1; + /* parse_size() will exit() on any error */ + return parse_size(p); I don't think that this is a good thing to do: the parse_limit behaviour is the good one: return an error to the caller instead of exit()-ing. We should convert the parse_size() to return an error, no to convert parse_limit to exit(). Of course this is out of the goals of this set of patches. } static const char * const cmd_qgroup_assign_usage[] = { @@ -364,10 +334,8 @@ static int cmd_qgroup_limit(int argc, char **argv) if (check_argc_min(argc - optind, 2)) usage(cmd_qgroup_limit_usage); - if (!parse_limit(argv[optind], size)) { - fprintf(stderr, Invalid size argument given\n); - return 1; - } + /* parse_limit will exit on any error */ + size = parse_limit(argv[optind]); memset(args, 0, sizeof(args)); if (size) { diff --git a/utils.c b/utils.c index d660507..bc6d5fe 100644 --- a/utils.c +++ b/utils.c @@ -1251,7 +1251,7 @@ scan_again: return 0; } -u64 parse_size(char *s) +u64 parse_size(const char *s) { int i; char c; @@ -1268,16 +1268,22 @@ u64 parse_size(char *s) switch (c) { case 'e': mult *= 1024; + /* Fallthrough */ case 'p': mult *= 1024; + /* Fallthrough */ case 't': mult *= 1024; + /* Fallthrough */ case 'g': mult *= 1024; + /* Fallthrough */ case 'm': mult *= 1024; + /* Fallthrough */ case 'k': mult *= 1024; + /* Fallthrough */ case 'b': break; default: diff --git a/utils.h b/utils.h index 60a0fea..dcdf475 100644 --- a/utils.h +++ b/utils.h @@ -47,7 +47,7 @@ char *pretty_sizes(u64 size); int check_label(char *input); int get_mountpt(char *dev, char *mntpt, size_t size); int btrfs_scan_block_devices(int run_ioctl); -u64 parse_size(char *s); +u64 parse_size(const char *s); int open_file_or_dir(const char *fname); int get_device_info(int fd, u64 devid, struct btrfs_ioctl_dev_info_args *di_args); -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: cleanup for open-coded alignment
Sometimes these open-coded alignment is not so easy to understand for newbie like me. Thanks, this is a nice change. You might have mentioned in the commit message that some of the changes aren't *strictly* equivalent but that they're still safe. - iosize = (iosize + blocksize - 1) ~((u64)blocksize - 1); + iosize = ALIGN(iosize, blocksize); This is equivalent to + iosize = (iosize + blocksize - 1) ~((size_t)blocksize - 1); Which is fine. That u64 was overkill. The manual casts only need to make sure that the mask is as wide as the input value to avoid losing bits and the macro now guarantees that. - num_bytes = (end - start + blocksize) ~(blocksize - 1); + num_bytes = ALIGN(end - start + 1, blocksize); Nice catch. Reviewed-by: Zach Brown z...@redhat.com - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
On Tue, Feb 26, 2013 at 10:24:42AM -0800, Zach Brown wrote: Am I wrong when saying that ending up with replay journals that have unexpected data and that can't be replayed is just inevitable and something any journalling filesystem must deal with? If by journal you mean the btrfs log then yes, strictly speaking, you're I did and used the wrong term, sorry. wrong. btrfs does deal with the kind of incomplete and reordered writes that you're talking about and it should not result in corruption of what it calls the log. Got it. So in theory what I'm seeing really shouldn't happen, unless there is a corruption bug, or hardware fault. But it's a reasonable thing to be confused by. I'm guessing that you're being tripped up by what ext3 means by a journal and by what btrfs means by a log. yes :) The journal in ext3 can be partially written during a crash. The journal replay on mount notices this because the commit block isn't present and just throws it away. No worries. That's indeed what I meant. The equivalent consistent update mechanism in btrfs is cow tree updates. The superblock that references new tree blocks written to free space is itself only written once all those blocks are stable on disk. If the tree block writes are interrupted then the superblock isn't updated and btrfs won't see the partially written blocks. No worries. Ok. So basically what I seem to be hitting, are seemingly complete tree blocks in the log, that aren't complete or consistent afterall, triggering BUG() in the end. replayed. For the log to be corrupted, if the btrfs code is perfect, the storage had to have lied to btrfs and told it that tree update blocks were stable which caused the superblock write that referenced them prematurely. That makes sense. Is dmcrypt with discard passthrough potentially able to tell btrfs that writes did make it to disk when in fact someonly some of them have? The equivalent problem in the ext3 journal would be a transaction that has blocks missing but which has a valid commit block. ext3 couldn't just throw this transaction away because after the commit block write it could have been in the process of replaying the transaction blocks at their final location on disk. And it's now missing some of those blocks to replay. This kind of corruption Shouldn't Happen and the fs can't just silently ignore it. I understand that, and I'm not advocating silent ignores either. I absolutely agree that the error messages should be greatly improved in this case, yes, and that it shouldn't BUG_ON (it should *never* BUG_ON). But btrfs is right to refuse to silently revert previously stable changes by just ignoring the corrupt log. That's the part I'm a bit confused about. It could complain, it could require a 'recovery' option on the kernel command line (which does exist but doesn't work for this purpose), but effectively it forces me to indeed ignore seemingly stable changes which are indeed unusable, and the code knows they are, except that I have to go back to boot media and go through hoops to get a netconsole dump/crash before that, and then use btrfs-zero-log. Once btrfs knows that the last log entry(ies) are corrupted in some way, it could dump a bunch of diag data in the kernel, then delete the unusable log and continue with the mount, which is effectively what I do myself right now. Unless I'm mistaken, this really only causes me to lose some amount of data that was written just before the crash/reboot. This is usually ok for most users and typically expected anyway (although not as desirable in the case of a database server of course). I understand that this might not want to be a default, but it should be something that could be done from the kernel command line, and allow the mount to continue and the boot to succeed (including the kernel debug log to make it to local disk, where it can then be Emailed since serial console or netconsole is not something you can do easily) Does that sound reasonable? Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table
On 02/26/2013 11:37 AM, Martin Steigerwald wrote: Am Dienstag, 26. Februar 2013 schrieb Tsutomu Itoh: Therefore I want you to revert commit:2a2d8e1962e8b6cda7b0a7584f6d2fb95d442cb6. btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table How do you think about it? What if you submit a patch to look at an environment variable, BTRFS_CLOBBERS_ALL=1 which causes it to not require -f to overwrite? Then you can just set it once at the top of your test environment, and not change every instance? Yes. But, (Most of my test scripts fails without -f. So I'll always type mkfs.btrfs -f) is one example. Almost everyone types mkfs.btrfs -f (or BTRFS_CLOBBERS_ALL=1 :) unconditionally, I think. So, I think -f option is almost meaningless. No. I don´t. me too And I teach not to in my trainings as well. Everyone who uses rm -rf by default even just for deleting a single file does it as long as he or she deleted his / her home directory or something. Unfortunately the rm -rf is a different case. Removing a directory is a common case. We should not be forced to use the -f for common case. A '-f' flag should be used only in uncommon case (like *re*format a disk or a test-suite)... However I think that '-f' is good for mkfs.btrfs. Ciao, -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Needed change in Wiki
Hi Anand, On 02/25/2013 05:39 AM, Anand Jain wrote: You may need to update the minimum version of libblkid-dev that is required. Since. latest btrfs-progs needs blkid_probe_get_wholedisk_devno() from the libblkid-dev. found libblkid-dev version 2.17.2 does not work, and 2.22 works. Not sure which version introduced the function required here. Which error is returned when you try to compile with the 2.17.2 ? HTH Anand -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:3753! These btrfs crashes at mount time on log replay are really a problem
The equivalent consistent update mechanism in btrfs is cow tree updates. The superblock that references new tree blocks written to free space is itself only written once all those blocks are stable on disk. If the tree block writes are interrupted then the superblock isn't updated and btrfs won't see the partially written blocks. No worries. Ok. So basically what I seem to be hitting, are seemingly complete tree blocks in the log, that aren't complete or consistent afterall, triggering BUG() in the end. That's my understanding, yes. I think Josef is digging in. I understand that this might not want to be a default, but it should be something that could be done from the kernel command line, and allow the mount to continue and the boot to succeed (including the kernel debug log to make it to local disk, where it can then be Emailed since serial console or netconsole is not something you can do easily) Does that sound reasonable? To me? Absolutely. Having to recover with btrfs-zero-log is annoying. I'd be surprised if anyone didn't agree that the current process should be improved. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17] btrfs-progs: replace strtok_r with strsep
On 2/26/13 12:47 PM, Goffredo Baroncelli wrote: On 02/25/2013 11:54 PM, Eric Sandeen wrote: The coverity had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-balance.c | 12 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..e8b9d90 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,9 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; -char *save_ptr; -for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { +while ((this_char = strsep(profiles, |))) { +printf(got profile %s\n, this_char); In the original code the printf() doesn't exist. May be this is a residual of a debugging code ? Argh, yes. Let me resend, thank you. -Eric if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +134,12 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; -char *save_ptr; if (!filters) return 0; -for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { +while ((this_char = strsep(filters , ,))) { +printf(got %s\n, this_char); Same here if ((value = strchr(this_char, '=')) != NULL) *value++ = 0; if (!strcmp(this_char, profiles)) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/17] btrfs-progs: Unify size-parsing
On 2/26/13 12:50 PM, Goffredo Baroncelli wrote: On 02/25/2013 11:54 PM, Eric Sandeen wrote: cmds-qgroup.c contained a parse_limit() function which duplicates much of the functionality of parse_size. The only unique behavior is to handle none; then we can just pass it off to parse_size(). Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-qgroup.c | 44 ++-- utils.c |8 +++- utils.h |2 +- 3 files changed, 14 insertions(+), 40 deletions(-) diff --git a/cmds-qgroup.c b/cmds-qgroup.c index 26f0ab0..ce013c8 100644 --- a/cmds-qgroup.c +++ b/cmds-qgroup.c @@ -198,43 +198,13 @@ done: return ret; } -static int parse_limit(const char *p, unsigned long long *s) +static u64 parse_limit(const char *p) ... +/* parse_size() will exit() on any error */ +return parse_size(p); I don't think that this is a good thing to do: the parse_limit behaviour is the good one: return an error to the caller instead of exit()-ing. We should convert the parse_size() to return an error, no to convert parse_limit to exit(). Of course this is out of the goals of this set of patches. Hm, fair point. I could either fix it before this patch, and add a 0.5/17, or add it to my next set of patches. What do you think? Thanks, -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep
The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: Remove accidentally-added debug printfs, thanks Geoffredo! diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..cfbb8eb 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; - for (this_char = strtok_r(profiles, |, save_ptr); -this_char != NULL; -this_char = strtok_r(NULL, |, save_ptr)) { + while ((this_char = strsep(profiles, |))) { if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; if (!filters) return 0; - for (this_char = strtok_r(filters, ,, save_ptr); -this_char != NULL; -this_char = strtok_r(NULL, ,, save_ptr)) { + while ((this_char = strsep(filters , ,))) { if ((value = strchr(this_char, '=')) != NULL) *value++ = 0; if (!strcmp(this_char, profiles)) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table
On 2/20/13 9:37 AM, Stefan Behrens wrote: ... This means that it is now required to change all occurrences of mkfs.btrfs to mkfs.btrfs -f everywhere. Can't we first establish a time period of 100 years where the -f option is tolerated and ignored, and then in 2113 we require that the users add the -f option? (Just had to do this string replacement everywhere, and had to add -f to xfstest's _scratch_mkfs in common.rc as well). Sigh. I'll send a patch today to make xfstests handle both variants cleanly, I should have done that earlier, sorry. -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/17] btrfs-progs: fix fd leak in cmd_subvol_set_default
On 2/26/13 12:46 PM, Goffredo Baroncelli wrote: Hi Eric, On 02/25/2013 11:54 PM, Eric Sandeen wrote: Rearrange cmd_subvol_set_default() slightly so we don't have to close the fd on an error return. While we're at it, fix whitespace remove magic return values. Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-subvolume.c | 17 + 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 0dfaefe..461eed9 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -712,24 +712,25 @@ static int cmd_subvol_set_default(int argc, char **argv) subvolid = argv[1]; path = argv[2]; +objectid = (unsigned long long)strtoll(subvolid, NULL, 0); Could you replace strtoll() with strtoull() ? Note that: strtoull(0x,0,0) == 0x strtoull(-1,0,0) == 0x strtoll(-1,0,0) == 0x strtoll(0x,0,0) - ERANGE Probably a good idea, I think I had noticed that earlier and then spaced it. :( But I figure one functional change per patch is the way to go; making this other change would probably be best under its own commit; one to fix the fd leak, and one to fix this issue? +if (errno == ERANGE) { Pay attention that if strtoull() doesn't encounter a problem errno *is not* touched: this check could catch a previous error. I don't know if it is an hole in the standard or a bug in the gnu-libc; however I think that before strtoXll() we should put 'errno = 0;'. yeah, ugh. But this problem existed before, correct? So I think a separate fix makes sense, do you agree? Or have I made something worse here with this change? Thanks, -Eric +fprintf(stderr, ERROR: invalid tree id (%s)\n, subvolid); +return 1; +} + fd = open_file_or_dir(path); if (fd 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); -return 12; +return 1; } -objectid = (unsigned long long)strtoll(subvolid, NULL, 0); -if (errno == ERANGE) { -fprintf(stderr, ERROR: invalid tree id (%s)\n,subvolid); -return 30; -} ret = ioctl(fd, BTRFS_IOC_DEFAULT_SUBVOL, objectid); e = errno; close(fd); -if( ret 0 ){ +if (ret 0) { fprintf(stderr, ERROR: unable to set a new default subvolume - %s\n, strerror(e)); -return 30; +return 1; } return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep
On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote: The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: Remove accidentally-added debug printfs, thanks Geoffredo! diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..cfbb8eb 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; - for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { + while ((this_char = strsep(profiles, |))) { if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; if (!filters) return 0; - for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { + while ((this_char = strsep(filters , ,))) { ^^^ whitespace One of the differences between strtok() and strsep() is that the former allows multiple delimiters between two tokens. With strsep(), this btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt fails with error, whereas with strtok() it passes. I don't have a strong opinion here (this has been loosely modeled on the way mount(8) handles -o options), but might it be better to just initialize save_ptr? (And yes, I know that strsep() is better ;)) Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfstests: handle new mkfs.btrfs -f option cleanly
I added an -f option to mkfs.btrfs to force overwrite of an existing filesystem. Now on an xfstests run, new mkfs.btrfs requires it, and old mkfs.btrfs cannot accept it. So, add a helper which works out whether -f is needed, and add it to the MKFS_BTRFS_PROG env. var as necessary, so that it is an always-included option during the tests. Signed-off-by: Eric Sandeen sand...@redhat.com --- diff --git a/common.config b/common.config index 57f505d..9f1d309 100644 --- a/common.config +++ b/common.config @@ -101,6 +101,17 @@ set_prog_path() return 1 } +# Handle mkfs.btrfs which does (or does not) require -f to overwrite +set_btrfs_mkfs_prog_path_with_opts() +{ + p=`set_prog_path mkfs.btrfs` + if grep -q 'force overwrite' $p; then + echo $p -f + else + echo $p + fi +} + _fatal() { echo $* @@ -185,7 +196,7 @@ case $HOSTOS in Linux) export MKFS_XFS_PROG=`set_prog_path mkfs.xfs` export MKFS_UDF_PROG=`set_prog_path mkudffs` -export MKFS_BTRFS_PROG=`set_prog_path mkfs.btrfs` +export MKFS_BTRFS_PROG=`set_btrfs_mkfs_prog_path_with_opts` export BTRFS_UTIL_PROG=`set_prog_path btrfs` export XFS_FSR_PROG=`set_prog_path xfs_fsr` export MKFS_NFS_PROG=false -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep
On 2/26/13 2:40 PM, Ilya Dryomov wrote: On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote: The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: Remove accidentally-added debug printfs, thanks Geoffredo! diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..cfbb8eb 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; -char *save_ptr; -for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { +while ((this_char = strsep(profiles, |))) { if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; -char *save_ptr; if (!filters) return 0; -for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { +while ((this_char = strsep(filters , ,))) { ^^^ whitespace One of the differences between strtok() and strsep() is that the former allows multiple delimiters between two tokens. With strsep(), this btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt fails with error, whereas with strtok() it passes. I don't have a strong opinion here (this has been loosely modeled on the way mount(8) handles -o options), but might it be better to just initialize save_ptr? (And yes, I know that strsep() is better ;)) I don't really care much either way, TBH. Initializing it seems a little bit magic, but with a comment as to why, it'd be fine. If you did it this way intentionally to allow the above format, and changing it would break expectations, then I'll happily just initialize save_ptr. (And, I realize that lots of these changes are pedantic and seemingly pointless, but we've gotten the static checker errors down from over 100 to under 30 and dropping; the more noise we remove the more likely we are to pay attention to the output and catch actual errors. At least that's my feeling; if people think this is getting to be pointless churn, I'm ok with that, too). Thanks for the review, -Eric p.s. Zach made me do it. ;) Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/17] btrfs-progs: fix fd leak in cmd_subvol_set_default
On 02/26/2013 09:10 PM, Eric Sandeen wrote: On 2/26/13 12:46 PM, Goffredo Baroncelli wrote: Hi Eric, On 02/25/2013 11:54 PM, Eric Sandeen wrote: Rearrange cmd_subvol_set_default() slightly so we don't have to close the fd on an error return. While we're at it, fix whitespace remove magic return values. Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-subvolume.c | 17 + 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 0dfaefe..461eed9 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -712,24 +712,25 @@ static int cmd_subvol_set_default(int argc, char **argv) subvolid = argv[1]; path = argv[2]; + objectid = (unsigned long long)strtoll(subvolid, NULL, 0); Could you replace strtoll() with strtoull() ? Note that: strtoull(0x,0,0) == 0x strtoull(-1,0,0) == 0x strtoll(-1,0,0) == 0x strtoll(0x,0,0) - ERANGE Probably a good idea, I think I had noticed that earlier and then spaced it. :( But I figure one functional change per patch is the way to go; making this other change would probably be best under its own commit; one to fix the fd leak, and one to fix this issue? IMHO this would be simple enough to be done in one shot. However this problem exists also in other points. May be that for now your patch is ok. But then we should start another set of patches which correct/sanitise all these use of parse_size/strto[u]ll/parse_limit Unfortunately this means that these next series of patches will start only when these one will be accepted in order to avoid patches conflict. + if (errno == ERANGE) { Pay attention that if strtoull() doesn't encounter a problem errno *is not* touched: this check could catch a previous error. I don't know if it is an hole in the standard or a bug in the gnu-libc; however I think that before strtoXll() we should put 'errno = 0;'. yeah, ugh. But this problem existed before, correct? So I think a separate fix makes sense, do you agree? Or have I made something worse here with this change? No the things aren't worse. You are doing a great work Thanks, -Eric + fprintf(stderr, ERROR: invalid tree id (%s)\n, subvolid); + return 1; + } + fd = open_file_or_dir(path); if (fd 0) { fprintf(stderr, ERROR: can't access to '%s'\n, path); - return 12; + return 1; } - objectid = (unsigned long long)strtoll(subvolid, NULL, 0); - if (errno == ERANGE) { - fprintf(stderr, ERROR: invalid tree id (%s)\n,subvolid); - return 30; - } ret = ioctl(fd, BTRFS_IOC_DEFAULT_SUBVOL, objectid); e = errno; close(fd); - if( ret 0 ){ + if (ret 0) { fprintf(stderr, ERROR: unable to set a new default subvolume - %s\n, strerror(e)); - return 30; + return 1; } return 0; } -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep
On Tue, Feb 26, 2013 at 02:46:30PM -0600, Eric Sandeen wrote: On 2/26/13 2:40 PM, Ilya Dryomov wrote: On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote: The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: Remove accidentally-added debug printfs, thanks Geoffredo! diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..cfbb8eb 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; - for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { + while ((this_char = strsep(profiles, |))) { if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; if (!filters) return 0; - for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { + while ((this_char = strsep(filters , ,))) { ^^^ whitespace One of the differences between strtok() and strsep() is that the former allows multiple delimiters between two tokens. With strsep(), this btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt fails with error, whereas with strtok() it passes. I don't have a strong opinion here (this has been loosely modeled on the way mount(8) handles -o options), but might it be better to just initialize save_ptr? (And yes, I know that strsep() is better ;)) I don't really care much either way, TBH. Initializing it seems a little bit magic, but with a comment as to why, it'd be fine. If you did it this way intentionally to allow the above format, and changing it would break expectations, then I'll happily just initialize save_ptr. Yeah, although I doubt anybody would notice, it's probably better to keep the old behaviour. (And, I realize that lots of these changes are pedantic and seemingly pointless, but we've gotten the static checker errors down from over 100 to under 30 and dropping; the more noise we remove the more likely we are to pay attention to the output and catch actual errors. At least that's my feeling; if people think this is getting to be pointless churn, I'm ok with that, too). Not at all. Btrfs-progs had rarely seen any cleanups, with you and Zach patching it up and David's integration work it has gotten a whole new life ;) BTW, huge thanks for the kernel-style build output, I am forever grateful.. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/17] btrfs-progs: Unify size-parsing
On 02/26/2013 09:17 PM, Eric Sandeen wrote: On 2/26/13 12:50 PM, Goffredo Baroncelli wrote: On 02/25/2013 11:54 PM, Eric Sandeen wrote: cmds-qgroup.c contained a parse_limit() function which duplicates much of the functionality of parse_size. The only unique behavior is to handle none; then we can just pass it off to parse_size(). Signed-off-by: Eric Sandeen sand...@redhat.com --- cmds-qgroup.c | 44 ++-- utils.c |8 +++- utils.h |2 +- 3 files changed, 14 insertions(+), 40 deletions(-) diff --git a/cmds-qgroup.c b/cmds-qgroup.c index 26f0ab0..ce013c8 100644 --- a/cmds-qgroup.c +++ b/cmds-qgroup.c @@ -198,43 +198,13 @@ done: return ret; } -static int parse_limit(const char *p, unsigned long long *s) +static u64 parse_limit(const char *p) ... + /* parse_size() will exit() on any error */ + return parse_size(p); I don't think that this is a good thing to do: the parse_limit behaviour is the good one: return an error to the caller instead of exit()-ing. We should convert the parse_size() to return an error, no to convert parse_limit to exit(). Of course this is out of the goals of this set of patches. Hm, fair point. I could either fix it before this patch, and add a 0.5/17, or add it to my next set of patches. What do you think? I think that there is a general problem about parse_size/parse_limit/str[x]ll functions... We should address all these issues with another set of patches. Your set of patches is big enough, and now we should stabilise them in order to be accepted. If we agree that it is not good thing to allow parse_limit to call exit(), I will leave parse_limit() as is. If you are brave enough ( :) ) add the /* Fallthrough */ comment and add the peta and exa cases. In the future we could move parse_limit in utils.c then implement parse_size in terms of parse_limits and finally replace all the occurrences of strto[x]ll... Thanks, -Eric -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] btrfs-progs: require mkfs -f force option to overwrite filesystem or partition table
Am Dienstag, 26. Februar 2013 schrieb Goffredo Baroncelli: And I teach not to in my trainings as well. Everyone who uses rm -rf by default even just for deleting a single file does it as long as he or she deleted his / her home directory or something. Unfortunately the rm -rf is a different case. Removing a directory is a common case. We should not be forced to use the -f for common case. A '-f' flag should be used only in uncommon case (like *re*format a disk or a test-suite)... However I think that '-f' is good for mkfs.btrfs. I don´t get you on this one: artin@merkaba:~/Zeit export LANG=C martin@merkaba:~/Zeit mkdir test martin@merkaba:~/Zeit rm test rm: cannot remove 'test': Is a directory martin@merkaba:~/Zeit#1 rm -f test rm: cannot remove 'test': Is a directory martin@merkaba:~/Zeit#1 rm -r test martin@merkaba:~/Zeit mkdir test martin@merkaba:~/Zeit rmdir test martin@merkaba:~/Zeit So you do not need -f to remove a directory, but either rm -r or rmdir (if its empty). I think that special casing deleting empty directories with rmdir command doesn´t make much sense. The most important distinction there IMHO is to recurse or not to recurse. Thus I would let rm also delete empty directories and remove rmdir altogether or alias it to rm (without -r!). Thats how it was done on AmigaOS delete command. Delete would delete files and empty dirs, and recurse only if ALL option was given. rm -f doesn´t work anyway if parent directory has write permission removed: martin@merkaba:~/Zeit mkdir -p test/test martin@merkaba:~/Zeit chmod a-w test martin@merkaba:~/Zeit cd test martin@merkaba:~/Zeit rm -r test rm: descend into write-protected directory 'test'? y rm: cannot remove 'test/test': Permission denied martin@merkaba:~/Zeit rm -rf test rm: cannot remove 'test/test': Permission denied -f, --force ignore nonexistent files and arguments, never prompt Only in the case the directory itself is write-protected rm -f makes the prompt go away: martin@merkaba:~/Zeit mkdir test martin@merkaba:~/Zeit chmod a-w test martin@merkaba:~/Zeit rm -r test rm: remove write-protected directory 'test'? y martin@merkaba:~/Zeit mkdir -m a-w test martin@merkaba:~/Zeit ls -ld test dr-xr-xr-x 1 martin martin 0 Feb 26 22:12 test martin@merkaba:~/Zeit rm -rf test But on skipping write protection -f is justified I think. But more so since this is a empty directory and the parent directory has write protection, rmdir will remove it without -f (which it doesn´t support anyway): martin@merkaba:~/Zeit mkdir -m a-w test martin@merkaba:~/Zeit rmdir test martin@merkaba:~/Zeit Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Swapfile on btrfs
Hey, I was looking at the wiki 'unclaimed projects' list and I thought I'd take a look at what might be needed to use the swap-over-n{fs,bd} functionality in order to implement swapfile support. It *looks* like it's surprisingly uncomplicated - to the point where I suspect I'm missing something, and would like some sanity checking. My primary references for this are the patch[1] implementing swap-over-NFS and HEAD of cmason's for-linus branch. I'm going to go over the sections of the commitdiff from top to bottom (and hunk-by-hunk where I feel it necessary), and I'd like it if people could correct any mistakes I make. Kconfig: Simple enough that I'm not worried. direct.c: The first hunk seems to serve to factor out the difference between rw={READ,WRITE} and the KERNEL_ equivalents into a 'uio' parameter, so I'll view it as being mostly a code organization thing. The second hunk just adds that uio param to read_schedule_segment(), which it is given secondhand via direct_read(). The third hunk seems to be where the actual reading occurs in read_schedule_segment(), and the differences between uio and !uio seem to boil down to that KERNEL_READ is only allowed to operate one page at a time under penalty of BUG, and that it uses get_kernel_page instead of get_user_pages. Since btrfs calls into __blockdev_direct_IO(), I suspect that fs/direct-io.c would (in this case) be the right place to implement both the {READ,WRITE} vs KERNEL_* logic and those checks in this case. Also, check_direct_IO() in fs/btrfs/inode.c may want to treat KERNEL_WRITE similarly to WRITE - I'm not completely sure of that, however. Hunks 4-8 are just dealing with passing the additional uio parameter around. Hunk 9 is the write-side equivalent of hunk 3, and seems to correspond pretty exactly. It's looking like fs/direct_io.c is definitely the place to implement this. Hunks 10-17 are more uio parameter passing. file.c: Pretty simple - as far as I can tell, the activate() sets the span to what the swap header expects, the xs_swapper call basically sets the appropriate kmalloc flags for the network stuff (irrelevant here), and the deactivate() just resets those flags. All the smarts seem to be in the DIO code. The rest: The remainder of the patch seems to just be a.) handing that uio parameter around and b.) stuff specific to swapping over the network. Did I miss anything important? If not, I may well decide to tackle this as my first non-trivial kernel contribution. Does anyone have any suggestions on how I should stresstest this? I figure a good starting point would be running xfstests under memory pressure with such a swapfile. [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=a564b8f -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/17 V3] btrfs-progs: initialize save_ptr prior to strtok_r
The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. Turns out that under the covers glibc was doing enough to confuse the checker about what was being called. Just to keep the noise down, do a harmless initialization, with a comment as to why. Signed-off-by: Eric Sandeen sand...@redhat.com --- V3: Keep strtok_r for old compat, and just init the var. diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..f5dc317 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,7 +67,7 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; + char *save_ptr = NULL; /* Satisfy static checkers */ for (this_char = strtok_r(profiles, |, save_ptr); this_char != NULL; @@ -136,7 +136,7 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; + char *save_ptr = NULL; /* Satisfy static checkers */ if (!filters) return 0; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/inode.c:2165 btrfs_orphan_commit_root+0xcb/0xdf()
On Tue, Feb 26, 2013 at 09:20:43AM -0500, Josef Bacik wrote: On Sun, Feb 24, 2013 at 07:55:46PM -0700, Marc MERLIN wrote: Is this useful to anyone? Got this after a crash/reboot: if (block_rsv) { WARN_ON(block_rsv-size 0); btrfs_free_block_rsv(root, block_rsv); } Fixed in btrfs-next, thanks for reporting! FYI, the commit Josef mentioned is Btrfs: cleanup orphan reservation if truncate fails http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=4a7d0f6854c4a4ad1dba00a3b128a32d39b9a742 (Correct me if it is wrong.) thanks, liubo Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lvm volume like support
On Tue, Feb 26, 2013 at 9:30 PM, Martin Steigerwald mar...@lichtvoll.de wrote: Am Dienstag, 26. Februar 2013 schrieb Fajar A. Nugraha: On Tue, Feb 26, 2013 at 11:59 AM, Mike Fleetwood mike.fleetw...@googlemail.com wrote: On 25 February 2013 23:35, Suman C schakr...@gmail.com wrote: Hi, I think it would be great if there is a lvm volume or zfs zvol type support in btrfs. Btrfs already has capabilities to add and remove block devices on the fly. Data can be stripped or mirrored or both. Raid 5/6 is in testing at the moment. https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devic es https://btrfs.wiki.kernel.org/index.php/UseCases#RAID Which specific features do you think btrfs is lacking? I think he's talking about zvol-like feature. In zfs, instead of creating a filesystem-that-is-accessible-as-a-directory, you can create a zvol which behaves just like any other standard block device (e.g. you can use it as swap, or create ext4 filesystem on top of it). But it would also have most of the benefits that a normal zfs filesystem has, like: - thin provisioning (sparse allocation, snapshot clone) - compression - integrity check (via checksum) Typical use cases would be: - swap in a pure-zfs system - virtualization (xen, kvm, etc) - NAS which exports the block device using iscsi/AoE AFAIK no such feature exist in btrfs yet. Sounds like the RADOS block device stuff for Ceph. Exactly. While using files + loopback device mostly works, there were problems regarding performance and data integrity. Not to mention the hassle in accessing the data if it resides on a partition inside the file (e.g. you need losetup + kpartx to access it, and you must remember to do the reverse when you're finished with it). In zfsonlinux it's very easy to do so since a zvol is treated pretty much like a disk, and whenever there's a partition inside a zvol, a coressponding device noed is also created automatically. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: fix segmentation fault of btrfs check
Segmentation fault occurred in the following command. # btrfs check /dev/sdc7 No valid Btrfs found on /dev/sdc7 Segmentation fault (core dumped) Fix it. Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com --- cmds-check.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index d63e945..5d2e9ed 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -3610,12 +3610,12 @@ int cmd_check(int argc, char **argv) } info = open_ctree_fs_info(argv[optind], bytenr, rw, 1); - uuid_unparse(info-super_copy.fsid, uuidbuf); - printf(Checking filesystem on %s\nUUID: %s\n, argv[optind], uuidbuf); - if (info == NULL) return 1; + uuid_unparse(info-super_copy.fsid, uuidbuf); + printf(Checking filesystem on %s\nUUID: %s\n, argv[optind], uuidbuf); + if (!extent_buffer_uptodate(info-tree_root-node) || !extent_buffer_uptodate(info-dev_root-node) || !extent_buffer_uptodate(info-extent_root-node) || -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lvm volume like support
On Wed, 27 Feb 2013 13:23:23 +1100 Fajar A. Nugraha l...@fajar.net wrote: Not to mention the hassle in accessing the data if it resides on a partition inside the file (e.g. you need losetup + kpartx to access it, and you must remember to do the reverse when you're finished with it). In zfsonlinux it's very easy to do so since a zvol is treated pretty much like a disk, and whenever there's a partition inside a zvol, a coressponding device noed is also created automatically. So I'd say what you (we) need is a generic Linux kernel framework that would allow treating any regular file pretty much like a disk. Not some filesystem-specific block device emulation kludge. Btw some years ago there was a patchset adding proper automatic partition support to 'loop'; but it seems like that went nowhere, and I have no idea why something this useful did not end up being added into the mainline kernel. -- With respect, Roman signature.asc Description: PGP signature
btrfs crash when low on memory.
Something I've yet to repeat managed to leak a whole bunch of memory while I was travelling, and locked up my workstation. When I got home, this was the last thing printed out before it locked up (it did make it into the logs thankfully) after a bunch of instances of the oom-killers handywork. SLUB: Unable to allocate memory on node -1 (gfp=0x50) cache: btrfs_extent_state, object size: 176, buffer size: 504, default order: 1, min order: 0 node 0: slabs: 49, objs: 640, free: 0 [ cut here ] kernel BUG at fs/btrfs/extent_io.c:748! invalid opcode: [#1] PREEMPT SMP Modules linked in: xfs vfat fat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_emu10k1 coretemp snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq snd_seq_device microcode snd_pcm pcspkr snd_page_alloc snd_timer snd soundcore e1000e vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c lzo_compress zlib_deflate ata_piix usb_storage firewire_ohci firewire_core sata_sil crc_itu_t radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core floppy CPU 1 Pid: 7017, comm: mutt Not tainted 3.8.0+ #67 /D975XBX RIP: 0010:[a02502fe] [a02502fe] __set_extent_bit+0x3ae/0x4d0 [btrfs] RSP: :8800a4c31838 EFLAGS: 00010246 RAX: RBX: 001bbfff RCX: RDX: 0001 RSI: 00b0 RDI: RBP: 8800a4c318b8 R08: 81cf0b80 R09: 0400 R10: 0001 R11: 0508 R12: 8800ba4ab2c8 R13: 8800ba4ab2c8 R14: R15: 001bb000 FS: 7eff96e14800() GS:8800bfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00449cee CR3: 80ebb000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process mutt (pid: 7017, threadinfo 8800a4c3, task 880025cba4c0) Stack: 2ab2 8800a4c318e0 8800a4c31fd8 0292 8800a4c31fd8 1000 001bbfff 10080008 034a39a8 8800ba4ab2c8 8800a4c31898 001bbfff Call Trace: [a0251024] lock_extent_bits+0x74/0xa0 [btrfs] [a0251063] lock_extent+0x13/0x20 [btrfs] [a02531c4] __extent_read_full_page+0xc4/0x720 [btrfs] [a02520e0] ? repair_io_failure+0x440/0x440 [btrfs] [a023adf0] ? btrfs_submit_direct+0x640/0x640 [btrfs] [a023adf0] ? btrfs_submit_direct+0x640/0x640 [btrfs] [a023adf0] ? btrfs_submit_direct+0x640/0x640 [btrfs] [a0254886] extent_readpages+0x116/0x1f0 [btrfs] [a02383cf] btrfs_readpages+0x1f/0x30 [btrfs] [8115592a] __do_page_cache_readahead+0x2aa/0x350 [81155790] ? __do_page_cache_readahead+0x110/0x350 [81147d95] ? find_get_page+0x5/0x280 [81155b61] ra_submit+0x21/0x30 [81149f07] filemap_fault+0x267/0x4a0 [811724fe] __do_fault+0x6e/0x530 [8117539f] handle_pte_fault+0x8f/0x900 [81176430] handle_mm_fault+0x210/0x300 [81693d6c] __do_page_fault+0x15c/0x4e0 [811029d7] ? rcu_eqs_exit_common+0xc7/0x380 [81102cf5] ? rcu_eqs_exit+0x65/0xb0 [8169411b] do_page_fault+0x2b/0x50 [81690ccf] page_fault+0x1f/0x30 Code: c9 0f 85 c7 fc ff ff 66 0f 1f 44 00 00 f6 45 18 10 0f 84 b7 fc ff ff 8b 7d 18 e8 8e f2 ff ff 48 85 c0 48 89 c1 0f 85 a3 fc ff ff 0f 0b 4d 89 ef 31 c9 eb 89 66 0f 1f 84 00 00 00 00 00 48 83 7b WARNING: at kernel/exit.c:721 do_exit+0x55/0xc70() Hardware name: Modules linked in: xfs vfat fat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_emu10k1 coretemp snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq snd_seq_device microcode snd_pcm pcspkr snd_page_alloc snd_timer snd soundcore e1000e vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c lzo_compress zlib_deflate ata_piix usb_storage firewire_ohci firewire_core sata_sil crc_itu_t radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core floppy Pid: 7017, comm: mutt Tainted: G D 3.8.0+ #67 Call Trace: [8104b54f] warn_slowpath_common+0x7f/0xc0 [8104b5aa] warn_slowpath_null+0x1a/0x20 [810517c5] do_exit+0x55/0xc70 [8133b1b8] ? __const_udelay+0x28/0x30 [81075b3c] ? __rcu_read_unlock+0x5c/0xa0 [8104f08d] ? kmsg_dump+0x1bd/0x230 [8104eef5] ? kmsg_dump+0x25/0x230 [81691936] oops_end+0x96/0xe0 [81005ff8] die+0x58/0x90 [8169116b] do_trap+0x6b/0x170 [81002f1a] do_invalid_op+0x9a/0xc0 [a02502fe] ? __set_extent_bit+0x3ae/0x4d0 [btrfs] [a024f5ae] ?
[PATCH] xfstests 276: fix error 'FIBMAP: Invalid argument'
Btrfs doesn't support FIEMAP_FLAG_XATTR, which is enabled by -x option of filefrag, and will fail with 'FIBMAP: Invalid argument' for 'filefrag -vx'. 'filefrag -vx' fails on btrfs with 'FIEMAP failed with unsupported flags 2' Remove the '-x' option. Signed-off-by: Wang Sheng-Hui shh...@gmail.com --- 276 |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/276 b/276 index 082f943..f2b17c9 100755 --- a/276 +++ b/276 @@ -75,7 +75,7 @@ _filter_extents() _check_file_extents() { - cmd=filefrag -vx $1 + cmd=filefrag -v $1 echo # $cmd $seq.full out=`$cmd | _filter_extents` if [ -z $out ]; then -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html