Re: syslog message repeated 3x
On Thu, May 09, 2013 at 11:26:25AM +0200, Toralf Förster wrote: I'm just curious why the last of the following 3 commands : $ dd if=/dev/zero of=/mnt/ramdisk/disk1 bs=1M count=257 $ yes | /sbin/mkfs.btrfs /mnt/ramdisk/disk1 $ mount -o loop /mnt/ramdisk/disk1 /mnt/t gives 3x the same log message : 2013-05-09T11:23:00.230+02:00 n22 kernel: device fsid 5b4be7c4-e662-459a-a2a7-066e9384c901 devid 1 transid 4 /dev/loop1 2013-05-09T11:23:00.581+02:00 n22 kernel: device fsid 5b4be7c4-e662-459a-a2a7-066e9384c901 devid 1 transid 4 /dev/loop1 2013-05-09T11:23:00.583+02:00 n22 kernel: device fsid 5b4be7c4-e662-459a-a2a7-066e9384c901 devid 1 transid 4 /dev/loop1 At a guess, two of those are probably from btrfs dev scan triggered by udev. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, sir, the floor is yours. But remember, the --- roof is ours! signature.asc Description: Digital signature
Re: syslog message repeated 3x
On Thu, May 09, 2013 at 12:37:38PM +0200, Toralf Förster wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/09/2013 12:04 PM, Hugo Mills wrote: At a guess, two of those are probably from btrfs dev scan triggered by udev. Those messages do only appear for a btrfs, not if I choose ext4. They're from the btrfs kernel module, so that's hardly surprising. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You know... I'm sure this code would seem a lot better if I --- never tried running it. signature.asc Description: Digital signature
Re: syslog message repeated 3x
On Thu, May 09, 2013 at 02:45:00PM +0200, Toralf Förster wrote: On 05/09/2013 01:47 PM, Wang Shilong wrote: Anyway, i use the latest btrfs-progs. well, under Gentoo I used sys-fs/btrfs-progs- which points always to the latest git version : git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git My host kernel is stable 3.9.1 The mount command still gives : 2013-05-09T14:43:35.604+02:00 n22 kernel: device fsid 20a30a6b-a82f-429f-b426-00f1739e4d3d devid 1 transid 8 /dev/loop1 2013-05-09T14:43:35.604+02:00 n22 kernel: device fsid 20a30a6b-a82f-429f-b426-00f1739e4d3d devid 1 transid 8 /dev/loop1 2013-05-09T14:43:35.608+02:00 n22 kernel: btrfs: disk space caching is enabled 2013-05-09T14:43:35.660+02:00 n22 kernel: device fsid 20a30a6b-a82f-429f-b426-00f1739e4d3d devid 1 transid 8 /dev/loop1 I guess the main question is... why is this a problem for you? The message is informational and doesn't indicate any kind of issue with the FS. I'd just ignore it/them. (Also, are you running btrfs dev scan beforehand or not? It'd be interesting to see the difference in your logs -- particularly with timestamps -- when you do that.) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am the author. You are the audience. I outrank you! --- signature.asc Description: Digital signature
Re: Btrfs balance invalid argument error
On Fri, May 10, 2013 at 10:07:56PM +0200, Marcus Lövgren wrote: Hi list, I am using kernel 3.9.0, btrfs-progs 0.20-rc1-253-g7854c8b. I have a three disk array of level single: # btrfs fi sh Label: none uuid: 2e905f8f-e525-4114-afa6-cce48f77b629 Total devices 3 FS bytes used 3.80TB devid1 size 2.73TB used 2.25TB path /dev/sdd devid2 size 2.73TB used 1.55TB path /dev/sdc devid3 size 2.73TB used 0.00 path /dev/sdb Btrfs v0.20-rc1-253-g7854c8b # btrfs fi df /mnt/data Data: total=3.79TB, used=3.79TB System: total=4.00MB, used=420.00KB Metadata: total=6.01GB, used=4.87GB When running # btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/data I get ERROR: error during balancing '/mnt/data' - Invalid argument There may be more info in syslog - try dmesg | tail dmesg | tail says: btrfs: unable to start balance with target data profile 128 Isn't it possible to convert raid level to raid5? Yes, it should be possible. It looks like the kernel's got a problem with it, which is odd because 3.9 should know about RAID-5. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I think that everything darkling says is actually a joke. --- It's just that we haven't worked out most of them yet. signature.asc Description: Digital signature
Re: Btrfs balance invalid argument error
On Fri, May 10, 2013 at 11:43:34PM +0200, Marcus Lövgren wrote: Yes, you were right! Adding another drive to the array made it continue without errors. Is this already reported as a bug? I believe it has been, yes. I think we've even had a patch out for it. I haven't looked to see if it's got into 3.10. Hugo. Thanks for the help, Marcus 2013/5/10 Remco Hosman - Yerf IT re...@yerf-it.nl On May 10, 2013, at 10:21 PM, Hugo Mills h...@carfax.org.uk wrote: On Fri, May 10, 2013 at 10:07:56PM +0200, Marcus Lövgren wrote: Hi list, I am using kernel 3.9.0, btrfs-progs 0.20-rc1-253-g7854c8b. I have a three disk array of level single: # btrfs fi sh Label: none uuid: 2e905f8f-e525-4114-afa6-cce48f77b629 Total devices 3 FS bytes used 3.80TB devid1 size 2.73TB used 2.25TB path /dev/sdd devid2 size 2.73TB used 1.55TB path /dev/sdc devid3 size 2.73TB used 0.00 path /dev/sdb Btrfs v0.20-rc1-253-g7854c8b # btrfs fi df /mnt/data Data: total=3.79TB, used=3.79TB System: total=4.00MB, used=420.00KB Metadata: total=6.01GB, used=4.87GB When running # btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/data I get ERROR: error during balancing '/mnt/data' - Invalid argument There may be more info in syslog - try dmesg | tail dmesg | tail says: btrfs: unable to start balance with target data profile 128 Isn't it possible to convert raid level to raid5? Yes, it should be possible. It looks like the kernel's got a problem with it, which is odd because 3.9 should know about RAID-5. Wasn't there some issues that the kernel or tools wanted 4 disks when converting to raid5? Remco Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Strive for apathy! --- signature.asc Description: Digital signature
Re: unlinked 10 orphans - something to worry about?
On Sat, May 11, 2013 at 02:27:27PM +0200, Clemens Eisserer wrote: Hi, I frequently get messages like unlinked 10 orphans in syslog (running linux 3.9.1), although I have never had a power outage nor a kernel crash. Is this something to worry about, or just a usual clean-up information? It's just information about a clean-up. Totally harmless. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Happiness is mandatory. Are you happy? --- signature.asc Description: Digital signature
Re: RADI6 questions
On Sat, Jun 01, 2013 at 02:07:53PM -0700, ronnie sahlberg wrote: Hi List, I have a filesystem that is spanning about 10 devices. It is currently using RAID1 for both data and metadata. In order to get higher availability and be able to handle multi device failures I would like to change from RAID1 to RAID6. Is it possible/stable/supported/recommended to change data from RAID1 to RAID6 ? (I assume btrfs fi balance ... is used for this?) Yes. Metadata is currently RAID1, is it supported to put metadata as RAID6 too? It would be odd to have lesser protection for metadata than data. Optimally I would like a mode where metadata is mirrored onto all the spindles in the filesystem, not just 2 in RAID1 or n in RAID6. Yes, that should be supported. Im running a 3.8.0 kernel. The btrfs RAID-5 and RAID-6 implementations aren't really ready for production use, so right now I wouldn't recommend using them for anything other than for testing purposes with data that's replacable. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- w.w.w. : England's batting scorecard --- signature.asc Description: Digital signature
Re: RAID10 total capacity incorrect
On Sun, Jun 02, 2013 at 05:17:11PM +0100, Tim Eggleston wrote: Hi list, I have a 4-device RAID10 array of 2TB drives on btrfs. It works great. I recently added an additional 4 drives to the array. There is only about 2TB in use across the whole array (which should have an effective capacity of about 8TB). However I have noticed that when I issue btrfs filesystem df against the mountpoint, in the total field, I get the same value as the used field: root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0 Data, RAID10: total=2.06TB, used=2.06TB System, RAID10: total=64.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=3.00GB, used=2.29GB Here's my btrfs filesystem show: root@mckinley:/# btrfs fi show Label: 'btrfsvol0' uuid: 1a735971-3ad7-4046-b25b-e834a74f2fbb Total devices 8 FS bytes used 2.06TB devid7 size 1.82TB used 527.77GB path /dev/sdk1 devid8 size 1.82TB used 527.77GB path /dev/sdg1 devid6 size 1.82TB used 527.77GB path /dev/sdi1 devid5 size 1.82TB used 527.77GB path /dev/sde1 devid4 size 1.82TB used 527.77GB path /dev/sdj1 devid2 size 1.82TB used 527.77GB path /dev/sdf1 devid1 size 1.82TB used 527.77GB path /dev/sdh1 devid3 size 1.82TB used 527.77GB path /dev/sdc1 You have 8*527.77 GB = 4222.16 GB of raw space allocated for all purposes. Since RAID-10 takes twice the raw bytes to store data, that gives you 2111.08 GB of usable space so far. From the df output, 2.06 TB ~= 2109.44 GB is allocated as data, and all of that space is used. 3.00 GB is allocated as metadata, and most of that is used. That adds up (within rounding errors) to the 2111.08 GB above. Additional space will be allocated from the available unallocated space as the FS needs it. This is running the Ubuntu build of kernel 3.9.4 and btrfs-progs from git (v0.20-rc1-324-g650e656). Am I being an idiot and missing something here? I must admit that I still find the df output a bit cryptic (entirely my failure to understand, nothing else), but on another system with only a single device the total field returns the capacity of the device. That's probably already fully-allocated, so used=size in btrfs fi show. If it's a single device, then you're probably not using any replication, so the raw storage is equal to the possible storage. HTH, Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I can resist everything except temptation --- signature.asc Description: Digital signature
Re: RAID10 total capacity incorrect
On Sun, Jun 02, 2013 at 05:52:38PM +0100, Tim Eggleston wrote: Hi Hugo, Thanks for your reply, good to know it's not an error as such (just me being an idiot!). Additional space will be allocated from the available unallocated space as the FS needs it. So I guess my question becomes, how much of that available unallocated space do I have? Instinctively the btrfs df output feels like it's missing an equivalent to the size column from vanilla df. Look at btrfs fi show -- you have size and used there, so the difference there will give you the unallocated space. Is there a method of getting this in a RAID situation? I understand that btrfs RAID is more complicated than md RAID, so it's ok if the answer at this point is no... Not in any obvious (and non-surprising) way. Basically, any way you could work it out is going to give someone a surprise because they were thinking of it some other way around. The problem is that until the space is allocated, the FS can't know how that space needs to be allocated (to data/metadata, or with what replication type and hence overheads), so we can't necessarily give a reliable estimate. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If you're not part of the solution, you're part --- of the precipiate. signature.asc Description: Digital signature
Re: RAID10 total capacity incorrect
On Sun, Jun 02, 2013 at 12:52:40PM -0400, Chris Murphy wrote: On Jun 2, 2013, at 12:17 PM, Tim Eggleston li...@timeggleston.co.uk wrote: root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0 Data, RAID10: total=2.06TB, used=2.06TB System, RAID10: total=64.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=3.00GB, used=2.29GB Am I being an idiot and missing something here? No, it's confusing. btrfs fi df doesn't show free space. The first value is what space the fs has allocated for the data usage type, and the 2nd value is how much of that allocation is actually being used. I personally think the allocated value is useless for mortal users. I'd rather have some idea of what free space I have left, and the regular df command presents this in an annoying way also because it shows the total volume size, not accounting for the double consumption of raid1. So no matter how you slice it, it's confusing. It's the nature of the beast, unfortunately. So far, nobody's managed to come up with a simple method of showing free space and space usage that isn't going to be misleading somehow. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If you're not part of the solution, you're part --- of the precipiate. signature.asc Description: Digital signature
Re: oops at mount
On Mon, Jun 03, 2013 at 01:56:10PM +0200, Papp Tamas wrote: On 05/30/2013 02:55 PM, Stefan Behrens wrote: On Thu, 30 May 2013 08:32:35 -0400, Josef Bacik wrote: On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote: hi All, I'm new on the list. System: Distributor ID:Ubuntu Description: Ubuntu 13.04 Release: 13.04 Codename: raring Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The symptom is the same with Saucy 3.9 kernel. Can you try btrfs-next git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git if it's still not fixed please file a bug at bugzilla.kernel.org and make sure the component is set to btrfs. Thanks, Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with the ID INTEL SSDSA2M040, I've tested whether they honor the flush request. And these two SSDs don't do so, they ignore it. If you cut the power after a flush request completes, the data that was written before the flush request is gone, the write cache was _not_ flushed. You can only disable the write cache during/after every boot hdparm -W 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or avoid such SSDs, or prepare to restore from backup occasionally. Basically it means it's not safe to use this SSD? Correct. I used it for 2 years with ext4 without any issue, before I switched to btrfs (on the root partition). In the meantime btrfs also was quite stable on my /data partition. After I reinstalled thr system with btrfs, this issue happened two times. But anyway, I thought cow should be able to handle these kind of issues by design. Am I wrong? CoW writes out everything that's going to be changed first, and finally writes one piece of data which points to the new version of the data. *Provided* you can guarantee that the final piece of data (the superblock) gets written only after everything else has made it to permanent storage, then everything is good. However, most hardware (and most operating systems) reorder the data which is being sent to the disk, for performance reasons. This is fine, as long as you can enforce the dependency in some way -- this is what barriers/flushes do: they say ensure that all of this is fully written out to real permanent storage before you try to write the superblock. If the hardware ignores flushes or barriers, there's no mechanism for ensuring that the data is fully consistent, because you may find that the superblock gets reordered to be written before some of the other writes to the device. If that happens and then the power gets cut before the rest of the data can be written, you have a corrupt filesystem. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In theory, theory and practice are the same. In --- practice, they're different. signature.asc Description: Digital signature
Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28
nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mar k xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_i pv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables bridge stp llc rtc snd_hda_codec_realtek fbcon bitblit softcursor font nouveau video mxm_wmi cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit evdev d rm_kms_helper snd_hda_intel ttm snd_hda_codec drm i2c_piix4 pcspkr snd_pcm serio_raw snd_page_alloc snd_timer k8temp snd i2c_core processor button thermal_sys sky2 wmi backlight fb fbdev pata_acpi firewire_ohci firewire_cor e pata_atiixp usbhid pata_jmicron sata_sil24 kernel: Pid: 10980, comm: btrfs-transacti Tainted: GW 3.8.13-gentoo #1 kernel: Call Trace: kernel: [811d3600] ? btrfs_printk+0x12/0xc2 kernel: [810289c8] ? warn_slowpath_common+0x78/0x8c kernel: [81028a74] ? warn_slowpath_fmt+0x45/0x4a kernel: [811d5e00] ? btrfs_release_path+0x5e/0x79 kernel: [811d36ed] ? __btrfs_abort_transaction+0x3d/0xad kernel: [811ed97b] ? btrfs_save_ino_cache+0x1d4/0x348 kernel: [8142ce4c] ? commit_fs_roots.isra.25+0xa1/0x14a kernel: [81237a0f] ? btrfs_scrub_pause+0xd5/0xe4 kernel: [811f4f1a] ? btrfs_commit_transaction+0x3f9/0x93c kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79 kernel: [811f5a8c] ? start_transaction+0x311/0x408 kernel: [811eed7e] ? transaction_kthread+0xd1/0x16d kernel: [811eecad] ? btrfs_alloc_root+0x34/0x34 kernel: [810420b3] ? kthread+0xad/0xb5 kernel: [81042006] ? __kthread_parkme+0x5e/0x5e kernel: [814315ac] ? ret_from_fork+0x7c/0xb0 kernel: [81042006] ? __kthread_parkme+0x5e/0x5e kernel: ---[ end trace b584e8ceb6422945 ]--- kernel: BTRFS error (device sdf) in btrfs_save_ino_cache:471: error 28 kernel: btrfs is forced readonly kernel: BTRFS warning (device sdf): Skipping commit of aborted transaction. kernel: BTRFS error (device sdf) in cleanup_transaction:1391: error 28 -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'm all for giving people enough rope to shoot themselves in --- the foot -- Andreas Dilger signature.asc Description: Digital signature
Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28
On Wed, Jun 05, 2013 at 04:28:33PM +0100, Martin wrote: On 05/06/13 16:05, Hugo Mills wrote: On Wed, Jun 05, 2013 at 03:57:42PM +0100, Martin wrote: Dear Devs, I have x4 4TB HDDs formatted with: mkfs.btrfs -L bu-16TB_0 -d raid1 -m raid1 /dev/sd[cdef] /etc/fstab mounts with the options: noatime,noauto,space_cache,inode_cache All on kernel 3.8.13. Upon using rsync to copy some heavily hardlinked backups from ReiserFS, I've seen: The following block rsv returned -28 is repeated 7 times until there is a call trace for: This is ENOSPC. Can you post the output of btrfs fi df /mountpoint and btrfs fi show, please? btrfs fi df: Data, RAID1: total=2.85TB, used=2.84TB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=412.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=27.00GB, used=25.82GB Metadata: total=8.00MB, used=0.00 btrfs fi show: Label: 'bu-16TB_0' uuid: 8fd9a0a8-9109-46db-8da0-396d9c6bc8e9 Total devices 4 FS bytes used 2.87TB devid4 size 3.64TB used 1.44TB path /dev/sdf devid3 size 3.64TB used 1.44TB path /dev/sde devid1 size 3.64TB used 1.44TB path /dev/sdc devid2 size 3.64TB used 1.44TB path /dev/sdd OK, so you've got plenty of space to allocate. There were some issues in this area (block reserves and ENOSPC, and I think specifically addressing the issue of ENOSPC when there's space available to allocate) that were fixed between 3.8 and 3.9 (and probably some between 3.9 and 3.10-rc as well), so upgrading your kernel _may_ help here. Something else that may possibly help as a sticking-plaster is to write metadata more slowly, so that you don't have quite so much of it waiting to be written out for the next transaction. Practically, this may involve things like running sync on a loop. But it's definitely a horrible hack that may help if you're desperate for a quick fix until you can finish creating metadata so quickly and upgrade your kernel... Hugo. And df -h: Filesystem Size Used Avail Use% Mounted on /dev/sde 15T 5.8T 8.9T 40% /mnt/sata16 WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x3d/0xad(). Then, the mount is set read-only. How to fix or debug? Thanks, Martin kernel: [ cut here ] kernel: WARNING: at fs/btrfs/extent-tree.c:6372 btrfs_alloc_free_block+0xd3/0x29c() kernel: Hardware name: GA-MA790FX-DS5 kernel: btrfs: block rsv returned -28 kernel: Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_time xt_connlimit xt_realm xt_addrtype xt_comment xt_recent xt_policy xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat _tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_ conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mar k xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_i pv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables bridge stp llc rtc snd_hda_codec_realtek fbcon bitblit softcursor font nouveau video mxm_wmi cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit evdev d rm_kms_helper snd_hda_intel ttm snd_hda_codec drm i2c_piix4 pcspkr snd_pcm serio_raw snd_page_alloc snd_timer k8temp snd i2c_core processor button thermal_sys sky2 wmi backlight fb fbdev pata_acpi firewire_ohci firewire_cor e pata_atiixp usbhid pata_jmicron sata_sil24 kernel: Pid: 10980, comm: btrfs-transacti Not tainted 3.8.13-gentoo #1 kernel: Call Trace: kernel: [811e6600] ? btrfs_init_new_buffer+0xef/0xf6 kernel: [810289c8] ? warn_slowpath_common+0x78/0x8c kernel: [81028a74] ? warn_slowpath_fmt+0x45/0x4a kernel: [81278f2c] ? ___ratelimit+0xc4/0xd0 kernel: [811e66da] ? btrfs_alloc_free_block+0xd3/0x29c kernel: [811d68e5] ? __btrfs_cow_block+0x136/0x454 kernel: [811f0d47] ? btrfs_buffer_uptodate+0x40/0x56 kernel: [811d6d8c] ? btrfs_cow_block+0x132/0x19d kernel: [811da606] ? btrfs_search_slot+0x2f5/0x624 kernel: [811dbc5a] ? btrfs_insert_empty_items+0x5c/0xaf kernel: [811e5089] ? run_clustered_refs+0x852/0x8e6 kernel
Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28
On Wed, Jun 05, 2013 at 04:59:57PM +0100, Martin wrote: On 05/06/13 16:43, Hugo Mills wrote: On Wed, Jun 05, 2013 at 04:28:33PM +0100, Martin wrote: btrfs fi df: Data, RAID1: total=2.85TB, used=2.84TB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=412.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=27.00GB, used=25.82GB Metadata: total=8.00MB, used=0.00 btrfs fi show: Label: 'bu-16TB_0' uuid: 8fd9a0a8-9109-46db-8da0-396d9c6bc8e9 Total devices 4 FS bytes used 2.87TB devid4 size 3.64TB used 1.44TB path /dev/sdf devid3 size 3.64TB used 1.44TB path /dev/sde devid1 size 3.64TB used 1.44TB path /dev/sdc devid 2 size 3.64TB used 1.44TB path /dev/sdd Thanks for that. I can give kernel 3.9.4 a try. For a giggle, I'll try first with nice 19 and syncs in a loop... One confusing bit is why the Data, RAID1: total=2.85TB from btrfs fi df? Because you've got enough raw space allocated for 2.85 TiB of data; that's 5.7 TiB of actual bytes, because you're using RAID-1 for it. That should add up to somewhere near the total of the used values in btrfs fi show. The difference will be accounted for in metadata, system, and the inevitable rounding errors. All the values are shown in powers-of-two -- i.e. IEEE units, not SI units despite the use of SI prefixes. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- All hope abandon, Ye who press Enter here. --- signature.asc Description: Digital signature
Re: Moved partition via dd
On Sun, Jun 09, 2013 at 12:44:23PM +0200, André Schlichting wrote: Am 09.06.2013 00:57, schrieb Chris Murphy: The next issue: if=/dev/sdc2 skip=$((245547520-33024)) seek=0 of=/dev/sdc2 You have a skip (skip n block from input) value well inside of sdc2. It seems you should have skipped from sdc not sdc2, and should have used the old start value for sdc2 which was just 245547520, and you needed to specify a count value in order to get the correct number of blocks, which would have been 732566527-245547520. Then write those blocks to sdc2 (which makes seek= unnecessary). Chris Murphy /dev/sdc2 at this moment was already the new partition with boundaries 33024 to 732566640 with the old partition inside. Therefore I used skip=old start - new start, which inside of sdc2 points to the start of the old partition. I didn't worry about the count, because the partition was at the end of the disk. I actually think that the move of the partition was no problem. I guess that btrfs has some absolute references which have to be adjusted and now has some problems with sectors not at the right place. No, it doesn't. All the position values in the FS are either relative to the containing block device (i.e. the partition, in this case), or are based on an internal virtual address space -- which is itself mapped in terms of the containing block device(s). The following error from btrfsck Check tree block failed, want=959572647936, have=13587293097915834379 suggests that 959572647936 is a way off... That just says to me that you've got garbage metadata -- usually a good indication that there's some file data where there should be metadata, which would further suggest that you've somehow moved the wrong data (or the right data into the wrong place). Maybe first, the principal question: Can one just move a btrfs-partition to the left by * delete partition * create partition moved * dd data from old to new partition Or does one have to adjust some references inside the btrfs filesystem? In theory, that process should be safe. In fact, I'm not aware of *any* filesystem which is dependent on the position of the partition within a larger device. I think at this point, you should try testdisk to see if it can identify your FS's superblock. If that doesn't work, then restore from backup is likely to be your fastest route to recovery. Hugo. http://www.cgsecurity.org/wiki/TestDisk -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I get nervous when I see words like 'mayhaps' in a novel, --- because I fear that just round the corner is lurking 'forsooth' signature.asc Description: Digital signature
Re: raid0, raid1, raid5, what to choose?
On Thu, Jun 13, 2013 at 11:09:00PM +0200, Hendrik Friedel wrote: Hello, I'd appreciate your recommendation on this: I have three hdd with 3TB each. I intend to use them as raid5 eventually. currently I use them like this: # mount|grep sd /dev/sda1 on /mnt/Datenplatte type ext4 /dev/sdb1 on /mnt/BTRFS/Video type btrfs /dev/sdb1 on /mnt/BTRFS/rsnapshot type btrfs #df -h /dev/sda1 2,7T 1,3T 1,3T 51% /mnt/Datenplatte /dev/sdb1 5,5T 5,4T 93G 99% /mnt/BTRFS/Video /dev/sdb1 5,5T 5,4T 93G 99% /mnt/BTRFS/rsnapshot Now, what surprises me, and here I lack memory- is that sdb appears twice.. I think, I created a raid1, but how can I find out? Appearing twice in that list is more an indication that you have multiple subvolumes -- check the subvol= options in /etc/fstab #/usr/local/smarthome# ~/btrfs/btrfs-progs/btrfs fi show /dev/sdb1 Label: none uuid: 989306aa-d291-4752-8477-0baf94f8c42f Total devices 2 FS bytes used 2.68TB devid2 size 2.73TB used 2.73TB path /dev/sdc1 devid1 size 2.73TB used 2.73TB path /dev/sdb1 Now, I wanted to convert it to raid0, because I lack space and redundancy is not important for the Videos and the Backup, but this fails: ~/btrfs/btrfs-progs/btrfs fi balance start -dconvert=raid0 /mnt/BTRFS/ ERROR: error during balancing '/mnt/BTRFS/' - Inappropriate ioctl for device /mnt/BTRFS isn't a btrfs subvol, according to what you have listed above. It's a subdirectory in /mnt which is contains two subdirs (Video and rsnapshot) which are used as mountpoints for subvolumes. Try running the above command with /mnt/BTRFS/Video instead (or rsnapshot -- it doesn't matter which). dmesg does not help here. Anyway: This gave me some time to think about this. In fact, as soon as raid5 is stable, I want to have all three as a raid5. Will this be possible with a balance command? If so: will this be possible as soon as raid5 is stable, or will I have to wait longer? Yes, it's possible to convert to RAID-5 right now -- although the code's not settled down into its final form quite yet. Note that RAID-5 over two devices won't give you any space benefits over RAID-1 over two devices. (Or any reliability benefits either). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Are you the man who rules the Universe? Well, I --- try not to. signature.asc Description: Digital signature
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote: Hi, I've observed a rather strange behaviour while trying to mount two identical copies of the same image to different mount points. Each modification to one image is also performed in the second one. Example: dd if=/dev/sda? of=image1 bs=1M cp image1 image2 mount -o loop image1 m1 mount -o loop image2 m2 touch m2/hello ls -la m1 //will now also include a file calles hello Is this behaviour intentional and known or should I create a bug-report? It's known, and not desired behaviour. The problem is that you've ended up with two filesystems with the same UUID, and the FS code gets rather confused about that. The same problem exists with LVM snapshots (or other block-device-layer copies). The solution is a combination of a tool to scan an image and change the UUID (offline), and of some code in the kernel that detects when it's being told about a duplicate image (rather than an additional device in the same FS). Neither of these has been written yet, I'm afraid. I've deleted quite a bunch of files on my production system because of this... Oops. I'm sorry to hear that. :( Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Welcome to Rivendell, Mr Anderson... --- signature.asc Description: Digital signature
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
On Thu, Jun 20, 2013 at 10:22:07AM +, Gabriel de Perthuis wrote: On Thu, 20 Jun 2013 10:16:22 +0100, Hugo Mills wrote: On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote: Hi, I've observed a rather strange behaviour while trying to mount two identical copies of the same image to different mount points. Each modification to one image is also performed in the second one. touch m2/hello ls -la m1 //will now also include a file calles hello Is this behaviour intentional and known or should I create a bug-report? It's known, and not desired behaviour. The problem is that you've ended up with two filesystems with the same UUID, and the FS code gets rather confused about that. The same problem exists with LVM snapshots (or other block-device-layer copies). The solution is a combination of a tool to scan an image and change the UUID (offline), and of some code in the kernel that detects when it's being told about a duplicate image (rather than an additional device in the same FS). Neither of these has been written yet, I'm afraid. To clarify, the loop devices are properly distinct, but the first device ends up mounted twice. I've had a look at the vfs code, and it doesn't seem to be uuid-aware, which makes sense because the uuid is a property of the superblock and the fs structure doesn't expose it. It's a Btrfs problem. Yes, it is. (I didn't intend, however obliquely, to imply that it wasn't). Instead of redirecting to a different block device, Btrfs could and should refuse to mount an already-mounted superblock when the block device doesn't match, somewhere in or below btrfs_mount. Registering extra, distinct superblocks for an already mounted raid is a different matter, but that isn't done through the mount syscall anyway. The problem here is that you could quite legitimately mount /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are both part of the same filesystem. So you can't simply prevent mounting based on the device that the mount's being done with. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I know of three kinds: hot, --- cool, and what-time-does-the-tune-start? signature.asc Description: Digital signature
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote: Instead of redirecting to a different block device, Btrfs could and should refuse to mount an already-mounted superblock when the block device doesn't match, somewhere in or below btrfs_mount. Registering extra, distinct superblocks for an already mounted raid is a different matter, but that isn't done through the mount syscall anyway. The problem here is that you could quite legitimately mount /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are both part of the same filesystem. So you can't simply prevent mounting based on the device that the mount's being done with. Okay. The check should rely on a list of known block devices for a given filesystem uuid. And this is where we fail currently -- that list is held by the btrfs module in the kernel, and is constructed on the basis of what btrfs dev scan finds by looking at superblocks on block devices. Currently, there's no method implemented for determining whether a block device with a legitimate btrfs superblock on it is a duplicate of another device, or whether it's a newly-discovered device which is part of an as-yet incompletely specified multi-device FS. I think it should be possible to look up the device ID as well, and complain (loudly, to the user, and in the kernel) at btrfs dev scan time if we see duplicates. That would deal with the problem at the earliest point of confusion. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I know of three kinds: hot, --- cool, and what-time-does-the-tune-start? signature.asc Description: Digital signature
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
On Thu, Jun 20, 2013 at 08:22:12AM -0500, Kevin O'Kelley wrote: Thank you for your reply. I appreciate it. Unfortunately this issue is a deal killer for us. The ability to take very fast snapshots and replicate them to another site is key for us. We just can't us Btrfs with this setup. That's too bad. Good luck and thank you. If you want to make fast atomic incremental copies of btrfs to a remote system, then btrfs send/receive may be what you're looking for. Hugo. Sent from my iPhone On Jun 20, 2013, at 5:56 AM, Hugo Mills h...@carfax.org.uk wrote: On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote: Instead of redirecting to a different block device, Btrfs could and should refuse to mount an already-mounted superblock when the block device doesn't match, somewhere in or below btrfs_mount. Registering extra, distinct superblocks for an already mounted raid is a different matter, but that isn't done through the mount syscall anyway. The problem here is that you could quite legitimately mount /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are both part of the same filesystem. So you can't simply prevent mounting based on the device that the mount's being done with. Okay. The check should rely on a list of known block devices for a given filesystem uuid. And this is where we fail currently -- that list is held by the btrfs module in the kernel, and is constructed on the basis of what btrfs dev scan finds by looking at superblocks on block devices. Currently, there's no method implemented for determining whether a block device with a legitimate btrfs superblock on it is a duplicate of another device, or whether it's a newly-discovered device which is part of an as-yet incompletely specified multi-device FS. I think it should be possible to look up the device ID as well, and complain (loudly, to the user, and in the kernel) at btrfs dev scan time if we see duplicates. That would deal with the problem at the earliest point of confusion. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Computer Science is not about computers, any more than --- astronomy is about telescopes. signature.asc Description: Digital signature
Re: raid1 inefficient unbalanced filesystem reads
On Fri, Jun 28, 2013 at 11:34:18AM -0400, Josef Bacik wrote: On Fri, Jun 28, 2013 at 02:59:45PM +0100, Martin wrote: On kernel 3.8.13: Using two equal performance SATAII HDDs, formatted for btrfs raid1 for both data and metadata and: The second disk appears to suffer about x8 the read activity of the first disk. This causes the second disk to quickly get maxed out whilst the first disk remains almost idle. Total writes to the two disks is equal. This is noticeable for example when running emerge --sync or running compiles on Gentoo. Is this a known feature/problem or worth looking/checking further? So we balance based on pids, so if you have one process that's doing a lot of work it will tend to be stuck on one disk, which is why you are seeing that kind of imbalance. Thanks, The other scenario is if the sequence of processes executed to do each compilation step happens to be an even number, then the heavy-duty file-reading parts will always hit the same parity of PID number. If each tool has, say, a small wrapper around it, then the wrappers will all run as (say) odd PIDs, and the tools themselves will run as even pids... Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Startle, startle, little twink. How I wonder what you think. --- signature.asc Description: Digital signature
Re: Hardware failure or btrfs issue?
On Mon, Jul 01, 2013 at 11:56:30PM +0100, Peter Chant wrote: Sirs, my recently slowing file system is now going read only after trying a defrag or other operation. I'm wondering whether this is the result of a hardware failure or a btrfs or some other issue. Output of dmesg: [snip] [ 127.862825] btrfs: corrupt leaf, bad key order: block=2837196627968,root=1, slot=121 [snip] This is usually an indication that you have bad hardware -- I'd suggest testing RAM, PSU, CPU in that order. I'm not sure what, if anything, can be done to fix the error on the disk right now. Not that I've done anything other than a cursory check but it looks like the read only data is fine. Might be a good idea to use that to refresh your backups, just in case my prediction about the fixability is correct. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- How deep will this sub go? Oh, she'll go all the way to --- the bottom if we don't stop her. signature.asc Description: Digital signature
Re: Hardware failure or btrfs issue?
On Tue, Jul 02, 2013 at 06:36:48PM +0100, Peter Chant wrote: On 07/02/2013 08:29 AM, Hugo Mills wrote: This is usually an indication that you have bad hardware -- I'd suggest testing RAM, PSU, CPU in that order. I'm not sure what, if anything, can be done to fix the error on the disk right now. Thanks, appreciated. Hmm. I've got one stick of ram out of the machine due to testing as I had some freezes last week. So the damage probably happened then, if that stick is bad. Filesystems have this irritating habit of remembering things done to them across reboots. :) Hugo. If it were one of the RAM, PSU and CPU then I'm unsure why this IO issue only surfaces on the HDD and not the SSD. I ordered a new HDD last night, before reading your post. If its not the disk I'll go raid1. If it is the disk then I'll probally find out. Not that I've done anything other than a cursory check but it looks like the read only data is fine. Might be a good idea to use that to refresh your backups, just in case my prediction about the fixability is correct. Well, first option is to drop in the new disk, freshly format it and copy the data across (not add it as a second disk). If that fails last backup was wednesday. I've not done much of note since then apart from try to fix the disk issues. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The glass is neither half-full nor half-empty; it is twice as --- large as it needs to be. signature.asc Description: Digital signature
Re: [PATCH] btrfs-progs: per-thread, per-call pretty buffer
Sorry to be a pain in the arse at this late stage of the patch, but I've only just noticed. On Wed, Jul 10, 2013 at 04:30:15PM +0200, David Sterba wrote: static char *size_strs[] = { , KB, MB, GB, TB, - PB, EB, ZB, YB}; -char *pretty_sizes(u64 size) + PB, EB}; These are SI (power of 10) prefixes... +void pretty_size_snprintf(u64 size, char *str, size_t str_bytes) { int num_divs = 0; -int pretty_len = 16; float fraction; - char *pretty; + + if (str_bytes == 0) + return; if( size 1024 ){ fraction = size; @@ -1172,13 +1173,13 @@ char *pretty_sizes(u64 size) num_divs ++; } - if (num_divs = ARRAY_SIZE(size_strs)) - return NULL; + if (num_divs = ARRAY_SIZE(size_strs)) { + str[0] = '\0'; + return; + } fraction = (float)last_size / 1024; ... and this is working in IEC (power of 2) units. Can we fix this discrepancy, please? Also note that SI uses k for 10^3, but IEC uses K for 2^10. Just insert an i in the middle of each element of size_strs should deal with the problem. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Charting the inexorable advance of Western syphilisation... --- signature.asc Description: Digital signature
Re: Need help mounting broken btrfs Fedora 19
On Sun, Jul 14, 2013 at 01:43:41PM +, Dave Barnum wrote: I need some help as may have lost some number of files on a btrfs raid 1 volume. I'm not quite sure what happend which, I know, only adds to the problem. On my computer #1 I had only a month or so ago installed Fedora 19 Beta and at the time of install chose BTRFS, raid 1. Recently one of the drives started complaining that it was going to die. WIthout taking it out of the array (perhaps I should have done that) I turned off the system and swapped the drive with another. From then on I lost my ability to boot the system. I could not get anything to work with the new hard drive. Then I put the old hard drive back in so that I could try to boot again. Still nothing. At one point I think I was able to see grub - but at this point, I'm not. I just get boot disk failure. Try adding the option degraded to your mount options. With grub, this should be possible to do manually at boot time. That should get yout the ability to mount the FS with just a single mirror. If that works, you can then use btrfs dev add to add the new device to the filesystem, and then a full balance to recreate the mirror. Hugo. Further more on the failing drive, drive A, I can still see the patitions but I cannot mount it on another system. On drive B (the other half of the mirror) I do not see any partitions. I tried copying the partition structure using sfdisk from A to B but that probably was not smart. I plugged drive A into computer #2 using a live Fedora and Ubuntu CD to try to mount the volume. However in both distributions I am unable to mount the volume. I've tried mounting using -o degraded but I still get the same error. The error I'm seeing when I try to mount the filesystem goes like this: Quote: [10792.307425] device label fedora_ison devid 2 transid 48720 /dev/sdc2 [10792.308202] btrfs: allowing degraded mounts [10792.308206] btrfs: disk space caching is enabled [10792.308599] btrfs: failed to read chunk root on sdc2 [10792.308799] btrfs warning page private not zero on page 20979712 [10792.320146] btrfs: open_ctree failed I believe the superblock may be in tact since when i turn the command ./btrfs-show-super /dev/sdc2 I get: Quote: root@ubuntu:/downloads/btrfs-progs# ./btrfs-show-super /dev/sdc2 superblock: bytenr=65536, device=/dev/sdc2 - csum 0xfc19c468 [match] bytenr 65536 flags 0x1 magic _BHRfS_M [match] fsid cbbf7d4c-f7a0-43ff-aed5-77b347d6ff25 label fedora_ison generation 48720 root 1105526784 sys_array_size 226 chunk_root_generation 46504 root_level 1 chunk_root 20979712 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 2988521291776 bytes_used 598216536064 sectorsize 4096 nodesize 4096 leafsize 4096 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x1 csum_type 0 csum_size 4 cache_generation 48720 dev_item.uuid 5f61edaa-7f12-4ec5-a024-02f7797e1400 dev_item.fsid cbbf7d4c-f7a0-43ff-aed5-77b347d6ff25 [match] dev_item.type 0 dev_item.total_bytes 1494260645888 dev_item.bytes_used 482110078976 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 2 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 Could someone help me troubleshoot why I can't mount my volume? I would REALLY appreciate it! Perhaps there is a way to repair my broken tree structure? Thank You! -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them. signature.asc Description: Digital signature
Re: Can btrfs handle different RAID levels for different subvolumes?
On Sun, Jul 14, 2013 at 04:50:35PM +0200, Adam Ryczkowski wrote: Can one btrfs filesystem handle different RAID levels e.g. for different subvolumes? If so, how does deduplication with bedup (https://github.com/g2p/bedup) across them work? No, not yet. It's planned at some point (probably in the fairly distant future), but hasn't arrived yet. Hugo. (It has been asked already on the Net (http://unix.stackexchange.com/questions/82869/can-btrfs-handle-different-raid-levels-for-different-subvolumes) but the question didn't get the answer. I guess answering it should be straightforward for you, guys :-) ) Adam Ryczkowski -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But somewhere along the line, it seems / That pimp became --- cool, and punk mainstream. signature.asc Description: Digital signature
Re: super block crcs don't match, older mkfs detected
On Sun, Jul 14, 2013 at 12:11:04PM -0600, Chris Murphy wrote: On Fedora 19 with all updates, when I mkfs.btrfs and then mount the volume, I'm getting this in dmesg: [ 280.534868] Btrfs loaded [ 280.581799] device fsid 94ed05cb-89a9-4d6b-a1e2-5312687b59f5 devid 1 transid 4 /dev/mapper/vg1-brick1 [ 280.590140] btrfs: super block crcs don't match, older mkfs detected [ 280.597746] btrfs: disk space caching is enabled [ 280.661204] SELinux: initialized (dev dm-4, type btrfs), uses xattr btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64 kernel-3.10.0-1.fc20.x86_64 Is this expected? Benign? Yes, I believe it's harmless and will go away after the first mount. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I gave up smoking, drinking and sex once. It was the scariest --- 20 minutes of my life. signature.asc Description: Digital signature
Re: bug: corrupt filesystem, cannot delete tmp files created just before crash.
that haven't actually been committed yet), that may well help in your case. I'm not technically qualified to match backtraces against commits/patches and identify a solid match, but it's definitely worth a try. Finally, as background once you're out of the tight spot, since you're running a multi-device filesystem, you're likely to find the discussion of that on the multiple devices, sysadmin guide, and use cases pages useful. FWIW, here I'm running most of my btrfs filesystems in dual- device raid1 (both data/metadata) mode, to take advantage of the checksumming and extra copy to lookup in case of checksum error, that btrfs offers, in addition to the device-loss scenario that raid1 helps protect against. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Reintarnation: Coming back from the dead as a hillbilly. --- signature.asc Description: Digital signature
Re: Questions about multi-device behavior
On Thu, Jul 18, 2013 at 02:59:58PM -0700, Roger Binns wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18/07/13 13:05, Chris Murphy wrote: Sounds like if I have a degraded 'single' volume, I can simply cp or rsync everything from that volume to another, and I'll end up with a successful copy of the surviving data. True? Not quite. I did it with cp -a. Because all the metadata survived, cp would create the target file, but then get an i/o error on opening/reading the source file. It would print an error message, but not delete the empty target file. Consequently I ended up with loads of zero length files I had to go in and delete afterwards. The odds of having an undamaged file from that process are much better for single than for RAID-0 (and aren't affected by having tools which will cope better with IO errors -- although you'll get more of each damaged file if you do). As the file size goes up, the odds of it being damaged increase. Hugo. I briefly looked for an rsync option to keep going on source i/o errors but didn't find one. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlHoZV4ACgkQmOOfHg372QRPFwCgob01TavS2qffBkxkuv0g9bl3 pC8An25Mgx+cRXb0Kds+GRnzaj2P0Acy =UA5J -END PGP SIGNATURE- -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am an opera lover from planet Zog. Take me to your lieder. --- signature.asc Description: Digital signature
Re: Different size devices in RAID
On Tue, Jul 23, 2013 at 07:32:36AM -0700, Curtis Shimamoto wrote: I am using btrfs to span across two SSDs at the moment. One is a 256GB and the other is a 128GB. So as of now, I have the data in single form and the metadata in a RAID1. I have heard that btrfs can adjust to some degree for devices in a RAID array that vary in sizes due to the way it handles things. But I feel as though the size difference between those two would be too vast to compensate for whatsoever. But I have an additional SSD in my machine, which is also a 128GB drive. I know that with RAID1, it will only duplicate the data no matter how many devices are present in the array. So if that is the case, if I were to add all three of my SSDs into the filesystem, and then put the data into a RAID1, would it be able to make use of all the space? Yes. If the largest device is A, and the two smaller ones are B and C, the system will allocate chunks in pairs, alternating A+B and A+C. The two smaller ones are about equal to the size of the larger, so in my mind it would seem that it would be entirely possible for it to keep two copies of each extent while still utilizing all the space. But I don't know if btrfs is set up to recognize this situation or how it would handle it. I know that I could potentially put the two smaller drives in some kind of an LVM or mdadm, but I would like to avoid this if possible. It just seems like an unnecessary layer of complexity. Though my question is about RAID1 specifically, as I would like to use the potential of the self healing features, I guess it would also extend to RAID0 as well. Would that be able to make efficient use of the space? No, because you only have one large device and two small ones, so the top part of the largest device would be unusable with RAID-0. (Or at least, not until we get stripe-width limitations, which should be coming up Real Soon Now, as I believe it's part of Chris's work to finish off the parity RAID implementation). Additionally, though not quite as much of a concern to me, the machine in which these drives live is an Ivy Bridge Laptop, so there are actually only two available SATA3 ports. The odd drive out at this point in time is actually an mSATA which is the only SATA2 port. If I were to add this to an array (assuming the above questions have favorable answers), how dramatically would the speed of the array be affected? To be honest, the speed of even just the mSATA drive alone is enough to keep me happy. But I have just been very curious about this. The write speeds of all three are relatively close. But the read speeds on the SATA3 are significantly faster than the mSATA. This one I don't have an answer for, sorry. Hugo. Anyway, thanks for the fantastic filesystem. Sorry for the long email, but these questions have been in the back of my mind for some time now. For the first question(s) at least I have not been able to find anything regarding that scenario. Regards, -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am but mad north-north-west: when the wind is southerly, I --- know a hawk from a handsaw. signature.asc Description: Digital signature
Re: Q: Why subvolumes?
On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote: Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). That's what I said. Our current implementation allows different subvolumes to have the same inode numbers, which is what makes it work. If you threw out the concept of subvolumes, or allowed snapshots within subvolumes, then you'd be duplicating inodes within a subvolume, which is one reason it doesn't work. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Unix: For controlling fungal diseases in crops. --- signature.asc Description: Digital signature
Re: Adding 500G disk to btrfs volume... but I don't get 500G more of available space (raid0)
On Fri, Jul 26, 2013 at 09:05:03AM +0200, Axelle wrote: Hi btrfs folks, I'm afraid I have a newbie question... but I can't sort it out? It's just about adding a disk to a btrfs volume and not getting the correct amount of GB in the end... I have a btrfs volume which already consists of two different devices and which is mounted on /samples. Its total size is 194G. $ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 194G 165G 20G 90% /samples Now, I would like to add another 500G to that volume, from another device. I did $ sudo mkfs.btrfs -m raid0 -d raid0 /dev/sdb $ sudo btrfs device add /dev/sdb /samples My filesystem now correctly reports: $ sudo btrfs filesystem show Label: none uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add Total devices 3 FS bytes used 161.98GB devid3 size 465.76GB used 0.00 path /dev/sdb devid2 size 93.13GB used 84.51GB path /dev/sdc1 devid1 size 100.61GB used 84.53GB path /dev/sdc6 But I miss some space when I do: RAID-0 requires at least two devices. If you balance this configuration, you'll use up the first 93.13 GiB of each device striping across all three devices, for a total of 3*93.13 = 279.39 GiB. Then /dev/sdc1 becomes full, leaving you with two devices which have 7.48 GiB and 372.63 GiB respectively. After another 7.48 GiB on each device (for a total of 2*7.48 = 14.96 GiB), you have filled /dev/sdc1, leaving only /dev/sdb to work with. Since there's only one device, it can't be used by RAID-0. If you want to use the full space available, you should rebalance to single usage, which stops the RAID-0 striping, and allocates linearly: # btrfs balance start -dconvert=single,soft /samples Hugo. $ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 660G 165G 43G 80% /samples I added 500G! Why haven't I got more available?? To debug, I ran this command: $ sudo btrfs filesystem df /samples Data, RAID0: total=162.00GB, used=159.79GB Data: total=8.00MB, used=7.48MB System, RAID1: total=8.00MB, used=24.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=3.50GB, used=2.19GB Metadata: total=8.00MB, used=0.00 My data is in RAID0, that's ok. So where have my 500G gone, and how can I fix this? Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Sometimes, when I'm alone, I Google myself. --- signature.asc Description: Digital signature
Re: Error on rebooting
On Fri, Jul 26, 2013 at 01:19:40AM +0100, Pete wrote: Dear All, Have I anything to be concerned about? I have got some error messages on booting. The scenario was that I had installed some ram and I suspect that I had disturbed a cable as one disk was not visible. I could not mount the other disk (did not try degraded, but the messages seemed to indicate something serious was up). After installing ram booted. But some issue with some files, anything accessing those files froze. Had to reboot. Failed to shutdown correctly (shutdown stalled on unmount) Reboot. /home etc not mounted (btrfs in question) Btrfsck /dev/sdb showed various errors. When complete turned off machine. Fiddled with cables. Affected drive now seen on reboot. Rebooted. Mounted disks (perhaps) error messages may have been present on boot. Much disk IO. Disk IO stopped. Machine appeared frozen except that Caps lock and Num lock worked. Ctrl-alt-backspace did not sort out stalled x(?)dm session. Hard power down. Last reboot. Error messages. However, works. Example messages from dmesg: [8.063138] btrfs: enabling inode map caching [8.067617] btrfs: use lzo compression [8.072092] btrfs: disk space caching is enabled [8.147324] btrfs: bdev /dev/sdb errs: wr 4015, rd 464, flush 0, corrupt 0, gen 0 [8.802275] NET: Registered protocol family 10 [ 15.462313] device fsid 2628a800-e095-4460-9b93-8847e9fb626b devid 2 transid 27794 /dev/sdc [ 15.511463] device fsid 2628a800-e095-4460-9b93-8847e9fb626b devid 2 transid 27794 /dev/sdc [ 15.566689] device fsid 2628a800-e095-4460-9b93-8847e9fb626b devid 2 transid 27794 /dev/sdc [ 15.587851] device fsid 2628a800-e095-4460-9b93-8847e9fb626b devid 2 transid 27794 /dev/sdc [ 15.620678] device fsid 2628a800-e095-4460-9b93-8847e9fb626b devid 2 transid 27794 /dev/sdc [ 16.024295] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready ... ... ... ... 19.491507] tun: (C) 1999-2004 Max Krasnyansky m...@qualcomm.com [ 56.064899] parent transid verify failed on 1142639534080 wanted 27788 found 26856 [ 56.154721] btrfs read error corrected: ino 1 off 1142639534080 (dev /dev/sdb sector 2179305424) [ 56.166301] parent transid verify failed on 1142597795840 wanted 2 found 27772 [ 56.186790] btrfs read error corrected: ino 1 off 1142597795840 (dev /dev/sdb sector 2179223904) [ 56.460857] parent transid verify failed on 1142599532544 wanted 27779 found 27772 [ 56.461396] btrfs read error corrected: ino 1 off 1142599532544 (dev /dev/sdb sector 2179227296) [ 59.927078] ata1.00: configured for UDMA/133 [ 59.927082] ata1: EH complete [ 59.933467] ata2.00: configured for UDMA/133 [ 59.933473] ata2: EH complete [ 60.129445] ata3.00: configured for UDMA/133 [ 60.129458] ata3: EH complete [ 61.449810] parent transid verify failed on 1142629605376 wanted 27784 found 26856 [ 61.473817] btrfs read error corrected: ino 1 off 1142629605376 (dev /dev/sdb sector 2179286032) [snip] [ 104.204035] btrfs read error corrected: ino 1544486 off 0 (dev /dev/sdb sector 2182960392) [ 104.204551] btrfs read error corrected: ino 1544486 off 4096 (dev /dev/sdb sector 2182960400) [ 117.249253] parent transid verify failed on 1142609051648 wanted 27774 found 26856 [ 117.255886] btrfs read error corrected: ino 1 off 1142609051648 (dev /dev/sdb sector 2179245888) [ 117.419294] parent transid verify failed on 1142599507968 wanted 27779 found 27772 [ 117.437317] btrfs read error corrected: ino 1 off 1142599507968 (dev /dev/sdb sector 2179227248) [ 137.502176] NFSD: Unable to end grace period: -110 Given that I have booted now - does this mean that the above was btrfs sorting itself out? Looks like it. I'd recommend a scrub to check for any other out of date data on the affected drive. I've done pretty much the same thing as this myself, and a scrub, though scary in the amount of noise it made, fixed everything satisfactorily. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Sometimes, when I'm alone, I Google myself. --- signature.asc Description: Digital signature
Re: Adding 500G disk to btrfs volume... but I don't get 500G more of available space (raid0)
On Fri, Jul 26, 2013 at 04:35:59PM +0200, Axelle wrote: Hi Hugo, Thanks for your answer, but I'm afraid I still don't get it. RAID-0 requires at least two devices. Well, I have three devices, so that's more than enough isn't it? Or do you mean I should be adding two devices at a time? If you balance this configuration, you'll use up the first 93.13 GiB of each device striping across all three devices, for a total of 3*93.13 = 279.39 why 93.13? I guess you meant 84.53 which is the size I am using on sdc1 and sdc6. Sorry, a little unclear -- if you balance, and then continue writing data to the FS. Once you hit 93.13 GiB (the size of the smallest device), you switch to 2-device operation, and then when that's full, you can't go any further. # btrfs balance start -dconvert=single,soft /samples Nice command but I wasn't thinking of stopping RAID0 striping. I was expecting my data to be stripped evenly on all 3 devices. It's worth noting that /dev/sdc1 and /dev/sdc6 are on the same physical device. If that's a rotational device (i.e. traditional hard disk), then you're going to have a serious performance decrease as a result of that, because /dev/sdc will have to spend lots of its time seeking between the two partitions. single operation really is the better option here -- you'll get to use all your space, and you won't suffer the performance problems of striping between two partitions on the same disk. Well - evenly - until the smallest one /dev/sdc1 is filled, then, it'll use only the last two, when /dev/sdc6 is filled, it will used /dev/sdb only. Is that possible/correct? That's exactly what happens, except for the last bit. RAID-0 requires at least two devices, so it can't stripe across the one device remaining once you have completely filled /dev/sdc1 and /dev/sdc6. But basically, what does not make sense to me is what df reports as available size. Look. Before, I had ~165G used on a total of 194G. I added a new disk of 465G. Now, df reports I have a total of 660G (that's right) with 165G used (that's correct too) but only 43G available! I was expecting to have ~495G available! Where are my 465G gone? It's not usable with the RAID configuration you've specified, so it's not shown. Hugo. $ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 660G 165G 43G 80% /samples Thanks Axelle. On Fri, Jul 26, 2013 at 9:45 AM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Jul 26, 2013 at 09:05:03AM +0200, Axelle wrote: Hi btrfs folks, I'm afraid I have a newbie question... but I can't sort it out? It's just about adding a disk to a btrfs volume and not getting the correct amount of GB in the end... I have a btrfs volume which already consists of two different devices and which is mounted on /samples. Its total size is 194G. $ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 194G 165G 20G 90% /samples Now, I would like to add another 500G to that volume, from another device. I did $ sudo mkfs.btrfs -m raid0 -d raid0 /dev/sdb $ sudo btrfs device add /dev/sdb /samples My filesystem now correctly reports: $ sudo btrfs filesystem show Label: none uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add Total devices 3 FS bytes used 161.98GB devid3 size 465.76GB used 0.00 path /dev/sdb devid2 size 93.13GB used 84.51GB path /dev/sdc1 devid1 size 100.61GB used 84.53GB path /dev/sdc6 But I miss some space when I do: RAID-0 requires at least two devices. If you balance this configuration, you'll use up the first 93.13 GiB of each device striping across all three devices, for a total of 3*93.13 = 279.39 GiB. Then /dev/sdc1 becomes full, leaving you with two devices which have 7.48 GiB and 372.63 GiB respectively. After another 7.48 GiB on each device (for a total of 2*7.48 = 14.96 GiB), you have filled /dev/sdc1, leaving only /dev/sdb to work with. Since there's only one device, it can't be used by RAID-0. If you want to use the full space available, you should rebalance to single usage, which stops the RAID-0 striping, and allocates linearly: # btrfs balance start -dconvert=single,soft /samples Hugo. $ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 660G 165G 43G 80% /samples I added 500G! Why haven't I got more available?? To debug, I ran this command: $ sudo btrfs filesystem df /samples Data, RAID0: total=162.00GB, used=159.79GB Data: total=8.00MB, used=7.48MB System, RAID1: total=8.00MB, used=24.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=3.50GB, used=2.19GB Metadata: total=8.00MB, used=0.00 My data is in RAID0, that's ok. So where have my 500G gone, and how can I fix this? Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org
Re: Mount multiple-device-filesystem by UUID
On Sat, Jul 27, 2013 at 08:52:50PM +0200, Hendrik Friedel wrote: As stated in the wiki, multiple-device filesystems (e.g. raid 1) will only mount after a btfs device scan, or if all devices are passed with the mount options. I remember, that for Ubuntu 12.04 I changed the initrd. But after a re-install, I have to do this again, and I don't remember how I did it. With Ubuntu, just install the btrfs-tools package. It should modify the initrd correctly. So, the other option would be passing the devices in the fstab. But here, I'd prefer UUIDs rather than device names, as they can change. This is why we don't recommend using device= mount flags. Is this possible? What is the syntax? I don't believe it is possible. Finding filesystems by UUID is (I think) a userspace-based thing, so you'd have to have an initrd anyway. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- No! My collection of rare, incurable diseases! Violated! --- signature.asc Description: Digital signature
Re: How to merge two partitions?
On Thu, Aug 01, 2013 at 11:53:34AM +0100, Andrew Stubbs wrote: If I have two partitions, /dev/sda1 and /dev/sda2, one btrfs, and one ext4 (but I could convert it first), how can I merge them into one filesystem without moving all the data onto an external device and then moving it all back again? (I do have a backup, of course, but transferring the data takes hours, maybe days.) That's going to be the easiest option by far. I'm left with this layout for historical reasons, and now the smaller partition is close to running out of space. I thought of using btrfs device add and just living with the untidy underlying devices, but an experiment with loopback filesystems shows that any data on the new device is silently obliterated (it might be nice if the docs mentioned this!) You would expect data in a different filesystem format to be integrated into an existing set of data structures? That would be... magic. :) I've thought of shrinking the larger partition, creating a third partition, and adding that to the smaller filesystem. This would solve the free-space issue, but doesn't feel great. I've thought of using a temporary third partition as an intermediary, but I don't have space to move all the data in one go. I've thought of using a clever partition manager to move the start of the second partition, transfer some data, move it some more, transfer some more data, but this seems like an equally lengthy process. That's the other option I'd go for. I could move the data from the smaller partition into the larger one, then delete the first partition, and move the whole larger partition forward, extend it, and fix up the fstab. That might be less painful. Is there a cunning btrfs trick to do this? Can a btrfs filesystem be extended backwards, if you see what I mean? No, using gparted to move it backwards into the free space is your best option here. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I don't know. I can't tell the future, I just work there. --- signature.asc Description: Digital signature
Re: [PATCH v2] btrfs: add mount option to set commit interval
On Sat, Aug 03, 2013 at 07:39:01AM -0400, Mike Audia wrote: Another newbie question is which version of the kernel do I need to have in order to cleanly apply this patch? I am finding that it fails to apply to the current stable kernel code (as of now it is v3.10.4) which makes me think your patch has to be applied to a newer one? Are you patching against the linux git tree meaning I have to use the 3.11 series to try your code? Try Josef's btrfs-next repo: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories#Integration_repository_.28btrfs-next.29 OK! I can patch successfully into that git repo: % cd /tmp/work % git clone git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git % cd btrfs % patch -Np1 -i btrfs_add_mount_option_to_set_commit_interval.patch patching file fs/btrfs/ctree.h patching file fs/btrfs/disk-io.c patching file fs/btrfs/super.c Hunk #3 succeeded at 647 (offset 19 lines). Hunk #4 succeeded at 1006 with fuzz 1 (offset 39 lines). If I am not mistaken, btrfs-next is the entire kernel's code? The wiki suggests running anything compiled therein from the build dir. That'll be for the userspace tools, not the kernel. Obviously, one doesn't tend to run kernels from the command line. :) If I want to compile this into the official 3.10.4 tree, how can I do it? Add the official kernel repo as a remote to the same git repo (with git remote add), fetch that repo, create a new branch to work in, based on the btrfs-next branch, then merge in the other branch (or vice-versa). Note that btrfs-next is usually based on the latest released kernel anyway, so that's likely to be largely superfluous. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A gentleman doesn't do damage unless he's paid for it. --- signature.asc Description: Digital signature
Re: building btrfs corrupt block
On Sun, Aug 04, 2013 at 12:39:28PM -0600, Chris Murphy wrote: I must be doing something wrong, but I can't figure out what. I have btrfs-progs source installed from here: http://koji.fedoraproject.org/koji/buildinfo?buildID=441375 make produces no errors. Yet btrfs-corrupt-block.c isn't built. Suggestions? $ make btrfs-corrupt-block Some of the more outré commands aren't built by default and have to be built individually. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- For months now, we have been making triumphant retreats --- before a demoralised enemy who is advancing in utter disorder. signature.asc Description: Digital signature
Re: check for reflink capability and for shared data
On Sat, Aug 24, 2013 at 06:09:58PM +0200, Thomas Koch wrote: Hi, how can I do the following in a shell script: - check whether my file system supports cp --reflink? touch foo; if cp --reflink=always foo bar; then ...; fi; rm -f foo bar - check whether two files share the same data on disk, i.e. one has been created by cp --reflink of the other? You can't, using simple userspace tools. I think the only way would be to use the tree search ioctl to inspect the extents for each file, and see whether any of them overlap. Why do you need to know this? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Someone's been throwing dead sheep down my Fun Well --- signature.asc Description: Digital signature
Re: failed to read log tree, open_ctree failed
On Tue, Aug 27, 2013 at 02:25:09AM +0800, Tomasz Chmielewski wrote: I had a RAID-1 btrfs filesystem with Linux 3.10. After hard reset, I'm no longer able to mount it: [ 35.254122] Btrfs loaded [ 35.254577] device label test-btrfs devid 1 transid 97966 /dev/sda4 [ 35.254819] device label test-btrfs devid 3 transid 97966 /dev/sdb4 [ 35.255032] device label test-btrfs devid 3 transid 97966 /dev/sdb4 [ 35.22] btrfs: force zlib compression [ 35.255645] btrfs: disk space caching is enabled [ 35.379806] btrfs: bdev /dev/sda4 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 [ 56.209412] parent transid verify failed on 3321036099584 wanted 97967 found 97966 [ 56.225990] parent transid verify failed on 3321036099584 wanted 97967 found 97966 [ 56.226128] btrfs: failed to read log tree [ 56.344483] btrfs: open_ctree failed I've tried with 3.11-rc7, but it gives the same result. Any hints how to recover from that? I have backups, but it would be nice if the filesystem just mounted. Try mounting with both -orecovery and -oro,recovery. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Python is executable pseudocode; perl --- is executable line-noise. signature.asc Description: Digital signature
Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)
On Mon, Aug 26, 2013 at 01:10:54PM -0600, Chris Murphy wrote: On Aug 26, 2013, at 11:41 AM, Nick Lee em...@nickle.es wrote: There was a discussion on IRC a few days ago that the problem with the tree root's bloco was likely the result of either an issue with the disk itself, or the chunk tree/logical mappings. I ran the chunk recover, looked over the errors it found, and hit write. (If it failed, I was going to run something photorec, loss of organization as a side effect.) I can write something more clear after my flight lands tomorrow if you want. I'm just curious about when to use various techniques: -o recovery, btrfsck, chunk-recover, zero log. Let's assume that you don't have a physical device failure (which is a different set of tools -- mount -odegraded, btrfs dev del missing). First thing to do is to take a btrfs-image -c9 -t4 of the filesystem, and keep a copy of the output to show josef. :) Then start with -orecovery and -oro,recovery for pretty much anything. If those fail, then look in dmesg for errors relating to the log tree -- if that's corrupt and can't be read (or causes a crash), use btrfs-zero-log. If there's problems with the chunk tree -- the only one I've seen recently was reporting something like can't map address -- then chunk-recover may be of use. After that, btrfsck is probably the next thing to try. If options -s1, -s2, -s3 have any success, then btrfs-select-super will help by replacing the superblock with one that works. If that's not going to be useful, fall back to btrfsck --repair. Finally, btrfsck --repair --init-extent-tree may be necessary if there's a damaged extent tree. Finally, if you've got corruption in the checksums, there's --init-csum-tree. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Try everything once, except incest and folk-dancing. --- signature.asc Description: Digital signature
Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)
On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote: On Aug 29, 2013, at 11:35 AM, Zach Brown z...@redhat.com wrote: If those fail, then look in dmesg for errors relating to the log tree -- if that's corrupt and can't be read (or causes a crash), use btrfs-zero-log. In a bit of a tangent: btrfs-zero-log throws away data that fsync/sync could have previously claimed was stable on disk. Given how often this is thrown around as a solution to a broken partition, should the tool jump up and down and make it clear that it's about to roll the file system back? This seems like relevant information. Right now, as far as I can tell, it's completely undocumented and silent. Yes, I think it helps remove some burden on the list answering questions about a tool that doesn't have any documentation, to have a warning. How much longer will btrfs-zero-log be needed? If whatever it's doing isn't obviated by future improvements to btrfsck, and this sort of big hammer approach is still needed in some worse case scenarios, then it probably hurts no one to flag the user with essentially how you described it. I think documentation is a greater burden to create, and less likely to be consulted. Proceeding will roll back the file system to a previous state, and may cause the loss of successfully written data. Proceed? (Y/N) ... the loss of up to the last 30 seconds of successfully written data. Give the user enough information to make a sensible decision. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eighty Megabytes And Constantly Swapping. --- signature.asc Description: Digital signature
Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)
On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote: On Aug 29, 2013, at 1:40 PM, Hugo Mills h...@carfax.org.uk wrote: On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote: Proceeding will roll back the file system to a previous state, and may cause the loss of successfully written data. Proceed? (Y/N) ... the loss of up to the last 30 seconds of successfully written data. Give the user enough information to make a sensible decision. Certainly, if known for sure it won't be more than 30 seconds? Mmm... it'll depend on the setting of the commit period, which up until a couple of weeks ago was always 30s, but someone posted a patch to give it a config knob... Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eighty Megabytes And Constantly Swapping. --- signature.asc Description: Digital signature
Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)
On Fri, Aug 30, 2013 at 09:44:28AM -0500, Eric Sandeen wrote: On 8/29/13 3:19 PM, Chris Murphy wrote: On Aug 29, 2013, at 1:53 PM, Hugo Mills h...@carfax.org.uk wrote: On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote: Certainly, if known for sure it won't be more than 30 seconds? Mmm... it'll depend on the setting of the commit period, which up until a couple of weeks ago was always 30s, but someone posted a patch to give it a config knob… Proceeding will roll back the file system to a previous state, and may cause the loss of successfully written data since the last commit period (30 seconds by default). Proceed? (Y/N) Is it just loss of data, or might this also result in a filesystem with inconsistent metadata, which then requires a fsck? No the metadata is always consistent (well, in theory, barring bugs and out-of-band corruption). Above sounds like it's just reverting to a previous (consistent) state. Is that correct? Yes, it's dropping the log of accepted-but-uncommitted work. This is a Bad Thing in the sense that something that's reached the log is reported to the application as being successfully written. If the application critically relies on that (e.g. databases), then we've discarded durability from ACID. (Can you guess I've been marking Databases resit exam papers this morning? :) ) Hugo. -Eric p.s. fwiw when the xfs_repair zero-log option -L is used, we say: ALERT: The filesystem has valuable metadata changes in a log which is being\n destroyed because the -L option was used.\n)); That's a reasonable wording too. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We teach people management skills by examining characters in --- Shakespeare. You could look at Claudius's crisis management techniques, for example. signature.asc Description: Digital signature
Re: Device delete returns unable to go below four devices on raid10 on 5 drive setup
On Sat, Aug 31, 2013 at 11:42:28AM -0600, Chris Murphy wrote: On Aug 31, 2013, at 4:12 AM, Steven Post redalert.comman...@gmail.com wrote: The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64). Is this something known (and possibly resolved in a later version), or should I open a bug report about it? Try 3.10 or 3.11 before filing a bug on it. If you want a debian-packaged kernel, they're available from the experimental distribution. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 5: Manifesto Promise --- signature.asc Description: Digital signature
Re: Recovering from csum errors
On Mon, Sep 02, 2013 at 11:41:12PM +0200, Rain Maker wrote: Hello list, So, I ran a full scrub, and, luckily, it only found 6 csum errors (these 6). The damage therefore seems to be contained in just 1 file. Now, I removed the offending file. But is there something else I should have done to recover the data in this file? Can it be recovered? No, and no. The data's failing a checksum, so it's basically broken. If you had a btrfs RAID-1 configuration, the FS would be able to recover from one broken copy using the other (good) copy. I'm running 3.11-rc7. It is a single disk btrfs filesystem. I have several subvolumes defined, one of which for VMWare Workstation (on which the corruption took place). Aaah, the VM workload could explain this. There's some (known, won't-fix) issues with (I think) direct-IO in VM guests that can cause bad checksums to be written under some circumstances. I'm not 100% certain, but I _think_ that making your VM images nocow (create an empty file with touch; use chattr +C; extend the file to the right size) may help prevent these problems. I checked the SMART values, they all seem OK. The harddisks in this machine are less then a month old. I replaced them after seeing similar messages on the old disks. Is the only logical explanation for this some kind of hardware failure (SATA controller, power supply...), or could there be something more to this? As above, there's some direct-IO problems with data changing in-flight that can lead to bad checksums. Fixing the issue would cause some fairly serious slow-downs in performance for that case, which is rather against what direct-IO is trying to do, so I think it's unlikely the behaviour will be changed. Of course, I could be completely wrong about all this, and you've got bad RAM or PSU something... Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What are we going to do tonight? The same thing we do --- every night, Pinky. Try to take over the world! signature.asc Description: Digital signature
Re: Btrfs prog
On Wed, Sep 04, 2013 at 01:57:42PM +0200, Giuseppe Fierro wrote: I'm using btrfs on ubuntu 13.04 with btrfs prog v0.20-rc1 This is my configuration using 2 disks in raid1 mode: gspe@jura:/mnt$ sudo btrfs f show Label: 'UbuntuDSK' uuid: f4a3c832-f6ab-4b1d-9eb7-f9ba7d1cba01 Total devices 2 FS bytes used 205.41GB devid1 size 2.70TB used 214.03GB path /dev/sdb2 devid2 size 2.70TB used 214.01GB path /dev/sda2 Btrfs v0.20-rc1 Some btrfs command behave strange: If i want to check free space using df, i get: gspe@jura:/mnt$ sudo btrfs filesystem df / Data, RAID1: total=212.00GB, used=204.42GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=36.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=2.00GB, used=1010.04MB Metadata: total=8.00MB, used=0.00 What do you think is wrong with this output? It looks OK to me: From the btrfs fi show at the top, you have 214 GB allocated on each device. The btrfs fi df shows you how that allocation is used: 212 GB (*2, because it's RAID-1) is allocated to data, with 204 GB holding useful data. The remaining 2 GB (*2) is allocated to metadata, and 1 GB of that is actually used. If I would like to show the subvolume, i get gspe@jura:/mnt$ sudo btrfs subvolume list / gspe@jura:/mnt$ nothing is shown!!! Try using the -a option. It got added a while ago, and has been a complete pain in the neck ever since... Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The English language has the mot juste for every occasion. --- signature.asc Description: Digital signature
Re: btrfs-convert won't convert ext* - No valid Btrfs found on /dev/sdb1
On Thu, Sep 05, 2013 at 09:06:19PM +0600, Roman Mamedov wrote: On Thu, 5 Sep 2013 15:54:07 +0100 Hugo Mills h...@carfax.org.uk wrote: On Thu, Sep 05, 2013 at 05:43:27PM +0300, Тимофей Титовец wrote: Hello guys, i try to convert ext4 volume, but btrfs-convert show me error: No valid Btrfs found on file unable to open ctree conversion aborted. Ubuntu 13.04 Kernel: 3.11 btrfs-progs git version 0.20-git20130822~194aa4a13 way to reproduce error: $ truncate -s 4G file $ mkfs.ext4 file #say yes to create fs on non block device. $ btrfs-convert file No valid Btrfs found on file unable to open ctree conversion aborted. I'm guessing here, but I suspect you will need to create a loopback device so that btrfs-convert can look at it as a block device rather than as a file: # losetup -f --show file /dev/loop0 # btrfs-convert /dev/loop0 Hugo. Nope, just today I saw someone report the same problem in a blog comment: http://popey.com/blog/2013/09/02/fun-with-btrfs-on-ubuntu/#comment-9704 It's the same person, in fact. I'd not seen that the one on popey's blog was doing it with block devices. This does indeed look like a fairly drastic bug... Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is it true that last known good on Windows XP --- boots into CP/M? signature.asc Description: Digital signature
Re: btrfs-convert won't convert ext* - No valid Btrfs found on /dev/sdb1
On Thu, Sep 05, 2013 at 05:43:27PM +0300, Тимофей Титовец wrote: Hello guys, i try to convert ext4 volume, but btrfs-convert show me error: No valid Btrfs found on file unable to open ctree conversion aborted. Ubuntu 13.04 Kernel: 3.11 btrfs-progs git version 0.20-git20130822~194aa4a13 way to reproduce error: $ truncate -s 4G file $ mkfs.ext4 file #say yes to create fs on non block device. $ btrfs-convert file No valid Btrfs found on file unable to open ctree conversion aborted. I'm guessing here, but I suspect you will need to create a loopback device so that btrfs-convert can look at it as a block device rather than as a file: # losetup -f --show file /dev/loop0 # btrfs-convert /dev/loop0 Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: [GIT PULL] Btrfs
On Fri, Sep 13, 2013 at 09:07:36AM -0400, Ric Wheeler wrote: On 09/12/2013 11:36 AM, Chris Mason wrote: Mark Fasheh's offline dedup work is also here. In this case offline means the FS is mounted and active, but the dedup work is not done inline during file IO. This is a building block where utilities are able to ask the FS to dedup a series of extents. The kernel takes care of verifying the data involved really is the same. Today this involves reading both extents, but we'll continue to evolve the patches. Nice feature! Just a note, the offline label is really confusing. In other storage products, they typically call this out of band since you are online but not during the actual write in a synchronous way :) I knew there was a specific term for this, but couldn't remember what it was. I've now updated the btrfs website's description(s) of the feature to include out-of-band and in-band. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Once is happenstance; twice is coincidence; three times --- is enemy action. signature.asc Description: Digital signature
Re: [raidX vs single/dup]
On Thu, Sep 26, 2013 at 12:22:49PM +, miaou sami wrote: Hi btrfs guys, could someone explain to me the differences in mkfs.btrfs: - between -d raid0 and -d single In RAID0, data is striped across all the devices, so the first 64k of a file will go on device 1, the next 64k will go on device 2, and so on. With single, files are allocated linearly on one device. (This is assuming smallish files, a filesystem with lots of space. Even with single, files can still end up being scattered around over multiple devices -- but with RAID0, even non-fragmented files are striped) - between -m raid1 and -m dup In both cases, there are two copies of each metadata block. With RAID1, it *requires* the two copies to live on different devices. With DUP, it allows the two copies to live on the same device (e.g. if there's only one device). - between -m raid0 and -m single As for -draid0 and -dsingle, but for metadata instead of data. My understanding is that raidX should be used in case of multi devices and single/dup should be used in case of single device to allow duplication, but it is not 100% clear to me... As btrfs raid concepts are quite different from traditionnal raid, shouldn't we use the words stripped and mirrored instead of raid0/raid1? or even single and duplicated? Then there would be no difference between single/raid0 and duplicated/raid1... But there _are_ differences between them, as explained above. :) I posted a patch a while ago to change the names to something more logical and expressive, but it didn't get merged. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Stick them with the pointy end. --- signature.asc Description: Digital signature
Re: [raidX vs single/dup]
On Thu, Sep 26, 2013 at 01:40:57PM +, miaou sami wrote: Thank you, it is quite clear now. I guess that on multi device, raid0 vs single would be a matter of performance vs ease of low level hardware data recovery. The wiki https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices says: When you have drives with differing sizes and want to use the full capacity of each drive, you have to use the single profile for the data blocks. Let's assume the following configuration: 1x10GB disk and 2x5GB disks -- Does it mean I cannot use the full capacity AND have a duplication of my data in the configuration above? (full capacity would be 10GB here) No, that will give you the full usable space. A 20 GB drive and two 5 GB drives would not, though. -- If I try to setup either -d raid1 or -d dup on that configuration, what will I get? Try it for yourself in the space simulator: http://carfax.org.uk/btrfs-usage/ -- Is there any behavior difference between raid1 / dup in that case? If you have multiple disks, I think DUP gets automatically upgraded to RAID-1 (i.e. the different copies on different devices requirement is enforced). So, no. -- Can raid1 ensure that data are always duplicated on different devices AND take advantage of all available space? Depends on the relative sizes of the devices. If your largest device is bigger than the rest put together, then you'll lose some space. Hugo. Regards, Sam Date: Thu, 26 Sep 2013 13:32:33 +0100 From: h...@carfax.org.uk To: miaous...@hotmail.com CC: linux-btrfs@vger.kernel.org Subject: Re: [raidX vs single/dup] On Thu, Sep 26, 2013 at 12:22:49PM +, miaou sami wrote: Hi btrfs guys, could someone explain to me the differences in mkfs.btrfs: - between -d raid0 and -d single In RAID0, data is striped across all the devices, so the first 64k of a file will go on device 1, the next 64k will go on device 2, and so on. With single, files are allocated linearly on one device. (This is assuming smallish files, a filesystem with lots of space. Even with single, files can still end up being scattered around over multiple devices -- but with RAID0, even non-fragmented files are striped) - between -m raid1 and -m dup In both cases, there are two copies of each metadata block. With RAID1, it *requires* the two copies to live on different devices. With DUP, it allows the two copies to live on the same device (e.g. if there's only one device). - between -m raid0 and -m single As for -draid0 and -dsingle, but for metadata instead of data. My understanding is that raidX should be used in case of multi devices and single/dup should be used in case of single device to allow duplication, but it is not 100% clear to me... As btrfs raid concepts are quite different from traditionnal raid, shouldn't we use the words stripped and mirrored instead of raid0/raid1? or even single and duplicated? Then there would be no difference between single/raid0 and duplicated/raid1... But there _are_ differences between them, as explained above. :) I posted a patch a while ago to change the names to something more logical and expressive, but it didn't get merged. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Nothing right in my left brain. Nothing left in --- my right brain. signature.asc Description: Digital signature
Re: [raidX vs single/dup]
On Thu, Sep 26, 2013 at 02:55:38PM +, miaou sami wrote: OK, that's clear. Nice space simulator btw :-) you should add a link somewhere in btrfs wiki... There is one, linked from the first line of the relevant section in the FAQ. Hugo. Thanks Date: Thu, 26 Sep 2013 14:46:05 +0100 From: h...@carfax.org.uk To: miaous...@hotmail.com CC: linux-btrfs@vger.kernel.org Subject: Re: [raidX vs single/dup] On Thu, Sep 26, 2013 at 01:40:57PM +, miaou sami wrote: Thank you, it is quite clear now. I guess that on multi device, raid0 vs single would be a matter of performance vs ease of low level hardware data recovery. The wiki https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices says: When you have drives with differing sizes and want to use the full capacity of each drive, you have to use the single profile for the data blocks. Let's assume the following configuration: 1x10GB disk and 2x5GB disks -- Does it mean I cannot use the full capacity AND have a duplication of my data in the configuration above? (full capacity would be 10GB here) No, that will give you the full usable space. A 20 GB drive and two 5 GB drives would not, though. -- If I try to setup either -d raid1 or -d dup on that configuration, what will I get? Try it for yourself in the space simulator: http://carfax.org.uk/btrfs-usage/ -- Is there any behavior difference between raid1 / dup in that case? If you have multiple disks, I think DUP gets automatically upgraded to RAID-1 (i.e. the different copies on different devices requirement is enforced). So, no. -- Can raid1 ensure that data are always duplicated on different devices AND take advantage of all available space? Depends on the relative sizes of the devices. If your largest device is bigger than the rest put together, then you'll lose some space. Hugo. Regards, Sam Date: Thu, 26 Sep 2013 13:32:33 +0100 From: h...@carfax.org.uk To: miaous...@hotmail.com CC: linux-btrfs@vger.kernel.org Subject: Re: [raidX vs single/dup] On Thu, Sep 26, 2013 at 12:22:49PM +, miaou sami wrote: Hi btrfs guys, could someone explain to me the differences in mkfs.btrfs: - between -d raid0 and -d single In RAID0, data is striped across all the devices, so the first 64k of a file will go on device 1, the next 64k will go on device 2, and so on. With single, files are allocated linearly on one device. (This is assuming smallish files, a filesystem with lots of space. Even with single, files can still end up being scattered around over multiple devices -- but with RAID0, even non-fragmented files are striped) - between -m raid1 and -m dup In both cases, there are two copies of each metadata block. With RAID1, it *requires* the two copies to live on different devices. With DUP, it allows the two copies to live on the same device (e.g. if there's only one device). - between -m raid0 and -m single As for -draid0 and -dsingle, but for metadata instead of data. My understanding is that raidX should be used in case of multi devices and single/dup should be used in case of single device to allow duplication, but it is not 100% clear to me... As btrfs raid concepts are quite different from traditionnal raid, shouldn't we use the words stripped and mirrored instead of raid0/raid1? or even single and duplicated? Then there would be no difference between single/raid0 and duplicated/raid1... But there _are_ differences between them, as explained above. :) I posted a patch a while ago to change the names to something more logical and expressive, but it didn't get merged. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The trouble with you, Ibid, is you think you know everything. --- signature.asc Description: Digital signature
Re: csum questions
On Fri, Sep 27, 2013 at 04:22:16PM +0200, Tom Gundersen wrote: Hi guys, I have some questions about btrfs' handling of invalid csums. For the sake of argument I'm assuming no raid or anything like that (so only one copy exists of every file). When I try to access a file whose csum does not match, btrfs logs an error and refuses access to the file. I have two questions about this: 1) What happens to the file. Will btrfs just leave it alone, or will it be deleted from disk (I seem to remember reading this somewhere, just want to confirm)? It's left there. 2) How may I tell btrfs to ignore all csums and just assume they are all correct? The reason for wanting this is in case the csum is garbled and the file is intact, or the csum is correct and the file is only partially garbled, but may still contain useful data. You can't, right now. There's discussion on IRC about this very point right now. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A clear conscience. Where did you get this taste --- for luxuries, Bernard? signature.asc Description: Digital signature
extlinux and btrfs RAID-1
I'm trying to get a system booting, and I'm having something of a hard time with it. I'd like to check whether anyone's managed to do what I'm attempting, and whether I'm doing something silly, or just need to upgrade something. I've got two disks, /dev/sda and /dev/sdb, each partitioned the same way, with GPTs. The second partition on each is part of a RAID-1 (data and metadata) btrfs, with no compression. # btrfs fi show Label: 'amelia' uuid: cba252b5-af1b-4f31-9f8f-191ef66f777d Total devices 2 FS bytes used 1.03GB devid1 size 275.48GB used 3.04GB path /dev/sda2 devid2 size 275.48GB used 3.03GB path /dev/sdb2 I have the gptmbr.bin from extlinux installed on the boot sector of each device: # cat /usr/lib/syslinux/gptmbr.bin /dev/sda # cat /usr/lib/syslinux/gptmbr.bin /dev/sdb I've attempted to install extlinux from a chroot: # extlinux --install /boot/extlinux This is extlinux 4.05, which claims (on the syslinux website) to support btrfs. When I boot the machine from its disks, I'm being told that extlinux only supports single-disk btrfs. Is this still the case? Or am I just using a version that's far too old? (Looks like there's a v6.01 available). I can't see a list of the limitations and capabilities of syslinux and btrfs on the syslinux website. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 9: Standard Deviation --- signature.asc Description: Digital signature
Re: extlinux and btrfs RAID-1
On Fri, Sep 27, 2013 at 02:12:36PM -0600, Chris Murphy wrote: On Sep 27, 2013, at 1:36 PM, Hugo Mills h...@carfax.org.uk wrote: When I boot the machine from its disks, I'm being told that extlinux only supports single-disk btrfs. Is this still the case? I'm pretty sure the answer is yes. The last time I looked not that long ago the multiple device scenario wasn't supported. Dammit. Thanks for the info. At least this means I don't have to struggle with the syslinux error I've been getting... Back to grub, then. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- It used to take a lot of talent and a certain type of --- upbringing to be perfectly polite and have filthy manners at the same time. Now all it needs is a computer. signature.asc Description: Digital signature
Re: extlinux and btrfs RAID-1
On Fri, Sep 27, 2013 at 03:04:22PM -0600, Chris Murphy wrote: On Sep 27, 2013, at 2:44 PM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Sep 27, 2013 at 02:12:36PM -0600, Chris Murphy wrote: On Sep 27, 2013, at 1:36 PM, Hugo Mills h...@carfax.org.uk wrote: When I boot the machine from its disks, I'm being told that extlinux only supports single-disk btrfs. Is this still the case? I'm pretty sure the answer is yes. The last time I looked not that long ago the multiple device scenario wasn't supported. Dammit. Thanks for the info. At least this means I don't have to struggle with the syslinux error I've been getting... Back to grub, then. I'm seeing in changelogs that 4.0 brought btrfs support, 4.06 brought subvolume support. Nothing in changelogs for versions 5 and 6 so far inclusive. And interestingly enough, Fedora's koji only has 4.05 current for F20 and rawhide. Yeah, Debian have 4.05 in everything except experimental (which is 6.02~pre16). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'll take your bet, but make it ten thousand francs. I'm only --- a _poor_ corrupt official. signature.asc Description: Digital signature
Re: Questions regarding logging upon fsync in btrfs
On Sun, Sep 29, 2013 at 01:46:23AM +0200, Aastha Mehta wrote: I am using linux kernel 3.1.10-1.16, just to let you know. Not that it invalidates the questions below, but that's a really old kernel. You should update to something recent (3.11, or 3.12-rc2) as soon as possible. There are major problems in 3.1 (and most of the subsequent kernels) that have been fixed in 3.11. Of course, there are still major problems in 3.11 that haven't been fixed yet, but we don't know about very many of those. :) (And when we do, we'll be recommending that you upgrade to whatever has them fixed...) Hugo. Thanks On 29 September 2013 01:35, Aastha Mehta aasth...@gmail.com wrote: Hi, I have few questions regarding logging triggered by calling fsync in BTRFS: 1. If I understand correctly, fsync will call to log entire inode in the log tree. Does this mean that the data extents are also logged into the log tree? Are they copied into the log tree, or just referenced? Are they copied into the subvolume's extent tree again upon replay? 2. During replay, when the extents are added into the extent allocation tree, do they acquire the physical extent number during replay? Does they physical extent allocated to the data in the log tree differ from that in the subvolume? 3. I see there is a mount option of notreelog available. After disabling tree logging, does fsync still lead to flushing of buffers to the disk directly? 4. Is it possible to selectively identify certain files in the log tree and flush them to disk directly, without waiting for the replay to do it? Thanks -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Diablo-D3 My code is never released, it escapes from the --- git repo and kills a few beta testers on the way out. signature.asc Description: Digital signature
Re: btrfs raid0
On Fri, Oct 04, 2013 at 04:15:22PM +, ray clancy wrote: How can I verify the read speed of a btrfs raid0 pair in archlinux.? I assume raid0 means striped activity in a paralleled mode at lease similar to raid0 in mdadm. How can I measure the btrfs read speed since it is copy-on-write which is not the norm in mdadm raid0.? Testing read speed... you're not writing, so there's no copy-on-write involved there. Just test reading the way you would for anything else. Perhaps I cannot use the same approach in btrfs to determine the performance. Secondly, I see a methodology for raid10 using the commandmkfs.btrfs -m raid10 -d raid10 /dev/sda/dev/sdb /dev/sdc /dev/sdd... Can I apply the parameters above for -m and -d for raid0? I'd certainly recommend it for testing RAID-0. :) Actually, a slightly more realistic test would be to use RAID-0 for data and RAID-1 for metadata, because that's what most [default] users of the FS will end up with. If using raid0 for two devices and add another device, is it striped as raid0 also or does the system change it to raid1. No, it'll remain as RAID-0. If you rebalance, then the data will get striped across three devices instead of two. What happens to the speed of the system when a new device is added? Is it increased ? Assuming the FS is reading from all the devices, yes. Much I have at hand for mdadm software raido and it doubles the read speed. What parallel exists in raid0 btrfs? Or is it completely off base to expect a speed increase? In theory, you should be able to get the sum of the bandwidths of all the devices (assuming sequential streaming reads). We don't have any good benchmarks of this kind of thing, so when you do your tests, please (a) make sure you do a decent experimental design, and (b) publish the results. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I always felt that as a C programmer, I --- was becoming typecast. signature.asc Description: Digital signature
Re: Some questions after devices addition to existing raid 1 btrfs filesystem
On Mon, Oct 07, 2013 at 01:45:29PM +0200, Laurent Humblet wrote: I have added 2x2Tb to my existing 2x2Tb raid 1 btrfs filesystem and then ran a balance: # btrfs filesystem show Total devices 4 FS bytes used 1.74TB devid3 size 1.82TB used 0.00 path /dev/sdd devid4 size 1.82TB used 0.00 path /dev/sde devid2 size 1.82TB used 1.75TB path /dev/sdc devid1 size 1.82TB used 1.75TB path /dev/sdb # btrfs filesystem balance btrfs_root/ # btrfs filesystem show Total devices 4 FS bytes used 1.74TB devid3 size 1.82TB used 892.00GB path /dev/sdd devid4 size 1.82TB used 892.00GB path /dev/sde devid2 size 1.82TB used 891.03GB path /dev/sdc devid1 size 1.82TB used 891.04GB path /dev/sdb It took 59 hours to complete the balance. I checked on a couple of files and all seems fine but I have some questions: - is there some kind of 'overall filesystem health/integrity check' that I should do on the filesystem now that the balance is done? See btrfs scrub start - also, I ran the command while some of the btrfs subvolumes were mounted (as well as the btrfs_root/ of course), does this impact on the balance job? No. - the mounted btrfs devices were mounted using -o space_cache,inode_cache but the btrfs_root/ was not, also, does this impact on the balance job? No. - about those options, a few months ago, I oftent had btrfs-cache-1/btrfs-endio-met processes taking some cpu/hd time. I was advised to mount -o space_cache,inode_cache, which seems to have quiet the processes down. Are those options still necessary now? No, once you've mounted with them once (and had the caches rebuilt) they're not necessary to use any more. - as the job took 60+ hours but the CPU rarely went above 10%, the computer seemed still usable. I left it do its job of course but could I have accessed or write anything on the subvolumes while the balance was running and if yes, would this have any impact on the filesystem? Absolutely, yes, you could have done. It would probably be slower than normal to access the files while the balance is happening, because the balance is using up I/O bandwidth, but other than that there should be no impact. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- argc, argv, argh! --- signature.asc Description: Digital signature
Re: [PATCH v3 10/12] Btrfs-progs: add '--block-size' option to control print result
On Wed, Oct 09, 2013 at 12:54:03AM +0800, Shilong Wang wrote: Hi David, 2013/10/8 David Sterba dste...@suse.cz: On Mon, Oct 07, 2013 at 03:21:46PM +0800, Wang Shilong wrote: You can use it like: btrfs qgroup show --block-size=m mnt Here, block size supports k/K/m/M/g/G/t/T/p/P/e/E. k = SI prefix, kilo K = ? (IEEE prefix kibi?) m = SI prefix, milli M = SI prefix, mega g = SI unit, grams G = SI prefix, giga t = ? T = SI prefix, tera p = SI prefix, pico P = SI prefix, peta e = ? E = SI prefix, exa Some confusion here, I think. :) There is no distinction between the 1000 and 1024 based prefixes, also no way to get the raw values in bytes. I don't have a suggestion how to do that, merely letting you know that this could go separately (this and the -h patch, the rest shall be integrated). I implement this like the command 'du'. In default, we print result in bytes. And block size don't give a byte unit implicitly. Aslo i don't know why we need to distinct 1000 and 1024, i don't have any ideas about this. Because when you have a terabyte of data, the difference between the two is 10%. If you're putting in this kind of infrastructure, it's not much of an addition to report in either SI decimal or IEEE binary scales. Also, the numbers in the table should be aligned to the right: Yes, this should be fixed. Thanks, Wang $ btrfs qgroup show -h -p /mnt/ qgroupid rfer excl parent -- 0/5 900.00KiB 900.00KiB --- 0/267688.00KiB 12.00KiB 1/5 0/268684.00KiB 8.00KiB 1/5 0/2696.71GiB 4.00KiB 1/1 0/2776.71GiB 4.00KiB 1/1 0/27839.74GiB 39.74GiB 1/2 1/1 6.71GiB 6.71GiB --- 1/2 39.74GiB 39.74GiB --- 1/5 696.00KiB 696.00KiB --- Note that the SI mandate a space between the value and the unit. Note also, for future reference, that SI use k for 10^3, whereas IEEE use Ki for 2^10. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'll take your bet, but make it ten thousand francs. I'm only --- a _poor_ corrupt official. signature.asc Description: Digital signature
Re: [PATCH v3 10/12] Btrfs-progs: add '--block-size' option to control print result
On Tue, Oct 08, 2013 at 06:01:57PM +0100, Hugo Mills wrote: On Wed, Oct 09, 2013 at 12:54:03AM +0800, Shilong Wang wrote: Hi David, 2013/10/8 David Sterba dste...@suse.cz: On Mon, Oct 07, 2013 at 03:21:46PM +0800, Wang Shilong wrote: You can use it like: btrfs qgroup show --block-size=m mnt Here, block size supports k/K/m/M/g/G/t/T/p/P/e/E. k = SI prefix, kilo K = ? (IEEE prefix kibi?) ... or SI unit, kelvin t = ? SI-accepted unit, tonne e = ? SI-accepted unit, charge on the electron Hugo. :) -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt signature.asc Description: Digital signature
warn_slowpath in clean_tree_block
I've just started playing with Btrfs, and I'm getting a log full of kernel warnings that look something like this: Feb 16 09:02:17 vlad kernel: [ cut here ] Feb 16 09:02:17 vlad kernel: WARNING: at fs/btrfs/disk-io.c:815 clean_tree_block+0x9d/0xbb [btrfs]() Feb 16 09:02:17 vlad kernel: Hardware name: System Product Name Feb 16 09:02:17 vlad kernel: Modules linked in: btrfs zlib_deflate tcp_diag inet_diag kqemu cpufreq_userspace ipv6 nfsd nfs lockd nfs_acl auth_rpcgss sunrpc af_packet bridge stp llc xfs exportfs it87 hwmon_vid powernow_k8 sbp2 ieee1394 ide_generic ide_gd_mod ide_cd_mod pcspkr evdev k8temp hwmon i2c_viapro i2c_core button dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod usbhid usb_storage libusual sg sr_mod cdrom via82cxxx floppy via_rhine mii ehci_hcd uhci_hcd usbcore pata_via ide_pci_generic ide_core sd_mod thermal processor fan unix Feb 16 09:02:17 vlad kernel: Pid: 24129, comm: btrfs-endio-wri Tainted: G W 2.6.29-rc4 #1 Feb 16 09:02:17 vlad kernel: Call Trace: Feb 16 09:02:17 vlad kernel: [80228d7d] warn_slowpath+0xd8/0x111 Feb 16 09:02:17 vlad kernel: [80251d9e] __alloc_pages_internal+0xd2/0x3ec Feb 16 09:02:17 vlad kernel: [8024d55d] add_to_page_cache_locked+0x52/0x9e Feb 16 09:02:17 vlad kernel: [8024d5e9] add_to_page_cache_lru+0x40/0x58 Feb 16 09:02:17 vlad kernel: [8024dbd0] find_or_create_page+0x62/0x88 Feb 16 09:02:17 vlad kernel: [80313244] rb_insert_color+0xba/0xe2 Feb 16 09:02:17 vlad kernel: [a03f992a] alloc_extent_buffer+0x268/0x2ec [btrfs] Feb 16 09:02:17 vlad kernel: [a03e1b18] clean_tree_block+0x9d/0xbb [btrfs] Feb 16 09:02:17 vlad kernel: [a03d5eaf] btrfs_init_new_buffer+0x99/0xf3 [btrfs] Feb 16 09:02:17 vlad kernel: [a03d849e] btrfs_alloc_free_block+0x83/0x8c [btrfs] Feb 16 09:02:17 vlad kernel: [a03cda8b] split_leaf+0x159/0xa0a [btrfs] Feb 16 09:02:17 vlad kernel: [a03f0de5] btrfs_item_offset+0xb3/0xbe [btrfs] Feb 16 09:02:17 vlad kernel: [a03c96bc] leaf_space_used+0xb5/0xe8 [btrfs] Feb 16 09:02:17 vlad kernel: [a03d0ebd] btrfs_search_slot+0x917/0x99b [btrfs] Feb 16 09:02:17 vlad kernel: [a03ef06a] btrfs_drop_extents+0xa75/0xab3 [btrfs] Feb 16 09:02:17 vlad kernel: [a03d14ee] btrfs_insert_empty_items+0x7f/0x49d [btrfs] Feb 16 09:02:17 vlad kernel: [a03e6f0f] insert_reserved_file_extent+0xd9/0x230 [btrfs] Feb 16 09:02:17 vlad kernel: [a03fa44f] set_extent_bit+0x220/0x277 [btrfs] Feb 16 09:02:17 vlad kernel: [a03fadce] lock_extent+0x46/0x95 [btrfs] Feb 16 09:02:17 vlad kernel: [a03e885c] btrfs_finish_ordered_io+0xfe/0x198 [btrfs] Feb 16 09:02:17 vlad kernel: [a03fb4ff] end_bio_extent_writepage+0xa9/0x1b1 [btrfs] Feb 16 09:02:17 vlad kernel: [a04023e4] worker_loop+0x5f/0x15e [btrfs] Feb 16 09:02:17 vlad kernel: [a0402385] worker_loop+0x0/0x15e [btrfs] Feb 16 09:02:17 vlad kernel: [a0402385] worker_loop+0x0/0x15e [btrfs] Feb 16 09:02:17 vlad kernel: [80238269] kthread+0x47/0x73 Feb 16 09:02:17 vlad kernel: [8020c03a] child_rip+0xa/0x20 Feb 16 09:02:17 vlad kernel: [80238222] kthread+0x0/0x73 Feb 16 09:02:17 vlad kernel: [8020c030] child_rip+0x0/0x20 Feb 16 09:02:17 vlad kernel: ---[ end trace a315082d5647b979 ]--- They're not all identical -- there are bits in the middle of the trace that change. They tend to arrive in groups of 4-8 warnings very close together, separated by 15-20 seconds without a warning. The workload was encoding a video from another filesystem, onto the Btrfs filesystem. It's quiet when there's nothing accessing the filesystem. The filesystem is 19GiB in size, residing on LVM-on-RAID-1 on a 2.6.29-rc4 kernel. It was created on 2.6.29-rc3, 20GiB in size using btrfs tools 0.18-ge3b0f66, and shrunk online to its current size. I haven't found any similar reports on the mailing list, which means either I've got something unusual, or something so blindingly expected that nobody's bothered to mention it. I suspect the latter, but I'm reporting it in case it's the former. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You got very nice eyes, Deedee. Never noticed them --- before. They real? signature.asc Description: Digital signature
btrfs: warn_slowpath in clean_tree_block and others
This is essentially a repost of a mail I made last week, to which I didn't get a reply. I'm getting huge numbers of kernel warnings whilst using btrfs. They're all warn_slowpath, and all seem to be in fs/btrfs/disk-io.c. I've included one typical example at the end of this mail. Kernel versions are 2.6.29-rc2, -rc4 and -rc6. If I do lots of writes to my btrfs filesystem (e.g. video encoding), I end up with a syslog in the tens-of-megabytes range. This makes logcheck an unhappy bunny... I don't know if this behaviour is expected, and everyone using btrfs simply puts up with it for now, or if it's something unusual that needs investigating. On the chance that it's the latter, I'm reporting it here. Hugo. Feb 23 21:45:42 vlad kernel: [ cut here ] Feb 23 21:45:42 vlad kernel: WARNING: at fs/btrfs/disk-io.c:815 clean_tree_block+0x9d/0xbb [btrfs]() Feb 23 21:45:42 vlad kernel: Hardware name: System Product Name Feb 23 21:45:42 vlad kernel: Modules linked in: tun ext3 jbd btrfs zlib_deflate tcp_diag inet_diag kqemu cpufreq_userspace ipv6 nfsd nfs lockd nfs_acl auth_rpcgss sunrpc af_packet bridge stp llc xfs exportfs it87 hwmon_vid powernow_k8 sbp2 ieee1394 ide_generic ide_gd_mod ide_cd_mod pcspkr evdev k8temp hwmon i2c_viapro i2c_core button dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod usbhid usb_storage libusual sg sr_mod cdrom via82cxxx floppy via_rhine mii ehci_hcd uhci_hcd usbcore pata_via ide_pci_generic ide_core sd_mod thermal processor fan unix Feb 23 21:45:42 vlad kernel: Pid: 27034, comm: hdparm Tainted: GW 2.6.29-rc4 #1 Feb 23 21:45:42 vlad kernel: Call Trace: Feb 23 21:45:42 vlad kernel: [80228d7d] warn_slowpath+0xd8/0x111 Feb 23 21:45:42 vlad kernel: [80312f11] radix_tree_insert+0xd7/0x19f Feb 23 21:45:42 vlad kernel: [8024d55d] add_to_page_cache_locked+0x52/0x9e Feb 23 21:45:42 vlad kernel: [8024d5e9] add_to_page_cache_lru+0x40/0x58 Feb 23 21:45:42 vlad kernel: [8024dbd0] find_or_create_page+0x62/0x88 Feb 23 21:45:42 vlad kernel: [a03f992a] alloc_extent_buffer+0x268/0x2ec [btrfs] Feb 23 21:45:42 vlad kernel: [a03e1b18] clean_tree_block+0x9d/0xbb [btrfs] Feb 23 21:45:42 vlad kernel: [a03d5eaf] btrfs_init_new_buffer+0x99/0xf3 [btrfs] Feb 23 21:45:42 vlad kernel: [a03d849e] btrfs_alloc_free_block+0x83/0x8c [btrfs] Feb 23 21:45:42 vlad kernel: [a03cb2f8] __btrfs_cow_block+0x1ff/0x87e [btrfs] Feb 23 21:45:42 vlad kernel: [a03cc125] btrfs_cow_block+0x1e7/0x1f6 [btrfs] Feb 23 21:45:42 vlad kernel: [80251d9e] __alloc_pages_internal+0xd2/0x3ec Feb 23 21:45:42 vlad kernel: [a03d0915] btrfs_search_slot+0x36f/0x99b [btrfs] Feb 23 21:45:42 vlad kernel: [a03d14ee] btrfs_insert_empty_items+0x7f/0x49d [btrfs] Feb 23 21:45:42 vlad kernel: [a03d825d] __btrfs_alloc_reserved_extent+0x19f/0x2bb [btrfs] Feb 23 21:45:42 vlad kernel: [a03d83f0] btrfs_alloc_extent+0x77/0xa2 [btrfs] Feb 23 21:45:42 vlad kernel: [a03d847f] btrfs_alloc_free_block+0x64/0x8c [btrfs] Feb 23 21:45:42 vlad kernel: [a03cb2f8] __btrfs_cow_block+0x1ff/0x87e [btrfs] Feb 23 21:45:42 vlad kernel: [a03d7532] finish_current_insert+0x514/0x528 [btrfs] Feb 23 21:45:42 vlad kernel: [a03d7bf9] del_pending_extents+0xa5/0x33d [btrfs] Feb 23 21:45:42 vlad kernel: [a03cc125] btrfs_cow_block+0x1e7/0x1f6 [btrfs] Feb 23 21:45:42 vlad kernel: [a03e436d] btrfs_commit_tree_roots+0x53/0x1ba [btrfs] Feb 23 21:45:42 vlad kernel: [80403a3e] schedule_timeout+0xa1/0xbc Feb 23 21:45:42 vlad kernel: [a03e55dd] btrfs_commit_transaction+0x322/0x6e5 [btrfs] Feb 23 21:45:42 vlad kernel: [802385fb] autoremove_wake_function+0x0/0x2e Feb 23 21:45:42 vlad kernel: [a03e4809] join_transaction+0x129/0x147 [btrfs] Feb 23 21:45:42 vlad kernel: [a03c8788] btrfs_sync_fs+0x70/0x78 [btrfs] Feb 23 21:45:42 vlad kernel: [8026f332] sync_filesystems+0xa8/0xde Feb 23 21:45:42 vlad kernel: [80287256] do_sync+0x25/0x50 Feb 23 21:45:42 vlad kernel: [8028728f] sys_sync+0xe/0x13 Feb 23 21:45:42 vlad kernel: [8020b25b] system_call_fastpath+0x16/0x1b Feb 23 21:45:42 vlad kernel: ---[ end trace a315082d564863a6 ]--- -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: btrfs: warn_slowpath in clean_tree_block and others
On Wed, Feb 25, 2009 at 11:05:58AM -0500, Lee Trager wrote: But what are you doing to the filesystem when it crashes? How did you mount it? In my case, it's mounted with this fstab entry: /dev/media/scratch /media/vlad/video/video btrfs noatime,nosuid,nodev 0 0 and I can trigger hundreds (literally) of these backtraces with a single touch /media/vlad/video/video/foo. If I encode a video to the FS, the backtraces come in bursts at intervals of, say, 20 seconds (it's not perfectly regular). Hugo. On Wed, Feb 25, 2009 at 08:03:01AM -0600, Mitch Harder (aka DontPanic) wrote: I've been creating a local git repository of full btrfs-unstable sources. I'll create a new branch off the master branch, and apply the patch supplied in the Feb. 11 message to the M/L. I then create a kernel module based on the results in /fs/btrfs/ I have also tried replicating the experimental branch, and merging the patch into that branch, but I get the same results. On Wed, Feb 25, 2009 at 12:26 AM, Lee Trager l...@cs.drexel.edu wrote: Mitch, I haven't seen any problems using BTRFS and my patch on 2.6.28 or 2.6.27, what are you doing to cause this error? Are you using the latest sources from btrfs-unstable? Lee Mitch Harder (aka DontPanic) wrote: I have also been getting similar warnings filling up my logs. However, in my case, I have been experimenting with back-porting btrfs to a 2.6.28 kernel. ?So I've been waiting for the back-porting efforts to get a little further along. But I thought I'd respond in case this information helps. Here's an example of the warnings I've been seeing: [80577.151167] [ cut here ] [80577.151169] WARNING: at /var/tmp/portage/sys-fs/btrfs-9998/work/btrfs-9998/disk-io.c:860 clean_tree_block+0xa4/0xb0 [btrfs]() [80577.151172] Modules linked in: btrfs snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ipv6 ppdev snd_intel8x0 snd_ac97_codec parport_pc nvidia(P) ac97_bus snd_pcm snd_timer ohci_hcd ssb shpchp pci_hotplug pcmcia i2c_nforce2 snd forcedeth sr_mod pcspkr parport i2c_core snd_page_alloc nvidia_agp sl811_hcd pcmcia_core uhci_hcd ehci_hcd [80577.151190] Pid: 11503, comm: cp Tainted: P ? ? ? ?W ?2.6.28-sabayon-r10 #1 [80577.151192] Call Trace: [80577.151195] ?[c011e77f] warn_on_slowpath+0x5f/0x90 [80577.151203] ?[c043c427] rb_insert_color+0x77/0xe0 [80577.151221] ?[f8c28e9e] alloc_extent_buffer+0x1fe/0x300 [btrfs] [80577.151238] ?[f8c08d54] clean_tree_block+0xa4/0xb0 [btrfs] [80577.151253] ?[f8bf665d] btrfs_init_new_buffer+0x7d/0x130 [btrfs] [80577.151269] ?[f8bfb6f4] btrfs_alloc_free_block+0x104/0x110 [btrfs] [80577.151285] ?[f8bef3da] __btrfs_cow_block+0x22a/0x8b0 [btrfs] [80577.151300] ?[f8bed212] generic_bin_search+0x162/0x1c0 [btrfs] [80577.151315] ?[f8bf00e6] btrfs_cow_block+0x156/0x200 [btrfs] [80577.151330] ?[f8bf3267] btrfs_search_slot+0x1a7/0x910 [btrfs] [80577.151333] ?[c01230e7] irq_exit+0x27/0x60 [80577.151336] ?[c01052cb] do_IRQ+0x6b/0x80 [80577.151354] ?[f8c24a55] read_extent_buffer+0xd5/0x170 [btrfs] [80577.151369] ?[f8bf3f7d] btrfs_insert_empty_items+0x6d/0x410 [btrfs] [80577.151385] ?[f8bf8f4f] btrfs_find_block_group+0xff/0x1a0 [btrfs] [80577.151402] ?[f8c0fa1d] btrfs_new_inode+0x18d/0x360 [btrfs] [80577.151420] ?[f8c135a9] btrfs_create+0x189/0x2a0 [btrfs] [80577.151423] ?[c04162d9] security_capable+0x9/0x10 [80577.151427] ?[c0197f3d] vfs_create+0xcd/0x160 [80577.151430] ?[c019ad6f] do_filp_open+0x5af/0x7d0 [80577.151433] ?[c01932e9] cp_new_stat64+0xf9/0x110 [80577.151436] ?[c018e40e] do_sys_open+0x4e/0xe0 [80577.151439] ?[c018e51c] sys_open+0x2c/0x40 [80577.151442] ?[c0103165] sysenter_do_call+0x12/0x21 [80577.151444] ---[ end trace 79cdc48bc88dedf7 ]--- On Tue, Feb 24, 2009 at 5:02 PM, Hugo Mills hugo-l...@carfax.org.uk wrote: ? This is essentially a repost of a mail I made last week, to which I didn't get a reply. ? I'm getting huge numbers of kernel warnings whilst using btrfs. They're all warn_slowpath, and all seem to be in fs/btrfs/disk-io.c. I've included one typical example at the end of this mail. ? Kernel versions are 2.6.29-rc2, -rc4 and -rc6. ? If I do lots of writes to my btrfs filesystem (e.g. video encoding), I end up with a syslog in the tens-of-megabytes range. This makes logcheck an unhappy bunny... ? I don't know if this behaviour is expected, and everyone using btrfs simply puts up with it for now, or if it's something unusual that needs investigating. On the chance that it's the latter, I'm reporting it here. ? Hugo. Feb 23 21:45:42 vlad kernel: [ cut here ] Feb 23 21:45:42 vlad kernel: WARNING: at fs/btrfs/disk-io.c:815 clean_tree_block+0x9d/0xbb [btrfs]() Feb 23
Entirely unexpected ENOSPC?
[dm_mod] Mar 4 01:55:52 vlad kernel: [80284afa] ? generic_sync_sb_inodes+0x287/0x3e4 Mar 4 01:55:52 vlad kernel: [80284dbe] ? writeback_inodes+0x68/0xa1 Mar 4 01:55:52 vlad kernel: [80252e10] ? wb_kupdate+0x8b/0xfd Mar 4 01:55:52 vlad kernel: [8025374b] ? pdflush+0x0/0x1b5 Mar 4 01:55:52 vlad kernel: [8025374b] ? pdflush+0x0/0x1b5 Mar 4 01:55:52 vlad kernel: [80253869] ? pdflush+0x11e/0x1b5 Mar 4 01:55:52 vlad kernel: [80252d85] ? wb_kupdate+0x0/0xfd Mar 4 01:55:52 vlad kernel: [802383f1] ? kthread+0x47/0x73 Mar 4 01:55:52 vlad kernel: [8020c07a] ? child_rip+0xa/0x20 Mar 4 01:55:52 vlad kernel: [802383aa] ? kthread+0x0/0x73 Mar 4 01:55:52 vlad kernel: [8020c070] ? child_rip+0x0/0x20 Mar 4 01:55:52 vlad kernel: Code: 8b 83 b8 00 00 00 48 8d 98 48 ff ff ff 48 8b 83 b8 00 00 00 0f 18 08 48 8d 83 b8 00 00 00 48 39 c5 75 b0 4c 89 e7 e8 63 42 fe df 0f 0b eb fe 48 83 c4 38 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 Mar 4 01:55:52 vlad kernel: RIP [a0256b6b] __btrfs_reserve_extent+0x296/0x2ab [btrfs] Mar 4 01:55:52 vlad kernel: RSP 88003ea618d0 Mar 4 01:55:52 vlad kernel: ---[ end trace eb8a7132a207a474 ]--- Now, to my untrained eye, this looks like it might be an ENOSPC problem, and thus wouldn't be entirely unexpected, except for one thing: h...@vlad:~ $ df -h FilesystemSize Used Avail Use% Mounted on [...] /dev/mapper/media-scratch 41G 17G 25G 42% /media/vlad/video/video The filesystem was nowhere near full, and I wasn't expecting it to become anywhere near full. The only thing that writes to the filesystem is deliberately coded to leave several gigabytes of space free. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Nothing wrong with being written in Perl... Some of my best --- friends are written in Perl. signature.asc Description: Digital signature
Re: Entirely unexpected ENOSPC?
On Wed, Mar 04, 2009 at 01:50:53PM -0500, Josef Bacik wrote: On Wed, Mar 04, 2009 at 06:06:19PM +, Hugo Mills wrote: Last night, this event jammed up a good chunk of my server: Mar 4 01:51:36 vlad kernel: btrfs searching for 1716224 bytes, num_bytes 1716224, loop 2, allowed_alloc 1 Mar 4 01:51:36 vlad kernel: btrfs searching for 860160 bytes, num_bytes 860160, loop 2, allowed_alloc 1 [lots of this...] Mar 4 01:55:52 vlad kernel: btrfs searching for 4096 bytes, num_bytes 4096, loop 2, allowed_alloc 1 Mar 4 01:55:52 vlad kernel: btrfs allocation failed flags 1, wanted 4096 Mar 4 01:55:52 vlad kernel: space_info has 0 free, is full Mar 4 01:55:52 vlad kernel: block group 12582912 has 8388608 bytes, 8388608 used 0 pinned 0 reserved Mar 4 01:55:52 vlad kernel: 0 blocks of free space at or bigger than bytes is Mar 4 01:55:52 vlad kernel: block group 1103101952 has 1073741824 bytes, 1073741824 used 0 pinned 0 reserved Mar 4 01:55:52 vlad kernel: 0 blocks of free space at or bigger than bytes is [30 more lines of this] So yeah thats expected, you ran out of space. The key thing is this Mar 4 01:55:52 vlad kernel: space_info has 0 free, is full If space_info has 0 free and is full, then there is no space to allocate for it and its completely used. I'd recommend switching to the -rc7 kernel since that has things in place to keep this from happening as often. Thanks, I'll do that. However, what's confusing me is that the filesystem was reported as less than half full (17/41GiB used) at the time that it decided it had no space. Is there any likely explanation for that behaviour? I've used btrfsctl to resize it online several times: shrink by 1GiB, then enlarge by 12, 10, 10GiB. Might that have been a factor? Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- How do you become King? You stand in the marketplace and --- announce you're going to tax everyone. If you get out alive, you're King. signature.asc Description: Digital signature
Online resize vs ENOSPC
After an online resize, the filesystem reports its new size, but still runs out of space at the old size: Mar 9 08:12:59 vlad kernel: no space left, need 4096, 380928 delalloc bytes, 51509866496 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use51510247424 total [...] Mar 9 08:14:21 vlad kernel: no space left, need 4096, 0 delalloc bytes, 51510247424 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use51510247424 total h...@vlad:~ $ df -h FilesystemSize Used Avail Use% Mounted on [...] /dev/mapper/media-scratch 70G 48G 23G 68% /media/vlad/video/video This was online resized from 50G to 70G, using: $ sudo lvresize media/scratch -L 70G $ sudo btrfsctl -r 70G /media/vlad/video/video Version numbers: $ btrfsctl [...] Btrfs v0.18-ge3b0f66 $ uname -a Linux vlad 2.6.29-rc7 #1 Fri Mar 6 23:32:13 GMT 2009 x86_64 GNU/Linux Unmounting and remounting the filesystem seems to make the new space available for use again. This is the second time I've had this happen to me now, so it seems to be more-or-less reproducible, although I haven't deliberately tried to trigger the behaviour yet. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Klytus! Are your men on the right pills? Maybe you should --- execute their trainer! signature.asc Description: Digital signature
Re: Entirely unexpected ENOSPC?
On Mon, Mar 09, 2009 at 07:08:16AM -0600, Yien Zheng wrote: At this point I'm wondering if this is a anomaly or if it has anything to do with using an SSD. It seems the pre-2.7.29-rc7 code had a hard stop at 85%. But the recent patch doesn't seem to have solve the issue for me. Is there another issue that makes btrfs want to reserve 2G free? I see another email with someone growing their filesystem from 48G to 70G because they ran out of space on their 50G disk, which should still have 2G free. Not quite -- I was some 5G free on a 50G filesystem, without errors. I expanded the filesystem online to 70G because I knew I would run out within the next few hours. Despite the expansion, it still ran out at (just short of) 50G. Unless you've resized your filesystem online, I think we're seeing different problems. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Do not meddle in the affairs of system administrators, for --- they are subtle, and quick to anger. signature.asc Description: Digital signature
Re: Online resize vs ENOSPC
On Mon, Mar 09, 2009 at 10:31:41AM +, Hugo Mills wrote: After an online resize, the filesystem reports its new size, but still runs out of space at the old size: [...] Unmounting and remounting the filesystem seems to make the new space available for use again. This is the second time I've had this happen to me now, so it seems to be more-or-less reproducible, although I haven't deliberately tried to trigger the behaviour yet. Just to confirm, I can indeed reproduce it trivially: $ sudo lvcreate scratch -n testresize -L 5G $ sudo mkfs.btrfs /dev/scratch/testresize $ sudo mount /dev/scratch/testresize /mnt $ sudo chmod ug+w /mnt $ sudo chown hrm. /mnt $ cd /mnt $ dd if=/dev/zero of=foo.txt bs=1M count=4096 $ sudo lvextend scratch/testresize -L 9G $ sudo btrfsctl -r 9G /mnt $ dd if=/dev/zero of=foo2.txt bs=1M count=4096 and I get an out-of-space error within a few hundred blocks. $ cd .. $ sudo umount /mnt $ sudo mount /dev/scratch/testresize /mnt $ cd /mnt $ dd if=/dev/zero of=foo2.txt bs=1M count=4096 and then I can write the full 4G of data. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There are three mistaiks in this sentance. --- signature.asc Description: Digital signature
Problem with renaming devices
There seems to be some issue over changing the names of the device that a btrfs filesystem lives on: # lvcreate scratch -n fstest -L 2G Logical volume fstest created # mkfs -t btrfs /dev/scratch/fstest WARNING! - Btrfs v0.18-ge3b0f66 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/scratch/fs1 nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB Btrfs v0.18-ge3b0f66 # mount /dev/scratch/fstest /mnt # umount /mnt # lvrename scratch fstest derek Renamed fstest to derek in volume group scratch # mount /dev/scratch/derek /mnt mount: /dev/mapper/scratch-derek: can't read superblock # lvrename scratch derek fstest Renamed derek to fstest in volume group scratch # mount /dev/scratch/fstest /mnt [success] The rename works properly on a completely virgin filesystem, but not on one that's been mounted and unmounted (as above). Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Happiness is mandatory. Are you happy? --- signature.asc Description: Digital signature
Re: ENOSPC at 94% full -- and causing BUGs elsewhere?
On Sun, Oct 04, 2009 at 08:06:30AM -0400, Chris Mason wrote: On Sat, Oct 03, 2009 at 05:55:32PM -0400, Josef Bacik wrote: On Sat, Oct 03, 2009 at 01:21:09PM +0100, Hugo Mills wrote: I've just had the following on my home server. I believe that it's btrfs that's responsible, as the machine wasn't doing much other than reading/writing on a btrfs filesystem. The process that was doing so is now stuck in D+ state, and can't be killed. The timing of the oops at the end is also suggestive of being involved in the same incident. This is the only btrfs filesystem on the machine. Patches have gone to Linus to fix the enospc problems. You can try running the enospc branch of Chris's git tree and it should behave better for you. Thanks, The right tree for this is the master branch of btrfs-unstable for 2.6.31. Thanks, Josef and Chris. I've now found the time to check out and build the btrfs-unstable tree, and it is indeed handling the ENOSPC condition much more cleanly. However, it seems to have got into a position where I have lots of free space reported by df (over 10% of the size of the volume -- 185 GiB free of 1474 GiB total), but still refuses to write anything to the filesystem. Do you have any suggestions for what I could try? The original ENOSPC error I reported above happened at approximately 85/1370 GiB free; I then added 100 GiB more space online, had another failure (same kernel: 2.6.31 mainline), and then rebooted into master from btrfs-unstable. Just for the record, I'm now using this kernel: Linux vlad 2.6.31-47417-gac6889c #1 Sun Oct 11 14:27:06 BST 2009 x86_64 GNU/Linux Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'll take your bet, but make it ten thousand francs. I'm only --- a _poor_ corrupt official. signature.asc Description: Digital signature
Re: ENOSPC at 94% full -- and causing BUGs elsewhere?
On Tue, Oct 13, 2009 at 10:58:12PM +0800, Yan, Zheng wrote: On Tue, Oct 13, 2009 at 10:50 PM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Tue, Oct 13, 2009 at 06:31:45AM -0400, Chris Mason wrote: On Mon, Oct 12, 2009 at 03:09:35PM +0100, Hugo Mills wrote: On Sun, Oct 04, 2009 at 08:06:30AM -0400, Chris Mason wrote: On Sat, Oct 03, 2009 at 05:55:32PM -0400, Josef Bacik wrote: On Sat, Oct 03, 2009 at 01:21:09PM +0100, Hugo Mills wrote: I've just had the following on my home server. I believe that it's btrfs that's responsible, as the machine wasn't doing much other than reading/writing on a btrfs filesystem. The process that was doing so is now stuck in D+ state, and can't be killed. The timing of the oops at the end is also suggestive of being involved in the same incident. This is the only btrfs filesystem on the machine. Patches have gone to Linus to fix the enospc problems. You can try running the enospc branch of Chris's git tree and it should behave better for you. Thanks, The right tree for this is the master branch of btrfs-unstable for 2.6.31. Thanks, Josef and Chris. I've now found the time to check out and build the btrfs-unstable tree, and it is indeed handling the ENOSPC condition much more cleanly. However, it seems to have got into a position where I have lots of free space reported by df (over 10% of the size of the volume -- 185 GiB free of 1474 GiB total), but still refuses to write anything to the filesystem. Do you have any suggestions for what I could try? You've probably got most of that 10GB free allocated as metadata. You could try btrfs-vol -b. I moved some 13 GiB of data off the filesystem, and ran btrfs-vol -b. As I reported on IRC, I then got this in my syslog: Oct 13 13:16:19 vlad kernel: btrfs: relocating block group 1401224691712 flags 1 Oct 13 13:17:02 vlad kernel: btrfs: found 123 extents Oct 13 13:17:10 vlad kernel: btrfs: found 123 extents Oct 13 13:17:11 vlad kernel: btrfs: found 28 extents Oct 13 13:17:21 vlad kernel: btrfs: found 28 extents Oct 13 13:17:25 vlad kernel: btrfs: found 28 extents Oct 13 13:17:26 vlad kernel: btrfs: found 27 extents Oct 13 13:17:36 vlad kernel: btrfs: found 27 extents Oct 13 13:17:39 vlad kernel: btrfs: found 27 extents Oct 13 13:17:48 vlad kernel: btrfs: found 27 extents ... repeat forever (or at least for 50 minutes or so). The btrfs-vol -b process didn't respond to ^C, so on advice of yanzheng on IRC I rebooted the machine. I'm currently running a btrfsck on the filesystem, and will try btrfs-vol -b again when that's done. don't do that, It will run into infinite loop again. I got this from the btrfsck: h...@vlad:~ $ sudo btrfsck /dev/media/scratch root 5 inode 3949 errors 2000 found 1366552736241 bytes used err is 1 total csum bytes: 1336783032 total tree bytes: 1944158208 total fs tree bytes: 20267008 btree space waste bytes: 462357950 file data blocks allocated: 1368865824768 referenced 1368851816448 Btrfs Btrfs v0.19 I guess that means that there were errors found -- is the btrfs-vol -b still going to cause an infinite loop, or is it worth trying that again? Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Quantum Mechanics: the dreams stuff is made of. --- signature.asc Description: Digital signature
Re: To loop or not to loop with btrfs
On Wed, Nov 18, 2009 at 10:31:53PM +0100, Jan Engelhardt wrote: This left me puzzled for a while: 22:29 borg:/ # losetup /dev/loop1 /.B.disk 22:29 borg:/ # mount /dev/loop1 /B mount: /dev/loop1: can't read superblock 22:29 borg:/ # blkid /dev/loop1 /dev/loop1: UUID=e19fe89b-cde3-4ccc-bc70-b759a57bd1c9 UUID_SUB=f29c6218-d040-4546-a227-4dd2d2142817 TYPE=btrfs 22:29 borg:/ # losetup -d /dev/loop1 22:29 borg:/ # losetup /dev/loop2 /.B.disk 22:29 borg:/ # mount /dev/loop2 /B (success) So the btrfs volume is tied to loop2? That certainly is not good. Even real disks (/dev/sd*) can move around, the more so USB flash gadgets and loop devices. This looks like it might be related to [1]? (I suspect it slipped Chris's mind back in April, and nobody's really noticed it since). Hugo. [1] http://article.gmane.org/gmane.comp.file-systems.btrfs/2817 -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Most administrators wouldn't give their users the time of --- day. That's what NTP is for. signature.asc Description: Digital signature
Re: btrfs: 21 minutes to read 1.2M file directory
On Wed, Dec 22, 2010 at 12:39:15PM -0800, Andy Isaacson wrote: On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote: On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote: I have a directory with 1.2M files in it, which makes readdir very slow on btrfs with cold caches (although it's reasonably fast with hot caches as in the first example below): Sounds like: Bug 21562 - btrfs is dead slow due to fragmentation https://bugzilla.kernel.org/show_bug.cgi?id=21562 Hmmm, how do I look at the btree layout for a given inode? There's documentation on the tree structures at [1] and [2]. If you know the inode number of the object you're interested in, you need to look in the FS tree for the subvolume it's in and find the (inode_number, EXTENT_DATA, ...) keys for the file. Each of those records will reference an individual disk extent -- and you can get the disk start position and length of the extent from the data stored under the key. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/Btree_Items [2] https://btrfs.wiki.kernel.org/index.php/Data_Structures -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Hail and greetings. We are a flat-pack invasion force from --- Planet Ikea. We come in pieces. signature.asc Description: Digital signature
Re: open_ctree failed, unable to mount the fs
On Fri, Jan 07, 2011 at 08:01:47PM +0100, Tomasz Chmielewski wrote: I got a power cycle, after which I'm no longer able to mount btrfs filesystem: device fsid x-y devid 1 transid 169686 /dev/vda3 device fsid x-y devid 1 transid 169686 /dev/vda3 parent transid verify failed on 3260289024 wanted 169686 found 169685 parent transid verify failed on 3260289024 wanted 169686 found 169685 parent transid verify failed on 3260289024 wanted 169686 found 169685 btrfs: open_ctree failed Tried to get that mounted with 2.6.35 and 2.6.37, without success. Is there a way to fix it? The forthcoming[1] btrfsck tool should handle that particular error, I believe. To prevent it from happening again, ensure that you have working barriers on your disks, or that you turn off write caching on the drives at every boot. Hugo. [1] out real soon now -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, sir, the floor is yours. But remember, the --- roof is ours! signature.asc Description: Digital signature
Re: Synching a Backup Server
On Sun, Jan 09, 2011 at 08:57:12PM +, Alan Chandler wrote: On 09/01/11 18:30, Hugo Mills wrote: No, subvolumes are a part of the whole filesystem. In btrfs, there is only one filesystem. There are 6 main B-trees that store metadata in btrfs (plus a couple of others). One of those is the filesystem tree (or FS tree), which contains all the metadata associated with the normal POSIX directory/file namespace (basically all the inode and xattr data). When you create a subvolume, a new FS tree is created, but it shares *all* of the other btrfs B-trees. There is only one filesystem, but there may be distinct namespaces within that filesystem that can be mounted as if they were filesystems. Think of it more like NFSv4, where there's one overall namespace exported per server, but clients can mount subsections of it. I think this explanation is still missing the key piece that has confused me despite trying very hard to understand it by reading the wiki. You talk about Distinct Namespaces, but what I learnt from further up the thread is that this namespace is also inside the the namespace that makes up the whole filesystem. I mount the whole filesystem, and all my subvolumes are automatically there (at least that is what I find in practice). Its this duality of namespace that is the difficult concept. I am still not sure of there is a default subvolume, and the other subvolumes are defined within its namespace, or whether there is an overall filesystem namespace and subvolumes defined within it and if you mount the default subvolume you would then lose the overall filesystem namespace and hence no longer see the subvolumes. There is a root subvolume namespace (subvolid=0), which may contain files, directories, and other subvolumes. This root subvolume is what you see when you mount a newly-created btrfs filesystem. The default subvolume is simply what you get when you mount the filesystem without a subvol or subvolid parameter to mount. Initially, the default subvolume is set to be the root subvolume. If another subvolume is set to be the default, then the root subvolume can only be mounted with the subvolid=0 mount option. I find the wiki also confusing because it talks about subvolumes having to be at the first level of the filesystem, but again further up this thread there is an example which is used for real of it not being at the first level, but at one level down inside a directory. Try it, see what happens, and fix the wiki where it's wrong? :) Or at least say what page this is on, and I can try the experiment and fix it later... What it means is that I don't have a mental picture of how this all works, and all use cases could then be worked out by following this mental picture. I think it would be helpful if the Wiki contained some of the use cases that we have been talking about in this thread - but with more detailed information - like the actual commands used to mount the filesystems like this, and information as to in what circumstances you would perform each action. I've written a chunk of text about how btrfs's storage, RAID and subvolumes work. At the moment, though, the wiki is somewhat broken and I can't actually create the page to put it on... There's also a page of recipes[1], which is probably the place that the examples you mentioned should go. The main awkward piece of btrfs terminology is the use of RAID to describe btrfs's replication strategies. It's not RAID, and thinking of it in RAID terms is causing lots of confusion. Most of the other things in btrfs are, I think, named relatively sanely. I don't find this AS confusing, although there is still information missing which I asked in another post that wasn't answered. I still can't understand if its possible to initialise a filesystem in degraded mode. If you create the filesystem so that -m RAID1 and -d RAID1 but only have one device - it implies that it writes two copies of both metadata and data to that one device. However if you successfully create the filesystem on two devices and then fail one and mount it -o degraded it appears to suggest it will only write the one copy. From trying it a while ago, I don't think it is possible to create a filesystem in degraded mode. Again, I'll try it again when I have the time to do some experimentation and see what actually happens. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/UseCases -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A clear conscience. Where did you get this taste --- for luxuries, Bernard? signature.asc Description: Digital signature
Filesystem creation in degraded mode
I've had a go at determining exactly what happens when you create a filesystem without enough devices to meet the requested replication strategy: # mkfs.btrfs -m raid1 -d raid1 /dev/vdb # mount /dev/vdb /mnt # btrfs fi df /mnt Data: total=8.00MB, used=0.00 System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=153.56MB, used=24.00KB Metadata: total=8.00MB, used=0.00 The data section is single-copy-only; system and metadata are DUP. This is good. Let's add some data: # cp develop/linux-image-2.6.3* /mnt # btrfs fi df /mnt Data: total=315.19MB, used=250.58MB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=153.56MB, used=364.00KB Metadata: total=8.00MB, used=0.00 Again, much as expected. Now, add in a second device, and balance: # btrfs dev add /dev/vdc /mnt # btrfs fi bal /mnt # btrfs fi df /mnt Data, RAID0: total=1.20GB, used=250.58MB System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=128.00MB, used=308.00KB This is bad, though. Data has reverted to RAID-0. Now, just to check, what happens when we create a filesystem with enough devices, fail one, and re-add it? # mkfs.btrfs -d raid1 -m raid1 /dev/vdb /dev/vdc # mount /dev/vdb /mnt # # Copy some data into it # btrfs fi df /mnt Data, RAID1: total=1.50GB, used=1.24GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=307.19MB, used=1.80MB Metadata: total=8.00MB, used=0.00 # umount /mnt OK, so what happens if we fail one drive? # dd if=/dev/zero of=/dev/vdb bs=1M count=16 # mount /dev/vdc /mnt -o degraded # btrfs dev add /dev/vdd /mnt # btrfs fi show failed to read /dev/sr0 Label: none uuid: 2495fe15-174f-4aaa-8317-c2cfb4dade1f Total devices 3 FS bytes used 1.25GB devid2 size 3.00GB used 1.81GB path /dev/vdc devid3 size 3.00GB used 0.00 path /dev/vdd *** Some devices missing Btrfs Btrfs v0.19 # btrfs fi bal /mnt # btrfs fi df /mnt Data, RAID1: total=1.50GB, used=1.24GB System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=128.00MB, used=1.41MB This looks all well and good. So it looks like it's just the create-in-degraded-mode idea that doesn't work. Kernel is btrfs-unstable, up to 65e5341b (plus my balance-progress patches, but those shouldn't affect this). Hugo. PS. I haven't tried with RAID-10 yet, but I suspect that it'll be much the same. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You are demons, and I am in Hell! Well, technically, it's --- London, but it's an easy mistake to make. signature.asc Description: Digital signature
[PATCH RFC] Add ioctl for balancing a subset of the full filesystem.
This is a patch purely for comment. There's several things wrong with it that I need to fix (at minimum, it has too much debugging output, the __balance_chunk_filters function takes the wrong set of parameters to make it properly extensible, and the progress counter is broken). I'm planning on adding at least two more filters, once this basic infrastructure is reasonably stable: one to filter on a range of (virtual) addresses, and one to work on device IDs (i.e. was any part of this block group stored on device $n?). With the additional filters written, you'll be able to specify any conjunctive set of filters. i.e. This block group is RAID1, *and* was stored on devid 4. Disjunctions (or) aren't supported, and probably won't be with this API. The filter data for additional filters will go at the end of struct btrfs_ioctl_balance_start, ensuring extensibility and backwards-compatibility (or at least, proper error reporting of unsupported features). Questions for the panel: * Is the ioctl API reasonably sane, extensible, future-proof? * What other block group filters could be useful for this API? Hugo. There are situations, such as restarting an interrupted balance, where is not necessary or desired to balance all of the block groups in the filesystem. This patch adds the basic infrastructure for filtering block groups during a balance. It also adds a single filter method, allowing the caller to select block groups with specific usage and replication strategies. --- fs/btrfs/ioctl.c | 44 +- fs/btrfs/ioctl.h | 15 ++ fs/btrfs/volumes.c | 76 +++ fs/btrfs/volumes.h |3 +- 4 files changed, 124 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6d50d24..a2dd60c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2243,6 +2243,46 @@ static noinline long btrfs_ioctl_wait_sync(struct file *file, void __user *argp) return btrfs_wait_for_commit(root, transid); } +/* Balance the filesystem unconditionally */ +long btrfs_ioctl_balance(struct btrfs_fs_info *fs_info) +{ + return btrfs_balance(fs_info-dev_root, NULL); +} + +/* Balance particular chunks in the filesystem */ +long btrfs_ioctl_balance_filtered( + struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_balance_start __user *user_filters) +{ + int ret = 0; + struct btrfs_ioctl_balance_start *dest; + + dest = kmalloc(sizeof(struct btrfs_ioctl_balance_start), GFP_KERNEL); + if (!dest) + return -ENOMEM; + + if (copy_from_user(dest, user_filters, sizeof(struct btrfs_ioctl_balance_start))) { + ret = -EFAULT; + goto error; + } + + printk(Starting balance with filter: %llx %llx %llx\n, + dest-flags, dest-chunk_type, dest-chunk_type_mask); + + /* Basic sanity checking */ + if (dest-flags ~BTRFS_BALANCE_FILTER_MASK) { + ret = -ENOTSUPP; + goto error; + } + + /* Do the balance */ + ret = btrfs_balance(fs_info-dev_root, dest); + +error: + kfree(dest); + return ret; +} + /* * Return the current status of any balance operation */ @@ -2335,11 +2375,13 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_RM_DEV: return btrfs_ioctl_rm_dev(root, argp); case BTRFS_IOC_BALANCE: - return btrfs_balance(root-fs_info-dev_root); + return btrfs_ioctl_balance(root-fs_info); case BTRFS_IOC_BALANCE_PROGRESS: return btrfs_ioctl_balance_progress(root-fs_info, argp); case BTRFS_IOC_BALANCE_CANCEL: return btrfs_ioctl_balance_cancel(root-fs_info); + case BTRFS_IOC_BALANCE_FILTERED: + return btrfs_ioctl_balance_filtered(root-fs_info, argp); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 4f73d11..7c0c69c 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -154,6 +154,19 @@ struct btrfs_ioctl_balance_progress { __u64 completed; }; +/* Types of balance filter */ +#define BTRFS_BALANCE_FILTER_CHUNK_TYPE 0x1 +#define BTRFS_BALANCE_FILTER_MASK 0x1 + +/* All the possible options for a filter */ +struct btrfs_ioctl_balance_start { + __u64 flags; /* Bit field indicating which fields of this struct are filled */ + + /* For FILTER_CHUNK_TYPE */ + __u64 chunk_type; /* Flag bits required */ + __u64 chunk_type_mask; /* Mask of bits to examine */ +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -201,4 +214,6 @@ struct btrfs_ioctl_balance_progress { #define BTRFS_IOC_BALANCE_PROGRESS
[PATCH RFC] Initial implementation of userspace interface for filtered balancing.
This is the userspace side of the filtered balance patch, again purely for comment at this stage. The command-line invocation will look something like this: $ sudo btrfs fi bal --filter type=meta,~raid1 /mnt This will balance all metadata block groups that are not replicated with RAID1. Once I've implemented additional filter types, they can be specified with extra --filter options, with the semantics of and between each --filter option. (Yes, Goffredo, I know I need to update the man pages for this patch... :) ) This patch, and the preceding kernel one, both apply on top of my previous balance progress/cancel patches. Hugo. It is useful to be able to balance a subset of the full filesystem. This patch implements the infrastructure for filtering block groups on different criteria when balancing the filesystem. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c |4 +- btrfs_cmds.c | 132 -- ioctl.h | 15 +++ 3 files changed, 145 insertions(+), 6 deletions(-) diff --git a/btrfs.c b/btrfs.c index 7b42658..19b0e56 100644 --- a/btrfs.c +++ b/btrfs.c @@ -92,8 +92,8 @@ static struct Command commands[] = { Show space usage information for a mount point\n. }, { do_balance, -1, - filesystem balance, [-w|--wait] path\n - Balance the chunks across the device. + filesystem balance, [-w|--wait] [-f|--filter=filter:...] path\n + Balance chunks across the devices. --filter=help for help on filters.\n }, { do_balance, -1, balance start, [-w|--wait] path\n diff --git a/btrfs_cmds.c b/btrfs_cmds.c index fadcb4f..f7bd835 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -756,26 +756,74 @@ int do_add_volume(int nargs, char **args) const struct option balance_options[] = { { wait, 0, NULL, 'w' }, + { filter, 1, NULL, 'f' }, { NULL, 0, NULL, 0 } }; +struct filter_class_desc { + char *keyword; + char *description; + int flag; +}; + +const struct filter_class_desc filter_class[] = { + { type, + type=[~]flagname[,...]\n + \tWhere flagname is one of:\n + \t\tmeta, sys, data, raid0, raid1, raid10, dup\n + \tPrefix a flagname with ~ to negate the match.\n, + BTRFS_BALANCE_FILTER_CHUNK_TYPE }, + { NULL, NULL, 0 } +}; + +struct type_filter_desc { + char *keyword; + __u64 mask; + __u64 set; + __u64 unset; +}; + +#define BTRFS_BLOCK_GROUP_SINGLE \ + BTRFS_BLOCK_GROUP_RAID0 | \ + BTRFS_BLOCK_GROUP_RAID1 | \ + BTRFS_BLOCK_GROUP_RAID10 | \ + BTRFS_BLOCK_GROUP_DUP + +const struct type_filter_desc type_filters[] = { + { data, BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_DATA, 0 }, + { sys, BTRFS_BLOCK_GROUP_SYSTEM, BTRFS_BLOCK_GROUP_SYSTEM, 0 }, + { meta, BTRFS_BLOCK_GROUP_METADATA, BTRFS_BLOCK_GROUP_METADATA, 0 }, + { raid0, BTRFS_BLOCK_GROUP_RAID0, BTRFS_BLOCK_GROUP_RAID0, 0 }, + { raid1, BTRFS_BLOCK_GROUP_RAID1, BTRFS_BLOCK_GROUP_RAID1, 0 }, + { raid10, BTRFS_BLOCK_GROUP_RAID10, BTRFS_BLOCK_GROUP_RAID10, 0 }, + { dup, BTRFS_BLOCK_GROUP_DUP, BTRFS_BLOCK_GROUP_DUP, 0 }, + { single, BTRFS_BLOCK_GROUP_SINGLE, 0, BTRFS_BLOCK_GROUP_SINGLE }, + { NULL, 0, 0, 0 } +}; + int do_balance(int argc, char **argv) { int fdmnt, ret=0; int background = 1; - struct btrfs_ioctl_vol_args args; + struct btrfs_ioctl_balance_start *args; char *path; + char *filters_string = NULL; + char *this_filter_string; + char *saveptr; int ttyfd; optind = 1; while(1) { - int c = getopt_long(argc, argv, w, balance_options, NULL); + int c = getopt_long(argc, argv, wf:, balance_options, NULL); if (c 0) break; switch(c) { case 'w': background = 0; break; + case 'f': + filters_string = optarg; + break; default: fprintf(stderr, Invalid arguments for balance\n); free(argv); @@ -796,6 +844,82 @@ int do_balance(int argc, char **argv) return 12; } + args = malloc(4096); + if (!args) { + fprintf(stderr, ERROR: Not enough memory\n); + return 13; + } + + /* Parse the filters string, if there is one */ + this_filter_string = strtok_r(filters_string, :, saveptr); + while(this_filter_string) { + char *subsave; + char *part; + char *type = strtok_r(this_filter_string, =,, subsave); + int class_id = -1; + + /* Work out what filter type we're looking at */ + if(strcmp(type
Re: Possible Kernel BUG regarding BTRFS
Jan 19 20:05:00 Desktop kernel: [ 2091.228432] 0 88023649dd68 0003Jan 19 20:06:12 Desktop kernel: imklog 4.2.0, log source = /proc/kmsg started. /var/log/kern.log Jan 19 20:05:00 Desktop kernel: [ 2091.228274] device fsid b849836048fddcda-fdb584bb7dae7bb1 devid 1 transid 97123 /dev/sdb2 Jan 19 20:05:00 Desktop kernel: [ 2091.228294] BUG: unable to handle kernel NULL pointer dereference at 0128 Jan 19 20:05:00 Desktop kernel: [ 2091.228298] IP: [] btrfs_test_super+0x10/0x30 [btrfs] Jan 19 20:05:00 Desktop kernel: [ 2091.228309] PGD 2338f8067 PUD 235875067 PMD 0 Jan 19 20:05:00 Desktop kernel: [ 2091.228313] Oops: [#2] SMP Jan 19 20:05:00 Desktop kernel: [ 2091.228316] last sysfs file: /sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda2/uevent Jan 19 20:05:00 Desktop kernel: [ 2091.228319] CPU 7 Jan 19 20:05:00 Desktop kernel: [ 2091.228320] Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs cryptd aes_x86_64 aes_generic xt_multiport binfmt_misc parport_pc ppdev dm_crypt snd_hda_codec_atihdmi snd_hda_codec_realtek ipt_REJECT xt_comment xt_limit xt_tcpudp ipt_addrtype xt_state ip6table_filter ip6_tables nf_nat_irc snd_hda_intel nf_conntrack_irc nf_nat_ftp nf_nat snd_hda_codec nf_conntrack_ipv4 snd_hwdep nf_defrag_ipv4 snd_seq_midi snd_pcm snd_rawmidi nf_conntrack_ftp nf_conntrack snd_seq_midi_event iptable_filter snd_seq gspca_zc3xx gspca_main ip_tables snd_timer snd_seq_device x_tables psmouse videodev v4l1_compat v4l2_compat_ioctl32 serio_raw snd i7core_edac soundcore snd_page_alloc edac_core lp parport hid_apple usbhid hid radeon firewire_ohci ttm firewire_core drm_kms_helper crc_itu_t pata_jmicron ahci usb_storage r8169 libahci mii drm i2c_algo_bit Jan 19 20:05:00 Desktop kernel: [ 2091.228381] Jan 19 20:05:00 Desktop kernel: [ 2091.228384] Pid: 3248, comm: mount Tainted: G D 2.6.35-24-generic #42-Ubuntu MSI X58 Pro (MS-7522) /MS-7522 Jan 19 20:05:00 Desktop kernel: [ 2091.228387] RIP: 0010:[] [] btrfs_test_super+0x10/0x30 [btrfs] Jan 19 20:05:00 Desktop kernel: [ 2091.228395] RSP: 0018:88023649dd18 EFLAGS: 00010283 Jan 19 20:05:00 Desktop kernel: [ 2091.228397] RAX: RBX: a05cd000 RCX: 880236918d80 Jan 19 20:05:00 Desktop kernel: [ 2091.228400] RDX: 81154a00 RSI: 880236918d80 RDI: 880204a4e800 Jan 19 20:05:00 Desktop kernel: [ 2091.228402] RBP: 88023649dd18 R08: R09: 0001 Jan 19 20:05:00 Desktop kernel: [ 2091.228404] R10: 880236918deJan 19 20:06:12 Desktop kernel: imklog 4.2.0, log source = /proc/kmsg started. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Someone's been throwing dead sheep down my Fun Well --- signature.asc Description: Digital signature
Re: Btrfs balance
On Thu, Jan 20, 2011 at 03:53:41PM +0100, Andreas Philipp wrote: On 20.01.2011 14:39, Hugo Mills wrote: On Thu, Jan 20, 2011 at 02:07:23PM +0100, Andreas Philipp wrote: Hi, Maybe it is a very stupid question but I want to ask it anyway. In general, 'btrfs filesystem balance' takes very long to finish and produces lots of IO. So what are the classical usage scenarios, when it is (really) worth doing a balance? The primary use-cases for balancing are to even out the filesystem after adding, removing or changing the size of one of the underlying volumes. Ok, so this is a little bit like for example resyncing a classical raid after it was in degraded mode etc. Pretty much exactly that. It will also be of use when we finally get around to allowing you to change RAID settings on the whole volume, to implement the requested changes to the RAID level. Definitely, a nice feature. I'm in the process of implementing balance filters, so that some other cases where balancing is useful (reclaiming unused block groups) can be run more efficiently by only balancing the bits that need doing. I have seen your post on balance filters. So then it will be (much) faster just because less is done? Yes, that's the idea. If you've lost and replaced a drive from a 2-drive RAID-1 array, there's not much that filters can do for you: all your data will have to be read and rebuilt. However, if you're changing just your metadata from DUP to RAID-1, say, or recovering from the loss of one drive in an 8-drive RAID-1 array, it should be an awful lot faster with filters. When you have a version for trying it out and you need someone for testing I will give it a try. Thanks. I've got quite a bit reworked now to support multiple filter types, but I need to do a full review of what I'm doing, and test it myself first. I probably won't have much time to work on it before Monday, now. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eats Memory and Crashes. --- signature.asc Description: Digital signature
Re: Encryption
On Thu, Jan 20, 2011 at 07:05:52AM -0800, Carl Cook wrote: Does BTRFS have subvolume encryption built in? If not, why? Not at the moment. My opinion on why: Getting crypto right is *hard*. There are far easier features that people are asking for that we can implement first. There may be technical issues that make it hard to implement within btrfs, although being able to do compression is harder from a FS structure point of view, so I suspect that the issues are more about ensuring correctness of the crypto implementation (not just the basic symmetric algorithm, because we've got those in the kernel, but all the key management and block chaining and probably a bunch of things I don't know about because I'm not a cryptographer -- all of which makes a big difference to the security of the final system). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Once is happenstance; twice is coincidence; three times --- is enemy action. signature.asc Description: Digital signature
Re: Shrinking virtual disk with btrfs on it
On Fri, Jan 21, 2011 at 10:20:34AM -0700, Rodney Beede wrote: Any tools to go about zeroing about the free space on a btrfs file system so I can shrink the VMware vmdk virtual disk? I ran the VMware command, but the dynamic disk is still really big. I presume it is due to free space that isn't zeroed out. One solution I've used before is to write a single very large file full of zeroes, filling the filesystem, then delete it. $ dd if=/dev/zero of=/mountpoint/foo.dat rm /mountpoint/foo.dat Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them. signature.asc Description: Digital signature
Re: Synching a Backup Server
On Fri, Jan 21, 2011 at 11:28:19AM -0800, Freddie Cash wrote: On Sun, Jan 9, 2011 at 10:30 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote: Let see if I can match up the terminology and layers a bit: LVM Physical Volume == Btrfs disk == ZFS disk / vdevs LVM Volume Group == Btrfs filesystem == ZFS storage pool LVM Logical Volume == Btrfs subvolume == ZFS volume 'normal' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem Does that look about right? Kind of. The thing is that the way that btrfs works is massively different to the way that LVM works (and probably massively different to the way that ZFS works, but I don't know much about ZFS, so I can't comment there). I think that trying to think of btrfs in LVM terms is going to lead you to a large number of incorrect conclusions. It's just not a good model to use. My biggest issue trying to understand Btrfs is figuring out the layers involved. With ZFS, it's extremely easy: disks -- vdev -- pool -- filesystems With LVM, it's fairly easy: disks - volume group -- volumes -- filesystems But, Btrfs doesn't make sense to me: disks -- filesystem -- sub-volumes??? So, is Btrfs pooled storage or not? Do you throw 24 disks into a single Btrfs filesystem, and then split that up into separate sub-volumes as needed? Yes, except that the subvolumes aren't quite as separate as you seem to think that they are. There's no preallocation of storage to a subvolume (in the way that LVM works), so you're only limited by the amount of free space in the whole pool. Also, data stored in the pool is actually free for use by any subvolume, and can be shared (see the deeper explanation below). From the looks of things, you don't have to partition disks or worry about sizes before formatting (if the space is available, Btrfs will use it). But it also looks like you still have to manage disks. Or, maybe it's just that the initial creation is done via mkfs (as in, formatting a partition with a filesystem) that's tripping me up after using ZFS for so long (zpool creates the storage pool, manages the disks, sets up redundancy levels, etc; zfs creates filesystems and volumes, and sets properties; no newfs/mkfs involved). So potentially zpool - mkfs.btrfs, and zfs - btrfs. However, I don't know enough about ZFS internals to know whether this is a reasonable analogy to make or not. It looks like ZFS, Btrfs, and LVM should work in similar manners, but the overloaded terminology (pool, volume, sub-volume, filesystem are different in all three) and new terminology that's only in Btrfs is confusing. Just curious, why all the new terminology in btrfs for things that already existed? And why are old terms overloaded with new meanings? I don't think I've seen a write-up about that anywhere (or I don't remember it if I have). The main awkward piece of btrfs terminology is the use of RAID to describe btrfs's replication strategies. It's not RAID, and thinking of it in RAID terms is causing lots of confusion. Most of the other things in btrfs are, I think, named relatively sanely. No, the main awkward piece of btrfs terminology is overloading filesystem to mean collection of disks and creating sub-volume to mean filesystem. At least, that's how it looks from way over here. :) As I've tried to explain, that's the wrong way of looking at it. Let me have another go in more detail. There's *one* filesystem. It contains: - *One* set of metadata about the underlying disks (the dev tree). - *One* set of metadata about the distribution of the storage pool on those disks (the chunk tree) - *One* set of metadata about extents within that storage pool (the extent tree). - *One* set of metadata about checksums for each 4k chunk of data within an extent (the checksum tree). - *One* set of metadata about where to find all the other metadata (the root tree). Note that an extent is a sequence of blocks which is both contiguous on disk, and contiguous within one *or more* files. In addition to the above globally-shared metadata, there are multiple metadata sets, each representing a mountable namespace -- these are the subvolumes. Each of these subvolumes holds a directory structure, and all of the POSIX information for each file name within that structure. For each file within a subvolume, there's a sequence of pointers to the shared extent pool, indicating what blocks on disk are actually holding the data for that file. Note that the actual file data, and the management of its location on the disk (and its replication), is completely shared across subvolumes. The same extent may be used multiple times by different files, and those files may be in any subvolumes on the filesystem. In theory, the same extent could even appear several times in the same file. This sharing is how snapshots and COW copies
Re: v0.19-35-g1b444cd btrfsck says snapshots have errors
On Sun, Jan 23, 2011 at 05:44:34AM -0500, Ian! D. Allen wrote: On Fri, Jan 21, 2011 at 09:15:49AM +0800, Yan, Zheng wrote: On Fri, Jan 21, 2011 at 6:52 AM, Ian! D. Allen idal...@idallen.ca wrote: Still getting btrfsck errors with this: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git unresolved ref root 256 dir 256 index 2 namelen 5 name snap1 error 600 found 49152 bytes used err is 1 These is caused by a design flaw, you can safely ignore them. If it isn't an error, shouldn't btrfsck be ignoring it, not me? At minimum it could say warning and not err is 1. Yes, it probably should, but there's not a great deal of point in fixing this particular issue, because Chris is working on the all-new (offline) repairing fsck, which should replace the current checking-only fsck very soon now. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- O tempura! O moresushi! --- signature.asc Description: Digital signature
Re: Bug in mkfs.btrfs?!
Hi, Felix, On Sat, Jan 22, 2011 at 04:56:12PM +0100, Felix Blanke wrote: It was a simple: mkfs.btrfs -L backup -d single /dev/loop2 But it also happens without the options, like: mkfs.btrfs /dev/loop2 /dev/loop2 is a loop device, which is aes encrypted. The output of losetup /dev/loop2: /dev/loop2: [0010]:5324 (/dev/disk/by-id/ata-WDC_WD6400AAKS-22A7B2_WD-WCASY7780706-part3) encryption=AES128 Thanks you for looking into this! While writing this I read your second mail. The strace output is attached. OK, I've traced through the functions being called, and I really can't see where it could be truncating the name, unless your system has a stupidly small value of PATH_MAX. Can you apply the following patch (to the next branch of the btrfs-progs git repo), rebuild, and try again? It's just adding some debugging output to track what it's looking at. Hugo. diff --git a/mkfs.c b/mkfs.c index 2e99b95..51a5096 100644 --- a/mkfs.c +++ b/mkfs.c @@ -422,6 +422,7 @@ int main(int ac, char **av) printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n); file = av[optind++]; + printf(Checking whether %s is part of a mounted filesystem\n, file); ret = check_mounted(file); if (ret 0) { fprintf(stderr, error checking %s mount status\n, file); diff --git a/utils.c b/utils.c index fd894f3..7fa3149 100644 --- a/utils.c +++ b/utils.c @@ -610,12 +610,16 @@ int resolve_loop_device(const char* loop_dev, char* loop_file, int max_len) int ret_ioctl; struct loop_info loopinfo; + printf(Resolving loop device %s (length %d)\n, loop_dev, max_len); + if ((loop_fd = open(loop_dev, O_RDONLY)) 0) return -errno; ret_ioctl = ioctl(loop_fd, LOOP_GET_STATUS, loopinfo); close(loop_fd); + printf(Loop name = %s\n, loopinfo.lo_name); + if (ret_ioctl == 0) strncpy(loop_file, loopinfo.lo_name, max_len); else @@ -639,6 +643,9 @@ int is_same_blk_file(const char* a, const char* b) return -errno; } + printf(Realpath of %s was %s\n, a, real_a); + printf(Realpath of %s was %s\n, b, real_b); + /* Identical path? */ if(strcmp(real_a, real_b) == 0) return 1; @@ -680,6 +687,9 @@ int is_same_loop_file(const char* a, const char* b) const char* final_b; int ret; + printf(is_same_loop_file: %s and %s\n, a, b); + printf(PATH_MAX = %d\n, PATH_MAX); + /* Resolve a if it is a loop device */ if((ret = is_loop_device(a)) 0) { return ret; @@ -784,8 +794,10 @@ int check_mounted(const char* file) if(strcmp(mnt-mnt_type, btrfs) != 0) continue; + printf(Testing if btrfs device is in the dev list: %s\n, mnt-mnt_fsname); ret = blk_file_in_dev_list(fs_devices_mnt, mnt-mnt_fsname); } else { + printf(Testing if non-btrfs device is block or regular: %s\n, mnt-mnt_fsname); /* ignore entries in the mount table that are not associated with a file*/ if((ret = is_existing_blk_or_reg_file(mnt-mnt_fsname)) 0) diff --git a/volumes.c b/volumes.c index 7671855..2496fbd 100644 --- a/volumes.c +++ b/volumes.c @@ -130,6 +130,8 @@ static int device_list_add(const char *path, device-fs_devices = fs_devices; } + printf(Device added with name %s\n, device-name); + if (found_transid fs_devices-latest_trans) { fs_devices-latest_devid = devid; fs_devices-latest_trans = found_transid; @@ -223,6 +225,7 @@ int btrfs_scan_one_device(int fd, const char *path, *total_devs = btrfs_super_num_devices(disk_super); uuid_unparse(disk_super-fsid, uuidbuf); + printf(Adding device %s to list\n, path); ret = device_list_add(path, disk_super, devid, fs_devices_ret); error_brelse: -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Doughnut furs ache me, Omar Dorlin. --- signature.asc Description: Digital signature
Re: Cannot Create Partition
On Sun, Jan 23, 2011 at 10:07:54AM -0800, cac...@quantum-sci.com wrote: On /dev/sda I have sda1 which is my / bootable filesystem for Debian formatted ext4. This is 256MB on a 2TB drive. I want to set up the rest of the drive as BTRFS for various functions, and I presume that I first have to create a partition using fdisk for this? Since my first part is ext4? So I: # fdisk /dev/sda WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted. I think the above may be the root cause of your problem. You're using the new GPT partition table format, not the traditional DOS one, and fdisk is claiming that it can't handle it. WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u'). Command (m for help): p Disk /dev/sda: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sda1 1 243202 1953514583+ ee GPT Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 No free sectors available Command (m for help): - Whaa? Maybe it's possible that I just mkfs.btrfs /dev/sda and it will set up -only- the remaining space, but I'm afraid that this may destroy my OS. No, that will almost certainly destroy your existing partitioning, and hence, as you say, your OS install. Also, what if I want to set up the whole drive as BTRFS? Could this be bootable, and can the canned Debian kernel load the BTRFS driver for boot at install? Or would I boot to the CD, mkfs.btrfs the drive, then install Debian? Anyone tried this? As far as I know, GRUB2 doesn't yet support btrfs (although there was some work done on it, I don't know what the status of that work is). This means that you need a filesystem of some other type to boot off -- even if it only holds the contents of /boot. There are certainly people around who've done this, although I'm not one of them. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Dullest spy film ever: The Eastbourne Ultimatum --- signature.asc Description: Digital signature
Re: Bug in mkfs.btrfs?!
On Sun, Jan 23, 2011 at 11:02:16PM +0100, Goffredo Baroncelli wrote: On 01/23/2011 07:18 PM, Hugo Mills wrote: Hi, Felix, On Sat, Jan 22, 2011 at 04:56:12PM +0100, Felix Blanke wrote: It was a simple: mkfs.btrfs -L backup -d single /dev/loop2 But it also happens without the options, like: mkfs.btrfs /dev/loop2 /dev/loop2 is a loop device, which is aes encrypted. The output of losetup /dev/loop2: /dev/loop2: [0010]:5324 (/dev/disk/by-id/ata-WDC_WD6400AAKS-22A7B2_WD-WCASY7780706-part3) encryption=AES128 Thanks you for looking into this! While writing this I read your second mail. The strace output is attached. OK, I've traced through the functions being called, and I really can't see where it could be truncating the name, unless your system has a stupidly small value of PATH_MAX. It seems that when mkfs.btrfs checks if the passed block device is already mounted, uses the ioctl LOOP_GET_STATUS [1]. This ioctl has as argument the struct loop_info. This ioctl, should return the info about the back-end of the loop device. The file name is returned via the lo_name field, which is an array of 64 char...[2] Good catch, Goffredo. I completely missed that. Interestingly, on my system, lo_name is indeed defined as 64 chars, but I don't see Felix's problem. When I do losetup on the /dev/disk/by-id/... link, my version of losetup seems to be following the link: # losetup /dev/loop1 /dev/disk/by-id/dm-uuid-LVM-XRQLHQNa0xEeIZL4ofuBGIcfkr1Dhry8YHhkjaw4bvZA4meDFQfEMy5elIsVNeWl # losetup -a /dev/loop1: [0005]:1423915 (/dev/mapper/ruthven-btemp) I'm running Debian, and the mount package version 2.17.2-5 (losetup is part of mount, it seems). Felix, what is the output of the following command ? /sbin/losetup -a If my analysis is correct, this command should return the filename trunked at the 64th character too. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Sometimes, when I'm alone, I Google myself. --- signature.asc Description: Digital signature
Re: Bug in mkfs.btrfs?!
On Mon, Jan 24, 2011 at 02:29:36PM +, Hugo Mills wrote: If, instead, the initial losetup call tracked the symlinks back to the original device node (i.e. something like /dev/sdb3, or /dev/mapper/ruthven-btest in my example), then the name that's stored in the kernel would be shorter, and we'd be less likely to see the truncation. This is what my copy of losetup seems to be doing. I can't see any distribution-specific patches in the source for util-linux that would do this, though. Hmm... Just had a thought: is /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GC_CVPO939201JX160AGN-part3 on your system a symlink or a device node? What does ls -l say? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn't have made us of meat. signature.asc Description: Digital signature
Re: Kernel error during btrfs balance
On Wed, Jan 26, 2011 at 10:04:02AM +0100, Erik Logtenberg wrote: Hi, It took me a couple of days, because I needed to patch my kernel first and then issue a rebalance, which ran for more than two days. Nevertheless, the rebalance succeeded without any kernel BUG-messages, so apparently your patch works! I noticed that at first, the messages were like this: [79329.526490] btrfs: found 1939 extents [79375.950834] btrfs: found 1939 extents [79376.083599] btrfs: relocating block group 352220872704 flags 1 [80052.940435] btrfs: found 3786 extents [80108.439657] btrfs: found 3786 extents [80112.325548] btrfs: relocating block group 351147130880 flags 1 Just like I saw during previous balance-runs. Then all of a sudden the messages changed to: [104178.827594] btrfs allocation failed flags 1, wanted 2013265920 [104178.827599] space_info has 4271198208 free, is not full [104178.827602] space_info total=214748364800, used=210440957952, pinned=0, reserved=36208640, may_use=3168993280, readonly=0 [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144 used 0 pinned 0 reserved [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes [104178.827612] entry offset 1855827968, bytes 20480, bitmap no [104178.827614] entry offset 1855852544, bytes 20480, bitmap no [104178.827617] block group has cluster?: no [104178.827618] 0 blocks of free space at or bigger than bytes is [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024 used 0 pinned 0 reserved [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes [104178.827626] block group has cluster?: no [104178.827628] 0 blocks of free space at or bigger than bytes is [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 used 0 pinned 0 reserved [104178.827634] block group has cluster?: no And so on. Does this indicate an error of any sort, or is this expected behaviour? As far as I know, it means that you've run out of space, and not every block group has been rewritten by the balance process. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In one respect at least, the Martians are a happy people: --- they have no lawyers. signature.asc Description: Digital signature
Re: Corrupt filesystem after power failure
:40 linux-wuce kernel: [ 341.617173] Call Trace: Feb 10 21:57:40 linux-wuce kernel: [ 341.617309] [a07d66e8] replay_one_buffer+0x2e8/0x3b0 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617421] [a07d3d85] walk_down_log_tree+0x375/0x540 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617529] [a07d4053] walk_log_tree+0x103/0x280 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617637] [a07d8223] btrfs_recover_log_trees+0x223/0x310 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617748] [a079f049] open_ctree+0x1269/0x18e0 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617793] [a077cc0e] btrfs_get_sb+0x31e/0x430 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617809] [811271e0] vfs_kern_mount+0x80/0x210 Feb 10 21:57:40 linux-wuce kernel: [ 341.617819] [811273e3] do_kern_mount+0x53/0x130 Feb 10 21:57:40 linux-wuce kernel: [ 341.617829] [81141f20] do_mount+0x200/0x250 Feb 10 21:57:40 linux-wuce kernel: [ 341.617839] [8114205a] sys_mount+0x9a/0xf0 Feb 10 21:57:40 linux-wuce kernel: [ 341.617851] [81002ffb] system_call_fastpath+0x16/0x1b Feb 10 21:57:40 linux-wuce kernel: [ 341.617863] [7fb2ed4f0ffa] 0x7fb2ed4f0ffa Feb 10 21:57:40 linux-wuce kernel: [ 341.617866] Code: f4 52 96 e0 31 c0 48 81 c4 98 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 b8 fe ff ff ff eb e7 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 ec 28 01 00 00 48 Feb 10 21:57:40 linux-wuce kernel: [ 341.617921] RIP [a07d5eb3] add_inode_ref+0x4a3/0x4b0 [btrfs] Feb 10 21:57:40 linux-wuce kernel: [ 341.617941] RSP 8800b09af8b8 Feb 10 21:57:40 linux-wuce kernel: [ 341.617972] ---[ end trace 22bed547f3298140 ]--- Feb 10 21:57:45 linux-wuce kernel: [ 346.284240] rtl8192se_update_ratr_table: ratr_index=0 ratr_table=0x0ff5 Feb 10 21:58:08 linux-wuce kernel: [ 369.474197] SetHwReg8192SE():HW_VAR_AC_PARAM eACI:0:a425 -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Someone's been throwing dead sheep down my Fun Well --- signature.asc Description: Digital signature
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Sun, Feb 13, 2011 at 05:49:42PM +0200, Marti Raudsepp wrote: Hi list! It seems I have found a serious regression in compressed btrfs in kernel 2.6.37. When creating a small file (less than the block size) and then cp/mv it to *another* file system, an appropriate number of zeroes gets written to the destination file. Case in point: [snip] I'm currently running on 2.6.37, x86_64 using Arch Linux -testing with coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o noatime,compress This only seems to occur with compressed file systems (either zlib or LZO). A person on IRC also reproduced the same problem in 2.6.28-rc. I'm pretty sure this used to work correctly around 2.6.35 or 2.6.36. This would seem to be the same effect that we've had reported on IRC by at least two Gentoo users, of files full of zeroes in their build system. We'll follow up with them over there and see if it's the same bug. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I must be musical: I've got *loads* of CDs --- signature.asc Description: Digital signature
Re: Question on subvolumes and mount options
On Sun, Feb 13, 2011 at 05:46:46PM +0100, Yuri D'Elia wrote: Hi everyone, I'm experimenting with btrfs but I have some question regarding subvolumes. First: In the / filesystem I create a subvolume named /home. As soon as the subvolume is created, I can already see the entry point in /home without having to mount it separately. Is that expected? Yes. Mounting the subvolume with mount -o subvol=home /dev/x /home also works as expected. So, which is best? Looks like mounting subvolumes is not necessary. I would recommend putting nothing in the root of the filesystem *except* subvolumes. i.e. create a root subvolume in / that contains your root filesystem, and make that the default. Then you can mount your btrfs root subvolume (i.e. the thing that contains all the other subvolumes) somewhere like /media/btrfs-root, for purposes of managing subvolumes. Is it possible to change mount options in a subvolume? Suppose I would like to use nodatasum except for /home, will the following work? mount -o nodatasum /dev/x / btrfs subvolume create /home mount -o datasum,subvol=home /dev/x I'd expect that to work, although I haven't tried it myself. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I always felt that as a C programmer, I --- was becoming typecast. signature.asc Description: Digital signature
Re: Question on subvolumes and mount options
On Sun, Feb 13, 2011 at 06:49:58PM +0100, Yuri D'Elia wrote: On Sun, 13 Feb 2011 17:30:59 +, Hugo Mills wrote: First: In the / filesystem I create a subvolume named /home. As soon as the subvolume is created, I can already see the entry point in /home without having to mount it separately. Is that expected? Yes. What happens if I mount the home subvolume into a different point, like: mount -o subvol=home /home2 and then change a file in /home (which is accessible through the default subvolume)? Will the change be reflected on both mount points? Or the inverse (change /home2)? Yes, it's the same piece of storage, just appearing at more than one point in your overall filesystem. Similar to the way that bind mounts work. So, which is best? Looks like mounting subvolumes is not necessary. I would recommend putting nothing in the root of the filesystem *except* subvolumes. i.e. create a root subvolume in / that contains your root filesystem, and make that the default. Then you can mount your btrfs root subvolume (i.e. the thing that contains all the other subvolumes) somewhere like /media/btrfs-root, for purposes of managing subvolumes. So you would recommend creating both /root and /home subvolumes, to be mounted separately, or create /root and /root/home subvolumes? The former. like to use nodatasum except for /home, will the following work? mount -o nodatasum /dev/x / btrfs subvolume create /home mount -o datasum,subvol=home /dev/x I'd expect that to work, although I haven't tried it myself. What if I remount the /home subvol into /home2. What happens when I touch a file through /home (nodatasum) and what happens when I use /home2 - since both are available at the same time? They'll stay in sync with respect to the files written to either one. I'm not sure what the behaviour of nodatasum is with different mounts of the same subvolume. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Try everything once, except incest and folk-dancing. --- signature.asc Description: Digital signature
Re: Copied Files from Btrfs partition larger than original
On Mon, Feb 14, 2011 at 12:41:46AM -0800, MOB wrote: So as I'm doing some maintenance on my personal video server, I'm noticing that when I'm copying files off of my btrfs partitions, they are getting larger... First partition is the original: http://pastebin.com/GM5xWetR I have 3 affected partitions, This appears to have started with 2.6.37 but could have started happening before. I have ~3300 video files where ~840 are on btrfs partitions that randomly get shuffled on/off for free space distribution Pastebins aren't forever. For the archives: -- (begin) ls -lah /mnt/store-p00/1280x720/~NCIS\ Los\ Angeles~2010-05-11~720.mov -rw-rw-r-- 1 root hdhr 1.7G Nov 6 18:39 /mnt/store-p00/1280x720/~NCIS Los Angeles~2010-05-11~720.mov ls -lah /hdhr/demux/1280x720/~NCIS Los Angeles~2010-05-11~720.mov -rw-rw-r-- 1 root hdhr 3.7G Nov 6 18:39 /hdhr/demux/1280x720/~NCIS Los Angeles~2010-05-11~720.mov -- (end) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Attempted murder, now honestly, what is that? Do they give a --- Nobel Prize for attempted chemistry? signature.asc Description: Digital signature
Re: combining 2 RAID 10 pools to one filesystem
On Wed, Feb 16, 2011 at 10:50:57PM +0200, Gal Buki wrote: I have RAID 10 using 4 times 500GB drives (1TB of storage). Is it possible to create another RAID 10 with 4 times 250GB drives (500GB of storage) and then combine those two RAIDs to one file system so that I would be able to get 1.5TB? If I create one RAID 10 with all 8 drives I would only be able to use 8 times 250GB /2 = 1TB, right? No, just add the new drives to the existing btrfs pool, and run a balance, and you should get a btrfs filesystem with 1.5TB of mirrored/striped storage. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Anyone who claims their cryptographic protocol is secure is --- either a genius or a fool. Given the genius/fool ratio for our species, the odds aren't good. signature.asc Description: Digital signature
Re: Space used by snapshot
On Thu, Feb 17, 2011 at 12:13:53PM +0100, Roman Kapusta wrote: Hello all, Is there any way how to obtain information how much space is physically allocated by given subvolume? I cannot find any. I'm interested in two values: - physical space allocated by SUBVOLUME INCLUDING all space shared by other subvolumes - physical space allocated by SUBVOLUME EXCLUDING all space shared by other subvolumes Currently I can use only du, which is not reporting what I want to know. Not at the moment. It shouldn't be too difficult to implement (certainly to implement the latter), but it's just not been done yet. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What do you give the man who has everything? -- Penicillin is --- a good start... signature.asc Description: Digital signature
Re: raid5 - again
On Sat, Feb 19, 2011 at 10:11:30PM +0100, Roy Sigurd Karlsbakk wrote: It's been some two years since I read about the becoming of raid5 etc in btrfs. Since the code is available in linux, why isn't this already in btrfs? Is Oracle holding back? It's about resourcing and stability. Oracle only employ one person (AFAIK) on btrfs -- Chris Mason. He does a sterling job of maintaining and developing the filesystem, but there is only one of him. Since well before December, he's been working on a functional fsck, trying to get it to a state where it won't demolish your filesystem even more than it already is. This has left the work to integrate the RAID-5/6 patches behind. He's also been working hard on fixing a great many other stability issues as they're reported, and integrating patches from other developers. I believe that RAID-5/6 is the next major piece of work that Chris is intending to integrate, once fsck is ready. However, stability is better than features at this point. I don't see any commercial benefit in preventing the integration and deployment of the RAID-5/6 patches, but simply that there's other things that are more important. Please be patient. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Startle, startle, little twink. How I wonder what you think. --- signature.asc Description: Digital signature
Re: Recovering parent transid verify failed
On Sun, Mar 06, 2011 at 12:28:41PM +0200, Yo'av Moshe wrote: Hey, I'd start by saying that I know Btrfs is a still experimental, and so there's no guarantee that one would be able to help me at all... But I thought I'll try anyway :-) Few months ago I bought a new laptop and installed ArchLinux on it, with Btrfs on the root filesystem... I know, it's not the smartest thing to do... After a few month I had issues with my hibernations scripts, and one day I tried to hibernate my computer but it didn't go that well, and, well, ever since then my Btrfs partition is not accessible. I opened up the Btrfs FAQ and saw that the fsck tool should be out by the end of 2010, and thought oh well, I could wait until then, and went on and installed Ubuntu with Ext4 on another small partition. But times goes one and the fsck tool is still in development... I've tried using the code from GIT and it didn't work, and I'm starting to wonder (a) if there's any hope at all and (b) what other step am I able to do to recover my old Btrfs partition. Yes, there is hope. This error should be fixable with the new fsck. When trying to mount the Btrfs parition I get this in dmesg: [105252.779080] device fsid d14e78a602757297-bf762d859b406ca9 devid 1 transid 135714 /dev/sda4 [105252.818697] parent transid verify failed on 216925220864 wanted 135714 found 135713 [snip] Should I wait for btrfsck to be ready? Yes. Am I not using it correctly now? No, there's not a lot the current version can do right now. Is there anyway to recover this partition or should I just wipe it and reinstall Btrfs only when I'm supposed to?.. Your help is appreciated. HTH, Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am the author. You are the audience. I outrank you! --- signature.asc Description: Digital signature