Re: syslog message repeated 3x

2013-05-09 Thread Hugo Mills
On Thu, May 09, 2013 at 11:26:25AM +0200, Toralf Förster wrote:
 I'm just curious why the last of the following 3 commands :
 
 $ dd if=/dev/zero of=/mnt/ramdisk/disk1 bs=1M count=257
 $ yes | /sbin/mkfs.btrfs /mnt/ramdisk/disk1
 $ mount -o loop /mnt/ramdisk/disk1 /mnt/t
 
 gives 3x the same log message :
 
 2013-05-09T11:23:00.230+02:00 n22 kernel: device fsid 
 5b4be7c4-e662-459a-a2a7-066e9384c901 devid 1 transid 4 /dev/loop1
 2013-05-09T11:23:00.581+02:00 n22 kernel: device fsid 
 5b4be7c4-e662-459a-a2a7-066e9384c901 devid 1 transid 4 /dev/loop1
 2013-05-09T11:23:00.583+02:00 n22 kernel: device fsid 
 5b4be7c4-e662-459a-a2a7-066e9384c901 devid 1 transid 4 /dev/loop1

   At a guess, two of those are probably from btrfs dev scan triggered
by udev.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Well, sir, the floor is yours.  But remember, the ---
  roof is ours!  


signature.asc
Description: Digital signature


Re: syslog message repeated 3x

2013-05-09 Thread Hugo Mills
On Thu, May 09, 2013 at 12:37:38PM +0200, Toralf Förster wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 05/09/2013 12:04 PM, Hugo Mills wrote:
  At a guess, two of those are probably from btrfs dev scan
  triggered by udev.
 Those messages do only appear for a btrfs, not if I choose ext4.

   They're from the btrfs kernel module, so that's hardly surprising. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You know... I'm sure this code would seem a lot better if I ---   
 never tried running it. 


signature.asc
Description: Digital signature


Re: syslog message repeated 3x

2013-05-09 Thread Hugo Mills
On Thu, May 09, 2013 at 02:45:00PM +0200, Toralf Förster wrote:
 On 05/09/2013 01:47 PM, Wang Shilong wrote:
  Anyway, i use the latest btrfs-progs.
 
 well, under Gentoo I used sys-fs/btrfs-progs- which points always to the 
 latest git version :
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
 
 My host kernel is stable 3.9.1
 
 The mount command still gives :
 
 2013-05-09T14:43:35.604+02:00 n22 kernel: device fsid 
 20a30a6b-a82f-429f-b426-00f1739e4d3d devid 1 transid 8 /dev/loop1
 2013-05-09T14:43:35.604+02:00 n22 kernel: device fsid 
 20a30a6b-a82f-429f-b426-00f1739e4d3d devid 1 transid 8 /dev/loop1
 2013-05-09T14:43:35.608+02:00 n22 kernel: btrfs: disk space caching is enabled
 2013-05-09T14:43:35.660+02:00 n22 kernel: device fsid 
 20a30a6b-a82f-429f-b426-00f1739e4d3d devid 1 transid 8 /dev/loop1

   I guess the main question is... why is this a problem for you?

   The message is informational and doesn't indicate any kind of issue
with the FS. I'd just ignore it/them.

   (Also, are you running btrfs dev scan beforehand or not? It'd be
interesting to see the difference in your logs -- particularly with
timestamps -- when you do that.)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I am the author. You are the audience. I outrank you! --- 


signature.asc
Description: Digital signature


Re: Btrfs balance invalid argument error

2013-05-10 Thread Hugo Mills
On Fri, May 10, 2013 at 10:07:56PM +0200, Marcus Lövgren wrote:
 Hi list,
 
 I am using kernel 3.9.0, btrfs-progs 0.20-rc1-253-g7854c8b.
 
 I have a three disk array of level single:
 
 # btrfs fi sh
 Label: none  uuid: 2e905f8f-e525-4114-afa6-cce48f77b629
 Total devices 3 FS bytes used 3.80TB
 devid1 size 2.73TB used 2.25TB path /dev/sdd
 devid2 size 2.73TB used 1.55TB path /dev/sdc
 devid3 size 2.73TB used 0.00 path /dev/sdb
 
 Btrfs v0.20-rc1-253-g7854c8b
 
 # btrfs fi df /mnt/data
 Data: total=3.79TB, used=3.79TB
 System: total=4.00MB, used=420.00KB
 Metadata: total=6.01GB, used=4.87GB
 
 
 When running
 # btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/data
 
 I get
 
 ERROR: error during balancing '/mnt/data' - Invalid argument
 There may be more info in syslog - try dmesg | tail
 
 dmesg | tail says:
 
 btrfs: unable to start balance with target data profile 128
 
 Isn't it possible to convert raid level to raid5?

   Yes, it should be possible. It looks like the kernel's got a
problem with it, which is odd because 3.9 should know about RAID-5.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I think that everything darkling says is actually a joke. ---
 It's just that we haven't worked out most of them yet.  


signature.asc
Description: Digital signature


Re: Btrfs balance invalid argument error

2013-05-10 Thread Hugo Mills
On Fri, May 10, 2013 at 11:43:34PM +0200, Marcus Lövgren wrote:
 Yes, you were right! Adding another drive to the array made it continue
 without errors. Is this already reported as a bug?

   I believe it has been, yes. I think we've even had a patch out for
it. I haven't looked to see if it's got into 3.10.

   Hugo.

 Thanks for the help,
 Marcus
 
 
 2013/5/10 Remco Hosman - Yerf IT re...@yerf-it.nl
 
  On May 10, 2013, at 10:21 PM, Hugo Mills h...@carfax.org.uk wrote:
 
   On Fri, May 10, 2013 at 10:07:56PM +0200, Marcus Lövgren wrote:
   Hi list,
  
   I am using kernel 3.9.0, btrfs-progs 0.20-rc1-253-g7854c8b.
  
   I have a three disk array of level single:
  
   # btrfs fi sh
   Label: none  uuid: 2e905f8f-e525-4114-afa6-cce48f77b629
  Total devices 3 FS bytes used 3.80TB
  devid1 size 2.73TB used 2.25TB path /dev/sdd
  devid2 size 2.73TB used 1.55TB path /dev/sdc
  devid3 size 2.73TB used 0.00 path /dev/sdb
  
   Btrfs v0.20-rc1-253-g7854c8b
  
   # btrfs fi df /mnt/data
   Data: total=3.79TB, used=3.79TB
   System: total=4.00MB, used=420.00KB
   Metadata: total=6.01GB, used=4.87GB
  
  
   When running
   # btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/data
  
   I get
  
   ERROR: error during balancing '/mnt/data' - Invalid argument
   There may be more info in syslog - try dmesg | tail
  
   dmesg | tail says:
  
   btrfs: unable to start balance with target data profile 128
  
   Isn't it possible to convert raid level to raid5?
  
 Yes, it should be possible. It looks like the kernel's got a
   problem with it, which is odd because 3.9 should know about RAID-5.
  
 
  Wasn't there some issues that the kernel or tools wanted 4 disks when
  converting to raid5?
 
  Remco
 
 Hugo.
  

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Strive for apathy! ---


signature.asc
Description: Digital signature


Re: unlinked 10 orphans - something to worry about?

2013-05-11 Thread Hugo Mills
On Sat, May 11, 2013 at 02:27:27PM +0200, Clemens Eisserer wrote:
 Hi,
 
 I frequently get messages like unlinked 10 orphans in syslog
 (running linux 3.9.1), although I have never had a power outage nor a
 kernel crash.
 Is this something to worry about, or just a usual clean-up information?

   It's just information about a clean-up. Totally harmless.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Happiness is mandatory.  Are you happy? --- 


signature.asc
Description: Digital signature


Re: RADI6 questions

2013-06-01 Thread Hugo Mills
On Sat, Jun 01, 2013 at 02:07:53PM -0700, ronnie sahlberg wrote:
 Hi List,
 
 I have a filesystem that is spanning about 10 devices.
 It is currently using RAID1 for both data and metadata.
 
 In order to get higher availability and be able to handle multi device 
 failures
 I would like to change from RAID1 to RAID6.
 
 
 Is it possible/stable/supported/recommended to change data from RAID1 to 
 RAID6 ?
 (I assume btrfs fi balance ...  is used for this?)

   Yes.

 Metadata is currently RAID1, is it supported to put metadata as RAID6 too?
 It would be odd to have lesser protection for metadata than data.
 Optimally I would like a mode where metadata is mirrored onto all the
 spindles in the filesystem, not just 2 in RAID1 or n in RAID6.

   Yes, that should be supported.

 Im running a 3.8.0 kernel.

   The btrfs RAID-5 and RAID-6 implementations aren't really ready for
production use, so right now I wouldn't recommend using them for
anything other than for testing purposes with data that's replacable.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- w.w.w.  : England's batting scorecard ---  


signature.asc
Description: Digital signature


Re: RAID10 total capacity incorrect

2013-06-02 Thread Hugo Mills
On Sun, Jun 02, 2013 at 05:17:11PM +0100, Tim Eggleston wrote:
 Hi list,
 
 I have a 4-device RAID10 array of 2TB drives on btrfs. It works
 great. I recently added an additional 4 drives to the array. There
 is only about 2TB in use across the whole array (which should have
 an effective capacity of about 8TB). However I have noticed that
 when I issue btrfs filesystem df against the mountpoint, in the
 total field, I get the same value as the used field:
 
 root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0
 Data, RAID10: total=2.06TB, used=2.06TB
 System, RAID10: total=64.00MB, used=188.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID10: total=3.00GB, used=2.29GB
 
 Here's my btrfs filesystem show:
 
 root@mckinley:/# btrfs fi show
 Label: 'btrfsvol0'  uuid: 1a735971-3ad7-4046-b25b-e834a74f2fbb
   Total devices 8 FS bytes used 2.06TB
   devid7 size 1.82TB used 527.77GB path /dev/sdk1
   devid8 size 1.82TB used 527.77GB path /dev/sdg1
   devid6 size 1.82TB used 527.77GB path /dev/sdi1
   devid5 size 1.82TB used 527.77GB path /dev/sde1
   devid4 size 1.82TB used 527.77GB path /dev/sdj1
   devid2 size 1.82TB used 527.77GB path /dev/sdf1
   devid1 size 1.82TB used 527.77GB path /dev/sdh1
   devid3 size 1.82TB used 527.77GB path /dev/sdc1

   You have 8*527.77 GB = 4222.16 GB of raw space allocated for all
purposes. Since RAID-10 takes twice the raw bytes to store data, that
gives you 2111.08 GB of usable space so far.

   From the df output, 2.06 TB ~= 2109.44 GB is allocated as data, and
all of that space is used. 3.00 GB is allocated as metadata, and most
of that is used. That adds up (within rounding errors) to the 2111.08
GB above.

   Additional space will be allocated from the available unallocated
space as the FS needs it.

 This is running the Ubuntu build of kernel 3.9.4 and btrfs-progs
 from git (v0.20-rc1-324-g650e656).
 
 Am I being an idiot and missing something here? I must admit that I
 still find the df output a bit cryptic (entirely my failure to
 understand, nothing else), but on another system with only a single
 device the total field returns the capacity of the device.

   That's probably already fully-allocated, so used=size in btrfs fi
show. If it's a single device, then you're probably not using any
replication, so the raw storage is equal to the possible storage.

   HTH,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I can resist everything except temptation ---


signature.asc
Description: Digital signature


Re: RAID10 total capacity incorrect

2013-06-02 Thread Hugo Mills
On Sun, Jun 02, 2013 at 05:52:38PM +0100, Tim Eggleston wrote:
 Hi Hugo,
 
 Thanks for your reply, good to know it's not an error as such (just
 me being an idiot!).
 
 Additional space will be allocated from the available unallocated
 space as the FS needs it.
 
 So I guess my question becomes, how much of that available
 unallocated space do I have? Instinctively the btrfs df output feels
 like it's missing an equivalent to the size column from vanilla
 df.

   Look at btrfs fi show -- you have size and used there, so the
difference there will give you the unallocated space.

 Is there a method of getting this in a RAID situation? I understand
 that btrfs RAID is more complicated than md RAID, so it's ok if the
 answer at this point is no...

   Not in any obvious (and non-surprising) way. Basically, any way you
could work it out is going to give someone a surprise because they
were thinking of it some other way around. The problem is that until
the space is allocated, the FS can't know how that space needs to be
allocated (to data/metadata, or with what replication type and hence
overheads), so we can't necessarily give a reliable estimate.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- If you're not part of the solution, you're part --- 
   of the precipiate.


signature.asc
Description: Digital signature


Re: RAID10 total capacity incorrect

2013-06-02 Thread Hugo Mills
On Sun, Jun 02, 2013 at 12:52:40PM -0400, Chris Murphy wrote:
 
 On Jun 2, 2013, at 12:17 PM, Tim Eggleston li...@timeggleston.co.uk wrote:
  
  root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0
  Data, RAID10: total=2.06TB, used=2.06TB
  System, RAID10: total=64.00MB, used=188.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID10: total=3.00GB, used=2.29GB
  
  
  Am I being an idiot and missing something here? 

 No, it's confusing. btrfs fi df doesn't show free space. The first
 value is what space the fs has allocated for the data usage type,
 and the 2nd value is how much of that allocation is actually being
 used. I personally think the allocated value is useless for mortal
 users. I'd rather have some idea of what free space I have left, and
 the regular df command presents this in an annoying way also because
 it shows the total volume size, not accounting for the double
 consumption of raid1. So no matter how you slice it, it's confusing.

   It's the nature of the beast, unfortunately. So far, nobody's
managed to come up with a simple method of showing free space and
space usage that isn't going to be misleading somehow.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- If you're not part of the solution, you're part --- 
   of the precipiate.


signature.asc
Description: Digital signature


Re: oops at mount

2013-06-03 Thread Hugo Mills
On Mon, Jun 03, 2013 at 01:56:10PM +0200, Papp Tamas wrote:
 On 05/30/2013 02:55 PM, Stefan Behrens wrote:
 
 On Thu, 30 May 2013 08:32:35 -0400, Josef Bacik wrote:
 On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote:
 hi All,
 
 I'm new on the list.
 
 System:
 Distributor ID:Ubuntu
 Description:   Ubuntu 13.04
 Release:   13.04
 Codename:  raring
 
 Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 
 x86_64 x86_64 x86_64 GNU/Linux
 
 The symptom is the same with Saucy 3.9 kernel.
 
 Can you try btrfs-next
 
 git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
 
 if it's still not fixed please file a bug at bugzilla.kernel.org and make 
 sure
 the component is set to btrfs.  Thanks,
 
 Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel
 X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with
 the ID INTEL SSDSA2M040, I've tested whether they honor the flush
 request. And these two SSDs don't do so, they ignore it. If you cut the
 power after a flush request completes, the data that was written before
 the flush request is gone, the write cache was _not_ flushed.
 
 You can only disable the write cache during/after every boot hdparm -W
 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or
 avoid such SSDs, or prepare to restore from backup occasionally.
 
 Basically it means it's not safe to use this SSD?

   Correct.

 I used it for 2 years with ext4 without any issue, before I switched
 to btrfs (on the root partition). In the meantime btrfs also was
 quite stable on my /data partition.
 
 After I reinstalled thr system with btrfs, this issue happened two times.
 But anyway, I thought cow should be able to handle these kind of issues by 
 design. Am I wrong?

   CoW writes out everything that's going to be changed first, and
finally writes one piece of data which points to the new version of
the data. *Provided* you can guarantee that the final piece of data
(the superblock) gets written only after everything else has made it
to permanent storage, then everything is good.

   However, most hardware (and most operating systems) reorder the
data which is being sent to the disk, for performance reasons. This is
fine, as long as you can enforce the dependency in some way -- this is
what barriers/flushes do: they say ensure that all of this is fully
written out to real permanent storage before you try to write the
superblock.

   If the hardware ignores flushes or barriers, there's no mechanism
for ensuring that the data is fully consistent, because you may find
that the superblock gets reordered to be written before some of the
other writes to the device. If that happens and then the power gets
cut before the rest of the data can be written, you have a corrupt
filesystem.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- In theory, theory and practice are the same. In --- 
  practice,  they're different.  


signature.asc
Description: Digital signature


Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Hugo Mills
 nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323
 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner xt_NFQUEUE xt_NFLOG
 nfnetlink_log xt_multiport xt_mar
 k xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP
 xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp
 xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
 nf_defrag_i
 pv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables
 x_tables bridge stp llc rtc snd_hda_codec_realtek fbcon bitblit
 softcursor font nouveau video mxm_wmi cfbfillrect cfbimgblt cfbcopyarea
 i2c_algo_bit evdev d
 rm_kms_helper snd_hda_intel ttm snd_hda_codec drm i2c_piix4 pcspkr
 snd_pcm serio_raw snd_page_alloc snd_timer k8temp snd i2c_core processor
 button thermal_sys sky2 wmi backlight fb fbdev pata_acpi firewire_ohci
 firewire_cor
 e pata_atiixp usbhid pata_jmicron sata_sil24
 kernel: Pid: 10980, comm: btrfs-transacti Tainted: GW
 3.8.13-gentoo #1
 kernel: Call Trace:
 kernel: [811d3600] ? btrfs_printk+0x12/0xc2
 kernel: [810289c8] ? warn_slowpath_common+0x78/0x8c
 kernel: [81028a74] ? warn_slowpath_fmt+0x45/0x4a
 kernel: [811d5e00] ? btrfs_release_path+0x5e/0x79
 kernel: [811d36ed] ? __btrfs_abort_transaction+0x3d/0xad
 kernel: [811ed97b] ? btrfs_save_ino_cache+0x1d4/0x348
 kernel: [8142ce4c] ? commit_fs_roots.isra.25+0xa1/0x14a
 kernel: [81237a0f] ? btrfs_scrub_pause+0xd5/0xe4
 kernel: [811f4f1a] ? btrfs_commit_transaction+0x3f9/0x93c
 kernel: [810427f0] ? abort_exclusive_wait+0x79/0x79
 kernel: [811f5a8c] ? start_transaction+0x311/0x408
 kernel: [811eed7e] ? transaction_kthread+0xd1/0x16d
 kernel: [811eecad] ? btrfs_alloc_root+0x34/0x34
 kernel: [810420b3] ? kthread+0xad/0xb5
 kernel: [81042006] ? __kthread_parkme+0x5e/0x5e
 kernel: [814315ac] ? ret_from_fork+0x7c/0xb0
 kernel: [81042006] ? __kthread_parkme+0x5e/0x5e
 kernel: ---[ end trace b584e8ceb6422945 ]---
 kernel: BTRFS error (device sdf) in btrfs_save_ino_cache:471: error 28
 kernel: btrfs is forced readonly
 kernel: BTRFS warning (device sdf): Skipping commit of aborted transaction.
 kernel: BTRFS error (device sdf) in cleanup_transaction:1391: error 28
 
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I'm all for giving people enough rope to shoot themselves in ---   
   the foot -- Andreas Dilger


signature.asc
Description: Digital signature


Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Hugo Mills
On Wed, Jun 05, 2013 at 04:28:33PM +0100, Martin wrote:
 On 05/06/13 16:05, Hugo Mills wrote:
  On Wed, Jun 05, 2013 at 03:57:42PM +0100, Martin wrote:
  Dear Devs,
  
  I have x4 4TB HDDs formatted with:
  
  mkfs.btrfs -L bu-16TB_0 -d raid1 -m raid1 /dev/sd[cdef]
  
  
  /etc/fstab mounts with the options:
  
  noatime,noauto,space_cache,inode_cache
  
  
  All on kernel 3.8.13.
  
  
  Upon using rsync to copy some heavily hardlinked backups from
  ReiserFS, I've seen:
  
  
  The following block rsv returned -28 is repeated 7 times until
  there is a call trace for:
  
  This is ENOSPC. Can you post the output of btrfs fi df 
  /mountpoint and btrfs fi show, please?
 
 
 btrfs fi df:
 
 Data, RAID1: total=2.85TB, used=2.84TB
 Data: total=8.00MB, used=0.00
 System, RAID1: total=8.00MB, used=412.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=27.00GB, used=25.82GB
 Metadata: total=8.00MB, used=0.00
 
 
 btrfs fi show:
 
 Label: 'bu-16TB_0'  uuid: 8fd9a0a8-9109-46db-8da0-396d9c6bc8e9
 Total devices 4 FS bytes used 2.87TB
 devid4 size 3.64TB used 1.44TB path /dev/sdf
 devid3 size 3.64TB used 1.44TB path /dev/sde
 devid1 size 3.64TB used 1.44TB path /dev/sdc
 devid2 size 3.64TB used 1.44TB path /dev/sdd

   OK, so you've got plenty of space to allocate. There were some
issues in this area (block reserves and ENOSPC, and I think
specifically addressing the issue of ENOSPC when there's space
available to allocate) that were fixed between 3.8 and 3.9 (and
probably some between 3.9 and 3.10-rc as well), so upgrading your
kernel _may_ help here.

   Something else that may possibly help as a sticking-plaster is to
write metadata more slowly, so that you don't have quite so much of it
waiting to be written out for the next transaction. Practically, this
may involve things like running sync on a loop. But it's definitely
a horrible hack that may help if you're desperate for a quick fix
until you can finish creating metadata so quickly and upgrade your
kernel...

   Hugo.

 And df -h:
 
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sde 15T  5.8T  8.9T  40% /mnt/sata16
 
 
 
 
  WARNING: at fs/btrfs/super.c:256
  __btrfs_abort_transaction+0x3d/0xad().
  
  Then, the mount is set read-only.
  
  
  How to fix or debug?
  
  Thanks, Martin
  
  
  
  kernel: [ cut here ] kernel: WARNING: at
  fs/btrfs/extent-tree.c:6372 btrfs_alloc_free_block+0xd3/0x29c() 
  kernel: Hardware name: GA-MA790FX-DS5 kernel: btrfs: block rsv
  returned -28 kernel: Modules linked in: raid456 async_raid6_recov
  async_memcpy async_pq async_xor xor async_tx raid6_pq act_police
  cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb
  sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter
  xt_statistic xt_CT xt_LOG xt_time xt_connlimit xt_realm
  xt_addrtype xt_comment xt_recent xt_policy xt_nat ipt_ULOG
  ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set 
  ip_set nf_nat _tftp nf_nat_snmp_basic nf_conntrack_snmp
  nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323
  nf_nat_ftp nf_conntrack_tftp nf_conntrack_sip
  nf_conntrack_proto_udplite nf_conntrack_proto_sctp 
  nf_conntrack_pptp nf_ conntrack_proto_gre nf_conntrack_netlink
  nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc
  nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_owner
  xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mar k xt_mac
  xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP 
  xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT
  xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat
  nf_conntrack_ipv4 nf_defrag_i pv4 nf_conntrack iptable_mangle
  nfnetlink iptable_filter ip_tables x_tables bridge stp llc rtc
  snd_hda_codec_realtek fbcon bitblit softcursor font nouveau video
  mxm_wmi cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit evdev d 
  rm_kms_helper snd_hda_intel ttm snd_hda_codec drm i2c_piix4
  pcspkr snd_pcm serio_raw snd_page_alloc snd_timer k8temp snd
  i2c_core processor button thermal_sys sky2 wmi backlight fb fbdev
  pata_acpi firewire_ohci firewire_cor e pata_atiixp usbhid
  pata_jmicron sata_sil24 kernel: Pid: 10980, comm: btrfs-transacti
  Not tainted 3.8.13-gentoo #1 kernel: Call Trace: kernel:
  [811e6600] ? btrfs_init_new_buffer+0xef/0xf6 kernel:
  [810289c8] ? warn_slowpath_common+0x78/0x8c kernel:
  [81028a74] ? warn_slowpath_fmt+0x45/0x4a kernel:
  [81278f2c] ? ___ratelimit+0xc4/0xd0 kernel:
  [811e66da] ? btrfs_alloc_free_block+0xd3/0x29c kernel:
  [811d68e5] ? __btrfs_cow_block+0x136/0x454 kernel:
  [811f0d47] ? btrfs_buffer_uptodate+0x40/0x56 kernel:
  [811d6d8c] ? btrfs_cow_block+0x132/0x19d kernel:
  [811da606] ? btrfs_search_slot+0x2f5/0x624 kernel:
  [811dbc5a] ? btrfs_insert_empty_items+0x5c/0xaf kernel:
  [811e5089] ? run_clustered_refs+0x852/0x8e6 kernel

Re: btrfs raid1 on 16TB goes read-only after btrfs: block rsv returned -28

2013-06-05 Thread Hugo Mills
On Wed, Jun 05, 2013 at 04:59:57PM +0100, Martin wrote:
 On 05/06/13 16:43, Hugo Mills wrote:
  On Wed, Jun 05, 2013 at 04:28:33PM +0100, Martin wrote:
  btrfs fi df:
  
  Data, RAID1: total=2.85TB, used=2.84TB Data: total=8.00MB,
  used=0.00 System, RAID1: total=8.00MB, used=412.00KB System:
  total=4.00MB, used=0.00 Metadata, RAID1: total=27.00GB,
  used=25.82GB Metadata: total=8.00MB, used=0.00
  
  
  btrfs fi show:
  
  Label: 'bu-16TB_0'  uuid: 8fd9a0a8-9109-46db-8da0-396d9c6bc8e9 
  Total devices 4 FS bytes used 2.87TB devid4 size 3.64TB used
  1.44TB path /dev/sdf devid3 size 3.64TB used 1.44TB path
  /dev/sde devid1 size 3.64TB used 1.44TB path /dev/sdc devid
  2 size 3.64TB used 1.44TB path /dev/sdd
  
 
 Thanks for that. I can give kernel 3.9.4 a try. For a giggle, I'll try
 first with nice 19 and syncs in a loop...
 
 
 One confusing bit is why the Data, RAID1: total=2.85TB from btrfs
 fi df?

   Because you've got enough raw space allocated for 2.85 TiB of data;
that's 5.7 TiB of actual bytes, because you're using RAID-1 for it.
That should add up to somewhere near the total of the used values in
btrfs fi show. The difference will be accounted for in metadata,
system, and the inevitable rounding errors. All the values are shown
in powers-of-two -- i.e. IEEE units, not SI units despite the use of
SI prefixes.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- All hope abandon,  Ye who press Enter here. ---   


signature.asc
Description: Digital signature


Re: Moved partition via dd

2013-06-09 Thread Hugo Mills
On Sun, Jun 09, 2013 at 12:44:23PM +0200, André Schlichting wrote:
 Am 09.06.2013 00:57, schrieb Chris Murphy:
 The next issue:
 
 if=/dev/sdc2 skip=$((245547520-33024)) seek=0 of=/dev/sdc2
 
 You have a skip (skip n block from input) value well inside of sdc2. It 
 seems you should have skipped from sdc not sdc2, and should have used the 
 old start value for sdc2 which was just 245547520, and you needed to specify 
 a count value in order to get the correct number of blocks, which would have 
 been 732566527-245547520. Then write those blocks to sdc2 (which makes seek= 
 unnecessary).
 
 
 Chris Murphy
 
 
 /dev/sdc2 at this moment was already the new partition with
 boundaries 33024 to 732566640 with the old partition inside.
 Therefore I used skip=old start - new start, which inside of sdc2
 points to the start of the old partition. I didn't worry about the
 count, because the partition was at the end of the disk.
 
 I actually think that the move of the partition was no problem. I
 guess that btrfs has some absolute references which have to be
 adjusted and now has some problems with sectors not at the right
 place.

   No, it doesn't. All the position values in the FS are either
relative to the containing block device (i.e. the partition, in this
case), or are based on an internal virtual address space -- which is
itself mapped in terms of the containing block device(s).

 The following error from btrfsck
  Check tree block failed, want=959572647936, have=13587293097915834379
 suggests that 959572647936 is a way off...

   That just says to me that you've got garbage metadata -- usually a
good indication that there's some file data where there should be
metadata, which would further suggest that you've somehow moved the
wrong data (or the right data into the wrong place).

 Maybe first, the principal question: Can one just move a
 btrfs-partition to the left by
 * delete partition
 * create partition moved
 * dd data from old to new partition
 Or does one have to adjust some references inside the btrfs filesystem?

   In theory, that process should be safe. In fact, I'm not aware of
*any* filesystem which is dependent on the position of the partition
within a larger device.

   I think at this point, you should try testdisk to see if it can
identify your FS's superblock. If that doesn't work, then restore from
backup is likely to be your fastest route to recovery.

   Hugo.

http://www.cgsecurity.org/wiki/TestDisk

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I get nervous when I see words like 'mayhaps' in a novel, ---
because I fear that just round the corner
  is lurking 'forsooth'  


signature.asc
Description: Digital signature


Re: raid0, raid1, raid5, what to choose?

2013-06-13 Thread Hugo Mills
On Thu, Jun 13, 2013 at 11:09:00PM +0200, Hendrik Friedel wrote:
 Hello,
 
 I'd appreciate your recommendation on this:
 
 I have three hdd with 3TB each. I intend to use them as raid5 eventually.
 currently I use them like this:
 
 # mount|grep sd
 /dev/sda1 on /mnt/Datenplatte type ext4
 /dev/sdb1 on /mnt/BTRFS/Video type btrfs
 /dev/sdb1 on /mnt/BTRFS/rsnapshot type btrfs
 
 #df -h
 /dev/sda1   2,7T  1,3T  1,3T  51% /mnt/Datenplatte
 /dev/sdb1   5,5T  5,4T   93G  99% /mnt/BTRFS/Video
 /dev/sdb1   5,5T  5,4T   93G  99% /mnt/BTRFS/rsnapshot
 
 Now, what surprises me, and here I lack memory- is that sdb appears
 twice.. I think, I created a raid1, but how can I find out?

   Appearing twice in that list is more an indication that you have
multiple subvolumes -- check the subvol= options in /etc/fstab

 #/usr/local/smarthome# ~/btrfs/btrfs-progs/btrfs fi show /dev/sdb1
 Label: none  uuid: 989306aa-d291-4752-8477-0baf94f8c42f
 Total devices 2 FS bytes used 2.68TB
 devid2 size 2.73TB used 2.73TB path /dev/sdc1
 devid1 size 2.73TB used 2.73TB path /dev/sdb1
 
 Now, I wanted to convert it to raid0, because I lack space and
 redundancy is not important for the Videos and the Backup, but this
 fails:
 ~/btrfs/btrfs-progs/btrfs fi balance start -dconvert=raid0  /mnt/BTRFS/
 ERROR: error during balancing '/mnt/BTRFS/' - Inappropriate ioctl for device

   /mnt/BTRFS isn't a btrfs subvol, according to what you have listed
above. It's a subdirectory in /mnt which is contains two subdirs
(Video and rsnapshot) which are used as mountpoints for subvolumes.

   Try running the above command with /mnt/BTRFS/Video instead (or
rsnapshot -- it doesn't matter which).

 dmesg does not help here.
 
 Anyway: This gave me some time to think about this. In fact, as soon
 as raid5 is stable, I want to have all three as a raid5. Will this
 be possible with a balance command? If so: will this be possible as
 soon as raid5 is stable, or will I have to wait longer?

   Yes, it's possible to convert to RAID-5 right now -- although the
code's not settled down into its final form quite yet. Note that
RAID-5 over two devices won't give you any space benefits over RAID-1
over two devices. (Or any reliability benefits either).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Are you the man who rules the Universe? Well,  I ---   
  try not to.   


signature.asc
Description: Digital signature


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote:
 Hi,
 
 I've observed a rather strange behaviour while trying to mount two
 identical copies of the same image to different mount points.
 Each modification to one image is also performed in the second one.
 
 Example:
 dd if=/dev/sda? of=image1 bs=1M
 cp image1 image2
 mount -o loop image1 m1
 mount -o loop image2 m2
 
 touch m2/hello
 ls -la m1  //will now also include a file calles hello
 
 Is this behaviour intentional and known or should I create a bug-report?

   It's known, and not desired behaviour. The problem is that you've
ended up with two filesystems with the same UUID, and the FS code gets
rather confused about that. The same problem exists with LVM snapshots
(or other block-device-layer copies).

   The solution is a combination of a tool to scan an image and change
the UUID (offline), and of some code in the kernel that detects when
it's being told about a duplicate image (rather than an additional
device in the same FS). Neither of these has been written yet, I'm
afraid.

 I've deleted quite a bunch of files on my production system because of this...

   Oops. I'm sorry to hear that. :(

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Welcome to Rivendell,  Mr Anderson... ---  


signature.asc
Description: Digital signature


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 10:22:07AM +, Gabriel de Perthuis wrote:
 On Thu, 20 Jun 2013 10:16:22 +0100, Hugo Mills wrote:
  On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote:
  Hi,
  
  I've observed a rather strange behaviour while trying to mount two
  identical copies of the same image to different mount points.
  Each modification to one image is also performed in the second one.
 
  touch m2/hello
  ls -la m1  //will now also include a file calles hello
  
  Is this behaviour intentional and known or should I create a bug-report?
  
 It's known, and not desired behaviour. The problem is that you've
  ended up with two filesystems with the same UUID, and the FS code gets
  rather confused about that. The same problem exists with LVM snapshots
  (or other block-device-layer copies).
  
 The solution is a combination of a tool to scan an image and change
  the UUID (offline), and of some code in the kernel that detects when
  it's being told about a duplicate image (rather than an additional
  device in the same FS). Neither of these has been written yet, I'm
  afraid.
 
 To clarify, the loop devices are properly distinct, but the first
 device ends up mounted twice.
 
 I've had a look at the vfs code, and it doesn't seem to be uuid-aware,
 which makes sense because the uuid is a property of the superblock and
 the fs structure doesn't expose it.  It's a Btrfs problem.

   Yes, it is. (I didn't intend, however obliquely, to imply that it
wasn't).

 Instead of redirecting to a different block device, Btrfs could and
 should refuse to mount an already-mounted superblock when the block
 device doesn't match, somewhere in or below btrfs_mount.  Registering
 extra, distinct superblocks for an already mounted raid is a different
 matter, but that isn't done through the mount syscall anyway.

   The problem here is that you could quite legitimately mount
/dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
both part of the same filesystem. So you can't simply prevent mounting
based on the device that the mount's being done with.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I know of three kinds: hot, ---   
cool,  and what-time-does-the-tune-start?


signature.asc
Description: Digital signature


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote:
  Instead of redirecting to a different block device, Btrfs could and
  should refuse to mount an already-mounted superblock when the block
  device doesn't match, somewhere in or below btrfs_mount.  Registering
  extra, distinct superblocks for an already mounted raid is a different
  matter, but that isn't done through the mount syscall anyway.
  
 The problem here is that you could quite legitimately mount
  /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
  UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
  both part of the same filesystem. So you can't simply prevent mounting
  based on the device that the mount's being done with.
 
 Okay.  The check should rely on a list of known block devices
 for a given filesystem uuid.

   And this is where we fail currently -- that list is held by the
btrfs module in the kernel, and is constructed on the basis of what
btrfs dev scan finds by looking at superblocks on block devices.
Currently, there's no method implemented for determining whether a
block device with a legitimate btrfs superblock on it is a duplicate
of another device, or whether it's a newly-discovered device which is
part of an as-yet incompletely specified multi-device FS.

   I think it should be possible to look up the device ID as well, and
complain (loudly, to the user, and in the kernel) at btrfs dev scan
time if we see duplicates. That would deal with the problem at the
earliest point of confusion.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I know of three kinds: hot, ---   
cool,  and what-time-does-the-tune-start?


signature.asc
Description: Digital signature


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 08:22:12AM -0500, Kevin O'Kelley wrote:
 Thank you for your reply. I appreciate it. Unfortunately this issue
 is a deal killer for us. The ability to take very fast snapshots and
 replicate them to another site is key for us. We just can't us Btrfs
 with this setup. That's too bad. Good luck and thank you.

   If you want to make fast atomic incremental copies of btrfs to a
remote system, then btrfs send/receive may be what you're looking for.

   Hugo.

 Sent from my iPhone
 
 On Jun 20, 2013, at 5:56 AM, Hugo Mills h...@carfax.org.uk wrote:
 
  On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote:
  Instead of redirecting to a different block device, Btrfs could and
  should refuse to mount an already-mounted superblock when the block
  device doesn't match, somewhere in or below btrfs_mount.  Registering
  extra, distinct superblocks for an already mounted raid is a different
  matter, but that isn't done through the mount syscall anyway.
  
The problem here is that you could quite legitimately mount
  /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
  UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
  both part of the same filesystem. So you can't simply prevent mounting
  based on the device that the mount's being done with.
  
  Okay.  The check should rely on a list of known block devices
  for a given filesystem uuid.
  
And this is where we fail currently -- that list is held by the
  btrfs module in the kernel, and is constructed on the basis of what
  btrfs dev scan finds by looking at superblocks on block devices.
  Currently, there's no method implemented for determining whether a
  block device with a legitimate btrfs superblock on it is a duplicate
  of another device, or whether it's a newly-discovered device which is
  part of an as-yet incompletely specified multi-device FS.
  
I think it should be possible to look up the device ID as well, and
  complain (loudly, to the user, and in the kernel) at btrfs dev scan
  time if we see duplicates. That would deal with the problem at the
  earliest point of confusion.
  
Hugo.
  

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Computer Science is not about computers,  any more than --- 
 astronomy is about telescopes.  


signature.asc
Description: Digital signature


Re: raid1 inefficient unbalanced filesystem reads

2013-06-28 Thread Hugo Mills
On Fri, Jun 28, 2013 at 11:34:18AM -0400, Josef Bacik wrote:
 On Fri, Jun 28, 2013 at 02:59:45PM +0100, Martin wrote:
  On kernel 3.8.13:
  
  Using two equal performance SATAII HDDs, formatted for btrfs raid1 for
  both data and metadata and:
  
  The second disk appears to suffer about x8 the read activity of the
  first disk. This causes the second disk to quickly get maxed out whilst
  the first disk remains almost idle.
  
  Total writes to the two disks is equal.
  
  This is noticeable for example when running emerge --sync or running
  compiles on Gentoo.
  
  
  Is this a known feature/problem or worth looking/checking further?
 
 So we balance based on pids, so if you have one process that's doing a lot of
 work it will tend to be stuck on one disk, which is why you are seeing that 
 kind
 of imbalance.  Thanks,

   The other scenario is if the sequence of processes executed to do
each compilation step happens to be an even number, then the
heavy-duty file-reading parts will always hit the same parity of PID
number. If each tool has, say, a small wrapper around it, then the
wrappers will all run as (say) odd PIDs, and the tools themselves will
run as even pids...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Startle, startle, little twink.  How I wonder what you think. ---  


signature.asc
Description: Digital signature


Re: Hardware failure or btrfs issue?

2013-07-02 Thread Hugo Mills
On Mon, Jul 01, 2013 at 11:56:30PM +0100, Peter Chant wrote:
 Sirs,
 
 my recently slowing file system is now going read only after trying
 a defrag or other operation.  I'm wondering whether this is the
 result of a hardware failure or a btrfs or some other issue.  Output
 of dmesg:

[snip]
 [  127.862825] btrfs: corrupt leaf, bad key order:
 block=2837196627968,root=1, slot=121
[snip]

   This is usually an indication that you have bad hardware -- I'd
suggest testing RAM, PSU, CPU in that order. I'm not sure what, if
anything, can be done to fix the error on the disk right now.

 Not that I've done anything other than a cursory check but it looks
 like the read only data is fine.

   Might be a good idea to use that to refresh your backups, just in
case my prediction about the fixability is correct.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- How deep will this sub go? Oh,  she'll go all the way to ---   
the bottom if we don't stop her.


signature.asc
Description: Digital signature


Re: Hardware failure or btrfs issue?

2013-07-02 Thread Hugo Mills
On Tue, Jul 02, 2013 at 06:36:48PM +0100, Peter Chant wrote:
 On 07/02/2013 08:29 AM, Hugo Mills wrote:
 This is usually an indication that you have bad hardware -- I'd
 suggest testing RAM, PSU, CPU in that order. I'm not sure what, if
 anything, can be done to fix the error on the disk right now.
 
 Thanks, appreciated.
 
 Hmm.  I've got one stick of ram out of the machine due to testing as
 I had some freezes last week.

   So the damage probably happened then, if that stick is bad.
Filesystems have this irritating habit of remembering things done to
them across reboots. :)

   Hugo.

 If it were one of the RAM, PSU and CPU then I'm unsure why this IO
 issue only surfaces on the HDD and not the SSD.  I ordered a new HDD
 last night, before reading your post.  If its not the disk I'll go
 raid1.  If it is the disk then I'll probally find out.
 
 Not that I've done anything other than a cursory check but it looks
 like the read only data is fine.
 Might be a good idea to use that to refresh your backups, just in
 case my prediction about the fixability is correct.
 
 Well, first option is to drop in the new disk, freshly format it and
 copy the data across (not add it as a second disk).  If that fails
 last backup was wednesday.  I've not done much of note since then
 apart from try to fix the disk issues.
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The glass is neither half-full nor half-empty; it is twice as ---  
large as it needs to be. 


signature.asc
Description: Digital signature


Re: [PATCH] btrfs-progs: per-thread, per-call pretty buffer

2013-07-10 Thread Hugo Mills
   Sorry to be a pain in the arse at this late stage of the patch, but
I've only just noticed.

On Wed, Jul 10, 2013 at 04:30:15PM +0200, David Sterba wrote:
  static char *size_strs[] = { , KB, MB, GB, TB,
 - PB, EB, ZB, YB};
 -char *pretty_sizes(u64 size)
 + PB, EB};

   These are SI (power of 10) prefixes...

 +void pretty_size_snprintf(u64 size, char *str, size_t str_bytes)
  {
   int num_divs = 0;
 -int pretty_len = 16;
   float fraction;
 - char *pretty;
 +
 + if (str_bytes == 0)
 + return;
  
   if( size  1024 ){
   fraction = size;
 @@ -1172,13 +1173,13 @@ char *pretty_sizes(u64 size)
   num_divs ++;
   }
  
 - if (num_divs = ARRAY_SIZE(size_strs))
 - return NULL;
 + if (num_divs = ARRAY_SIZE(size_strs)) {
 + str[0] = '\0';
 + return;
 + }
   fraction = (float)last_size / 1024;

   ... and this is working in IEC (power of 2) units.

   Can we fix this discrepancy, please? Also note that SI uses k for
10^3, but IEC uses K for 2^10. Just insert an i in the middle of
each element of size_strs should deal with the problem.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Charting the inexorable advance of Western syphilisation... ---   


signature.asc
Description: Digital signature


Re: Need help mounting broken btrfs Fedora 19

2013-07-14 Thread Hugo Mills
On Sun, Jul 14, 2013 at 01:43:41PM +, Dave Barnum wrote:
 I need some help as may have lost some number of files on a btrfs raid
 1 volume. I'm not quite sure what happend which, I know, only adds to
 the problem.
 
 On my computer #1 I had only a month or so ago installed Fedora 19
 Beta and at the time of install chose BTRFS, raid 1. Recently one of
 the drives started complaining that it was going to die. WIthout
 taking it out of the array (perhaps I should have done that) I turned
 off the system and swapped the drive with another. From then on I lost
 my ability to boot the system. I could not get anything to work with
 the new hard drive. Then I put the old hard drive back in so that I
 could try to boot again. Still nothing. At one point I think I was
 able to see grub - but at this point, I'm not. I just get boot disk
 failure.

   Try adding the option degraded to your mount options. With grub,
this should be possible to do manually at boot time. That should get
yout the ability to mount the FS with just a single mirror. If that
works, you can then use btrfs dev add to add the new device to the
filesystem, and then a full balance to recreate the mirror.

   Hugo.

 Further more on the failing drive, drive A, I can still see the
 patitions but I cannot mount it on another system. On drive B (the
 other half of the mirror) I do not see any partitions. I tried copying
 the partition structure using sfdisk from A to B but that probably was
 not smart.
 
 I plugged drive A into computer #2 using a live Fedora and Ubuntu CD
 to try to mount the volume. However in both distributions I am unable
 to mount the volume. I've tried mounting using -o degraded but I still
 get the same error. The error I'm seeing when I try to mount the
 filesystem goes like this:
 
 Quote:
 [10792.307425] device label fedora_ison devid 2 transid 48720 /dev/sdc2
 [10792.308202] btrfs: allowing degraded mounts
 [10792.308206] btrfs: disk space caching is enabled
 [10792.308599] btrfs: failed to read chunk root on sdc2
 [10792.308799] btrfs warning page private not zero on page 20979712
 [10792.320146] btrfs: open_ctree failed
 
 I believe the superblock may be in tact since when i turn the command
 ./btrfs-show-super /dev/sdc2 I get:
 
 Quote:
 root@ubuntu:/downloads/btrfs-progs# ./btrfs-show-super /dev/sdc2
 superblock: bytenr=65536, device=/dev/sdc2
 -
 csum 0xfc19c468 [match]
 bytenr 65536
 flags 0x1
 magic _BHRfS_M [match]
 fsid cbbf7d4c-f7a0-43ff-aed5-77b347d6ff25
 label fedora_ison
 generation 48720
 root 1105526784
 sys_array_size 226
 chunk_root_generation 46504
 root_level 1
 chunk_root 20979712
 chunk_root_level 1
 log_root 0
 log_root_transid 0
 log_root_level 0
 total_bytes 2988521291776
 bytes_used 598216536064
 sectorsize 4096
 nodesize 4096
 leafsize 4096
 stripesize 4096
 root_dir 6
 num_devices 2
 compat_flags 0x0
 compat_ro_flags 0x0
 incompat_flags 0x1
 csum_type 0
 csum_size 4
 cache_generation 48720
 dev_item.uuid 5f61edaa-7f12-4ec5-a024-02f7797e1400
 dev_item.fsid cbbf7d4c-f7a0-43ff-aed5-77b347d6ff25 [match]
 dev_item.type 0
 dev_item.total_bytes 1494260645888
 dev_item.bytes_used 482110078976
 dev_item.io_align 4096
 dev_item.io_width 4096
 dev_item.sector_size 4096
 dev_item.devid 2
 dev_item.dev_group 0
 dev_item.seek_speed 0
 dev_item.bandwidth 0
 dev_item.generation 0
 
 Could someone help me troubleshoot why I can't mount my volume? I
 would REALLY appreciate it! Perhaps there is a way to repair my broken
 tree structure?
 
 Thank You!

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: Can btrfs handle different RAID levels for different subvolumes?

2013-07-14 Thread Hugo Mills
On Sun, Jul 14, 2013 at 04:50:35PM +0200, Adam Ryczkowski wrote:
 Can one btrfs filesystem handle different RAID levels e.g. for
 different subvolumes? If so, how does deduplication with bedup
 (https://github.com/g2p/bedup) across them work?

   No, not yet.

   It's planned at some point (probably in the fairly distant future),
but hasn't arrived yet.

   Hugo.

 (It has been asked already on the Net 
 (http://unix.stackexchange.com/questions/82869/can-btrfs-handle-different-raid-levels-for-different-subvolumes)
 but the question didn't get the answer. I guess answering it should
 be straightforward for you, guys :-) )
 
 Adam Ryczkowski
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- But somewhere along the line, it seems / That pimp became ---
   cool,  and punk mainstream.   


signature.asc
Description: Digital signature


Re: super block crcs don't match, older mkfs detected

2013-07-14 Thread Hugo Mills
On Sun, Jul 14, 2013 at 12:11:04PM -0600, Chris Murphy wrote:
 On Fedora 19 with all updates, when I mkfs.btrfs and then mount the volume, 
 I'm getting this in dmesg:
 
 [  280.534868] Btrfs loaded
 [  280.581799] device fsid 94ed05cb-89a9-4d6b-a1e2-5312687b59f5 devid 1 
 transid 4 /dev/mapper/vg1-brick1
 [  280.590140] btrfs: super block crcs don't match, older mkfs detected
 [  280.597746] btrfs: disk space caching is enabled
 [  280.661204] SELinux: initialized (dev dm-4, type btrfs), uses xattr
 
 btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64
 kernel-3.10.0-1.fc20.x86_64
 
 Is this expected? Benign?

   Yes, I believe it's harmless and will go away after the first
mount.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I gave up smoking, drinking and sex once. It was the scariest ---  
 20 minutes of my life.  


signature.asc
Description: Digital signature


Re: bug: corrupt filesystem, cannot delete tmp files created just before crash.

2013-07-18 Thread Hugo Mills
 that haven't actually been committed yet),
  that may well help in your case.  I'm not technically qualified to match
  backtraces against commits/patches and identify a solid match, but it's
  definitely worth a try.
 
  Finally, as background once you're out of the tight spot, since you're
  running a multi-device filesystem, you're likely to find the discussion
  of that on the multiple devices, sysadmin guide, and use cases pages
  useful.  FWIW, here I'm running most of my btrfs filesystems in dual-
  device raid1 (both data/metadata) mode, to take advantage of the
  checksumming and extra copy to lookup in case of checksum error, that
  btrfs offers, in addition to the device-loss scenario that raid1 helps
  protect against.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Reintarnation:  Coming back from the dead as a hillbilly. ---


signature.asc
Description: Digital signature


Re: Questions about multi-device behavior

2013-07-18 Thread Hugo Mills
On Thu, Jul 18, 2013 at 02:59:58PM -0700, Roger Binns wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 18/07/13 13:05, Chris Murphy wrote:
  Sounds like if I have a degraded 'single' volume, I can simply cp or
  rsync everything from that volume to another, and I'll end up with a
  successful copy of the surviving data. True?
 
 Not quite.  I did it with cp -a.  Because all the metadata survived, cp
 would create the target file, but then get an i/o error on opening/reading
 the source file.  It would print an error message, but not delete the
 empty target file. Consequently I ended up with loads of zero length files
 I had to go in and delete afterwards.

   The odds of having an undamaged file from that process are much
better for single than for RAID-0 (and aren't affected by having tools
which will cope better with IO errors -- although you'll get more of
each damaged file if you do). As the file size goes up, the odds of it
being damaged increase.

   Hugo.

 I briefly looked for an rsync option to keep going on source i/o errors
 but didn't find one.
 
 Roger
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEARECAAYFAlHoZV4ACgkQmOOfHg372QRPFwCgob01TavS2qffBkxkuv0g9bl3
 pC8An25Mgx+cRXb0Kds+GRnzaj2P0Acy
 =UA5J
 -END PGP SIGNATURE-
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I am an opera lover from planet Zog.  Take me to your lieder. ---  


signature.asc
Description: Digital signature


Re: Different size devices in RAID

2013-07-23 Thread Hugo Mills
On Tue, Jul 23, 2013 at 07:32:36AM -0700, Curtis Shimamoto wrote:
 I am using btrfs to span across two SSDs at the moment.  One is a 256GB
 and the other is a 128GB.  So as of now, I have the data in single form
 and the metadata in a RAID1. I have heard that btrfs can adjust to some
 degree for devices in a RAID array that vary in sizes due to the way it
 handles things.  But I feel as though the size difference between those
 two would be too vast to compensate for whatsoever.
 
 But I have an additional SSD in my machine, which is also a 128GB drive.
 I know that with RAID1, it will only duplicate the data no matter how
 many devices are present in the array.  So if that is the case, if I were
 to add all three of my SSDs into the filesystem, and then put the data
 into a RAID1, would it be able to make use of all the space?

   Yes. If the largest device is A, and the two smaller ones are B and
C, the system will allocate chunks in pairs, alternating A+B and A+C.

  The two
 smaller ones are about equal to the size of the larger, so in my mind it
 would seem that it would be entirely possible for it to keep two copies
 of each extent while still utilizing all the space.  But I don't know if
 btrfs is set up to recognize this situation or how it would handle it.
 
 I know that I could potentially put the two smaller drives in some kind
 of an LVM or mdadm, but I would like to avoid this if possible.  It just
 seems like an unnecessary layer of complexity.
 
 Though my question is about RAID1 specifically, as I would like to use
 the potential of the self healing features, I guess it would also extend
 to RAID0 as well.  Would that be able to make efficient use of the space?

   No, because you only have one large device and two small ones, so
the top part of the largest device would be unusable with RAID-0.
(Or at least, not until we get stripe-width limitations, which should
be coming up Real Soon Now, as I believe it's part of Chris's work to
finish off the parity RAID implementation).

 Additionally, though not quite as much of a concern to me, the machine in
 which these drives live is an Ivy Bridge Laptop, so there are actually
 only two available SATA3 ports.  The odd drive out at this point in time
 is actually an mSATA which is the only SATA2 port.  If I were to add this
 to an array (assuming the above questions have favorable answers), how
 dramatically would the speed of the array be affected?  To be honest, the
 speed of even just the mSATA drive alone is enough to keep me happy.  But
 I have just been very curious about this.  The write speeds of all three
 are relatively close. But the read speeds on the SATA3 are significantly
 faster than the mSATA.

   This one I don't have an answer for, sorry.

   Hugo.

 Anyway, thanks for the fantastic filesystem.  Sorry for the long email,
 but these questions have been in the back of my mind for some time now.
 For the first question(s) at least I have not been able to find anything
 regarding that scenario.
 
 Regards,

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I am but mad north-north-west:  when the wind is southerly, I ---  
   know a hawk from a handsaw.   


signature.asc
Description: Digital signature


Re: Q: Why subvolumes?

2013-07-23 Thread Hugo Mills
On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote:
 Now... since the snapshot's FS tree is a direct duplicate of the
  original FS tree (actually, it's the same tree, but they look like
  different things to the outside world), they share everything --
  including things like inode numbers. This is OK within a subvolume,
  because we have the semantics that subvolumes have their own distinct
  inode-number spaces. If we could snapshot arbitrary subsections of the
  FS, we'd end up having to fix up inode numbers to ensure that they
  were unique -- which can't really be an atomic operation (unless you
  want to have the FS locked while the kernel updates the inodes of the
  billion files you just snapshotted).
 
 I don't think so; I just checked some snapshots and the inos are the same.
 Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this).

   That's what I said. Our current implementation allows different
subvolumes to have the same inode numbers, which is what makes it
work. If you threw out the concept of subvolumes, or allowed snapshots
within subvolumes, then you'd be duplicating inodes within a
subvolume, which is one reason it doesn't work.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Unix: For controlling fungal diseases in crops. --- 


signature.asc
Description: Digital signature


Re: Adding 500G disk to btrfs volume... but I don't get 500G more of available space (raid0)

2013-07-26 Thread Hugo Mills
On Fri, Jul 26, 2013 at 09:05:03AM +0200, Axelle wrote:
 Hi btrfs folks,
 
 I'm afraid I have a newbie question... but I can't sort it out? It's
 just about adding a disk to a btrfs volume and not getting the correct
 amount of GB in the end...
 
 I have a btrfs volume which already consists of two different devices
 and which is mounted on /samples. Its total size is 194G.
 
 $ df -h
 Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 194G 165G 20G
 90% /samples
 
 Now, I would like to add another 500G to that volume, from another device. I 
 did
 
 $ sudo mkfs.btrfs -m raid0 -d raid0 /dev/sdb
 $ sudo btrfs device add /dev/sdb /samples
 My filesystem now correctly reports:
 
 $ sudo btrfs filesystem show
 Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
 Total devices 3 FS bytes used 161.98GB
 devid3 size 465.76GB used 0.00 path /dev/sdb
 devid2 size 93.13GB used 84.51GB path /dev/sdc1
 devid1 size 100.61GB used 84.53GB path /dev/sdc6
 But I miss some space when I do:

   RAID-0 requires at least two devices. If you balance this
configuration, you'll use up the first 93.13 GiB of each device
striping across all three devices, for a total of 3*93.13 = 279.39
GiB. Then /dev/sdc1 becomes full, leaving you with two devices which
have 7.48 GiB and 372.63 GiB respectively. After another 7.48 GiB on
each device (for a total of 2*7.48 = 14.96 GiB), you have filled
/dev/sdc1, leaving only /dev/sdb to work with. Since there's only one
device, it can't be used by RAID-0.

   If you want to use the full space available, you should rebalance
to single usage, which stops the RAID-0 striping, and allocates
linearly:

# btrfs balance start -dconvert=single,soft /samples

   Hugo.

 $ df -h
 Filesystem  Size  Used Avail Use% Mounted on
 ...
 /dev/sdc1   660G  165G   43G  80% /samples
 I added 500G! Why haven't I got more available??
 
 To debug, I ran this command:
 
 $ sudo btrfs filesystem df /samples
 Data, RAID0: total=162.00GB, used=159.79GB
 Data: total=8.00MB, used=7.48MB
 System, RAID1: total=8.00MB, used=24.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=3.50GB, used=2.19GB
 Metadata: total=8.00MB, used=0.00
 My data is in RAID0, that's ok. So where have my 500G gone, and how
 can I fix this?
 
 Thanks
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Sometimes, when I'm alone, I Google myself. ---   


signature.asc
Description: Digital signature


Re: Error on rebooting

2013-07-26 Thread Hugo Mills
On Fri, Jul 26, 2013 at 01:19:40AM +0100, Pete wrote:
 Dear All,
 
 Have I anything to be concerned about?
 
 I have got some error messages on booting.  The scenario was that I
 had installed some ram and I suspect that I had disturbed a cable as
 one disk was not visible.  I could not mount the other disk (did not
 try degraded, but the messages seemed to indicate something serious
 was up).
 
 After installing ram booted.  But some issue with some files,
 anything accessing those files froze.  Had to reboot.  Failed to
 shutdown correctly (shutdown stalled on unmount)
 
 Reboot.
 
 /home etc not mounted (btrfs in question)
 
 Btrfsck /dev/sdb showed various errors.
 
 When complete turned off machine.  Fiddled with cables.  Affected
 drive now seen on reboot.
 
 Rebooted.  Mounted disks (perhaps) error messages may have been
 present on boot.  Much disk IO.  Disk IO stopped.  Machine appeared
 frozen except that Caps lock and Num lock worked.
 Ctrl-alt-backspace did not sort out stalled x(?)dm session.  Hard
 power down.
 
 Last reboot.  Error messages.  However, works.  Example messages from dmesg:
 
 [8.063138] btrfs: enabling inode map caching
 [8.067617] btrfs: use lzo compression
 [8.072092] btrfs: disk space caching is enabled
 [8.147324] btrfs: bdev /dev/sdb errs: wr 4015, rd 464, flush 0,
 corrupt 0, gen 0
 [8.802275] NET: Registered protocol family 10
 [   15.462313] device fsid 2628a800-e095-4460-9b93-8847e9fb626b
 devid 2 transid 27794 /dev/sdc
 [   15.511463] device fsid 2628a800-e095-4460-9b93-8847e9fb626b
 devid 2 transid 27794 /dev/sdc
 [   15.566689] device fsid 2628a800-e095-4460-9b93-8847e9fb626b
 devid 2 transid 27794 /dev/sdc
 [   15.587851] device fsid 2628a800-e095-4460-9b93-8847e9fb626b
 devid 2 transid 27794 /dev/sdc
 [   15.620678] device fsid 2628a800-e095-4460-9b93-8847e9fb626b
 devid 2 transid 27794 /dev/sdc
 [   16.024295] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
 ...
 
 ...
 
 ...
 
 ...
19.491507] tun: (C) 1999-2004 Max Krasnyansky m...@qualcomm.com
 [   56.064899] parent transid verify failed on 1142639534080 wanted
 27788 found 26856
 [   56.154721] btrfs read error corrected: ino 1 off 1142639534080
 (dev /dev/sdb sector 2179305424)
 [   56.166301] parent transid verify failed on 1142597795840 wanted
 2 found 27772
 [   56.186790] btrfs read error corrected: ino 1 off 1142597795840
 (dev /dev/sdb sector 2179223904)
 [   56.460857] parent transid verify failed on 1142599532544 wanted
 27779 found 27772
 [   56.461396] btrfs read error corrected: ino 1 off 1142599532544
 (dev /dev/sdb sector 2179227296)
 [   59.927078] ata1.00: configured for UDMA/133
 [   59.927082] ata1: EH complete
 [   59.933467] ata2.00: configured for UDMA/133
 [   59.933473] ata2: EH complete
 [   60.129445] ata3.00: configured for UDMA/133
 [   60.129458] ata3: EH complete
 [   61.449810] parent transid verify failed on 1142629605376 wanted
 27784 found 26856
 [   61.473817] btrfs read error corrected: ino 1 off 1142629605376
 (dev /dev/sdb sector 2179286032)
[snip]
 [  104.204035] btrfs read error corrected: ino 1544486 off 0 (dev
 /dev/sdb sector 2182960392)
 [  104.204551] btrfs read error corrected: ino 1544486 off 4096 (dev
 /dev/sdb sector 2182960400)
 [  117.249253] parent transid verify failed on 1142609051648 wanted
 27774 found 26856
 [  117.255886] btrfs read error corrected: ino 1 off 1142609051648
 (dev /dev/sdb sector 2179245888)
 [  117.419294] parent transid verify failed on 1142599507968 wanted
 27779 found 27772
 [  117.437317] btrfs read error corrected: ino 1 off 1142599507968
 (dev /dev/sdb sector 2179227248)
 [  137.502176] NFSD: Unable to end grace period: -110
 
 Given that I have booted now - does this mean that the above was
 btrfs sorting itself out?

   Looks like it. I'd recommend a scrub to check for any other out of
date data on the affected drive. I've done pretty much the same thing
as this myself, and a scrub, though scary in the amount of noise it
made, fixed everything satisfactorily.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Sometimes, when I'm alone, I Google myself. ---   


signature.asc
Description: Digital signature


Re: Adding 500G disk to btrfs volume... but I don't get 500G more of available space (raid0)

2013-07-26 Thread Hugo Mills
On Fri, Jul 26, 2013 at 04:35:59PM +0200, Axelle wrote:
 Hi Hugo,
 Thanks for your answer, but I'm afraid I still don't get it.
 
 RAID-0 requires at least two devices.
 
 Well, I have three devices, so that's more than enough isn't it?
 Or do you mean I should be adding two devices at a time?
 
  If you balance this
 configuration, you'll use up the first 93.13 GiB of each device
 striping across all three devices, for a total of 3*93.13 = 279.39
 
 why 93.13? I guess you meant 84.53 which is the size I am using on
 sdc1 and sdc6.

   Sorry, a little unclear -- if you balance, and then continue
writing data to the FS. Once you hit 93.13 GiB (the size of the
smallest device), you switch to 2-device operation, and then when
that's full, you can't go any further.

 # btrfs balance start -dconvert=single,soft /samples
 
 Nice command but I wasn't thinking of stopping RAID0 striping. I was
 expecting my data to be stripped evenly on all 3 devices.

   It's worth noting that /dev/sdc1 and /dev/sdc6 are on the same
physical device. If that's a rotational device (i.e. traditional hard
disk), then you're going to have a serious performance decrease as a
result of that, because /dev/sdc will have to spend lots of its time
seeking between the two partitions. single operation really is the
better option here -- you'll get to use all your space, and you won't
suffer the performance problems of striping between two partitions on
the same disk.

 Well - evenly - until the smallest one /dev/sdc1 is filled, then,
 it'll use only the last two, when /dev/sdc6 is filled, it will used
 /dev/sdb only.
 Is that possible/correct?

   That's exactly what happens, except for the last bit. RAID-0
requires at least two devices, so it can't stripe across the one
device remaining once you have completely filled /dev/sdc1 and
/dev/sdc6.

 But basically, what does not make sense to me is what df reports as
 available size.
 Look.
 Before, I had ~165G used on a total of 194G.
 I added a new disk of 465G. Now, df reports I have a total of 660G
 (that's right) with 165G used (that's correct too) but only 43G
 available!
 
 I was expecting to have ~495G available! Where are my 465G gone?

   It's not usable with the RAID configuration you've specified, so
it's not shown.

   Hugo.

 $ df -h
 Filesystem  Size  Used Avail Use% Mounted on
 ...
 /dev/sdc1   660G  165G   43G  80% /samples
 
 
 
 Thanks
 Axelle.
 
 On Fri, Jul 26, 2013 at 9:45 AM, Hugo Mills h...@carfax.org.uk wrote:
  On Fri, Jul 26, 2013 at 09:05:03AM +0200, Axelle wrote:
  Hi btrfs folks,
 
  I'm afraid I have a newbie question... but I can't sort it out? It's
  just about adding a disk to a btrfs volume and not getting the correct
  amount of GB in the end...
 
  I have a btrfs volume which already consists of two different devices
  and which is mounted on /samples. Its total size is 194G.
 
  $ df -h
  Filesystem Size Used Avail Use% Mounted on ... /dev/sdc1 194G 165G 20G
  90% /samples
 
  Now, I would like to add another 500G to that volume, from another device. 
  I did
 
  $ sudo mkfs.btrfs -m raid0 -d raid0 /dev/sdb
  $ sudo btrfs device add /dev/sdb /samples
  My filesystem now correctly reports:
 
  $ sudo btrfs filesystem show
  Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
  Total devices 3 FS bytes used 161.98GB
  devid3 size 465.76GB used 0.00 path /dev/sdb
  devid2 size 93.13GB used 84.51GB path /dev/sdc1
  devid1 size 100.61GB used 84.53GB path /dev/sdc6
  But I miss some space when I do:
 
 RAID-0 requires at least two devices. If you balance this
  configuration, you'll use up the first 93.13 GiB of each device
  striping across all three devices, for a total of 3*93.13 = 279.39
  GiB. Then /dev/sdc1 becomes full, leaving you with two devices which
  have 7.48 GiB and 372.63 GiB respectively. After another 7.48 GiB on
  each device (for a total of 2*7.48 = 14.96 GiB), you have filled
  /dev/sdc1, leaving only /dev/sdb to work with. Since there's only one
  device, it can't be used by RAID-0.
 
 If you want to use the full space available, you should rebalance
  to single usage, which stops the RAID-0 striping, and allocates
  linearly:
 
  # btrfs balance start -dconvert=single,soft /samples
 
 Hugo.
 
  $ df -h
  Filesystem  Size  Used Avail Use% Mounted on
  ...
  /dev/sdc1   660G  165G   43G  80% /samples
  I added 500G! Why haven't I got more available??
 
  To debug, I ran this command:
 
  $ sudo btrfs filesystem df /samples
  Data, RAID0: total=162.00GB, used=159.79GB
  Data: total=8.00MB, used=7.48MB
  System, RAID1: total=8.00MB, used=24.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=3.50GB, used=2.19GB
  Metadata: total=8.00MB, used=0.00
  My data is in RAID0, that's ok. So where have my 500G gone, and how
  can I fix this?
 
  Thanks
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org

Re: Mount multiple-device-filesystem by UUID

2013-07-27 Thread Hugo Mills
On Sat, Jul 27, 2013 at 08:52:50PM +0200, Hendrik Friedel wrote:
 As stated in the wiki, multiple-device filesystems (e.g. raid 1)
 will only mount after a btfs device scan, or if all devices are
 passed with the mount options.
 
 I remember, that for Ubuntu 12.04 I changed the initrd. But after a
 re-install, I have to do this again, and I don't remember how I did
 it.

   With Ubuntu, just install the btrfs-tools package. It should modify
the initrd correctly.

 So, the other option would be passing the devices in the fstab. But
 here, I'd prefer UUIDs rather than device names, as they can change.

   This is why we don't recommend using device= mount flags.

 Is this possible? What is the syntax?

   I don't believe it is possible. Finding filesystems by UUID is (I
think) a userspace-based thing, so you'd have to have an initrd
anyway.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- No!  My collection of rare, incurable diseases! Violated! ---   


signature.asc
Description: Digital signature


Re: How to merge two partitions?

2013-08-01 Thread Hugo Mills
On Thu, Aug 01, 2013 at 11:53:34AM +0100, Andrew Stubbs wrote:
 If I have two partitions, /dev/sda1 and /dev/sda2, one btrfs, and
 one ext4 (but I could convert it first), how can I merge them into
 one filesystem without moving all the data onto an external device
 and then moving it all back again? (I do have a backup, of course,
 but transferring the data takes hours, maybe days.)

   That's going to be the easiest option by far.

 I'm left with this layout for historical reasons, and now the
 smaller partition is close to running out of space.
 
 I thought of using btrfs device add and just living with the
 untidy underlying devices, but an experiment with loopback
 filesystems shows that any data on the new device is silently
 obliterated (it might be nice if the docs mentioned this!)

   You would expect data in a different filesystem format to be
integrated into an existing set of data structures? That would be...
magic. :)

 I've thought of shrinking the larger partition, creating a third
 partition, and adding that to the smaller filesystem. This would
 solve the free-space issue, but doesn't feel great.
 
 I've thought of using a temporary third partition as an
 intermediary, but I don't have space to move all the data in one go.
 
 I've thought of using a clever partition manager to move the start
 of the second partition, transfer some data, move it some more,
 transfer some more data, but this seems like an equally lengthy
 process.

   That's the other option I'd go for.

 I could move the data from the smaller partition into the larger
 one, then delete the first partition, and move the whole larger
 partition forward, extend it, and fix up the fstab. That might be
 less painful.
 
 Is there a cunning btrfs trick to do this? Can a btrfs filesystem be
 extended backwards, if you see what I mean?

   No, using gparted to move it backwards into the free space is your
best option here.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I don't know. I can't tell the future, I just work there. ---


signature.asc
Description: Digital signature


Re: [PATCH v2] btrfs: add mount option to set commit interval

2013-08-03 Thread Hugo Mills
On Sat, Aug 03, 2013 at 07:39:01AM -0400, Mike Audia wrote:
   Another newbie question is which version of the kernel do I need to
   have in order to cleanly apply this patch?  I am finding that it fails
   to apply to the current stable kernel code (as of now it is v3.10.4)
   which makes me think your patch has to be applied to a newer one?  Are
   you patching against the linux git tree meaning I have to use the 3.11
   series to try your code?
  
  Try Josef's btrfs-next repo:
  
   
  https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories#Integration_repository_.28btrfs-next.29
 
 OK!  I can patch successfully into that git repo:
 
 % cd /tmp/work
 % git clone git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
 % cd btrfs
 % patch -Np1 -i btrfs_add_mount_option_to_set_commit_interval.patch
 patching file fs/btrfs/ctree.h
 patching file fs/btrfs/disk-io.c
 patching file fs/btrfs/super.c
 Hunk #3 succeeded at 647 (offset 19 lines).
 Hunk #4 succeeded at 1006 with fuzz 1 (offset 39 lines).

 If I am not mistaken, btrfs-next is the entire kernel's code?  The
 wiki suggests running anything compiled therein from the build dir.

   That'll be for the userspace tools, not the kernel. Obviously, one
doesn't tend to run kernels from the command line. :)

  If I want to compile this into the official 3.10.4 tree, how can I
 do it?

   Add the official kernel repo as a remote to the same git repo
(with git remote add), fetch that repo, create a new branch to work
in, based on the btrfs-next branch, then merge in the other branch (or
vice-versa).

   Note that btrfs-next is usually based on the latest released kernel
anyway, so that's likely to be largely superfluous.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- A gentleman doesn't do damage unless he's paid for it. ---  


signature.asc
Description: Digital signature


Re: building btrfs corrupt block

2013-08-04 Thread Hugo Mills
On Sun, Aug 04, 2013 at 12:39:28PM -0600, Chris Murphy wrote:
 I must be doing something wrong, but I can't figure out what. I have 
 btrfs-progs source installed from here:
 http://koji.fedoraproject.org/koji/buildinfo?buildID=441375
 
 make produces no errors. Yet btrfs-corrupt-block.c isn't built. Suggestions?

$ make btrfs-corrupt-block

   Some of the more outré commands aren't built by default and have to
be built individually.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- For months now, we have been making triumphant retreats --- 
   before a demoralised enemy who is advancing   
   in utter disorder.


signature.asc
Description: Digital signature


Re: check for reflink capability and for shared data

2013-08-24 Thread Hugo Mills
On Sat, Aug 24, 2013 at 06:09:58PM +0200, Thomas Koch wrote:
 Hi,
 
 how can I do the following in a shell script:
 
 - check whether my file system supports cp --reflink?

touch foo; if cp --reflink=always foo bar; then ...; fi; rm -f foo bar

 - check whether two files share the same data on disk, i.e. one has been 
 created by cp --reflink of the other?

   You can't, using simple userspace tools. I think the only way would
be to use the tree search ioctl to inspect the extents for each file,
and see whether any of them overlap. Why do you need to know this?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Someone's been throwing dead sheep down my Fun Well ---   


signature.asc
Description: Digital signature


Re: failed to read log tree, open_ctree failed

2013-08-26 Thread Hugo Mills
On Tue, Aug 27, 2013 at 02:25:09AM +0800, Tomasz Chmielewski wrote:
 I had a RAID-1 btrfs filesystem with Linux 3.10.
 
 After hard reset, I'm no longer able to mount it:
 
 [   35.254122] Btrfs loaded
 [   35.254577] device label test-btrfs devid 1 transid 97966 /dev/sda4
 [   35.254819] device label test-btrfs devid 3 transid 97966 /dev/sdb4
 [   35.255032] device label test-btrfs devid 3 transid 97966 /dev/sdb4
 [   35.22] btrfs: force zlib compression
 [   35.255645] btrfs: disk space caching is enabled
 [   35.379806] btrfs: bdev /dev/sda4 errs: wr 0, rd 2, flush 0, corrupt 0, 
 gen 0
 [   56.209412] parent transid verify failed on 3321036099584 wanted 97967 
 found 97966
 [   56.225990] parent transid verify failed on 3321036099584 wanted 97967 
 found 97966
 [   56.226128] btrfs: failed to read log tree
 [   56.344483] btrfs: open_ctree failed
 
 
 I've tried with 3.11-rc7, but it gives the same result.
 
 Any hints how to recover from that?
 I have backups, but it would be nice if the filesystem just mounted.

   Try mounting with both -orecovery and -oro,recovery.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Python is executable pseudocode; perl ---  
is executable line-noise.


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-26 Thread Hugo Mills
On Mon, Aug 26, 2013 at 01:10:54PM -0600, Chris Murphy wrote:
 
 On Aug 26, 2013, at 11:41 AM, Nick Lee em...@nickle.es wrote:
 
  There was a discussion on IRC a few days ago that the problem with the tree 
  root's bloco was likely the result of either an issue with the disk itself, 
  or the chunk tree/logical mappings. I ran the chunk recover, looked over 
  the errors it found, and hit write. (If it failed, I was going to run 
  something photorec, loss of organization as a side effect.)
  
  I can write something more clear after my flight lands tomorrow if you want.

 I'm just curious about when to use various techniques: -o recovery,
 btrfsck, chunk-recover, zero log.

   Let's assume that you don't have a physical device failure (which
is a different set of tools -- mount -odegraded, btrfs dev del
missing).

   First thing to do is to take a btrfs-image -c9 -t4 of the
filesystem, and keep a copy of the output to show josef. :)

   Then start with -orecovery and -oro,recovery for pretty much
anything.

   If those fail, then look in dmesg for errors relating to the log
tree -- if that's corrupt and can't be read (or causes a crash), use
btrfs-zero-log.

   If there's problems with the chunk tree -- the only one I've seen
recently was reporting something like can't map address -- then
chunk-recover may be of use.

   After that, btrfsck is probably the next thing to try. If options
-s1, -s2, -s3 have any success, then btrfs-select-super will help by
replacing the superblock with one that works. If that's not going to
be useful, fall back to btrfsck --repair.

   Finally, btrfsck --repair --init-extent-tree may be necessary if
there's a damaged extent tree. Finally, if you've got corruption in
the checksums, there's --init-csum-tree.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Hugo Mills
On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote:
 
 On Aug 29, 2013, at 11:35 AM, Zach Brown z...@redhat.com wrote:
 
If those fail, then look in dmesg for errors relating to the log
  tree -- if that's corrupt and can't be read (or causes a crash), use
  btrfs-zero-log.
  
  In a bit of a tangent:
  
  btrfs-zero-log throws away data that fsync/sync could have previously
  claimed was stable on disk.
  
  Given how often this is thrown around as a solution to a broken
  partition, should the tool jump up and down and make it clear that it's
  about to roll the file system back?  This seems like relevant
  information.
  
  Right now, as far as I can tell, it's completely undocumented and
  silent.
 
 Yes, I think it helps remove some burden on the list answering questions 
 about a tool that doesn't have any documentation, to have a warning.
 
 How much longer will btrfs-zero-log be needed? If whatever it's doing isn't 
 obviated by future improvements to btrfsck, and this sort of big hammer 
 approach is still needed in some worse case scenarios, then it probably hurts 
 no one to flag the user with essentially how you described it. I think 
 documentation is a greater burden to create, and less likely to be consulted.
 
 Proceeding will roll back the file system to a previous state, and may cause 
 the loss of successfully written data. Proceed? (Y/N)

   ... the loss of up to the last 30 seconds of successfully written data.

   Give the user enough information to make a sensible decision.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Hugo Mills
On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:
 
 On Aug 29, 2013, at 1:40 PM, Hugo Mills h...@carfax.org.uk wrote:
 
  On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote:
  
  Proceeding will roll back the file system to a previous state, and may 
  cause the loss of successfully written data. Proceed? (Y/N)
  
... the loss of up to the last 30 seconds of successfully written data.
  
Give the user enough information to make a sensible decision.
 
 Certainly, if known for sure it won't be more than 30 seconds?

   Mmm... it'll depend on the setting of the commit period, which up
until a couple of weeks ago was always 30s, but someone posted a patch
to give it a config knob...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-30 Thread Hugo Mills
On Fri, Aug 30, 2013 at 09:44:28AM -0500, Eric Sandeen wrote:
 On 8/29/13 3:19 PM, Chris Murphy wrote:
  
  On Aug 29, 2013, at 1:53 PM, Hugo Mills h...@carfax.org.uk wrote:
  
  On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:
 
  Certainly, if known for sure it won't be more than 30 seconds?
 
Mmm... it'll depend on the setting of the commit period, which up
  until a couple of weeks ago was always 30s, but someone posted a patch
  to give it a config knob…
  
  
  
  Proceeding will roll back the file system to a previous state, and
  may cause the loss of successfully written data since the last commit
  period (30 seconds by default). Proceed? (Y/N)
 
 Is it just loss of data, or might this also result in a filesystem with 
 inconsistent metadata, which then requires a fsck?

   No the metadata is always consistent (well, in theory, barring bugs
and out-of-band corruption).

 Above sounds like it's just reverting to a previous (consistent) state.  Is 
 that correct?

   Yes, it's dropping the log of accepted-but-uncommitted work. This
is a Bad Thing in the sense that something that's reached the log is
reported to the application as being successfully written. If the
application critically relies on that (e.g. databases), then we've
discarded durability from ACID. (Can you guess I've been marking
Databases resit exam papers this morning? :) )

   Hugo.

 -Eric
 
 p.s. fwiw when the xfs_repair zero-log option -L is used, we say:
 
 ALERT: The filesystem has valuable metadata changes in a log which is 
 being\n
 destroyed because the -L option was used.\n));

   That's a reasonable wording too.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- We teach people management skills by examining characters in ---   
Shakespeare.  You could look at Claudius's crisis
   management techniques, for example.   


signature.asc
Description: Digital signature


Re: Device delete returns unable to go below four devices on raid10 on 5 drive setup

2013-08-31 Thread Hugo Mills
On Sat, Aug 31, 2013 at 11:42:28AM -0600, Chris Murphy wrote:
 
 On Aug 31, 2013, at 4:12 AM, Steven Post redalert.comman...@gmail.com wrote:
  
  The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP Debian
  3.2.46-1 x86_64).
  
  Is this something known (and possibly resolved in a later version), or
  should I open a bug report about it?
 
 Try 3.10 or 3.11 before filing a bug on it.

   If you want a debian-packaged kernel, they're available from the
experimental distribution.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Great oxymorons of the world, no.  5: Manifesto Promise --- 


signature.asc
Description: Digital signature


Re: Recovering from csum errors

2013-09-02 Thread Hugo Mills
On Mon, Sep 02, 2013 at 11:41:12PM +0200, Rain Maker wrote:
 Hello list,
 
 So, I ran a full scrub, and, luckily, it only found 6 csum errors
 (these 6). The damage therefore seems to be contained in just 1
 file.
 
 Now, I removed the offending file. But is there something else I
 should have done to recover the data in this file? Can it be
 recovered?

   No, and no. The data's failing a checksum, so it's basically
broken. If you had a btrfs RAID-1 configuration, the FS would be able
to recover from one broken copy using the other (good) copy.

 I'm running 3.11-rc7. It is a single disk btrfs filesystem. I have
 several subvolumes defined, one of which for VMWare Workstation (on
 which the corruption took place).

   Aaah, the VM workload could explain this. There's some (known,
won't-fix) issues with (I think) direct-IO in VM guests that can cause
bad checksums to be written under some circumstances.

   I'm not 100% certain, but I _think_ that making your VM images
nocow (create an empty file with touch; use chattr +C; extend the file
to the right size) may help prevent these problems.

 I checked the SMART values, they all seem OK. The harddisks in this
 machine are less then a month old. I replaced them after seeing
 similar messages on the old disks.
 
 Is the only logical explanation for this some kind of hardware failure
 (SATA controller, power supply...), or could there be something more
 to this?

   As above, there's some direct-IO problems with data changing
in-flight that can lead to bad checksums. Fixing the issue would cause
some fairly serious slow-downs in performance for that case, which is
rather against what direct-IO is trying to do, so I think it's
unlikely the behaviour will be changed.

   Of course, I could be completely wrong about all this, and you've
got bad RAM or PSU something...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- What are we going to do tonight? The same thing we do --- 
every night, Pinky.  Try to take over the world!


signature.asc
Description: Digital signature


Re: Btrfs prog

2013-09-04 Thread Hugo Mills
On Wed, Sep 04, 2013 at 01:57:42PM +0200, Giuseppe Fierro wrote:
 I'm using btrfs on ubuntu 13.04 with btrfs prog v0.20-rc1
 This is my configuration using 2 disks in raid1 mode:
 
  gspe@jura:/mnt$ sudo btrfs f show
  Label: 'UbuntuDSK'  uuid: f4a3c832-f6ab-4b1d-9eb7-f9ba7d1cba01
  Total devices 2 FS bytes used 205.41GB
  devid1 size 2.70TB used 214.03GB path /dev/sdb2
  devid2 size 2.70TB used 214.01GB path /dev/sda2
  Btrfs v0.20-rc1
 
 
 Some btrfs command behave strange:
 If i want to check free space using df, i get:
 
  gspe@jura:/mnt$ sudo btrfs filesystem df /
  Data, RAID1: total=212.00GB, used=204.42GB
  Data: total=8.00MB, used=0.00
  System, RAID1: total=8.00MB, used=36.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=2.00GB, used=1010.04MB
  Metadata: total=8.00MB, used=0.00

   What do you think is wrong with this output? It looks OK to me:

   From the btrfs fi show at the top, you have 214 GB allocated on
each device. The btrfs fi df shows you how that allocation is used:
212 GB (*2, because it's RAID-1) is allocated to data, with 204 GB
holding useful data. The remaining 2 GB (*2) is allocated to metadata,
and 1 GB of that is actually used.

 If I would like to show the subvolume, i get
 
  gspe@jura:/mnt$ sudo btrfs subvolume list /
  gspe@jura:/mnt$

 nothing is shown!!!

   Try using the -a option. It got added a while ago, and has been a
complete pain in the neck ever since...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- The English language has the mot juste for every occasion. ---


signature.asc
Description: Digital signature


Re: btrfs-convert won't convert ext* - No valid Btrfs found on /dev/sdb1

2013-09-05 Thread Hugo Mills
On Thu, Sep 05, 2013 at 09:06:19PM +0600, Roman Mamedov wrote:
 On Thu, 5 Sep 2013 15:54:07 +0100
 Hugo Mills h...@carfax.org.uk wrote:
 
  On Thu, Sep 05, 2013 at 05:43:27PM +0300, Тимофей Титовец wrote:
   Hello guys, i try to convert ext4 volume, but btrfs-convert show me error:
   No valid Btrfs found on file
   unable to open ctree
   conversion aborted.
   Ubuntu 13.04
   Kernel: 3.11
   btrfs-progs git version 0.20-git20130822~194aa4a13
   
   way to reproduce error:
   $ truncate -s 4G file
   $ mkfs.ext4 file #say yes to create fs on non block device.
   $ btrfs-convert file
No valid Btrfs found on file
unable to open ctree
conversion aborted.
  
 I'm guessing here, but I suspect you will need to create a loopback
  device so that btrfs-convert can look at it as a block device rather
  than as a file:
  
  # losetup -f --show file
  /dev/loop0
  # btrfs-convert /dev/loop0
  
 Hugo.
  
 
 Nope, just today I saw someone report the same problem in a blog comment:
 http://popey.com/blog/2013/09/02/fun-with-btrfs-on-ubuntu/#comment-9704

   It's the same person, in fact. I'd not seen that the one on popey's
blog was doing it with block devices. This does indeed look like a
fairly drastic bug...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Is it true that last known good on Windows XP --- 
boots into CP/M? 


signature.asc
Description: Digital signature


Re: btrfs-convert won't convert ext* - No valid Btrfs found on /dev/sdb1

2013-09-05 Thread Hugo Mills
On Thu, Sep 05, 2013 at 05:43:27PM +0300, Тимофей Титовец wrote:
 Hello guys, i try to convert ext4 volume, but btrfs-convert show me error:
 No valid Btrfs found on file
 unable to open ctree
 conversion aborted.
 Ubuntu 13.04
 Kernel: 3.11
 btrfs-progs git version 0.20-git20130822~194aa4a13
 
 way to reproduce error:
 $ truncate -s 4G file
 $ mkfs.ext4 file #say yes to create fs on non block device.
 $ btrfs-convert file
  No valid Btrfs found on file
  unable to open ctree
  conversion aborted.

   I'm guessing here, but I suspect you will need to create a loopback
device so that btrfs-convert can look at it as a block device rather
than as a file:

# losetup -f --show file
/dev/loop0
# btrfs-convert /dev/loop0

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---  
 headline (possibly apocryphal)  


signature.asc
Description: Digital signature


Re: [GIT PULL] Btrfs

2013-09-13 Thread Hugo Mills
On Fri, Sep 13, 2013 at 09:07:36AM -0400, Ric Wheeler wrote:
 On 09/12/2013 11:36 AM, Chris Mason wrote:
 Mark Fasheh's offline dedup work is also here.  In this case offline
 means the FS is mounted and active, but the dedup work is not done
 inline during file IO.   This is a building block where utilities  are
 able to ask the FS to dedup a series of extents.  The kernel takes
 care of verifying the data involved really is the same.  Today this
 involves reading both extents, but we'll continue to evolve the patches.
 
 Nice feature!
 
 Just a note, the offline label is really confusing. In other
 storage products, they typically call this out of band since you
 are online but not during the actual write in a synchronous way :)

   I knew there was a specific term for this, but couldn't remember
what it was. I've now updated the btrfs website's description(s) of
the feature to include out-of-band and in-band.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Once is happenstance; twice is coincidence; three times --- 
is enemy action. 


signature.asc
Description: Digital signature


Re: [raidX vs single/dup]

2013-09-26 Thread Hugo Mills
On Thu, Sep 26, 2013 at 12:22:49PM +, miaou sami wrote:
 Hi btrfs guys,
 
 could someone explain to me the differences in mkfs.btrfs:
 
 - between -d raid0 and -d single

   In RAID0, data is striped across all the devices, so the first 64k
of a file will go on device 1, the next 64k will go on device 2, and
so on. With single, files are allocated linearly on one device.

   (This is assuming smallish files, a filesystem with lots of space.
Even with single, files can still end up being scattered around over
multiple devices -- but with RAID0, even non-fragmented files are
striped)

 - between -m raid1 and -m dup

   In both cases, there are two copies of each metadata block. With
RAID1, it *requires* the two copies to live on different devices. With
DUP, it allows the two copies to live on the same device (e.g. if
there's only one device).

 - between -m raid0 and -m single

   As for -draid0 and -dsingle, but for metadata instead of data.

 My understanding is that raidX should be used in case of multi
 devices and single/dup should be used in case of single device to
 allow duplication, but it is not 100% clear to me...

 As btrfs raid concepts are quite different from traditionnal raid,
 shouldn't we use the words stripped and mirrored instead of
 raid0/raid1? or even single and duplicated?
 Then there would be no difference between single/raid0 and
 duplicated/raid1...

   But there _are_ differences between them, as explained above. :)

   I posted a patch a while ago to change the names to something more
logical and expressive, but it didn't get merged.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Stick them with the pointy end. --- 


signature.asc
Description: Digital signature


Re: [raidX vs single/dup]

2013-09-26 Thread Hugo Mills
On Thu, Sep 26, 2013 at 01:40:57PM +, miaou sami wrote:
 Thank you, it is quite clear now.
 
 
 I guess that on multi device, raid0 vs single would be a matter of 
 performance vs ease of low level hardware data recovery.
 
 
 The wiki 
 https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices 
 says:
 When you have drives with differing sizes and want to use the full capacity 
 of each drive, you have to use the single profile for the data blocks.
 Let's assume the following configuration: 1x10GB disk and 2x5GB disks
 -- Does it mean I cannot use the full capacity AND have a duplication of my 
 data in the configuration above? (full capacity would be 10GB here)

   No, that will give you the full usable space. A 20 GB drive and two
5 GB drives would not, though.

 -- If I try to setup either -d raid1 or -d dup on that
 configuration, what will I get?

   Try it for yourself in the space simulator:

http://carfax.org.uk/btrfs-usage/

 -- Is there any behavior difference between raid1 / dup in that case?

   If you have multiple disks, I think DUP gets automatically upgraded
to RAID-1 (i.e. the different copies on different devices
requirement is enforced). So, no.

 -- Can raid1 ensure that data are always duplicated on different devices AND 
 take advantage of all available space?

   Depends on the relative sizes of the devices. If your largest
device is bigger than the rest put together, then you'll lose some
space.

   Hugo.

 Regards,
 Sam
 
 
 
  Date: Thu, 26 Sep 2013 13:32:33 +0100
  From: h...@carfax.org.uk
  To: miaous...@hotmail.com
  CC: linux-btrfs@vger.kernel.org
  Subject: Re: [raidX vs single/dup]
 
  On Thu, Sep 26, 2013 at 12:22:49PM +, miaou sami wrote:
  Hi btrfs guys,
 
  could someone explain to me the differences in mkfs.btrfs:
 
  - between -d raid0 and -d single
 
  In RAID0, data is striped across all the devices, so the first 64k
  of a file will go on device 1, the next 64k will go on device 2, and
  so on. With single, files are allocated linearly on one device.
 
  (This is assuming smallish files, a filesystem with lots of space.
  Even with single, files can still end up being scattered around over
  multiple devices -- but with RAID0, even non-fragmented files are
  striped)
 
  - between -m raid1 and -m dup
 
  In both cases, there are two copies of each metadata block. With
  RAID1, it *requires* the two copies to live on different devices. With
  DUP, it allows the two copies to live on the same device (e.g. if
  there's only one device).
 
  - between -m raid0 and -m single
 
  As for -draid0 and -dsingle, but for metadata instead of data.
 
  My understanding is that raidX should be used in case of multi
  devices and single/dup should be used in case of single device to
  allow duplication, but it is not 100% clear to me...
 
  As btrfs raid concepts are quite different from traditionnal raid,
  shouldn't we use the words stripped and mirrored instead of
  raid0/raid1? or even single and duplicated?
  Then there would be no difference between single/raid0 and
  duplicated/raid1...
 
  But there _are_ differences between them, as explained above. :)
 
  I posted a patch a while ago to change the names to something more
  logical and expressive, but it didn't get merged.
 
  Hugo.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Nothing right in my left brain. Nothing left in --- 
 my right brain. 


signature.asc
Description: Digital signature


Re: [raidX vs single/dup]

2013-09-26 Thread Hugo Mills
On Thu, Sep 26, 2013 at 02:55:38PM +, miaou sami wrote:
 OK, that's clear.
 Nice space simulator btw :-) you should add a link somewhere in btrfs wiki...

   There is one, linked from the first line of the relevant section in
the FAQ.

   Hugo.

 Thanks
 
  Date: Thu, 26 Sep 2013 14:46:05 +0100
  From: h...@carfax.org.uk
  To: miaous...@hotmail.com
  CC: linux-btrfs@vger.kernel.org
  Subject: Re: [raidX vs single/dup]
 
  On Thu, Sep 26, 2013 at 01:40:57PM +, miaou sami wrote:
  Thank you, it is quite clear now.
 
 
  I guess that on multi device, raid0 vs single would be a matter of 
  performance vs ease of low level hardware data recovery.
 
 
  The wiki 
  https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices 
  says:
  When you have drives with differing sizes and want to use the full 
  capacity of each drive, you have to use the single profile for the data 
  blocks.
  Let's assume the following configuration: 1x10GB disk and 2x5GB disks
  -- Does it mean I cannot use the full capacity AND have a duplication of 
  my data in the configuration above? (full capacity would be 10GB here)
 
  No, that will give you the full usable space. A 20 GB drive and two
  5 GB drives would not, though.
 
  -- If I try to setup either -d raid1 or -d dup on that
  configuration, what will I get?
 
  Try it for yourself in the space simulator:
 
  http://carfax.org.uk/btrfs-usage/
 
  -- Is there any behavior difference between raid1 / dup in that case?
 
  If you have multiple disks, I think DUP gets automatically upgraded
  to RAID-1 (i.e. the different copies on different devices
  requirement is enforced). So, no.
 
  -- Can raid1 ensure that data are always duplicated on different devices 
  AND take advantage of all available space?
 
  Depends on the relative sizes of the devices. If your largest
  device is bigger than the rest put together, then you'll lose some
  space.
 
  Hugo.
 
  Regards,
  Sam
 
 
  
  Date: Thu, 26 Sep 2013 13:32:33 +0100
  From: h...@carfax.org.uk
  To: miaous...@hotmail.com
  CC: linux-btrfs@vger.kernel.org
  Subject: Re: [raidX vs single/dup]
 
  On Thu, Sep 26, 2013 at 12:22:49PM +, miaou sami wrote:
  Hi btrfs guys,
 
  could someone explain to me the differences in mkfs.btrfs:
 
  - between -d raid0 and -d single
 
  In RAID0, data is striped across all the devices, so the first 64k
  of a file will go on device 1, the next 64k will go on device 2, and
  so on. With single, files are allocated linearly on one device.
 
  (This is assuming smallish files, a filesystem with lots of space.
  Even with single, files can still end up being scattered around over
  multiple devices -- but with RAID0, even non-fragmented files are
  striped)
 
  - between -m raid1 and -m dup
 
  In both cases, there are two copies of each metadata block. With
  RAID1, it *requires* the two copies to live on different devices. With
  DUP, it allows the two copies to live on the same device (e.g. if
  there's only one device).
 
  - between -m raid0 and -m single
 
  As for -draid0 and -dsingle, but for metadata instead of data.
 
  My understanding is that raidX should be used in case of multi
  devices and single/dup should be used in case of single device to
  allow duplication, but it is not 100% clear to me...
 
  As btrfs raid concepts are quite different from traditionnal raid,
  shouldn't we use the words stripped and mirrored instead of
  raid0/raid1? or even single and duplicated?
  Then there would be no difference between single/raid0 and
  duplicated/raid1...
 
  But there _are_ differences between them, as explained above. :)
 
  I posted a patch a while ago to change the names to something more
  logical and expressive, but it didn't get merged.
 
  Hugo.
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The trouble with you, Ibid, is you think you know everything. ---  
 


signature.asc
Description: Digital signature


Re: csum questions

2013-09-27 Thread Hugo Mills
On Fri, Sep 27, 2013 at 04:22:16PM +0200, Tom Gundersen wrote:
 Hi guys,
 
 I have some questions about btrfs' handling of invalid csums.
 
 For the sake of argument I'm assuming no raid or anything like that
 (so only one copy exists of every file).
 
 When I try to access a file whose csum does not match, btrfs logs an
 error and refuses access to the file. I have two questions about this:
 
 1) What happens to the file. Will btrfs just leave it alone, or will
 it be deleted from disk (I seem to remember reading this somewhere,
 just want to confirm)?

   It's left there.

 2) How may I tell btrfs to ignore all csums and just assume they are
 all correct? The reason for wanting this is in case the csum is
 garbled and the file is intact, or the csum is correct and the file is
 only partially garbled, but may still contain useful data.

   You can't, right now. There's discussion on IRC about this very
point right now. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- A clear conscience.  Where did you get this taste ---
 for luxuries,  Bernard? 


signature.asc
Description: Digital signature


extlinux and btrfs RAID-1

2013-09-27 Thread Hugo Mills
   I'm trying to get a system booting, and I'm having something of a
hard time with it. I'd like to check whether anyone's managed to do
what I'm attempting, and whether I'm doing something silly, or just
need to upgrade something.

   I've got two disks, /dev/sda and /dev/sdb, each partitioned the
same way, with GPTs. The second partition on each is part of a RAID-1
(data and metadata) btrfs, with no compression.

# btrfs fi show
Label: 'amelia'  uuid: cba252b5-af1b-4f31-9f8f-191ef66f777d
   Total devices 2 FS bytes used 1.03GB
   devid1 size 275.48GB used 3.04GB path /dev/sda2
   devid2 size 275.48GB used 3.03GB path /dev/sdb2

   I have the gptmbr.bin from extlinux installed on the boot sector of
each device:

# cat /usr/lib/syslinux/gptmbr.bin /dev/sda
# cat /usr/lib/syslinux/gptmbr.bin /dev/sdb

   I've attempted to install extlinux from a chroot:

# extlinux --install /boot/extlinux

   This is extlinux 4.05, which claims (on the syslinux website) to
support btrfs.

   When I boot the machine from its disks, I'm being told that
extlinux only supports single-disk btrfs. Is this still the case? Or
am I just using a version that's far too old? (Looks like there's a
v6.01 available). I can't see a list of the limitations and
capabilities of syslinux and btrfs on the syslinux website.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Great oxymorons of the world, no. 9: Standard Deviation --- 


signature.asc
Description: Digital signature


Re: extlinux and btrfs RAID-1

2013-09-27 Thread Hugo Mills
On Fri, Sep 27, 2013 at 02:12:36PM -0600, Chris Murphy wrote:
 
 On Sep 27, 2013, at 1:36 PM, Hugo Mills h...@carfax.org.uk wrote:
  
When I boot the machine from its disks, I'm being told that
  extlinux only supports single-disk btrfs. Is this still the case?
 
 I'm pretty sure the answer is yes. The last time I looked not that long ago 
 the multiple device scenario wasn't supported.

   Dammit.

   Thanks for the info. At least this means I don't have to struggle
with the syslinux error I've been getting... Back to grub, then.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- It used to take a lot of talent and a certain type of ---  
upbringing to be perfectly polite and have filthy manners
at the same time. Now all it needs is a computer.


signature.asc
Description: Digital signature


Re: extlinux and btrfs RAID-1

2013-09-27 Thread Hugo Mills
On Fri, Sep 27, 2013 at 03:04:22PM -0600, Chris Murphy wrote:
 
 On Sep 27, 2013, at 2:44 PM, Hugo Mills h...@carfax.org.uk wrote:
 
  On Fri, Sep 27, 2013 at 02:12:36PM -0600, Chris Murphy wrote:
  
  On Sep 27, 2013, at 1:36 PM, Hugo Mills h...@carfax.org.uk wrote:
  
   When I boot the machine from its disks, I'm being told that
  extlinux only supports single-disk btrfs. Is this still the case?
  
  I'm pretty sure the answer is yes. The last time I looked not that long 
  ago the multiple device scenario wasn't supported.
  
Dammit.
  
Thanks for the info. At least this means I don't have to struggle
  with the syslinux error I've been getting... Back to grub, then.
 
 I'm seeing in changelogs that 4.0 brought btrfs support, 4.06
 brought subvolume support. Nothing in changelogs for versions 5 and
 6 so far inclusive. And interestingly enough, Fedora's koji only has
 4.05 current for F20 and rawhide.

   Yeah, Debian have 4.05 in everything except experimental (which is
6.02~pre16).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I'll take your bet, but make it ten thousand francs. I'm only ---  
   a _poor_ corrupt official.


signature.asc
Description: Digital signature


Re: Questions regarding logging upon fsync in btrfs

2013-09-28 Thread Hugo Mills
On Sun, Sep 29, 2013 at 01:46:23AM +0200, Aastha Mehta wrote:
 I am using linux kernel 3.1.10-1.16, just to let you know.

   Not that it invalidates the questions below, but that's a really
old kernel. You should update to something recent (3.11, or 3.12-rc2)
as soon as possible. There are major problems in 3.1 (and most of the
subsequent kernels) that have been fixed in 3.11. Of course, there are
still major problems in 3.11 that haven't been fixed yet, but we don't
know about very many of those. :) (And when we do, we'll be
recommending that you upgrade to whatever has them fixed...)

   Hugo.

 Thanks
 
 On 29 September 2013 01:35, Aastha Mehta aasth...@gmail.com wrote:
  Hi,
 
  I have few questions regarding logging triggered by calling fsync in BTRFS:
 
  1. If I understand correctly, fsync will call to log entire inode in
  the log tree. Does this mean that the data extents are also logged
  into the log tree? Are they copied into the log tree, or just
  referenced? Are they copied into the subvolume's extent tree again
  upon replay?
 
  2. During replay, when the extents are added into the extent
  allocation tree, do they acquire the physical extent number during
  replay? Does they physical extent allocated to the data in the log
  tree differ from that in the subvolume?
 
  3. I see there is a mount option of notreelog available. After
  disabling tree logging, does fsync still lead to flushing of buffers
  to the disk directly?
 
  4. Is it possible to selectively identify certain files in the log
  tree and flush them to disk directly, without waiting for the replay
  to do it?
 
  Thanks
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Diablo-D3 My code is never released,  it escapes from the ---   
  git repo and kills a few beta testers on the way out.  


signature.asc
Description: Digital signature


Re: btrfs raid0

2013-10-04 Thread Hugo Mills
On Fri, Oct 04, 2013 at 04:15:22PM +, ray clancy wrote:
 How can I verify the read speed of a btrfs raid0 pair in archlinux.?
 
 I assume raid0 means striped activity in a paralleled mode at lease
 similar to raid0 in mdadm.
 
 How can I measure the btrfs read speed since it is copy-on-write
 which is not the norm in mdadm raid0.?

   Testing read speed... you're not writing, so there's no
copy-on-write involved there. Just test reading the way you would for
anything else.

 Perhaps I cannot use the same approach in btrfs to determine the
 performance.
 
 Secondly, I see a methodology for raid10 using the
 commandmkfs.btrfs -m raid10 -d raid10 /dev/sda/dev/sdb /dev/sdc
 /dev/sdd...
 
 Can I apply the parameters above for -m and -d for raid0?

   I'd certainly recommend it for testing RAID-0. :)

   Actually, a slightly more realistic test would be to use RAID-0 for
data and RAID-1 for metadata, because that's what most [default] users
of the FS will end up with.

 If using raid0 for two devices and add another device, is it striped
 as raid0 also or does the system change it to raid1.

   No, it'll remain as RAID-0. If you rebalance, then the data will
get striped across three devices instead of two.

 What happens to the speed of the system when a new device is added?
 Is it increased ?

   Assuming the FS is reading from all the devices, yes.

 Much I have at hand for mdadm software raido and it doubles the read
 speed.  What parallel exists in raid0 btrfs?  Or is it completely
 off base to expect a speed increase?

   In theory, you should be able to get the sum of the bandwidths of
all the devices (assuming sequential streaming reads). We don't have
any good benchmarks of this kind of thing, so when you do your tests,
please (a) make sure you do a decent experimental design, and (b)
publish the results. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I always felt that as a C programmer, I --- 
 was becoming typecast.  


signature.asc
Description: Digital signature


Re: Some questions after devices addition to existing raid 1 btrfs filesystem

2013-10-07 Thread Hugo Mills
On Mon, Oct 07, 2013 at 01:45:29PM +0200, Laurent Humblet wrote:
 I have added 2x2Tb to my existing 2x2Tb raid 1 btrfs filesystem and
 then ran a balance:
 
 # btrfs filesystem show
 Total devices 4 FS bytes used 1.74TB
 devid3 size 1.82TB used 0.00 path /dev/sdd
 devid4 size 1.82TB used 0.00 path /dev/sde
 devid2 size 1.82TB used 1.75TB path /dev/sdc
 devid1 size 1.82TB used 1.75TB path /dev/sdb
 # btrfs filesystem balance btrfs_root/
 # btrfs filesystem show
 Total devices 4 FS bytes used 1.74TB
 devid3 size 1.82TB used 892.00GB path /dev/sdd
 devid4 size 1.82TB used 892.00GB path /dev/sde
 devid2 size 1.82TB used 891.03GB path /dev/sdc
 devid1 size 1.82TB used 891.04GB path /dev/sdb
 
 It took 59 hours to complete the balance.
 
 I checked on a couple of files and all seems fine but I have some questions:
 - is there some kind of 'overall filesystem health/integrity check'
 that I should do on the filesystem now that the balance is done?

   See btrfs scrub start

 - also, I ran the command while some of the btrfs subvolumes were
 mounted (as well as the btrfs_root/ of course), does this impact on
 the balance job?

   No.

 - the mounted btrfs devices were mounted using -o
 space_cache,inode_cache but the btrfs_root/ was not, also, does this
 impact on the balance job?

   No.

 - about those options, a few months ago, I oftent had
 btrfs-cache-1/btrfs-endio-met processes taking some cpu/hd time.  I
 was advised to mount -o space_cache,inode_cache, which seems to have
 quiet the processes down.  Are those options still necessary now?

   No, once you've mounted with them once (and had the caches rebuilt)
they're not necessary to use any more.

 - as the job took 60+ hours but the CPU rarely went above 10%, the
 computer seemed still usable.  I left it do its job of course but
 could I have accessed or write anything on the subvolumes while the
 balance was running and if yes, would this have any impact on the
 filesystem?

   Absolutely, yes, you could have done. It would probably be slower
than normal to access the files while the balance is happening,
because the balance is using up I/O bandwidth, but other than that
there should be no impact.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- argc, argv, argh! ---


signature.asc
Description: Digital signature


Re: [PATCH v3 10/12] Btrfs-progs: add '--block-size' option to control print result

2013-10-08 Thread Hugo Mills
On Wed, Oct 09, 2013 at 12:54:03AM +0800, Shilong Wang wrote:
 Hi David,
 
 2013/10/8 David Sterba dste...@suse.cz:
  On Mon, Oct 07, 2013 at 03:21:46PM +0800, Wang Shilong wrote:
  You can use it like:
btrfs qgroup show --block-size=m mnt
 
  Here, block size supports k/K/m/M/g/G/t/T/p/P/e/E.

k = SI prefix, kilo
K = ? (IEEE prefix kibi?)
m = SI prefix, milli
M = SI prefix, mega
g = SI unit, grams
G = SI prefix, giga
t = ?
T = SI prefix, tera
p = SI prefix, pico
P = SI prefix, peta
e = ?
E = SI prefix, exa

   Some confusion here, I think. :)

  There is no distinction between the 1000 and 1024 based prefixes, also
  no way to get the raw values in bytes. I don't have a suggestion how to
  do that, merely letting you know that this could go separately (this and
  the -h patch, the rest shall be integrated).
 
 I implement this like the command 'du'.
 
 In default, we print result in bytes. And block size don't give a byte
 unit implicitly.

 Aslo i don't know why we need to distinct 1000 and 1024, i don't
 have any ideas about this.

   Because when you have a terabyte of data, the difference between
the two is 10%. If you're putting in this kind of infrastructure, it's
not much of an addition to report in either SI decimal or IEEE binary
scales.

  Also, the numbers in the table should be aligned to the right:
 
 Yes, this should be fixed.
 
 Thanks,
 Wang
 
  $ btrfs qgroup show -h -p /mnt/
  qgroupid rfer  excl  parent
       --
  0/5  900.00KiB 900.00KiB ---
  0/267688.00KiB 12.00KiB  1/5
  0/268684.00KiB 8.00KiB   1/5
  0/2696.71GiB   4.00KiB   1/1
  0/2776.71GiB   4.00KiB   1/1
  0/27839.74GiB  39.74GiB  1/2
  1/1  6.71GiB   6.71GiB   ---
  1/2  39.74GiB  39.74GiB  ---
  1/5  696.00KiB 696.00KiB ---

   Note that the SI mandate a space between the value and the unit.
Note also, for future reference, that SI use k for 10^3, whereas IEEE
use Ki for 2^10.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I'll take your bet, but make it ten thousand francs. I'm only ---  
   a _poor_ corrupt official.


signature.asc
Description: Digital signature


Re: [PATCH v3 10/12] Btrfs-progs: add '--block-size' option to control print result

2013-10-08 Thread Hugo Mills
On Tue, Oct 08, 2013 at 06:01:57PM +0100, Hugo Mills wrote:
 On Wed, Oct 09, 2013 at 12:54:03AM +0800, Shilong Wang wrote:
  Hi David,
  
  2013/10/8 David Sterba dste...@suse.cz:
   On Mon, Oct 07, 2013 at 03:21:46PM +0800, Wang Shilong wrote:
   You can use it like:
 btrfs qgroup show --block-size=m mnt
  
   Here, block size supports k/K/m/M/g/G/t/T/p/P/e/E.
 
 k = SI prefix, kilo
 K = ? (IEEE prefix kibi?)

   ... or SI unit, kelvin

 t = ?

   SI-accepted unit, tonne

 e = ?

   SI-accepted unit, charge on the electron

   Hugo. :)

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  


signature.asc
Description: Digital signature


warn_slowpath in clean_tree_block

2009-02-16 Thread Hugo Mills
   I've just started playing with Btrfs, and I'm getting a log full of
kernel warnings that look something like this:

Feb 16 09:02:17 vlad kernel: [ cut here ]
Feb 16 09:02:17 vlad kernel: WARNING: at fs/btrfs/disk-io.c:815 
clean_tree_block+0x9d/0xbb [btrfs]()
Feb 16 09:02:17 vlad kernel: Hardware name: System Product Name
Feb 16 09:02:17 vlad kernel: Modules linked in: btrfs zlib_deflate tcp_diag 
inet_diag kqemu cpufreq_userspace ipv6 nfsd nfs lockd nfs_acl auth_rpcgss 
sunrpc af_packet bridge stp llc xfs exportfs it87 hwmon_vid powernow_k8 sbp2 
ieee1394 ide_generic ide_gd_mod ide_cd_mod pcspkr evdev k8temp hwmon i2c_viapro 
i2c_core button dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod 
usbhid usb_storage libusual sg sr_mod cdrom via82cxxx floppy via_rhine mii 
ehci_hcd uhci_hcd usbcore pata_via ide_pci_generic ide_core sd_mod thermal 
processor fan unix
Feb 16 09:02:17 vlad kernel: Pid: 24129, comm: btrfs-endio-wri Tainted: G   
 W  2.6.29-rc4 #1
Feb 16 09:02:17 vlad kernel: Call Trace:
Feb 16 09:02:17 vlad kernel: [80228d7d] warn_slowpath+0xd8/0x111
Feb 16 09:02:17 vlad kernel: [80251d9e] 
__alloc_pages_internal+0xd2/0x3ec
Feb 16 09:02:17 vlad kernel: [8024d55d] 
add_to_page_cache_locked+0x52/0x9e
Feb 16 09:02:17 vlad kernel: [8024d5e9] 
add_to_page_cache_lru+0x40/0x58
Feb 16 09:02:17 vlad kernel: [8024dbd0] find_or_create_page+0x62/0x88
Feb 16 09:02:17 vlad kernel: [80313244] rb_insert_color+0xba/0xe2
Feb 16 09:02:17 vlad kernel: [a03f992a] 
alloc_extent_buffer+0x268/0x2ec [btrfs]
Feb 16 09:02:17 vlad kernel: [a03e1b18] clean_tree_block+0x9d/0xbb 
[btrfs]
Feb 16 09:02:17 vlad kernel: [a03d5eaf] 
btrfs_init_new_buffer+0x99/0xf3 [btrfs]
Feb 16 09:02:17 vlad kernel: [a03d849e] 
btrfs_alloc_free_block+0x83/0x8c [btrfs]
Feb 16 09:02:17 vlad kernel: [a03cda8b] split_leaf+0x159/0xa0a [btrfs]
Feb 16 09:02:17 vlad kernel: [a03f0de5] btrfs_item_offset+0xb3/0xbe 
[btrfs]
Feb 16 09:02:17 vlad kernel: [a03c96bc] leaf_space_used+0xb5/0xe8 
[btrfs]
Feb 16 09:02:17 vlad kernel: [a03d0ebd] btrfs_search_slot+0x917/0x99b 
[btrfs]
Feb 16 09:02:17 vlad kernel: [a03ef06a] 
btrfs_drop_extents+0xa75/0xab3 [btrfs]
Feb 16 09:02:17 vlad kernel: [a03d14ee] 
btrfs_insert_empty_items+0x7f/0x49d [btrfs]
Feb 16 09:02:17 vlad kernel: [a03e6f0f] 
insert_reserved_file_extent+0xd9/0x230 [btrfs]
Feb 16 09:02:17 vlad kernel: [a03fa44f] set_extent_bit+0x220/0x277 
[btrfs]
Feb 16 09:02:17 vlad kernel: [a03fadce] lock_extent+0x46/0x95 [btrfs]
Feb 16 09:02:17 vlad kernel: [a03e885c] 
btrfs_finish_ordered_io+0xfe/0x198 [btrfs]
Feb 16 09:02:17 vlad kernel: [a03fb4ff] 
end_bio_extent_writepage+0xa9/0x1b1 [btrfs]
Feb 16 09:02:17 vlad kernel: [a04023e4] worker_loop+0x5f/0x15e [btrfs]
Feb 16 09:02:17 vlad kernel: [a0402385] worker_loop+0x0/0x15e [btrfs]
Feb 16 09:02:17 vlad kernel: [a0402385] worker_loop+0x0/0x15e [btrfs]
Feb 16 09:02:17 vlad kernel: [80238269] kthread+0x47/0x73
Feb 16 09:02:17 vlad kernel: [8020c03a] child_rip+0xa/0x20
Feb 16 09:02:17 vlad kernel: [80238222] kthread+0x0/0x73
Feb 16 09:02:17 vlad kernel: [8020c030] child_rip+0x0/0x20
Feb 16 09:02:17 vlad kernel: ---[ end trace a315082d5647b979 ]---

   They're not all identical -- there are bits in the middle of the
trace that change. They tend to arrive in groups of 4-8 warnings very
close together, separated by 15-20 seconds without a warning. The
workload was encoding a video from another filesystem, onto the Btrfs
filesystem. It's quiet when there's nothing accessing the filesystem.

   The filesystem is 19GiB in size, residing on LVM-on-RAID-1 on a
2.6.29-rc4 kernel. It was created on 2.6.29-rc3, 20GiB in size using
btrfs tools 0.18-ge3b0f66, and shrunk online to its current size.

   I haven't found any similar reports on the mailing list, which
means either I've got something unusual, or something so blindingly
expected that nobody's bothered to mention it. I suspect the latter,
but I'm reporting it in case it's the former.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You got very nice eyes, Deedee. Never noticed them ---   
   before. They real?   


signature.asc
Description: Digital signature


btrfs: warn_slowpath in clean_tree_block and others

2009-02-24 Thread Hugo Mills
   This is essentially a repost of a mail I made last week, to which I
didn't get a reply.

   I'm getting huge numbers of kernel warnings whilst using
btrfs. They're all warn_slowpath, and all seem to be in
fs/btrfs/disk-io.c. I've included one typical example at the end of
this mail.

   Kernel versions are 2.6.29-rc2, -rc4 and -rc6.

   If I do lots of writes to my btrfs filesystem (e.g. video
encoding), I end up with a syslog in the tens-of-megabytes range. This
makes logcheck an unhappy bunny...

   I don't know if this behaviour is expected, and everyone using
btrfs simply puts up with it for now, or if it's something unusual
that needs investigating. On the chance that it's the latter, I'm
reporting it here.

   Hugo.

Feb 23 21:45:42 vlad kernel: [ cut here ]
Feb 23 21:45:42 vlad kernel: WARNING: at fs/btrfs/disk-io.c:815 
clean_tree_block+0x9d/0xbb [btrfs]()
Feb 23 21:45:42 vlad kernel: Hardware name: System Product Name
Feb 23 21:45:42 vlad kernel: Modules linked in: tun ext3 jbd btrfs zlib_deflate 
tcp_diag inet_diag kqemu cpufreq_userspace ipv6 nfsd nfs lockd nfs_acl 
auth_rpcgss sunrpc af_packet bridge stp llc xfs exportfs it87 hwmon_vid 
powernow_k8 sbp2 ieee1394 ide_generic ide_gd_mod ide_cd_mod pcspkr evdev k8temp 
hwmon i2c_viapro i2c_core button dm_mirror dm_region_hash dm_log dm_snapshot 
dm_mod raid1 md_mod usbhid usb_storage libusual sg sr_mod cdrom via82cxxx 
floppy via_rhine mii ehci_hcd uhci_hcd usbcore pata_via ide_pci_generic 
ide_core sd_mod thermal processor fan unix
Feb 23 21:45:42 vlad kernel: Pid: 27034, comm: hdparm Tainted: GW  
2.6.29-rc4 #1
Feb 23 21:45:42 vlad kernel: Call Trace:
Feb 23 21:45:42 vlad kernel: [80228d7d] warn_slowpath+0xd8/0x111
Feb 23 21:45:42 vlad kernel: [80312f11] radix_tree_insert+0xd7/0x19f
Feb 23 21:45:42 vlad kernel: [8024d55d] 
add_to_page_cache_locked+0x52/0x9e
Feb 23 21:45:42 vlad kernel: [8024d5e9] 
add_to_page_cache_lru+0x40/0x58
Feb 23 21:45:42 vlad kernel: [8024dbd0] find_or_create_page+0x62/0x88
Feb 23 21:45:42 vlad kernel: [a03f992a] 
alloc_extent_buffer+0x268/0x2ec [btrfs]
Feb 23 21:45:42 vlad kernel: [a03e1b18] clean_tree_block+0x9d/0xbb 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03d5eaf] 
btrfs_init_new_buffer+0x99/0xf3 [btrfs]
Feb 23 21:45:42 vlad kernel: [a03d849e] 
btrfs_alloc_free_block+0x83/0x8c [btrfs]
Feb 23 21:45:42 vlad kernel: [a03cb2f8] __btrfs_cow_block+0x1ff/0x87e 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03cc125] btrfs_cow_block+0x1e7/0x1f6 
[btrfs]
Feb 23 21:45:42 vlad kernel: [80251d9e] 
__alloc_pages_internal+0xd2/0x3ec
Feb 23 21:45:42 vlad kernel: [a03d0915] btrfs_search_slot+0x36f/0x99b 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03d14ee] 
btrfs_insert_empty_items+0x7f/0x49d [btrfs]
Feb 23 21:45:42 vlad kernel: [a03d825d] 
__btrfs_alloc_reserved_extent+0x19f/0x2bb [btrfs]
Feb 23 21:45:42 vlad kernel: [a03d83f0] btrfs_alloc_extent+0x77/0xa2 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03d847f] 
btrfs_alloc_free_block+0x64/0x8c [btrfs]
Feb 23 21:45:42 vlad kernel: [a03cb2f8] __btrfs_cow_block+0x1ff/0x87e 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03d7532] 
finish_current_insert+0x514/0x528 [btrfs]
Feb 23 21:45:42 vlad kernel: [a03d7bf9] 
del_pending_extents+0xa5/0x33d [btrfs]
Feb 23 21:45:42 vlad kernel: [a03cc125] btrfs_cow_block+0x1e7/0x1f6 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03e436d] 
btrfs_commit_tree_roots+0x53/0x1ba [btrfs]
Feb 23 21:45:42 vlad kernel: [80403a3e] schedule_timeout+0xa1/0xbc
Feb 23 21:45:42 vlad kernel: [a03e55dd] 
btrfs_commit_transaction+0x322/0x6e5 [btrfs]
Feb 23 21:45:42 vlad kernel: [802385fb] 
autoremove_wake_function+0x0/0x2e
Feb 23 21:45:42 vlad kernel: [a03e4809] join_transaction+0x129/0x147 
[btrfs]
Feb 23 21:45:42 vlad kernel: [a03c8788] btrfs_sync_fs+0x70/0x78 
[btrfs]
Feb 23 21:45:42 vlad kernel: [8026f332] sync_filesystems+0xa8/0xde
Feb 23 21:45:42 vlad kernel: [80287256] do_sync+0x25/0x50
Feb 23 21:45:42 vlad kernel: [8028728f] sys_sync+0xe/0x13
Feb 23 21:45:42 vlad kernel: [8020b25b] system_call_fastpath+0x16/0x1b
Feb 23 21:45:42 vlad kernel: ---[ end trace a315082d564863a6 ]---


-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---  
 headline (possibly apocryphal)  


signature.asc
Description: Digital signature


Re: btrfs: warn_slowpath in clean_tree_block and others

2009-02-25 Thread Hugo Mills
On Wed, Feb 25, 2009 at 11:05:58AM -0500, Lee Trager wrote:
 But what are you doing to the filesystem when it crashes? How did you
 mount it?

   In my case, it's mounted with this fstab entry:

/dev/media/scratch  /media/vlad/video/video btrfs   noatime,nosuid,nodev
 0 0

and I can trigger hundreds (literally) of these backtraces with a
single touch /media/vlad/video/video/foo. If I encode a video to the
FS, the backtraces come in bursts at intervals of, say, 20 seconds
(it's not perfectly regular).

   Hugo.

 On Wed, Feb 25, 2009 at 08:03:01AM -0600, Mitch Harder (aka DontPanic) wrote:
  I've been creating a local git repository of full btrfs-unstable sources.
  
  I'll create a new branch off the master branch, and apply the patch
  supplied in the Feb. 11 message to the M/L.
  
  I then create a kernel module based on the results in /fs/btrfs/
  
  I have also tried replicating the experimental branch, and merging the
  patch into that branch, but I get the same results.
  
  On Wed, Feb 25, 2009 at 12:26 AM, Lee Trager l...@cs.drexel.edu wrote:
   Mitch, I haven't seen any problems using BTRFS and my patch on 2.6.28 or
   2.6.27, what are you doing to cause this error? Are you using the latest
   sources from btrfs-unstable?
  
   Lee
  
   Mitch Harder (aka DontPanic) wrote:
   I have also been getting similar warnings filling up my logs.
  
   However, in my case, I have been experimenting with back-porting btrfs
   to a 2.6.28 kernel. ?So I've been waiting for the back-porting efforts
   to get a little further along.
  
   But I thought I'd respond in case this information helps.
  
   Here's an example of the warnings I've been seeing:
  
   [80577.151167] [ cut here ]
   [80577.151169] WARNING: at
   /var/tmp/portage/sys-fs/btrfs-9998/work/btrfs-9998/disk-io.c:860
   clean_tree_block+0xa4/0xb0 [btrfs]()
   [80577.151172] Modules linked in: btrfs snd_pcm_oss snd_mixer_oss
   snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
   ipv6 ppdev snd_intel8x0 snd_ac97_codec parport_pc nvidia(P) ac97_bus
   snd_pcm snd_timer ohci_hcd ssb shpchp pci_hotplug pcmcia i2c_nforce2
   snd forcedeth sr_mod pcspkr parport i2c_core snd_page_alloc nvidia_agp
   sl811_hcd pcmcia_core uhci_hcd ehci_hcd
   [80577.151190] Pid: 11503, comm: cp Tainted: P ? ? ? ?W 
   ?2.6.28-sabayon-r10 #1
   [80577.151192] Call Trace:
   [80577.151195] ?[c011e77f] warn_on_slowpath+0x5f/0x90
   [80577.151203] ?[c043c427] rb_insert_color+0x77/0xe0
   [80577.151221] ?[f8c28e9e] alloc_extent_buffer+0x1fe/0x300 [btrfs]
   [80577.151238] ?[f8c08d54] clean_tree_block+0xa4/0xb0 [btrfs]
   [80577.151253] ?[f8bf665d] btrfs_init_new_buffer+0x7d/0x130 [btrfs]
   [80577.151269] ?[f8bfb6f4] btrfs_alloc_free_block+0x104/0x110 [btrfs]
   [80577.151285] ?[f8bef3da] __btrfs_cow_block+0x22a/0x8b0 [btrfs]
   [80577.151300] ?[f8bed212] generic_bin_search+0x162/0x1c0 [btrfs]
   [80577.151315] ?[f8bf00e6] btrfs_cow_block+0x156/0x200 [btrfs]
   [80577.151330] ?[f8bf3267] btrfs_search_slot+0x1a7/0x910 [btrfs]
   [80577.151333] ?[c01230e7] irq_exit+0x27/0x60
   [80577.151336] ?[c01052cb] do_IRQ+0x6b/0x80
   [80577.151354] ?[f8c24a55] read_extent_buffer+0xd5/0x170 [btrfs]
   [80577.151369] ?[f8bf3f7d] btrfs_insert_empty_items+0x6d/0x410 [btrfs]
   [80577.151385] ?[f8bf8f4f] btrfs_find_block_group+0xff/0x1a0 [btrfs]
   [80577.151402] ?[f8c0fa1d] btrfs_new_inode+0x18d/0x360 [btrfs]
   [80577.151420] ?[f8c135a9] btrfs_create+0x189/0x2a0 [btrfs]
   [80577.151423] ?[c04162d9] security_capable+0x9/0x10
   [80577.151427] ?[c0197f3d] vfs_create+0xcd/0x160
   [80577.151430] ?[c019ad6f] do_filp_open+0x5af/0x7d0
   [80577.151433] ?[c01932e9] cp_new_stat64+0xf9/0x110
   [80577.151436] ?[c018e40e] do_sys_open+0x4e/0xe0
   [80577.151439] ?[c018e51c] sys_open+0x2c/0x40
   [80577.151442] ?[c0103165] sysenter_do_call+0x12/0x21
   [80577.151444] ---[ end trace 79cdc48bc88dedf7 ]---
  
  
   On Tue, Feb 24, 2009 at 5:02 PM, Hugo Mills hugo-l...@carfax.org.uk 
   wrote:
  
   ? This is essentially a repost of a mail I made last week, to which I
   didn't get a reply.
  
   ? I'm getting huge numbers of kernel warnings whilst using
   btrfs. They're all warn_slowpath, and all seem to be in
   fs/btrfs/disk-io.c. I've included one typical example at the end of
   this mail.
  
   ? Kernel versions are 2.6.29-rc2, -rc4 and -rc6.
  
   ? If I do lots of writes to my btrfs filesystem (e.g. video
   encoding), I end up with a syslog in the tens-of-megabytes range. This
   makes logcheck an unhappy bunny...
  
   ? I don't know if this behaviour is expected, and everyone using
   btrfs simply puts up with it for now, or if it's something unusual
   that needs investigating. On the chance that it's the latter, I'm
   reporting it here.
  
   ? Hugo.
  
   Feb 23 21:45:42 vlad kernel: [ cut here ]
   Feb 23 21:45:42 vlad kernel: WARNING: at fs/btrfs/disk-io.c:815 
   clean_tree_block+0x9d/0xbb [btrfs]()
   Feb 23

Entirely unexpected ENOSPC?

2009-03-04 Thread Hugo Mills
 
[dm_mod]
Mar  4 01:55:52 vlad kernel: [80284afa] ? 
generic_sync_sb_inodes+0x287/0x3e4
Mar  4 01:55:52 vlad kernel: [80284dbe] ? writeback_inodes+0x68/0xa1
Mar  4 01:55:52 vlad kernel: [80252e10] ? wb_kupdate+0x8b/0xfd
Mar  4 01:55:52 vlad kernel: [8025374b] ? pdflush+0x0/0x1b5
Mar  4 01:55:52 vlad kernel: [8025374b] ? pdflush+0x0/0x1b5
Mar  4 01:55:52 vlad kernel: [80253869] ? pdflush+0x11e/0x1b5
Mar  4 01:55:52 vlad kernel: [80252d85] ? wb_kupdate+0x0/0xfd
Mar  4 01:55:52 vlad kernel: [802383f1] ? kthread+0x47/0x73
Mar  4 01:55:52 vlad kernel: [8020c07a] ? child_rip+0xa/0x20
Mar  4 01:55:52 vlad kernel: [802383aa] ? kthread+0x0/0x73
Mar  4 01:55:52 vlad kernel: [8020c070] ? child_rip+0x0/0x20
Mar  4 01:55:52 vlad kernel: Code: 8b 83 b8 00 00 00 48 8d 98 48 ff ff ff 48 8b 
83 b8 00 00 00 0f 18 08 48 8d 83 b8 00 00 00 48 39 c5 75 b0 4c 89 e7 e8 63 42 
fe df 0f 0b eb fe 48 83 c4 38 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3
Mar  4 01:55:52 vlad kernel: RIP  [a0256b6b] 
__btrfs_reserve_extent+0x296/0x2ab [btrfs]
Mar  4 01:55:52 vlad kernel: RSP 88003ea618d0
Mar  4 01:55:52 vlad kernel: ---[ end trace eb8a7132a207a474 ]---

   Now, to my untrained eye, this looks like it might be an ENOSPC
problem, and thus wouldn't be entirely unexpected, except for one
thing:

h...@vlad:~ $ df -h
FilesystemSize  Used Avail Use% Mounted on
[...]
/dev/mapper/media-scratch
   41G   17G   25G  42% /media/vlad/video/video

   The filesystem was nowhere near full, and I wasn't expecting it to
become anywhere near full. The only thing that writes to the
filesystem is deliberately coded to leave several gigabytes of space
free.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Nothing wrong with being written in Perl... Some of my best ---   
  friends are written in Perl.   


signature.asc
Description: Digital signature


Re: Entirely unexpected ENOSPC?

2009-03-04 Thread Hugo Mills
On Wed, Mar 04, 2009 at 01:50:53PM -0500, Josef Bacik wrote:
 On Wed, Mar 04, 2009 at 06:06:19PM +, Hugo Mills wrote:
 Last night, this event jammed up a good chunk of my server:
  
  Mar  4 01:51:36 vlad kernel: btrfs searching for 1716224 bytes, num_bytes 
  1716224, loop 2, allowed_alloc 1
  Mar  4 01:51:36 vlad kernel: btrfs searching for 860160 bytes, num_bytes 
  860160, loop 2, allowed_alloc 1
  [lots of this...]
  Mar  4 01:55:52 vlad kernel: btrfs searching for 4096 bytes, num_bytes 
  4096, loop 2, allowed_alloc 1
  Mar  4 01:55:52 vlad kernel: btrfs allocation failed flags 1, wanted 4096
  Mar  4 01:55:52 vlad kernel: space_info has 0 free, is full
  Mar  4 01:55:52 vlad kernel: block group 12582912 has 8388608 bytes, 
  8388608 used 0 pinned 0 reserved
  Mar  4 01:55:52 vlad kernel: 0 blocks of free space at or bigger than bytes 
  is
  Mar  4 01:55:52 vlad kernel: block group 1103101952 has 1073741824 bytes, 
  1073741824 used 0 pinned 0 reserved
  Mar  4 01:55:52 vlad kernel: 0 blocks of free space at or bigger than bytes 
  is
  [30 more lines of this]
 
 So yeah thats expected, you ran out of space.  The key thing is this
 
 Mar  4 01:55:52 vlad kernel: space_info has 0 free, is full
 
 If space_info has 0 free and is full, then there is no space to allocate for 
 it
 and its completely used.  I'd recommend switching to the -rc7 kernel since 
 that
 has things in place to keep this from happening as often.  Thanks,

   I'll do that.

   However, what's confusing me is that the filesystem was reported as
less than half full (17/41GiB used) at the time that it decided it had
no space. Is there any likely explanation for that behaviour?

   I've used btrfsctl to resize it online several times: shrink by
1GiB, then enlarge by 12, 10, 10GiB. Might that have been a factor?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- How do you become King?  You stand in the marketplace and ---
  announce you're going to tax everyone. If you get out  
   alive, you're King.   


signature.asc
Description: Digital signature


Online resize vs ENOSPC

2009-03-09 Thread Hugo Mills
   After an online resize, the filesystem reports its new size, but
still runs out of space at the old size:

Mar  9 08:12:59 vlad kernel: no space left, need 4096, 380928 delalloc bytes, 
51509866496 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 
may use51510247424 total
[...]
Mar  9 08:14:21 vlad kernel: no space left, need 4096, 0 delalloc bytes, 
51510247424 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 
may use51510247424 total

h...@vlad:~ $ df -h
FilesystemSize  Used Avail Use% Mounted on
[...]
/dev/mapper/media-scratch
   70G   48G   23G  68% /media/vlad/video/video

   This was online resized from 50G to 70G, using:

$ sudo lvresize media/scratch -L 70G
$ sudo btrfsctl -r 70G /media/vlad/video/video

   Version numbers:

$ btrfsctl
[...]
Btrfs v0.18-ge3b0f66
$ uname -a
Linux vlad 2.6.29-rc7 #1 Fri Mar 6 23:32:13 GMT 2009 x86_64 GNU/Linux

   Unmounting and remounting the filesystem seems to make the new
space available for use again.

   This is the second time I've had this happen to me now, so it seems
to be more-or-less reproducible, although I haven't deliberately tried
to trigger the behaviour yet.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Klytus! Are your men on the right pills? Maybe you should ---
 execute their trainer!  


signature.asc
Description: Digital signature


Re: Entirely unexpected ENOSPC?

2009-03-09 Thread Hugo Mills
On Mon, Mar 09, 2009 at 07:08:16AM -0600, Yien Zheng wrote:
 At this point I'm wondering if this is a anomaly or if it has anything
 to do with using an SSD.  It seems the pre-2.7.29-rc7 code had a hard
 stop at 85%.  But the recent patch doesn't seem to have solve the
 issue for me.  Is there another issue that makes btrfs want to reserve
 2G free?  I see another email with someone growing their filesystem
 from 48G to 70G because they ran out of space on their 50G disk, which
 should still have 2G free.

   Not quite -- I was some 5G free on a 50G filesystem, without
errors. I expanded the filesystem online to 70G because I knew I would
run out within the next few hours. Despite the expansion, it still ran
out at (just short of) 50G.

   Unless you've resized your filesystem online, I think we're seeing
different problems.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Do not meddle in the affairs of system administrators,  for ---   
  they are subtle,  and quick to anger.  


signature.asc
Description: Digital signature


Re: Online resize vs ENOSPC

2009-03-09 Thread Hugo Mills
On Mon, Mar 09, 2009 at 10:31:41AM +, Hugo Mills wrote:
After an online resize, the filesystem reports its new size, but
 still runs out of space at the old size:
[...]
Unmounting and remounting the filesystem seems to make the new
 space available for use again.
 
This is the second time I've had this happen to me now, so it seems
 to be more-or-less reproducible, although I haven't deliberately tried
 to trigger the behaviour yet.

   Just to confirm, I can indeed reproduce it trivially:

$ sudo lvcreate scratch -n testresize -L 5G
$ sudo mkfs.btrfs /dev/scratch/testresize
$ sudo mount /dev/scratch/testresize /mnt
$ sudo chmod ug+w /mnt
$ sudo chown hrm. /mnt
$ cd /mnt
$ dd if=/dev/zero of=foo.txt bs=1M count=4096
$ sudo lvextend scratch/testresize -L 9G
$ sudo btrfsctl -r 9G /mnt
$ dd if=/dev/zero of=foo2.txt bs=1M count=4096

and I get an out-of-space error within a few hundred blocks.

$ cd ..
$ sudo umount /mnt
$ sudo mount /dev/scratch/testresize /mnt
$ cd /mnt
$ dd if=/dev/zero of=foo2.txt bs=1M count=4096

and then I can write the full 4G of data.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There are three mistaiks in this sentance. ---


signature.asc
Description: Digital signature


Problem with renaming devices

2009-04-06 Thread Hugo Mills
   There seems to be some issue over changing the names of the device
that a btrfs filesystem lives on:

# lvcreate scratch -n fstest -L 2G
  Logical volume fstest created
# mkfs -t btrfs /dev/scratch/fstest

WARNING! - Btrfs v0.18-ge3b0f66 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/scratch/fs1
nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
Btrfs v0.18-ge3b0f66

# mount /dev/scratch/fstest /mnt
# umount /mnt

# lvrename scratch fstest derek
  Renamed fstest to derek in volume group scratch
# mount /dev/scratch/derek /mnt
mount: /dev/mapper/scratch-derek: can't read superblock

# lvrename scratch derek fstest
  Renamed derek to fstest in volume group scratch
# mount /dev/scratch/fstest /mnt
[success]

   The rename works properly on a completely virgin filesystem, but
not on one that's been mounted and unmounted (as above).

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Happiness is mandatory.  Are you happy? --- 


signature.asc
Description: Digital signature


Re: ENOSPC at 94% full -- and causing BUGs elsewhere?

2009-10-12 Thread Hugo Mills
On Sun, Oct 04, 2009 at 08:06:30AM -0400, Chris Mason wrote:
 On Sat, Oct 03, 2009 at 05:55:32PM -0400, Josef Bacik wrote:
  On Sat, Oct 03, 2009 at 01:21:09PM +0100, Hugo Mills wrote:
  I've just had the following on my home server. I believe that it's
   btrfs that's responsible, as the machine wasn't doing much other than
   reading/writing on a btrfs filesystem. The process that was doing so
   is now stuck in D+ state, and can't be killed. The timing of the oops
   at the end is also suggestive of being involved in the same incident.
   This is the only btrfs filesystem on the machine.
  
  Patches have gone to Linus to fix the enospc problems.  You can try running 
  the
  enospc branch of Chris's git tree and it should behave better for you.  
  Thanks,
 
 The right tree for this is the master branch of btrfs-unstable for
 2.6.31.

   Thanks, Josef and Chris. I've now found the time to check out and
build the btrfs-unstable tree, and it is indeed handling the ENOSPC
condition much more cleanly.

   However, it seems to have got into a position where I have lots of
free space reported by df (over 10% of the size of the volume -- 185
GiB free of 1474 GiB total), but still refuses to write anything to
the filesystem. Do you have any suggestions for what I could try?

   The original ENOSPC error I reported above happened at
approximately 85/1370 GiB free; I then added 100 GiB more space
online, had another failure (same kernel: 2.6.31 mainline), and then
rebooted into master from btrfs-unstable.

   Just for the record, I'm now using this kernel:

Linux vlad 2.6.31-47417-gac6889c #1 Sun Oct 11 14:27:06 BST 2009 x86_64 
GNU/Linux

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I'll take your bet, but make it ten thousand francs. I'm only ---  
   a _poor_ corrupt official.


signature.asc
Description: Digital signature


Re: ENOSPC at 94% full -- and causing BUGs elsewhere?

2009-10-13 Thread Hugo Mills
On Tue, Oct 13, 2009 at 10:58:12PM +0800, Yan, Zheng  wrote:
 On Tue, Oct 13, 2009 at 10:50 PM, Hugo Mills hugo-l...@carfax.org.uk wrote:
  On Tue, Oct 13, 2009 at 06:31:45AM -0400, Chris Mason wrote:
  On Mon, Oct 12, 2009 at 03:09:35PM +0100, Hugo Mills wrote:
   On Sun, Oct 04, 2009 at 08:06:30AM -0400, Chris Mason wrote:
On Sat, Oct 03, 2009 at 05:55:32PM -0400, Josef Bacik wrote:
 On Sat, Oct 03, 2009 at 01:21:09PM +0100, Hugo Mills wrote:
     I've just had the following on my home server. I believe that 
  it's
  btrfs that's responsible, as the machine wasn't doing much other 
  than
  reading/writing on a btrfs filesystem. The process that was doing 
  so
  is now stuck in D+ state, and can't be killed. The timing of the 
  oops
  at the end is also suggestive of being involved in the same 
  incident.
  This is the only btrfs filesystem on the machine.

 Patches have gone to Linus to fix the enospc problems.  You can try 
 running the
 enospc branch of Chris's git tree and it should behave better for 
 you.  Thanks,
   
The right tree for this is the master branch of btrfs-unstable for
2.6.31.
  
      Thanks, Josef and Chris. I've now found the time to check out and
   build the btrfs-unstable tree, and it is indeed handling the ENOSPC
   condition much more cleanly.
  
      However, it seems to have got into a position where I have lots of
   free space reported by df (over 10% of the size of the volume -- 185
   GiB free of 1474 GiB total), but still refuses to write anything to
   the filesystem. Do you have any suggestions for what I could try?
 
  You've probably got most of that 10GB free allocated as metadata.  You
  could try btrfs-vol -b.
 
    I moved some 13 GiB of data off the filesystem, and ran
  btrfs-vol -b. As I reported on IRC, I then got this in my syslog:
 
  Oct 13 13:16:19 vlad kernel: btrfs: relocating block group 1401224691712 
  flags 1
  Oct 13 13:17:02 vlad kernel: btrfs: found 123 extents
  Oct 13 13:17:10 vlad kernel: btrfs: found 123 extents
  Oct 13 13:17:11 vlad kernel: btrfs: found 28 extents
  Oct 13 13:17:21 vlad kernel: btrfs: found 28 extents
  Oct 13 13:17:25 vlad kernel: btrfs: found 28 extents
  Oct 13 13:17:26 vlad kernel: btrfs: found 27 extents
  Oct 13 13:17:36 vlad kernel: btrfs: found 27 extents
  Oct 13 13:17:39 vlad kernel: btrfs: found 27 extents
  Oct 13 13:17:48 vlad kernel: btrfs: found 27 extents
  ... repeat forever (or at least for 50 minutes or so).
 
    The btrfs-vol -b process didn't respond to ^C, so on advice of
  yanzheng on IRC I rebooted the machine. I'm currently running a
  btrfsck on the filesystem, and will try btrfs-vol -b again when that's
  done.
 
 don't do that, It will run into infinite loop again.

   I got this from the btrfsck:

h...@vlad:~ $ sudo btrfsck /dev/media/scratch 
root 5 inode 3949 errors 2000
found 1366552736241 bytes used err is 1
total csum bytes: 1336783032
total tree bytes: 1944158208
total fs tree bytes: 20267008
btree space waste bytes: 462357950
file data blocks allocated: 1368865824768
 referenced 1368851816448
Btrfs Btrfs v0.19

   I guess that means that there were errors found -- is the btrfs-vol
-b still going to cause an infinite loop, or is it worth trying that
again?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Quantum Mechanics: the dreams stuff is made of. --- 


signature.asc
Description: Digital signature


Re: To loop or not to loop with btrfs

2009-11-18 Thread Hugo Mills
On Wed, Nov 18, 2009 at 10:31:53PM +0100, Jan Engelhardt wrote:
 This left me puzzled for a while:
 
 22:29 borg:/ # losetup /dev/loop1 /.B.disk
 22:29 borg:/ # mount /dev/loop1 /B
 mount: /dev/loop1: can't read superblock
 22:29 borg:/ # blkid /dev/loop1
 /dev/loop1: UUID=e19fe89b-cde3-4ccc-bc70-b759a57bd1c9
 UUID_SUB=f29c6218-d040-4546-a227-4dd2d2142817 TYPE=btrfs 
 22:29 borg:/ # losetup -d /dev/loop1
 22:29 borg:/ # losetup /dev/loop2 /.B.disk 
 22:29 borg:/ # mount /dev/loop2 /B
 (success)
 
 So the btrfs volume is tied to loop2? That certainly is not good.
 Even real disks (/dev/sd*) can move around, the more so USB flash
 gadgets and loop devices.

   This looks like it might be related to [1]? (I suspect it slipped
Chris's mind back in April, and nobody's really noticed it since).

   Hugo.

[1] http://article.gmane.org/gmane.comp.file-systems.btrfs/2817

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Most administrators wouldn't give their users the time of ---
  day.  That's what NTP is for.  


signature.asc
Description: Digital signature


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-22 Thread Hugo Mills
On Wed, Dec 22, 2010 at 12:39:15PM -0800, Andy Isaacson wrote:
 On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote:
  On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote:
   I have a directory with 1.2M files in it, which makes readdir very slow
   on btrfs with cold caches (although it's reasonably fast with hot caches
   as in the first example below):
  
  Sounds like:
  
  Bug 21562 - btrfs is dead slow due to fragmentation
  https://bugzilla.kernel.org/show_bug.cgi?id=21562
 
 Hmmm, how do I look at the btree layout for a given inode?

   There's documentation on the tree structures at [1] and [2]. If you
know the inode number of the object you're interested in, you need to
look in the FS tree for the subvolume it's in and find the
(inode_number, EXTENT_DATA, ...) keys for the file. Each of those
records will reference an individual disk extent -- and you can get
the disk start position and length of the extent from the data stored
under the key.

   Hugo.

[1] https://btrfs.wiki.kernel.org/index.php/Btree_Items
[2] https://btrfs.wiki.kernel.org/index.php/Data_Structures

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hail and greetings.  We are a flat-pack invasion force from ---   
 Planet Ikea. We come in pieces. 


signature.asc
Description: Digital signature


Re: open_ctree failed, unable to mount the fs

2011-01-07 Thread Hugo Mills
On Fri, Jan 07, 2011 at 08:01:47PM +0100, Tomasz Chmielewski wrote:
 I got a power cycle, after which I'm no longer able to mount btrfs
 filesystem:
 
 
 device fsid x-y devid 1 transid 169686 /dev/vda3
 device fsid x-y devid 1 transid 169686 /dev/vda3
 parent transid verify failed on 3260289024 wanted 169686 found 169685
 parent transid verify failed on 3260289024 wanted 169686 found 169685
 parent transid verify failed on 3260289024 wanted 169686 found 169685
 btrfs: open_ctree failed
 
 
 Tried to get that mounted with 2.6.35 and 2.6.37, without success.
 
 Is there a way to fix it?

   The forthcoming[1] btrfsck tool should handle that particular
error, I believe.

   To prevent it from happening again, ensure that you have working
barriers on your disks, or that you turn off write caching on the
drives at every boot.

   Hugo.

[1] out real soon now

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Well, sir, the floor is yours.  But remember, the ---
  roof is ours!  


signature.asc
Description: Digital signature


Re: Synching a Backup Server

2011-01-09 Thread Hugo Mills
On Sun, Jan 09, 2011 at 08:57:12PM +, Alan Chandler wrote:
 On 09/01/11 18:30, Hugo Mills wrote:
 
 No, subvolumes are a part of the whole filesystem. In btrfs, there
 is only one filesystem. There are 6 main B-trees that store metadata
 in btrfs (plus a couple of others). One of those is the filesystem
 tree (or FS tree), which contains all the metadata associated with
 the normal POSIX directory/file namespace (basically all the inode and
 xattr data). When you create a subvolume, a new FS tree is created,
 but it shares *all* of the other btrfs B-trees.
 
 There is only one filesystem, but there may be distinct namespaces
 within that filesystem that can be mounted as if they were
 filesystems. Think of it more like NFSv4, where there's one overall
 namespace exported per server, but clients can mount subsections of
 it.

 I think this explanation is still missing the key piece that has
 confused me despite trying very hard to understand it by reading the
 wiki.  You talk about Distinct Namespaces, but what I learnt from
 further up the thread is that this namespace is also inside the
 the namespace that makes up the whole filesystem.  I mount the
 whole filesystem, and all my subvolumes are automatically there (at
 least that is what I find in practice).  Its this duality of
 namespace that is the difficult concept.  I am still not sure of
 there is a default subvolume, and the other subvolumes are defined
 within its namespace, or whether there is an overall filesystem
 namespace and subvolumes defined within it and if you mount the
 default subvolume you would then lose the overall filesystem
 namespace and hence no longer see the subvolumes.

   There is a root subvolume namespace (subvolid=0), which may contain
files, directories, and other subvolumes. This root subvolume is what
you see when you mount a newly-created btrfs filesystem.

   The default subvolume is simply what you get when you mount the
filesystem without a subvol or subvolid parameter to mount. Initially,
the default subvolume is set to be the root subvolume. If another
subvolume is set to be the default, then the root subvolume can only
be mounted with the subvolid=0 mount option.

  I find the wiki
 also confusing because it talks about subvolumes having to be at the
 first level of the filesystem, but again further up this thread
 there is an example which is used for real of it not being at the
 first level, but at one level down inside a directory.

   Try it, see what happens, and fix the wiki where it's wrong? :)

   Or at least say what page this is on, and I can try the experiment
and fix it later...

 What it means is that I don't have a mental picture of how this all
 works, and all use cases could then be worked out by following this
 mental picture.  I think it would be helpful if the Wiki contained
 some of the use cases that we have been talking about in this thread
 - but with more detailed information - like the actual commands used
 to mount the filesystems like this, and information as to in what
 circumstances you would perform each action.

   I've written a chunk of text about how btrfs's storage, RAID and
subvolumes work. At the moment, though, the wiki is somewhat broken
and I can't actually create the page to put it on...

   There's also a page of recipes[1], which is probably the place that
the examples you mentioned should go.

 The main awkward piece of btrfs terminology is the use of RAID to
 describe btrfs's replication strategies. It's not RAID, and thinking
 of it in RAID terms is causing lots of confusion. Most of the other
 things in btrfs are, I think, named relatively sanely.
 
 I don't find this AS confusing, although there is still information
 missing which I asked in another post that wasn't answered.  I still
 can't understand if its possible to initialise a filesystem in
 degraded mode. If you create the filesystem so that -m RAID1 and -d
 RAID1 but only have one device - it implies that it writes two
 copies of both metadata and data to that one device.  However if you
 successfully create the filesystem on two devices and then fail one
 and mount it -o degraded it appears to suggest it will only write
 the one copy.

   From trying it a while ago, I don't think it is possible to create
a filesystem in degraded mode. Again, I'll try it again when I have
the time to do some experimentation and see what actually happens.

   Hugo.

[1] https://btrfs.wiki.kernel.org/index.php/UseCases

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- A clear conscience.  Where did you get this taste ---
 for luxuries,  Bernard? 


signature.asc
Description: Digital signature


Filesystem creation in degraded mode

2011-01-12 Thread Hugo Mills
   I've had a go at determining exactly what happens when you create a
filesystem without enough devices to meet the requested replication
strategy:

# mkfs.btrfs -m raid1 -d raid1 /dev/vdb
# mount /dev/vdb /mnt
# btrfs fi df /mnt
Data: total=8.00MB, used=0.00
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=153.56MB, used=24.00KB
Metadata: total=8.00MB, used=0.00

   The data section is single-copy-only; system and metadata are DUP.
This is good. Let's add some data:

# cp develop/linux-image-2.6.3* /mnt
# btrfs fi df /mnt
Data: total=315.19MB, used=250.58MB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=153.56MB, used=364.00KB
Metadata: total=8.00MB, used=0.00

   Again, much as expected. Now, add in a second device, and balance:

# btrfs dev add /dev/vdc /mnt
# btrfs fi bal /mnt
# btrfs fi df /mnt
Data, RAID0: total=1.20GB, used=250.58MB
System, RAID1: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=128.00MB, used=308.00KB

   This is bad, though. Data has reverted to RAID-0.

   Now, just to check, what happens when we create a filesystem with
enough devices, fail one, and re-add it?

# mkfs.btrfs -d raid1 -m raid1 /dev/vdb /dev/vdc
# mount /dev/vdb /mnt
# # Copy some data into it
# btrfs fi df /mnt
Data, RAID1: total=1.50GB, used=1.24GB
Data: total=8.00MB, used=0.00
System, RAID1: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=307.19MB, used=1.80MB
Metadata: total=8.00MB, used=0.00
# umount /mnt

   OK, so what happens if we fail one drive?

# dd if=/dev/zero of=/dev/vdb bs=1M count=16
# mount /dev/vdc /mnt -o degraded
# btrfs dev add /dev/vdd /mnt
# btrfs fi show
failed to read /dev/sr0
Label: none  uuid: 2495fe15-174f-4aaa-8317-c2cfb4dade1f
   Total devices 3 FS bytes used 1.25GB
   devid2 size 3.00GB used 1.81GB path /dev/vdc
   devid3 size 3.00GB used 0.00 path /dev/vdd
   *** Some devices missing

Btrfs Btrfs v0.19
# btrfs fi bal /mnt
# btrfs fi df /mnt
Data, RAID1: total=1.50GB, used=1.24GB
System, RAID1: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=128.00MB, used=1.41MB

   This looks all well and good. So it looks like it's just the
create-in-degraded-mode idea that doesn't work.

   Kernel is btrfs-unstable, up to 65e5341b (plus my balance-progress
patches, but those shouldn't affect this).

   Hugo.

PS. I haven't tried with RAID-10 yet, but I suspect that it'll be much
the same.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- You are demons,  and I am in Hell! Well, technically, it's ---  
   London,  but it's an easy mistake to make.   


signature.asc
Description: Digital signature


[PATCH RFC] Add ioctl for balancing a subset of the full filesystem.

2011-01-18 Thread Hugo Mills
   This is a patch purely for comment. There's several things wrong
with it that I need to fix (at minimum, it has too much debugging
output, the __balance_chunk_filters function takes the wrong set of
parameters to make it properly extensible, and the progress counter is
broken).

   I'm planning on adding at least two more filters, once this basic
infrastructure is reasonably stable: one to filter on a range of
(virtual) addresses, and one to work on device IDs (i.e. was any part
of this block group stored on device $n?).

   With the additional filters written, you'll be able to specify any
conjunctive set of filters. i.e. This block group is RAID1, *and* was
stored on devid 4. Disjunctions (or) aren't supported, and probably
won't be with this API. The filter data for additional filters will go
at the end of struct btrfs_ioctl_balance_start, ensuring extensibility
and backwards-compatibility (or at least, proper error reporting of
unsupported features).

   Questions for the panel:
   
 * Is the ioctl API reasonably sane, extensible, future-proof?
 * What other block group filters could be useful for this API?

   Hugo.

There are situations, such as restarting an interrupted balance, where
is not necessary or desired to balance all of the block groups in the
filesystem. This patch adds the basic infrastructure for filtering
block groups during a balance. It also adds a single filter method,
allowing the caller to select block groups with specific usage and
replication strategies.
---
 fs/btrfs/ioctl.c   |   44 +-
 fs/btrfs/ioctl.h   |   15 ++
 fs/btrfs/volumes.c |   76 +++
 fs/btrfs/volumes.h |3 +-
 4 files changed, 124 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6d50d24..a2dd60c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2243,6 +2243,46 @@ static noinline long btrfs_ioctl_wait_sync(struct file 
*file, void __user *argp)
return btrfs_wait_for_commit(root, transid);
 }
 
+/* Balance the filesystem unconditionally */
+long btrfs_ioctl_balance(struct btrfs_fs_info *fs_info)
+{
+   return btrfs_balance(fs_info-dev_root, NULL);
+}
+
+/* Balance particular chunks in the filesystem */
+long btrfs_ioctl_balance_filtered(
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_balance_start __user *user_filters)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_start *dest;
+
+   dest = kmalloc(sizeof(struct btrfs_ioctl_balance_start), GFP_KERNEL);
+   if (!dest)
+   return -ENOMEM;
+
+   if (copy_from_user(dest, user_filters, sizeof(struct 
btrfs_ioctl_balance_start))) {
+   ret = -EFAULT;
+   goto error;
+   }
+
+   printk(Starting balance with filter: %llx %llx %llx\n,
+  dest-flags, dest-chunk_type, dest-chunk_type_mask);
+
+   /* Basic sanity checking */
+   if (dest-flags  ~BTRFS_BALANCE_FILTER_MASK) {
+   ret = -ENOTSUPP;
+   goto error;
+   }
+
+   /* Do the balance */
+   ret = btrfs_balance(fs_info-dev_root, dest);
+
+error:
+   kfree(dest);
+   return ret;
+}
+
 /*
  * Return the current status of any balance operation
  */
@@ -2335,11 +2375,13 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_RM_DEV:
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
-   return btrfs_balance(root-fs_info-dev_root);
+   return btrfs_ioctl_balance(root-fs_info);
case BTRFS_IOC_BALANCE_PROGRESS:
return btrfs_ioctl_balance_progress(root-fs_info, argp);
case BTRFS_IOC_BALANCE_CANCEL:
return btrfs_ioctl_balance_cancel(root-fs_info);
+   case BTRFS_IOC_BALANCE_FILTERED:
+   return btrfs_ioctl_balance_filtered(root-fs_info, argp);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 4f73d11..7c0c69c 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -154,6 +154,19 @@ struct btrfs_ioctl_balance_progress {
__u64 completed;
 };
 
+/* Types of balance filter */
+#define BTRFS_BALANCE_FILTER_CHUNK_TYPE 0x1
+#define BTRFS_BALANCE_FILTER_MASK 0x1
+
+/* All the possible options for a filter */
+struct btrfs_ioctl_balance_start {
+   __u64 flags; /* Bit field indicating which fields of this struct are 
filled */
+
+   /* For FILTER_CHUNK_TYPE */
+   __u64 chunk_type;  /* Flag bits required */
+   __u64 chunk_type_mask; /* Mask of bits to examine */
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -201,4 +214,6 @@ struct btrfs_ioctl_balance_progress {
 #define BTRFS_IOC_BALANCE_PROGRESS 

[PATCH RFC] Initial implementation of userspace interface for filtered balancing.

2011-01-18 Thread Hugo Mills
   This is the userspace side of the filtered balance patch, again
purely for comment at this stage. The command-line invocation will
look something like this:

$ sudo btrfs fi bal --filter type=meta,~raid1 /mnt

   This will balance all metadata block groups that are not replicated
with RAID1. Once I've implemented additional filter types, they can be
specified with extra --filter options, with the semantics of and
between each --filter option.

   (Yes, Goffredo, I know I need to update the man pages for this
patch... :) )

   This patch, and the preceding kernel one, both apply on top of my
previous balance progress/cancel patches.

   Hugo.


It is useful to be able to balance a subset of the full filesystem.
This patch implements the infrastructure for filtering block groups on
different criteria when balancing the filesystem.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c  |4 +-
 btrfs_cmds.c |  132 --
 ioctl.h  |   15 +++
 3 files changed, 145 insertions(+), 6 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 7b42658..19b0e56 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -92,8 +92,8 @@ static struct Command commands[] = {
Show space usage information for a mount point\n.
},
{ do_balance, -1,
- filesystem balance, [-w|--wait] path\n
-   Balance the chunks across the device.
+ filesystem balance, [-w|--wait] [-f|--filter=filter:...] 
path\n
+   Balance chunks across the devices. --filter=help for help on 
filters.\n
},
{ do_balance, -1,
  balance start, [-w|--wait] path\n
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index fadcb4f..f7bd835 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -756,26 +756,74 @@ int do_add_volume(int nargs, char **args)
 
 const struct option balance_options[] = {
{ wait, 0, NULL, 'w' },
+   { filter, 1, NULL, 'f' },
{ NULL, 0, NULL, 0 }
 };
 
+struct filter_class_desc {
+   char *keyword;
+   char *description;
+   int flag;
+};
+
+const struct filter_class_desc filter_class[] = {
+   { type,
+ type=[~]flagname[,...]\n
+ \tWhere flagname is one of:\n
+ \t\tmeta, sys, data, raid0, raid1, raid10, dup\n
+ \tPrefix a flagname with ~ to negate the match.\n,
+ BTRFS_BALANCE_FILTER_CHUNK_TYPE },
+   { NULL, NULL, 0 }
+};
+
+struct type_filter_desc {
+   char *keyword;
+   __u64 mask;
+   __u64 set;
+   __u64 unset;
+};
+
+#define BTRFS_BLOCK_GROUP_SINGLE \
+   BTRFS_BLOCK_GROUP_RAID0 | \
+   BTRFS_BLOCK_GROUP_RAID1 | \
+   BTRFS_BLOCK_GROUP_RAID10 | \
+   BTRFS_BLOCK_GROUP_DUP
+
+const struct type_filter_desc type_filters[] = {
+   { data, BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_DATA, 0 },
+   { sys, BTRFS_BLOCK_GROUP_SYSTEM, BTRFS_BLOCK_GROUP_SYSTEM, 0 },
+   { meta, BTRFS_BLOCK_GROUP_METADATA, BTRFS_BLOCK_GROUP_METADATA, 0 },
+   { raid0, BTRFS_BLOCK_GROUP_RAID0, BTRFS_BLOCK_GROUP_RAID0, 0 },
+   { raid1, BTRFS_BLOCK_GROUP_RAID1, BTRFS_BLOCK_GROUP_RAID1, 0 },
+   { raid10, BTRFS_BLOCK_GROUP_RAID10, BTRFS_BLOCK_GROUP_RAID10, 0 },
+   { dup, BTRFS_BLOCK_GROUP_DUP, BTRFS_BLOCK_GROUP_DUP, 0 },
+   { single, BTRFS_BLOCK_GROUP_SINGLE, 0, BTRFS_BLOCK_GROUP_SINGLE },
+   { NULL, 0, 0, 0 }
+};
+
 int do_balance(int argc, char **argv)
 {
int fdmnt, ret=0;
int background = 1;
-   struct btrfs_ioctl_vol_args args;
+   struct btrfs_ioctl_balance_start *args;
char *path;
+   char *filters_string = NULL;
+   char *this_filter_string;
+   char *saveptr;
int ttyfd;
 
optind = 1;
while(1) {
-   int c = getopt_long(argc, argv, w, balance_options, NULL);
+   int c = getopt_long(argc, argv, wf:, balance_options, NULL);
if (c  0)
break;
switch(c) {
case 'w':
background = 0;
break;
+   case 'f':
+   filters_string = optarg;
+   break;
default:
fprintf(stderr, Invalid arguments for balance\n);
free(argv);
@@ -796,6 +844,82 @@ int do_balance(int argc, char **argv)
return 12;
}
 
+   args = malloc(4096);
+   if (!args) {
+   fprintf(stderr, ERROR: Not enough memory\n);
+   return 13;
+   }
+
+   /* Parse the filters string, if there is one */
+   this_filter_string = strtok_r(filters_string, :, saveptr);
+   while(this_filter_string) {
+   char *subsave;
+   char *part;
+   char *type = strtok_r(this_filter_string, =,, subsave);
+   int class_id = -1;
+
+   /* Work out what filter type we're looking at */
+   if(strcmp(type

Re: Possible Kernel BUG regarding BTRFS

2011-01-20 Thread Hugo Mills
 
 Jan 19 20:05:00 Desktop kernel: [ 2091.228432] 0 88023649dd68
 0003Jan 19 20:06:12 Desktop kernel: imklog 4.2.0, log
 source = /proc/kmsg started.
 
 /var/log/kern.log
 Jan 19 20:05:00 Desktop kernel: [ 2091.228274] device fsid
 b849836048fddcda-fdb584bb7dae7bb1 devid 1 transid 97123 /dev/sdb2
 Jan 19 20:05:00 Desktop kernel: [ 2091.228294] BUG: unable to handle
 kernel NULL pointer dereference at 0128
 Jan 19 20:05:00 Desktop kernel: [ 2091.228298] IP: []
 btrfs_test_super+0x10/0x30 [btrfs]
 Jan 19 20:05:00 Desktop kernel: [ 2091.228309] PGD 2338f8067 PUD 235875067 
 PMD 0
 Jan 19 20:05:00 Desktop kernel: [ 2091.228313] Oops:  [#2] SMP
 Jan 19 20:05:00 Desktop kernel: [ 2091.228316] last sysfs file:
 /sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda2/uevent
 Jan 19 20:05:00 Desktop kernel: [ 2091.228319] CPU 7
 Jan 19 20:05:00 Desktop kernel: [ 2091.228320] Modules linked in:
 btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs
 vfat msdos fat jfs xfs exportfs reiserfs cryptd aes_x86_64 aes_generic
 xt_multiport binfmt_misc parport_pc ppdev dm_crypt
 snd_hda_codec_atihdmi snd_hda_codec_realtek ipt_REJECT xt_comment
 xt_limit xt_tcpudp ipt_addrtype xt_state ip6table_filter ip6_tables
 nf_nat_irc snd_hda_intel nf_conntrack_irc nf_nat_ftp nf_nat
 snd_hda_codec nf_conntrack_ipv4 snd_hwdep nf_defrag_ipv4 snd_seq_midi
 snd_pcm snd_rawmidi nf_conntrack_ftp nf_conntrack snd_seq_midi_event
 iptable_filter snd_seq gspca_zc3xx gspca_main ip_tables snd_timer
 snd_seq_device x_tables psmouse videodev v4l1_compat
 v4l2_compat_ioctl32 serio_raw snd i7core_edac soundcore snd_page_alloc
 edac_core lp parport hid_apple usbhid hid radeon firewire_ohci ttm
 firewire_core drm_kms_helper crc_itu_t pata_jmicron ahci usb_storage
 r8169 libahci mii drm i2c_algo_bit
 Jan 19 20:05:00 Desktop kernel: [ 2091.228381]
 Jan 19 20:05:00 Desktop kernel: [ 2091.228384] Pid: 3248, comm: mount
 Tainted: G D 2.6.35-24-generic #42-Ubuntu MSI X58 Pro (MS-7522)
 /MS-7522
 Jan 19 20:05:00 Desktop kernel: [ 2091.228387] RIP: 0010:[] []
 btrfs_test_super+0x10/0x30 [btrfs]
 Jan 19 20:05:00 Desktop kernel: [ 2091.228395] RSP:
 0018:88023649dd18 EFLAGS: 00010283
 Jan 19 20:05:00 Desktop kernel: [ 2091.228397] RAX: 
 RBX: a05cd000 RCX: 880236918d80
 Jan 19 20:05:00 Desktop kernel: [ 2091.228400] RDX: 81154a00
 RSI: 880236918d80 RDI: 880204a4e800
 Jan 19 20:05:00 Desktop kernel: [ 2091.228402] RBP: 88023649dd18
 R08:  R09: 0001
 Jan 19 20:05:00 Desktop kernel: [ 2091.228404] R10: 880236918deJan
 19 20:06:12 Desktop kernel: imklog 4.2.0, log source = /proc/kmsg
 started.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Someone's been throwing dead sheep down my Fun Well ---   


signature.asc
Description: Digital signature


Re: Btrfs balance

2011-01-20 Thread Hugo Mills
On Thu, Jan 20, 2011 at 03:53:41PM +0100, Andreas Philipp wrote:
 On 20.01.2011 14:39, Hugo Mills wrote:
  On Thu, Jan 20, 2011 at 02:07:23PM +0100, Andreas Philipp wrote:
  Hi,
 
  Maybe it is a very stupid question but I want to ask it anyway. In
  general, 'btrfs filesystem balance' takes very long to finish and
  produces lots of IO. So what are the classical usage scenarios, when
  it is (really) worth doing a balance?
 The primary use-cases for balancing are to even out the filesystem
  after adding, removing or changing the size of one of the underlying
  volumes.
 Ok, so this is a little bit like for example resyncing a classical
 raid after it was in degraded mode etc.

   Pretty much exactly that.

 It will also be of use when we finally get around to allowing you
  to change RAID settings on the whole volume, to implement the
  requested changes to the RAID level.
 
 Definitely, a nice feature.
 I'm in the process of implementing balance filters, so that some
  other cases where balancing is useful (reclaiming unused block groups)
  can be run more efficiently by only balancing the bits that need
  doing.
 I have seen your post on balance filters. So then it will be (much)
 faster just because less is done? 

   Yes, that's the idea. If you've lost and replaced a drive from a
2-drive RAID-1 array, there's not much that filters can do for you:
all your data will have to be read and rebuilt. However, if you're
changing just your metadata from DUP to RAID-1, say, or recovering
from the loss of one drive in an 8-drive RAID-1 array, it should be an
awful lot faster with filters.

 When you have a version for trying it out and you need someone for
 testing I will give it a try.

   Thanks. I've got quite a bit reworked now to support multiple
filter types, but I need to do a full review of what I'm doing, and
test it myself first. I probably won't have much time to work on it
before Monday, now.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- emacs: Eats Memory and Crashes. --- 


signature.asc
Description: Digital signature


Re: Encryption

2011-01-20 Thread Hugo Mills
On Thu, Jan 20, 2011 at 07:05:52AM -0800, Carl Cook wrote:
 
 Does BTRFS have subvolume encryption built in?  If not, why?

   Not at the moment.

   My opinion on why: Getting crypto right is *hard*. There are far
easier features that people are asking for that we can implement
first.

   There may be technical issues that make it hard to implement within
btrfs, although being able to do compression is harder from a FS
structure point of view, so I suspect that the issues are more about
ensuring correctness of the crypto implementation (not just the basic
symmetric algorithm, because we've got those in the kernel, but all
the key management and block chaining and probably a bunch of things I
don't know about because I'm not a cryptographer -- all of which makes
a big difference to the security of the final system).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Once is happenstance; twice is coincidence; three times --- 
is enemy action. 


signature.asc
Description: Digital signature


Re: Shrinking virtual disk with btrfs on it

2011-01-21 Thread Hugo Mills
On Fri, Jan 21, 2011 at 10:20:34AM -0700, Rodney Beede wrote:
 Any tools to go about zeroing about the free space on a btrfs file
 system so I can shrink the VMware vmdk virtual disk?
 
 I ran the VMware command, but the dynamic disk is still really big.  I
 presume it is due to free space that isn't zeroed out.

   One solution I've used before is to write a single very large file
full of zeroes, filling the filesystem, then delete it.

$ dd if=/dev/zero of=/mountpoint/foo.dat  rm /mountpoint/foo.dat

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: Synching a Backup Server

2011-01-22 Thread Hugo Mills
On Fri, Jan 21, 2011 at 11:28:19AM -0800, Freddie Cash wrote:
 On Sun, Jan 9, 2011 at 10:30 AM, Hugo Mills hugo-l...@carfax.org.uk wrote:
  On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote:
  Let see if I can match up the terminology and layers a bit:
 
  LVM Physical Volume == Btrfs disk == ZFS disk / vdevs
  LVM Volume Group == Btrfs filesystem == ZFS storage pool
  LVM Logical Volume == Btrfs subvolume == ZFS volume
  'normal' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem
 
  Does that look about right?
 
    Kind of. The thing is that the way that btrfs works is massively
  different to the way that LVM works (and probably massively different
  to the way that ZFS works, but I don't know much about ZFS, so I can't
  comment there). I think that trying to think of btrfs in LVM terms is
  going to lead you to a large number of incorrect conclusions. It's
  just not a good model to use.
 
 My biggest issue trying to understand Btrfs is figuring out the layers 
 involved.
 
 With ZFS, it's extremely easy:
 
 disks -- vdev -- pool -- filesystems
 
 With LVM, it's fairly easy:
 
 disks - volume group -- volumes -- filesystems
 
 But, Btrfs doesn't make sense to me:
 
 disks -- filesystem -- sub-volumes???
 
 So, is Btrfs pooled storage or not?  Do you throw 24 disks into a
 single Btrfs filesystem, and then split that up into separate
 sub-volumes as needed?

   Yes, except that the subvolumes aren't quite as separate as you
seem to think that they are. There's no preallocation of storage to a
subvolume (in the way that LVM works), so you're only limited by the
amount of free space in the whole pool. Also, data stored in the pool
is actually free for use by any subvolume, and can be shared (see the
deeper explanation below).

  From the looks of things, you don't have to
 partition disks or worry about sizes before formatting (if the space
 is available, Btrfs will use it).  But it also looks like you still
 have to manage disks.
 
 Or, maybe it's just that the initial creation is done via mkfs (as in,
 formatting a partition with a filesystem) that's tripping me up after
 using ZFS for so long (zpool creates the storage pool, manages the
 disks, sets up redundancy levels, etc;  zfs creates filesystems and
 volumes, and sets properties; no newfs/mkfs involved).

   So potentially zpool - mkfs.btrfs, and zfs - btrfs. However, I
don't know enough about ZFS internals to know whether this is a
reasonable analogy to make or not.

 It looks like ZFS, Btrfs, and LVM should work in similar manners, but
 the overloaded terminology (pool, volume, sub-volume, filesystem are
 different in all three) and new terminology that's only in Btrfs is
 confusing.
 
  Just curious, why all the new terminology in btrfs for things that
  already existed?  And why are old terms overloaded with new meanings?
  I don't think I've seen a write-up about that anywhere (or I don't
  remember it if I have).
 
    The main awkward piece of btrfs terminology is the use of RAID to
  describe btrfs's replication strategies. It's not RAID, and thinking
  of it in RAID terms is causing lots of confusion. Most of the other
  things in btrfs are, I think, named relatively sanely.
 
 No, the main awkward piece of btrfs terminology is overloading
 filesystem to mean collection of disks and creating sub-volume
 to mean filesystem.  At least, that's how it looks from way over
 here.  :)

   As I've tried to explain, that's the wrong way of looking at it.
Let me have another go in more detail.

   There's *one* filesystem. It contains:

 - *One* set of metadata about the underlying disks (the dev tree).
 - *One* set of metadata about the distribution of the storage pool on those 
disks (the chunk tree)
 - *One* set of metadata about extents within that storage pool (the extent 
tree).
 - *One* set of metadata about checksums for each 4k chunk of data within an 
extent (the checksum tree).
 - *One* set of metadata about where to find all the other metadata (the root 
tree).

   Note that an extent is a sequence of blocks which is both
contiguous on disk, and contiguous within one *or more* files.

   In addition to the above globally-shared metadata, there are
multiple metadata sets, each representing a mountable namespace --
these are the subvolumes. Each of these subvolumes holds a directory
structure, and all of the POSIX information for each file name within
that structure. For each file within a subvolume, there's a sequence
of pointers to the shared extent pool, indicating what blocks on disk
are actually holding the data for that file.

   Note that the actual file data, and the management of its location
on the disk (and its replication), is completely shared across
subvolumes. The same extent may be used multiple times by different
files, and those files may be in any subvolumes on the filesystem. In
theory, the same extent could even appear several times in the same
file. This sharing is how snapshots and COW copies

Re: v0.19-35-g1b444cd btrfsck says snapshots have errors

2011-01-23 Thread Hugo Mills
On Sun, Jan 23, 2011 at 05:44:34AM -0500, Ian! D. Allen wrote:
 On Fri, Jan 21, 2011 at 09:15:49AM +0800, Yan, Zheng  wrote:
  On Fri, Jan 21, 2011 at 6:52 AM, Ian! D. Allen idal...@idallen.ca wrote:
   Still getting btrfsck errors with this:
   git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
   unresolved ref root 256 dir 256 index 2 namelen 5 name snap1 error 600
   found 49152 bytes used err is 1
  These is caused by a design flaw, you can safely ignore them.
 
 If it isn't an error, shouldn't btrfsck be ignoring it, not me?
 At minimum it could say warning and not err is 1.

   Yes, it probably should, but there's not a great deal of point in
fixing this particular issue, because Chris is working on the all-new
(offline) repairing fsck, which should replace the current
checking-only fsck very soon now.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- O tempura! O moresushi! --- 


signature.asc
Description: Digital signature


Re: Bug in mkfs.btrfs?!

2011-01-23 Thread Hugo Mills
   Hi, Felix,

On Sat, Jan 22, 2011 at 04:56:12PM +0100, Felix Blanke wrote:
 It was a simple:
 
 mkfs.btrfs -L backup -d single /dev/loop2
 
 But it also happens without the options, like:
 
 mkfs.btrfs /dev/loop2
 
 
 /dev/loop2 is a loop device, which is aes encrypted. The output of losetup 
 /dev/loop2:
 
 /dev/loop2: [0010]:5324 
 (/dev/disk/by-id/ata-WDC_WD6400AAKS-22A7B2_WD-WCASY7780706-part3) 
 encryption=AES128
 
 
 Thanks you for looking into this!
 While writing this I read your second mail. The strace output is attached.

   OK, I've traced through the functions being called, and I really
can't see where it could be truncating the name, unless your system
has a stupidly small value of PATH_MAX.

   Can you apply the following patch (to the next branch of the
btrfs-progs git repo), rebuild, and try again? It's just adding some
debugging output to track what it's looking at.

   Hugo.


diff --git a/mkfs.c b/mkfs.c
index 2e99b95..51a5096 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -422,6 +422,7 @@ int main(int ac, char **av)
printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n);
 
file = av[optind++];
+   printf(Checking whether %s is part of a mounted filesystem\n, file);
ret = check_mounted(file);
if (ret  0) {
fprintf(stderr, error checking %s mount status\n, file);
diff --git a/utils.c b/utils.c
index fd894f3..7fa3149 100644
--- a/utils.c
+++ b/utils.c
@@ -610,12 +610,16 @@ int resolve_loop_device(const char* loop_dev, char* 
loop_file, int max_len)
int ret_ioctl;
struct loop_info loopinfo;
 
+   printf(Resolving loop device %s (length %d)\n, loop_dev, max_len);
+
if ((loop_fd = open(loop_dev, O_RDONLY))  0)
return -errno;
 
ret_ioctl = ioctl(loop_fd, LOOP_GET_STATUS, loopinfo);
close(loop_fd);
 
+   printf(Loop name = %s\n, loopinfo.lo_name);
+
if (ret_ioctl == 0)
strncpy(loop_file, loopinfo.lo_name, max_len);
else
@@ -639,6 +643,9 @@ int is_same_blk_file(const char* a, const char* b)
return -errno;
}
 
+   printf(Realpath of %s was %s\n, a, real_a);
+   printf(Realpath of %s was %s\n, b, real_b);
+
/* Identical path? */
if(strcmp(real_a, real_b) == 0)
return 1;
@@ -680,6 +687,9 @@ int is_same_loop_file(const char* a, const char* b)
const char* final_b;
int ret;
 
+   printf(is_same_loop_file: %s and %s\n, a, b);
+   printf(PATH_MAX = %d\n, PATH_MAX);
+
/* Resolve a if it is a loop device */
if((ret = is_loop_device(a))  0) {
   return ret;
@@ -784,8 +794,10 @@ int check_mounted(const char* file)
if(strcmp(mnt-mnt_type, btrfs) != 0)
continue;
 
+   printf(Testing if btrfs device is in the dev list: 
%s\n, mnt-mnt_fsname);
ret = blk_file_in_dev_list(fs_devices_mnt, 
mnt-mnt_fsname);
} else {
+   printf(Testing if non-btrfs device is block or 
regular: %s\n, mnt-mnt_fsname);
/* ignore entries in the mount table that are not
   associated with a file*/
if((ret = is_existing_blk_or_reg_file(mnt-mnt_fsname)) 
 0)
diff --git a/volumes.c b/volumes.c
index 7671855..2496fbd 100644
--- a/volumes.c
+++ b/volumes.c
@@ -130,6 +130,8 @@ static int device_list_add(const char *path,
device-fs_devices = fs_devices;
}
 
+   printf(Device added with name %s\n, device-name);
+
if (found_transid  fs_devices-latest_trans) {
fs_devices-latest_devid = devid;
fs_devices-latest_trans = found_transid;
@@ -223,6 +225,7 @@ int btrfs_scan_one_device(int fd, const char *path,
*total_devs = btrfs_super_num_devices(disk_super);
uuid_unparse(disk_super-fsid, uuidbuf);
 
+   printf(Adding device %s to list\n, path);
ret = device_list_add(path, disk_super, devid, fs_devices_ret);
 
 error_brelse:


-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Doughnut furs ache me, Omar Dorlin. ---   


signature.asc
Description: Digital signature


Re: Cannot Create Partition

2011-01-23 Thread Hugo Mills
On Sun, Jan 23, 2011 at 10:07:54AM -0800, cac...@quantum-sci.com wrote:
 
 On /dev/sda I have sda1 which is my / bootable filesystem for Debian 
 formatted ext4.  This is 256MB on a 2TB drive.
 
 I want to set up the rest of the drive as BTRFS for various functions, and I 
 presume that I first have to create a partition using fdisk for this?  Since 
 my first part is ext4?  So I:
 # fdisk /dev/sda
 WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk 
 doesn't support GPT. Use GNU Parted.

   I think the above may be the root cause of your problem. You're
using the new GPT partition table format, not the traditional DOS one,
and fdisk is claiming that it can't handle it.

 WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
  switch off the mode (command 'c') and change display units to
  sectors (command 'u').
 Command (m for help): p
 Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
 255 heads, 63 sectors/track, 243201 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 Disk identifier: 0x
Device Boot  Start End  Blocks   Id  System
 /dev/sda1   1  243202  1953514583+  ee  GPT
 Command (m for help): n
 Command action
e   extended
p   primary partition (1-4)
 p
 Partition number (1-4): 2
 No free sectors available
 Command (m for help):
 -
 Whaa?
 
 Maybe it's possible that I just mkfs.btrfs /dev/sda and it will set
 up -only- the remaining space, but I'm afraid that this may destroy
 my OS.

   No, that will almost certainly destroy your existing partitioning,
and hence, as you say, your OS install.

 Also, what if I want to set up the whole drive as BTRFS?  Could this
 be bootable, and can the canned Debian kernel load the BTRFS driver
 for boot at install?  Or would I boot to the CD, mkfs.btrfs the
 drive, then install Debian?  Anyone tried this?

   As far as I know, GRUB2 doesn't yet support btrfs (although there
was some work done on it, I don't know what the status of that work
is). This means that you need a filesystem of some other type to boot
off -- even if it only holds the contents of /boot. There are
certainly people around who've done this, although I'm not one of
them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Dullest spy film ever: The Eastbourne Ultimatum ---


signature.asc
Description: Digital signature


Re: Bug in mkfs.btrfs?!

2011-01-23 Thread Hugo Mills
On Sun, Jan 23, 2011 at 11:02:16PM +0100, Goffredo Baroncelli wrote:
 On 01/23/2011 07:18 PM, Hugo Mills wrote:
 Hi, Felix,
  
  On Sat, Jan 22, 2011 at 04:56:12PM +0100, Felix Blanke wrote:
  It was a simple:
 
  mkfs.btrfs -L backup -d single /dev/loop2
 
  But it also happens without the options, like:
 
  mkfs.btrfs /dev/loop2
 
 
  /dev/loop2 is a loop device, which is aes encrypted. The output of 
  losetup /dev/loop2:
 
  /dev/loop2: [0010]:5324 
  (/dev/disk/by-id/ata-WDC_WD6400AAKS-22A7B2_WD-WCASY7780706-part3) 
  encryption=AES128
 
 
  Thanks you for looking into this!
  While writing this I read your second mail. The strace output is attached.
  
 OK, I've traced through the functions being called, and I really
  can't see where it could be truncating the name, unless your system
  has a stupidly small value of PATH_MAX.
 
 It seems that when mkfs.btrfs checks if the passed block device is
 already mounted, uses the ioctl LOOP_GET_STATUS [1]. This ioctl has as
 argument the struct loop_info.
 
 This ioctl, should return the info about the back-end of the loop
 device. The file name is returned via the lo_name field, which is an
 array of 64 char...[2]

   Good catch, Goffredo. I completely missed that.

   Interestingly, on my system, lo_name is indeed defined as 64 chars,
but I don't see Felix's problem. When I do losetup on the
/dev/disk/by-id/... link, my version of losetup seems to be following
the link:

# losetup /dev/loop1 
/dev/disk/by-id/dm-uuid-LVM-XRQLHQNa0xEeIZL4ofuBGIcfkr1Dhry8YHhkjaw4bvZA4meDFQfEMy5elIsVNeWl
 
# losetup -a
/dev/loop1: [0005]:1423915 (/dev/mapper/ruthven-btemp)

   I'm running Debian, and the mount package version 2.17.2-5 (losetup
is part of mount, it seems).

 Felix, what is the output of the following command ?
 
   /sbin/losetup -a
 
 If my analysis is correct, this command should return the filename
 trunked at the 64th character too.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Sometimes, when I'm alone, I Google myself. ---   


signature.asc
Description: Digital signature


Re: Bug in mkfs.btrfs?!

2011-01-24 Thread Hugo Mills
On Mon, Jan 24, 2011 at 02:29:36PM +, Hugo Mills wrote:
If, instead, the initial losetup call tracked the symlinks back to
 the original device node (i.e. something like /dev/sdb3, or
 /dev/mapper/ruthven-btest in my example), then the name that's
 stored in the kernel would be shorter, and we'd be less likely to see
 the truncation. This is what my copy of losetup seems to be doing. I
 can't see any distribution-specific patches in the source for
 util-linux that would do this, though.

   Hmm... Just had a thought: is
/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GC_CVPO939201JX160AGN-part3 on
your system a symlink or a device node? What does ls -l say?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- But people have always eaten people,  / what else is there to ---  
 eat?  / If the Juju had meant us not to eat people / he 
 wouldn't have made us of meat.  


signature.asc
Description: Digital signature


Re: Kernel error during btrfs balance

2011-01-26 Thread Hugo Mills
On Wed, Jan 26, 2011 at 10:04:02AM +0100, Erik Logtenberg wrote:
 Hi,
 
 It took me a couple of days, because I needed to patch my kernel first
 and then issue a rebalance, which ran for more than two days.
 Nevertheless, the rebalance succeeded without any kernel BUG-messages,
 so apparently your patch works!
 
 I noticed that at first, the messages were like this:
 
 [79329.526490] btrfs: found 1939 extents
 [79375.950834] btrfs: found 1939 extents
 [79376.083599] btrfs: relocating block group 352220872704 flags 1
 [80052.940435] btrfs: found 3786 extents
 [80108.439657] btrfs: found 3786 extents
 [80112.325548] btrfs: relocating block group 351147130880 flags 1
 
 Just like I saw during previous balance-runs. Then all of a sudden the
 messages changed to:
 
 [104178.827594] btrfs allocation failed flags 1, wanted 2013265920
 [104178.827599] space_info has 4271198208 free, is not full
 [104178.827602] space_info total=214748364800, used=210440957952,
 pinned=0, reserved=36208640, may_use=3168993280, readonly=0
 [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144
 used 0 pinned 0 reserved
 [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes
 [104178.827612] entry offset 1855827968, bytes 20480, bitmap no
 [104178.827614] entry offset 1855852544, bytes 20480, bitmap no
 [104178.827617] block group has cluster?: no
 [104178.827618] 0 blocks of free space at or bigger than bytes is
 [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024
 used 0 pinned 0 reserved
 [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes
 [104178.827626] block group has cluster?: no
 [104178.827628] 0 blocks of free space at or bigger than bytes is
 [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120
 used 0 pinned 0 reserved
 [104178.827634] block group has cluster?: no
 
 And so on.
 
 Does this indicate an error of any sort, or is this expected behaviour?

   As far as I know, it means that you've run out of space, and not
every block group has been rewritten by the balance process.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- In one respect at least, the Martians are a happy people: ---
  they have no lawyers.  


signature.asc
Description: Digital signature


Re: Corrupt filesystem after power failure

2011-02-10 Thread Hugo Mills
:40 linux-wuce kernel: [  341.617173] Call Trace:
Feb 10 21:57:40 linux-wuce kernel: [  341.617309]  [a07d66e8] 
replay_one_buffer+0x2e8/0x3b0 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617421]  [a07d3d85] 
walk_down_log_tree+0x375/0x540 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617529]  [a07d4053] 
walk_log_tree+0x103/0x280 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617637]  [a07d8223] 
btrfs_recover_log_trees+0x223/0x310 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617748]  [a079f049] 
open_ctree+0x1269/0x18e0 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617793]  [a077cc0e] 
btrfs_get_sb+0x31e/0x430 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617809]  [811271e0] 
vfs_kern_mount+0x80/0x210
Feb 10 21:57:40 linux-wuce kernel: [  341.617819]  [811273e3] 
do_kern_mount+0x53/0x130
Feb 10 21:57:40 linux-wuce kernel: [  341.617829]  [81141f20] 
do_mount+0x200/0x250
Feb 10 21:57:40 linux-wuce kernel: [  341.617839]  [8114205a] 
sys_mount+0x9a/0xf0
Feb 10 21:57:40 linux-wuce kernel: [  341.617851]  [81002ffb] 
system_call_fastpath+0x16/0x1b
Feb 10 21:57:40 linux-wuce kernel: [  341.617863]  [7fb2ed4f0ffa] 
0x7fb2ed4f0ffa
Feb 10 21:57:40 linux-wuce kernel: [  341.617866] Code: f4 52 96 e0 31 c0 48 81 
c4 98 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 b8 fe ff ff ff eb e7 0f 0b 0f 
0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 ec 28 
01 00 00 48
Feb 10 21:57:40 linux-wuce kernel: [  341.617921] RIP  [a07d5eb3] 
add_inode_ref+0x4a3/0x4b0 [btrfs]
Feb 10 21:57:40 linux-wuce kernel: [  341.617941]  RSP 8800b09af8b8
Feb 10 21:57:40 linux-wuce kernel: [  341.617972] ---[ end trace 
22bed547f3298140 ]---
Feb 10 21:57:45 linux-wuce kernel: [  346.284240] rtl8192se_update_ratr_table: 
ratr_index=0 ratr_table=0x0ff5
Feb 10 21:58:08 linux-wuce kernel: [  369.474197] 
SetHwReg8192SE():HW_VAR_AC_PARAM eACI:0:a425

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Someone's been throwing dead sheep down my Fun Well ---   


signature.asc
Description: Digital signature


Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?

2011-02-13 Thread Hugo Mills
On Sun, Feb 13, 2011 at 05:49:42PM +0200, Marti Raudsepp wrote:
 Hi list!
 
 It seems I have found a serious regression in compressed btrfs in
 kernel 2.6.37. When creating a small file (less than the block size)
 and then cp/mv it to *another* file system, an appropriate number of
 zeroes gets written to the destination file. Case in point:

[snip]

 I'm currently running on 2.6.37, x86_64 using Arch Linux -testing with
 coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o
 noatime,compress
 
 This only seems to occur with compressed file systems (either zlib or
 LZO). A person on IRC also reproduced the same problem in 2.6.28-rc.
 I'm pretty sure this used to work correctly around 2.6.35 or 2.6.36.

   This would seem to be the same effect that we've had reported on
IRC by at least two Gentoo users, of files full of zeroes in their
build system. We'll follow up with them over there and see if it's the
same bug.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I must be musical:  I've got *loads* of CDs ---   


signature.asc
Description: Digital signature


Re: Question on subvolumes and mount options

2011-02-13 Thread Hugo Mills
On Sun, Feb 13, 2011 at 05:46:46PM +0100, Yuri D'Elia wrote:
 Hi everyone, I'm experimenting with btrfs but I have some question
 regarding subvolumes.
 
 First: In the / filesystem I create a subvolume named /home. As soon as
 the subvolume is created, I can already see the entry point in /home
 without having to mount it separately. Is that expected?

   Yes.

 Mounting the subvolume with mount -o subvol=home /dev/x /home also works
 as expected.
 
 So, which is best? Looks like mounting subvolumes is not necessary.

   I would recommend putting nothing in the root of the filesystem
*except* subvolumes. i.e. create a root subvolume in / that contains
your root filesystem, and make that the default. Then you can mount
your btrfs root subvolume (i.e. the thing that contains all the other
subvolumes) somewhere like /media/btrfs-root, for purposes of managing
subvolumes.

 Is it possible to change mount options in a subvolume? Suppose I would
 like to use nodatasum except for /home, will the following work?
 
 mount -o nodatasum /dev/x /
 btrfs subvolume create /home
 mount -o datasum,subvol=home /dev/x

   I'd expect that to work, although I haven't tried it myself.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I always felt that as a C programmer, I --- 
 was becoming typecast.  


signature.asc
Description: Digital signature


Re: Question on subvolumes and mount options

2011-02-13 Thread Hugo Mills
On Sun, Feb 13, 2011 at 06:49:58PM +0100, Yuri D'Elia wrote:
 On Sun, 13 Feb 2011 17:30:59 +, Hugo Mills wrote:
  First: In the / filesystem I create a subvolume named /home. As soon as
  the subvolume is created, I can already see the entry point in /home
  without having to mount it separately. Is that expected?
 
 Yes.
 
 What happens if I mount the home subvolume into a different point, like:
 
 mount -o subvol=home /home2
 
 and then change a file in /home (which is accessible through the default
 subvolume)?
 
 Will the change be reflected on both mount points? Or the inverse
 (change /home2)?

   Yes, it's the same piece of storage, just appearing at more than
one point in your overall filesystem. Similar to the way that bind
mounts work.

  So, which is best? Looks like mounting subvolumes is not necessary.
 
 I would recommend putting nothing in the root of the filesystem
  *except* subvolumes. i.e. create a root subvolume in / that contains
  your root filesystem, and make that the default. Then you can mount
  your btrfs root subvolume (i.e. the thing that contains all the other
  subvolumes) somewhere like /media/btrfs-root, for purposes of managing
  subvolumes.
 
 So you would recommend creating both /root and /home subvolumes, to be
 mounted separately, or create /root and /root/home subvolumes?

   The former.

  like to use nodatasum except for /home, will the following work?
  
  mount -o nodatasum /dev/x /
  btrfs subvolume create /home
  mount -o datasum,subvol=home /dev/x
 
 I'd expect that to work, although I haven't tried it myself.
 
 What if I remount the /home subvol into /home2. What happens when I
 touch a file through /home (nodatasum) and what happens when I use
 /home2 - since both are available at the same time?

   They'll stay in sync with respect to the files written to either
one. I'm not sure what the behaviour of nodatasum is with different
mounts of the same subvolume.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: Copied Files from Btrfs partition larger than original

2011-02-14 Thread Hugo Mills
On Mon, Feb 14, 2011 at 12:41:46AM -0800, MOB wrote:
 So as I'm doing some maintenance on my personal video server, I'm noticing 
 that when I'm copying files off of my btrfs partitions, they are getting 
 larger...
 
 First partition is the original:
 http://pastebin.com/GM5xWetR
 
 I have 3 affected partitions, This appears to have started with 2.6.37 but 
 could have started happening before. I have ~3300 video files where ~840 are 
 on btrfs partitions that randomly get shuffled on/off for free space 
 distribution

   Pastebins aren't forever. For the archives:

-- (begin)
ls -lah /mnt/store-p00/1280x720/~NCIS\ Los\ Angeles~2010-05-11~720.mov
-rw-rw-r-- 1 root hdhr 1.7G Nov  6 18:39 /mnt/store-p00/1280x720/~NCIS Los 
Angeles~2010-05-11~720.mov


ls -lah /hdhr/demux/1280x720/~NCIS Los Angeles~2010-05-11~720.mov
-rw-rw-r-- 1 root hdhr 3.7G Nov  6 18:39 /hdhr/demux/1280x720/~NCIS Los 
Angeles~2010-05-11~720.mov
-- (end)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Attempted murder, now honestly, what is that?  Do they give a ---  
  Nobel Prize for attempted chemistry?   


signature.asc
Description: Digital signature


Re: combining 2 RAID 10 pools to one filesystem

2011-02-16 Thread Hugo Mills
On Wed, Feb 16, 2011 at 10:50:57PM +0200, Gal Buki wrote:
 I have RAID 10 using 4 times 500GB drives (1TB of storage).
 Is it possible to create another RAID 10 with 4 times 250GB drives
 (500GB of storage) and then combine those two RAIDs to one file
 system so that I would be able to get 1.5TB?
 
 If I create one RAID 10 with all 8 drives I would only be able to
 use 8 times 250GB /2 = 1TB, right?

   No, just add the new drives to the existing btrfs pool, and run a
balance, and you should get a btrfs filesystem with 1.5TB of
mirrored/striped storage.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Anyone who claims their cryptographic protocol is secure is ---   
 either a genius or a fool.  Given the genius/fool ratio 
 for our species,  the odds aren't good. 


signature.asc
Description: Digital signature


Re: Space used by snapshot

2011-02-17 Thread Hugo Mills
On Thu, Feb 17, 2011 at 12:13:53PM +0100, Roman Kapusta wrote:
 Hello all,
 
 Is there any way how to obtain information how much space is
 physically allocated by given subvolume?
 I cannot find any. I'm interested in two values:
 
 - physical space allocated by SUBVOLUME INCLUDING all space shared by
 other subvolumes
 
 - physical space allocated by SUBVOLUME EXCLUDING all space shared by
 other subvolumes
 
 Currently I can use only du, which is not reporting what I want to know.

   Not at the moment. It shouldn't be too difficult to implement
(certainly to implement the latter), but it's just not been done yet.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- What do you give the man who has everything? -- Penicillin is ---  
 a good start... 


signature.asc
Description: Digital signature


Re: raid5 - again

2011-02-19 Thread Hugo Mills
On Sat, Feb 19, 2011 at 10:11:30PM +0100, Roy Sigurd Karlsbakk wrote:
 It's been some two years since I read about the becoming of raid5
 etc in btrfs. Since the code is available in linux, why isn't this
 already in btrfs? Is Oracle holding back?

   It's about resourcing and stability. Oracle only employ one person
(AFAIK) on btrfs -- Chris Mason. He does a sterling job of maintaining
and developing the filesystem, but there is only one of him.

   Since well before December, he's been working on a functional fsck,
trying to get it to a state where it won't demolish your filesystem
even more than it already is. This has left the work to integrate the
RAID-5/6 patches behind. He's also been working hard on fixing a great
many other stability issues as they're reported, and integrating
patches from other developers.

   I believe that RAID-5/6 is the next major piece of work that Chris
is intending to integrate, once fsck is ready. However, stability is
better than features at this point. I don't see any commercial benefit
in preventing the integration and deployment of the RAID-5/6 patches,
but simply that there's other things that are more important. Please
be patient.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Startle, startle, little twink.  How I wonder what you think. ---  


signature.asc
Description: Digital signature


Re: Recovering parent transid verify failed

2011-03-06 Thread Hugo Mills
On Sun, Mar 06, 2011 at 12:28:41PM +0200, Yo'av Moshe wrote:
 Hey,
 I'd start by saying that I know Btrfs is a still experimental, and so
 there's no guarantee that one would be able to help me at all... But I
 thought I'll try anyway :-)
 
 Few months ago I bought a new laptop and installed ArchLinux on it,
 with Btrfs on the root filesystem... I know, it's not the smartest
 thing to do...
 After a few month I had issues with my hibernations scripts, and one
 day I tried to hibernate my computer but it didn't go that well, and,
 well, ever since then my Btrfs partition is not accessible.
 I opened up the Btrfs FAQ and saw that the fsck tool should be out by
 the end of 2010, and thought oh well, I could wait until then, and
 went on and installed Ubuntu with Ext4 on another small partition.
 
 But times goes one and the fsck tool is still in development... I've
 tried using the code from GIT and it didn't work, and I'm starting to
 wonder (a) if there's any hope at all and (b) what other step am I
 able to do to recover my old Btrfs partition.

   Yes, there is hope. This error should be fixable with the new fsck.

 When trying to mount the Btrfs parition I get this in dmesg:
 [105252.779080] device fsid d14e78a602757297-bf762d859b406ca9 devid 1
 transid 135714 /dev/sda4
 [105252.818697] parent transid verify failed on 216925220864 wanted
 135714 found 135713
[snip]
 Should I wait for btrfsck to be ready?

   Yes.

 Am I not using it correctly now?

   No, there's not a lot the current version can do right now.

 Is there anyway to recover this partition or should I just wipe it and
 reinstall Btrfs only when I'm supposed to?..
 
 Your help is appreciated.

   HTH,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I am the author. You are the audience. I outrank you! --- 


signature.asc
Description: Digital signature


  1   2   3   4   5   6   7   8   9   10   >