Re: Help with space

2014-05-03 Thread Austin S Hemmelgarn
On 05/02/2014 03:21 PM, Chris Murphy wrote:
 
 On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Something tells me btrfs replace (not device replace, simply
 replace) should be moved to btrfs device replace…
 
 The syntax for btrfs device is different though; replace is like
 balance: btrfs balance start and btrfs replace start. And you can
 also get a status on it. We don't (yet) have options to stop,
 start, resume, which could maybe come in handy for long rebuilds
 and a reboot is required (?) although maybe that just gets handled
 automatically: set it to pause, then unmount, then reboot, then
 mount and resume.
 
 Well, I'd say two copies if it's only two devices in the raid1...
 would be true raid1.  But if it's say four devices in the raid1,
 as is certainly possible with btrfs raid1, that if it's not
 mirrored 4-way across all devices, it's not true raid1, but
 rather some sort of hybrid raid,  raid10 (or raid01) if the
 devices are so arranged, raid1+linear if arranged that way, or
 some form that doesn't nicely fall into a well defined raid level
 categorization.
 
 Well, md raid1 is always n-way. So if you use -n 3 and specify
 three devices, you'll get 3-way mirroring (3 mirrors). But I don't
 know any hardware raid that works this way. They all seem to be
 raid 1 is strictly two devices. At 4 devices it's raid10, and only
 in pairs.
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is
 something like raid1 (2 copies) + linear/concat. But that
 allocation is round robin. I don't read code but based on how a 3
 disk raid1 volume grows VDI files as it's filled it looks like 1GB
 chunks are copied like this
Actually, MD RAID10 can be configured to work almost the same with an
odd number of disks, except it uses (much) smaller chunks, and it does
more intelligent striping of reads.
 
 Disk1 Disk2   Disk3 134   124 235 679 578 
 689
 
 So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a
 chunk 1; disk 2 and 3 each have a chunk 2, and so on. Total of 9GB
 of data taking up 18GB of space, 6GB on each drive. You can't do
 this with any other raid1 as far as I know. You do definitely run
 out of space on one disk first though because of uneven metadata to
 data chunk allocation.
 
 Anyway I think we're off the rails with raid1 nomenclature as soon
 as we have 3 devices. It's probably better to call it replication,
 with an assumed default of 2 replicates unless otherwise
 specified.
 
 There's definitely a benefit to a 3 device volume with 2
 replicates, efficiency wise. As soon as we go to four disks 2
 replicates it makes more sense to do raid10, although I haven't
 tested odd device raid10 setups so I'm not sure what happens.
 
 
 Chris Murphy
 
 -- To unsubscribe from this list: send the line unsubscribe
 linux-btrfs in the body of a message to majord...@vger.kernel.org 
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-03 Thread Chris Murphy

On May 3, 2014, at 10:31 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote:

 On 05/02/2014 03:21 PM, Chris Murphy wrote:
 
 On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Something tells me btrfs replace (not device replace, simply
 replace) should be moved to btrfs device replace…
 
 The syntax for btrfs device is different though; replace is like
 balance: btrfs balance start and btrfs replace start. And you can
 also get a status on it. We don't (yet) have options to stop,
 start, resume, which could maybe come in handy for long rebuilds
 and a reboot is required (?) although maybe that just gets handled
 automatically: set it to pause, then unmount, then reboot, then
 mount and resume.
 
 Well, I'd say two copies if it's only two devices in the raid1...
 would be true raid1.  But if it's say four devices in the raid1,
 as is certainly possible with btrfs raid1, that if it's not
 mirrored 4-way across all devices, it's not true raid1, but
 rather some sort of hybrid raid,  raid10 (or raid01) if the
 devices are so arranged, raid1+linear if arranged that way, or
 some form that doesn't nicely fall into a well defined raid level
 categorization.
 
 Well, md raid1 is always n-way. So if you use -n 3 and specify
 three devices, you'll get 3-way mirroring (3 mirrors). But I don't
 know any hardware raid that works this way. They all seem to be
 raid 1 is strictly two devices. At 4 devices it's raid10, and only
 in pairs.
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is
 something like raid1 (2 copies) + linear/concat. But that
 allocation is round robin. I don't read code but based on how a 3
 disk raid1 volume grows VDI files as it's filled it looks like 1GB
 chunks are copied like this
 Actually, MD RAID10 can be configured to work almost the same with an
 odd number of disks, except it uses (much) smaller chunks, and it does
 more intelligent striping of reads.

The efficiency of storage depends on the file system placed on top. Btrfs will 
allocate space exclusively for metadata, and it's possible much of that space 
either won't or can't be used. So ext4 or XFS on md probably is more efficient 
in that regard; but then Btrfs also has compression options so this clouds the 
efficiency analysis.

For striping of reads, there is a note in man 4 md about the layout with 
respect to raid10: The 'far' arrangement can give sequential read performance 
equal to that of a RAID0 array, but at the cost of reduced write performance. 
The default layout for raid10 is near 2. I think either the read performance is 
a wash with defaults, and md reads are better while writes are worse with the 
far layout.

I'm not sure how Btrfs performs reads with multiple devices.

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-03 Thread Chris Murphy

On May 3, 2014, at 1:09 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On May 3, 2014, at 10:31 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote:
 
 On 05/02/2014 03:21 PM, Chris Murphy wrote:
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is
 something like raid1 (2 copies) + linear/concat. But that
 allocation is round robin. I don't read code but based on how a 3
 disk raid1 volume grows VDI files as it's filled it looks like 1GB
 chunks are copied like this
 Actually, MD RAID10 can be configured to work almost the same with an
 odd number of disks, except it uses (much) smaller chunks, and it does
 more intelligent striping of reads.
 
 The efficiency of storage depends on the file system placed on top. Btrfs 
 will allocate space exclusively for metadata, and it's possible much of that 
 space either won't or can't be used. So ext4 or XFS on md probably is more 
 efficient in that regard; but then Btrfs also has compression options so this 
 clouds the efficiency analysis.
 
 For striping of reads, there is a note in man 4 md about the layout with 
 respect to raid10: The 'far' arrangement can give sequential read 
 performance equal to that of a RAID0 array, but at the cost of reduced write 
 performance. The default layout for raid10 is near 2. I think either the 
 read performance is a wash with defaults, and md reads are better while 
 writes are worse with the far layout.
 
 I'm not sure how Btrfs performs reads with multiple devices.


Also, for unequal sized devices, for example 12G,6G,6G, Btrfs raid1 is OK with 
this and efficiently uses the space, whereas md does not in raid10. First it 
complains when creating, asking if I want to continue anyway, and then it 



Second it ends up with *less* usable space than if it had 3x 6GB drives.

12G,6G,6G md raid10
# mdadm -C /dev/md0 -n 3 -l raid10 --assume-clean /dev/sd[bcd]
mdadm: largest drive (/dev/sdb) exceeds size (6283264K) by more than 1%.
# mdadm -D /dev/md0 (partial)
 Array Size : 9424896 (8.99 GiB 9.65 GB)
  Used Dev Size : 6283264 (5.99 GiB 6.43 GB)

# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/md09.0G   33M  9.0G   1% /mnt

12G,6G,6G btrfs raid1

# mkfs.btrfs -d raid1 -m raid1 /dev/sd[bcd]
# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdb 24G  1.3M   12G   1% /mnt


For performance workloads, this is probably a pathological configuration since 
it depends on disproportionate reading almost no matter what. But for those who 
happen to have uneven devices available, and favor space usage efficiency over 
performance, it's a nice capability.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Duncan
Russell Coker posted on Fri, 02 May 2014 11:48:07 +1000 as excerpted:

 On Thu, 1 May 2014, Duncan 1i5t5.dun...@cox.net wrote:
 
 Am I missing something or is it impossible to do a disk replace on BTRFS
 right now?
 
 I can delete a device, I can add a device, but I'd like to replace a
 device.

You're missing something... but it's easy to do as I almost missed it too 
even tho I was sure it was there.

Something tells me btrfs replace (not device replace, simply replace) 
should be moved to btrfs device replace...

 http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf‎
 
 Whether a true RAID-1 means just 2 copies or N copies is a matter of
 opinion. Papers such as the above seem to clearly imply that RAID-1 is
 strictly 2 copies of data.

Thanks for that link. =:^)

My position would be that reflects the original, but not the modern, 
definition.  The paper seems to describe as raid1 what would later come 
to be called raid1+0, which quickly morphed into raid10, leaving the 
raid1 description only covering pure mirror-raid.

And even then, the paper says mirrors in spots without specifically 
defining it as (only) two mirrors, but in others it seems to /assume/, 
without further explanation, just two mirrors.  So I'd argue that even 
then the definition of raid1 allowed more than two mirrors, but that it 
just so happened that the examples and formulae given dealt with only two 
mirrors.

Tho certainly I can see the room for differing opinions on the matter as 
well.

 I don't have a strong opinion on how many copies of data can be involved
 in a RAID-1, but I think that there's no good case to claim that only 2
 copies means that something isn't true RAID-1.

Well, I'd say two copies if it's only two devices in the raid1... would 
be true raid1.  But if it's say four devices in the raid1, as is 
certainly possible with btrfs raid1, that if it's not mirrored 4-way 
across all devices, it's not true raid1, but rather some sort of hybrid 
raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
arranged that way, or some form that doesn't nicely fall into a well 
defined raid level categorization.

But still, opinions can differ.  Point well made... and taken. =:^)

 Surprisingly, after shutting everything down, getting a new AC, and
 letting the system cool for a few hours, it pretty much all came back
 to life, including the CPU(s) (that was pre-multi-core, but I don't
 remember whether it was my dual socket original Opteron, or
 pre-dual-socket for me as well) which I had feared would be dead.
 
 CPUs have had thermal shutdown for a long time.  When a CPU lacks such
 controls (as some buggy Opteron chips did a few years ago) it makes the
 IT news.

That was certainly some years ago, and I remember for awhile, AMD Athlons 
didn't have thermal shutdown yet, while Intel CPUs of the time did.  And 
that was an AMD CPU as I've run mostly AMD (with only specific 
exceptions) for literally decades, now.  But what IDR for sure is whether 
it was my original AMD Athlon (500 MHz), or the Athlon C @ 1.2 GHz, or 
the dual Opteron 242s I ran for several years.  If it was the original 
Athlon, it wouldn't have had thermal shutdown.  If it was the Opterons I 
think they did, but I think the Athlon Cs were in the period when Intel 
had introduced thermal shutdown but AMD hadn't, and Tom's Hardware among 
others had dramatic videos of just exactly what happened if one actually 
tried to run the things without cooling, compared to running an Intel of 
the period.

But I remember being rather surprised that the CPU(s) was/were unharmed, 
which means it very well could have been the Athlon C era, and I had seen 
the dramatic videos and knew my CPU wasn't protected.

 I'd like to be able to run a combination of dup and RAID-1 for
 metadata. ZFS has a copies option, it would be good if we could do
 that.

Well, if N-way-mirroring were possible, one could do more or less just 
that easily enough with suitable partitioning and setting the data vs 
metadata number of mirrors as appropriate... but of course with only two-
way-mirroring and dup as choices... the only way to do it would be 
layering btrfs atop something else, say md/raid.  And without real-time 
checksumming verification at the md/raid level...

 I use BTRFS for all my backups too.  I think that the chance of data
 patterns triggering filesystem bugs that break backups as well as
 primary storage is vanishingly small.  The chance of such bugs being
 latent for long enough that I can't easily recreate the data isn't worth
 worrying about.

The fact that my primary filesystems and their first backups are btrfs 
raid1 on dual SSDs, while secondary backups are on spinning rust, does 
factor into my calculations here.

I ran reiserfs for many years, since I first switched to Linux full time 
in the early kernel 2.4 era in fact, and while it had its problems early 
on, since the introduction of ordered data mode in IIRC 2.6.16 or some 
such, 

Re: Help with space

2014-05-02 Thread Brendan Hide

On 02/05/14 10:23, Duncan wrote:

Russell Coker posted on Fri, 02 May 2014 11:48:07 +1000 as excerpted:


On Thu, 1 May 2014, Duncan 1i5t5.dun...@cox.net wrote:
[snip]
http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf‎

Whether a true RAID-1 means just 2 copies or N copies is a matter of
opinion. Papers such as the above seem to clearly imply that RAID-1 is
strictly 2 copies of data.

Thanks for that link. =:^)

My position would be that reflects the original, but not the modern,
definition.  The paper seems to describe as raid1 what would later come
to be called raid1+0, which quickly morphed into raid10, leaving the
raid1 description only covering pure mirror-raid.
Personally I'm flexible on using the terminology in day-to-day 
operations and discussion due to the fact that the end-result is close 
enough. But ...


The definition of RAID 1 is still only a mirror of two devices. As far 
as I'm aware, Linux's mdraid is the only raid system in the world that 
allows N-way mirroring while still referring to it as RAID1. Due to 
the way it handles data in chunks, and also due to its rampant layering 
violations, *technically* btrfs's RAID-like features are not RAID.


To differentiate from RAID, we're already using lowercase raid and, 
in the long term, some of us are also looking to do away with raid{x} 
terms altogether with what Hugo and I last termed as csp notation. 
Changing the terminology is important - but it is particularly non-urgent.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Something tells me btrfs replace (not device replace, simply replace) 
 should be moved to btrfs device replace…

The syntax for btrfs device is different though; replace is like balance: 
btrfs balance start and btrfs replace start. And you can also get a status on 
it. We don't (yet) have options to stop, start, resume, which could maybe come 
in handy for long rebuilds and a reboot is required (?) although maybe that 
just gets handled automatically: set it to pause, then unmount, then reboot, 
then mount and resume.

 Well, I'd say two copies if it's only two devices in the raid1... would 
 be true raid1.  But if it's say four devices in the raid1, as is 
 certainly possible with btrfs raid1, that if it's not mirrored 4-way 
 across all devices, it's not true raid1, but rather some sort of hybrid 
 raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
 arranged that way, or some form that doesn't nicely fall into a well 
 defined raid level categorization.

Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid that 
works this way. They all seem to be raid 1 is strictly two devices. At 4 
devices it's raid10, and only in pairs.

Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
don't read code but based on how a 3 disk raid1 volume grows VDI files as it's 
filled it looks like 1GB chunks are copied like this

Disk1   Disk2   Disk3
134 124 235
679 578 689

So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
18GB of space, 6GB on each drive. You can't do this with any other raid1 as far 
as I know. You do definitely run out of space on one disk first though because 
of uneven metadata to data chunk allocation.

Anyway I think we're off the rails with raid1 nomenclature as soon as we have 3 
devices. It's probably better to call it replication, with an assumed default 
of 2 replicates unless otherwise specified.

There's definitely a benefit to a 3 device volume with 2 replicates, efficiency 
wise. As soon as we go to four disks 2 replicates it makes more sense to do 
raid10, although I haven't tested odd device raid10 setups so I'm not sure what 
happens.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Hugo Mills
On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
 
 On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
  
  Something tells me btrfs replace (not device replace, simply replace) 
  should be moved to btrfs device replace…
 
 The syntax for btrfs device is different though; replace is like balance: 
 btrfs balance start and btrfs replace start. And you can also get a status on 
 it. We don't (yet) have options to stop, start, resume, which could maybe 
 come in handy for long rebuilds and a reboot is required (?) although maybe 
 that just gets handled automatically: set it to pause, then unmount, then 
 reboot, then mount and resume.
 
  Well, I'd say two copies if it's only two devices in the raid1... would 
  be true raid1.  But if it's say four devices in the raid1, as is 
  certainly possible with btrfs raid1, that if it's not mirrored 4-way 
  across all devices, it's not true raid1, but rather some sort of hybrid 
  raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
  arranged that way, or some form that doesn't nicely fall into a well 
  defined raid level categorization.
 
 Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
 you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid 
 that works this way. They all seem to be raid 1 is strictly two devices. At 4 
 devices it's raid10, and only in pairs.
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
 like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
 don't read code but based on how a 3 disk raid1 volume grows VDI files as 
 it's filled it looks like 1GB chunks are copied like this
 
 Disk1 Disk2   Disk3
 134   124 235
 679   578 689
 
 So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
 disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
 far as I know. You do definitely run out of space on one disk first though 
 because of uneven metadata to data chunk allocation.

   The algorithm is that when the chunk allocator is asked for a block
group (in pairs of chunks for RAID-1), it picks the number of chunks
it needs, from different devices, in order of the device with the most
free space. So, with disks of size 8, 4, 4, you get:

Disk 1: 12345678
Disk 2: 1357
Disk 3: 2468

and with 8, 8, 4, you get:

Disk 1: 1234568A
Disk 2: 1234579A
Disk 3: 6789

   Hugo.

 Anyway I think we're off the rails with raid1 nomenclature as soon as we have 
 3 devices. It's probably better to call it replication, with an assumed 
 default of 2 replicates unless otherwise specified.
 
 There's definitely a benefit to a 3 device volume with 2 replicates, 
 efficiency wise. As soon as we go to four disks 2 replicates it makes more 
 sense to do raid10, although I haven't tested odd device raid10 setups so I'm 
 not sure what happens.
 
 
 Chris Murphy
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Prisoner unknown:  Return to Zenda. ---   


signature.asc
Description: Digital signature


Re: Help with space

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 3:08 PM, Hugo Mills h...@carfax.org.uk wrote:

 On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
 
 On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Something tells me btrfs replace (not device replace, simply replace) 
 should be moved to btrfs device replace…
 
 The syntax for btrfs device is different though; replace is like balance: 
 btrfs balance start and btrfs replace start. And you can also get a status 
 on it. We don't (yet) have options to stop, start, resume, which could maybe 
 come in handy for long rebuilds and a reboot is required (?) although maybe 
 that just gets handled automatically: set it to pause, then unmount, then 
 reboot, then mount and resume.
 
 Well, I'd say two copies if it's only two devices in the raid1... would 
 be true raid1.  But if it's say four devices in the raid1, as is 
 certainly possible with btrfs raid1, that if it's not mirrored 4-way 
 across all devices, it's not true raid1, but rather some sort of hybrid 
 raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
 arranged that way, or some form that doesn't nicely fall into a well 
 defined raid level categorization.
 
 Well, md raid1 is always n-way. So if you use -n 3 and specify three 
 devices, you'll get 3-way mirroring (3 mirrors). But I don't know any 
 hardware raid that works this way. They all seem to be raid 1 is strictly 
 two devices. At 4 devices it's raid10, and only in pairs.
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
 like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
 don't read code but based on how a 3 disk raid1 volume grows VDI files as 
 it's filled it looks like 1GB chunks are copied like this
 
 Disk1Disk2   Disk3
 134  124 235
 679  578 689
 
 So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
 disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
 far as I know. You do definitely run out of space on one disk first though 
 because of uneven metadata to data chunk allocation.
 
   The algorithm is that when the chunk allocator is asked for a block
 group (in pairs of chunks for RAID-1), it picks the number of chunks
 it needs, from different devices, in order of the device with the most
 free space. So, with disks of size 8, 4, 4, you get:
 
 Disk 1: 12345678
 Disk 2: 1357
 Disk 3: 2468
 
 and with 8, 8, 4, you get:
 
 Disk 1: 1234568A
 Disk 2: 1234579A
 Disk 3: 6789

Sure in my example I was assuming equal size disks. But it's a good example to 
have uneven disks also, because it exemplifies all the more the flexibility 
btrfs replication has, over alternatives, with odd numbered *and* uneven size 
disks.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-04-30 Thread Russell Coker
On Fri, 28 Feb 2014 10:34:36 Roman Mamedov wrote:
  I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in
 
 Do you sleep well at night knowing that if one disk fails, you end up with
 basically a RAID0 of 7x3TB disks? And that if 2nd one encounters unreadable
 sector during rebuild, you lost your data? RAID5 actually stopped working 5
 years ago, apparently you didn't get the memo. :)
 http://hardware.slashdot.org/story/08/10/21/2126252/why-raid-5-stops-working
 -in-2009

I've just been doing some experiments with a failing disk used for backups (so 
I'm not losing any real data here).  The dup option for metadata means that 
the entire filesystem structure is intact in spite of having lots of errors 
(in another thread I wrote about getting 50+ correctable errors on metadata 
while doing a backup).

My experience is that in the vast majority of disk failures that don't involve 
dropping a disk the majority of disk data will still be readable.  For example 
one time I had a workstation running RAID-1 get too hot in summer and both 
disks developed significant numbers of errors, enough that it couldn't 
maintain a Linux Software RAID-1 (disks got kicked out all the time).  I wrote 
a program to read all the data from disk 0 and read from disk 1 any blocks 
that couldn't be read from disk 0, the result was that after running e2fsck on 
the result I didn't lose any data.

So if you have BTRFS configured to dup metadata on a RAID-5 array (either 
hardware RAID or Linux Software RAID) then the probability of losing metadata 
would be a lot lower than for a filesystem which doesn't do checksums and 
doesn't duplicate metadata.  To lose metadata you would need to have two 
errors that line up with both copies of the same metadata block.

One problem with many RAID arrays is that it seems to only be possible to 
remove a disk and generate a replacement from parity.  I'd like to be able to 
read all the data from the old disk which is readable and write it to the new 
disk.  Then use the parity from other disks to recover the blocks which 
weren't readable.  That way if you have errors on two disks it won't matter 
unless they both happen to be on the same stripe.  Given that BTRFS RAID-5 
isn't usable yet it seems that the only way to get this result is to use RAID-
Z on ZFS.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-04-30 Thread Duncan
Russell Coker posted on Thu, 01 May 2014 11:52:33 +1000 as excerpted:

 I've just been doing some experiments with a failing disk used for
 backups (so I'm not losing any real data here).

=:^)

 The dup option for metadata means that the entire filesystem
 structure is intact in spite of having lots of errors (in another
 thread I wrote about getting 50+ correctable errors on metadata while
 doing a backup).

TL;DR: Discustion of btrfs raid1 and n-way-mirroring.  Bonus discussion 
on spinning rust heat-death and death in general modes.

That's why I'm running raid1 for both data and metadata here.  I love 
btrfs' data/metadata checksumming and integrity mechanisms, and having 
that second copy to scrub from in the event of an error on one of them is 
just as important to me as the device-redundancy-and-failure-recovery bit.

I could get the latter on md/raid and did run it for some years, but the 
fact that there's no way to have it do routine read-time parity cross-
check and scrub (or N-way checking and vote, rewriting to a bad copy on 
failure, in the case of raid1), even tho it has all the cross-checksums 
already there and available to do it, but only actually makes /use/ of 
that for recovery if a device fails...

My biggest frustration with btrfs ATM is the lack of true raid1, aka 
N-way-mirroring.  Btrfs presently only does pair-mirroring, no matter the 
number of devices in the raid1.  Checksummed-3-way-redundancy really is 
the sweet spot I'd like to hit, and yes it's on the road map, but this 
thing seems to be taking about as long as Christmas does to a five or six 
year old... which is a pretty apt metaphor of my anticipation and the 
eagerness with which I'll be unwrapping and playing with that present 
once it comes! =:^)

 My experience is that in the vast majority of disk failures that don't
 involve dropping a disk the majority of disk data will still be
 readable.  For example one time I had a workstation running RAID-1 get
 too hot in summer and both disks developed significant numbers of
 errors, enough that it couldn't maintain a Linux Software RAID-1 (disks
 got kicked out all the time).  I wrote a program to read all the data
 from disk 0 and read from disk 1 any blocks that couldn't be read from
 disk 0, the result was that after running e2fsck on the result I didn't
 lose any data.

That's rather similar to an experience of mine.  I'm in Phoenix, AZ, and 
outdoor in-the-shade temps can reach near 50C.  Air-conditioning failure 
with a system left running while I was elsewhere.  I came home the the 
hot car effect, far hotter inside than out, so likely 55-60C ambient 
air temp, very likely 70+ device temps.  The system was still on but 
frozen (broiled?) due to disk head crash and possibly CPU thermal 
shutdown.

Surprisingly, after shutting everything down, getting a new AC, and 
letting the system cool for a few hours, it pretty much all came back to 
life, including the CPU(s) (that was pre-multi-core, but I don't remember 
whether it was my dual socket original Opteron, or pre-dual-socket for me 
as well) which I had feared would be dead.

The disk as well came back, minus the sections that were being accessed 
at the time of the head crash, which I expect were physically grooved.

I only had the one main disk running at the time, but fortunately I had 
partitioned it up and had working and backup partitions for everything 
vital, and of course the backup partitions weren't mounted at the time, 
and they came thru just fine (tho without checksumming so I'll never know 
if there were bit-flips, but I could boot from the backup / and mount the 
other backups, and a working partition or two that weren't hurt, just 
fine.

But I *DID* have quite a time recovering anyway, primarily because my 
rootfs, /usr/ and /var (which had the system's installed package 
database), were three different partitions that ended up being from three 
different backup dates... on gentoo, with its rolling updates!  IIRC I 
had a current /var including the package database, but the package files 
actually on the rootfs and on /usr were from different package versions 
from what the db in /var was tracking, and were different from each other 
as well.  I was still finding stale package remnants nearly two years 
later!

But I continued running that disk for several months until I had some 
money to replace it, then copied the system, by then current again except 
for the occasional stale file, to the new setup.  I always wondered how 
much longer I could have run the heat-tested one, but didn't want to 
trust my luck any further, so retired it.

Which was when I got into md/raid, first mostly raid6, then later redone 
to raid1, once I figured out the fancy dual checksums weren't doing 
anything but slowing me down in normal operations anyway.

And on my new setup, I used a partitioning policy I continue to this day, 
namely, everything that the package manager touches[1] including its 
installed-pkg 

Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 11:19 AM, Justin Brown otakujunct...@gmail.com wrote:

 I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in
 need of help.  Disk usage (du) shows 13 tera allocated yet strangely
 enough df shows approx. 780 gigs are free.  It seems, somehow, btrfs
 has eaten roughly 4 tera internally.  I've run a scrub and a balance
 usage=5 with no success, in fact I lost about 20 gigs after the
 balance attempt.  Some numbers:
 
 terra:/var/lib/nobody/fs/ubfterra # uname -a
 Linux terra 3.12.4-2.44-desktop #1 SMP PREEMPT Mon Dec 9 03:14:51 CST
 2013 i686 i686 i386 GNU/Linux

This is on i686?

The kernel page cache is limited to 16TB on i686, so effectively your block 
device is limited to 16TB. While the file system successfully creates, I think 
it's a bug that the mount -t btrfs command is probably a btrfs bug.

The way this works for XFS and ext4 is mount fails.

EXT4-fs (sdc): filesystem too large to mount safely on this system
XFS (sdc): file system too large to be mounted on this system.

If you're on a 32-bit OS, the file system might be toast, I'm not really sure. 
But I'd immediately stop using it and only use 64-bit OS for file systems of 
this size.



Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 12:27 PM, Chris Murphy li...@colorremedies.com wrote:
 This is on i686?
 
 The kernel page cache is limited to 16TB on i686, so effectively your block 
 device is limited to 16TB. While the file system successfully creates, I 
 think it's a bug that the mount -t btrfs command is probably a btrfs bug.

Yes Chris, circular logic day. It's probably a btrfs bug that the mount command 
succeeds.

So let us know if this is i686 or x86_64, because if it's the former it's a bug 
that should get fixed.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread otakujunction
Yes it's an ancient 32 bit machine.  There must be a complex bug involved as 
the system, when originally mounted, claimed the correct free space and only as 
used over time did the discrepancy between used and free grow.  I'm afraid I 
chose btrfs because it appeared capable of breaking the 16 tera limit on a 32 
bit system.  If this isn't the case then it's incredible that I've been using 
this file system for about a year without difficulty until now.

-Justin

Sent from my iPad

 On Feb 27, 2014, at 1:51 PM, Chris Murphy li...@colorremedies.com wrote:
 
 
 On Feb 27, 2014, at 12:27 PM, Chris Murphy li...@colorremedies.com wrote:
 This is on i686?
 
 The kernel page cache is limited to 16TB on i686, so effectively your block 
 device is limited to 16TB. While the file system successfully creates, I 
 think it's a bug that the mount -t btrfs command is probably a btrfs bug.
 
 Yes Chris, circular logic day. It's probably a btrfs bug that the mount 
 command succeeds.
 
 So let us know if this is i686 or x86_64, because if it's the former it's a 
 bug that should get fixed.
 
 
 Chris Murphy
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:

 Yes it's an ancient 32 bit machine.  There must be a complex bug involved as 
 the system, when originally mounted, claimed the correct free space and only 
 as used over time did the discrepancy between used and free grow.  I'm afraid 
 I chose btrfs because it appeared capable of breaking the 16 tera limit on a 
 32 bit system.  If this isn't the case then it's incredible that I've been 
 using this file system for about a year without difficulty until now.

Yep, it's not a good bug. This happened some years ago on XFS too, where people 
would use the file system for a long time and then at 16TB+1byte written to the 
volume, kablewy! And then it wasn't usable at all, until put on a 64-bit kernel.

http://oss.sgi.com/pipermail/xfs/2014-February/034588.html

I can't tell you if there's a work around for this other than to go to a 64bit 
kernel. Maybe you could partition the raid5 into two 9TB block devices, and 
then format the two partitions with -d single -m raid1. That way it behaves as 
one volume, and alternates 1GB chunks to the two partitions. This should be 
decent performing for large files, but otherwise it's possible that you will 
sometimes have the allocator writing to two data chunks on what it thinks are 
two drives, atthe same time, but it's actually writing to the physical device 
(array) at the same time. Hardware raid should optimize some of this, but I 
don't know what the penalty will be, if it'll work for your use case.

And I definitely don't know if the kernel page cache limit applies to the block 
device (partition) or if it applies to the file system. It sounds like it 
applies to the block device, so this might be a way around this if you had to 
stick to a 32bit system.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Dave Chinner
On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote:
 
 On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:
 
  Yes it's an ancient 32 bit machine.  There must be a complex bug
  involved as the system, when originally mounted, claimed the
  correct free space and only as used over time did the
  discrepancy between used and free grow.  I'm afraid I chose
  btrfs because it appeared capable of breaking the 16 tera limit
  on a 32 bit system.  If this isn't the case then it's incredible
  that I've been using this file system for about a year without
  difficulty until now.
 
 Yep, it's not a good bug. This happened some years ago on XFS too,
 where people would use the file system for a long time and then at
 16TB+1byte written to the volume, kablewy! And then it wasn't
 usable at all, until put on a 64-bit kernel.
 
 http://oss.sgi.com/pipermail/xfs/2014-February/034588.html

Well, no, that's not what I said. I said that it was limited on XFS,
not that the limit was a result of a user making a filesystem too
large and then finding out it didn't work. Indeed, you can't do that
on XFS - mkfs will refuse to run on a block device it can't access the
last block on, and the kernel has the same can I access the last
block of the filesystem sanity checks that are run at mount and
growfs time.

IOWs, XFS has *never* allowed 16TB on 32 bit systems on Linux. And,
historically speaking, it didn't even allow it on Irix. Irix on 32
bit systems was limited to 1TB (2^31 sectors of 2^9 bytes = 1TB),
and only as Linux gained sufficient capability on 32 bit systems
(e.g.  CONFIG_LBD) was the limit increased. The limit we are now at
is the address space index being 32 bits, so the size is limited by
2^32 * PAGE_SIZE = 2^44 = 16TB

i.e Back when XFS was still being ported to Linux from Irix in 2000:

203 #if !XFS_BIG_FILESYSTEMS
204 if (sbp-sb_dblocks  INT_MAX || sbp-sb_rblocks  INT_MAX)  {
205 cmn_err(CE_WARN,
206 XFS:  File systems greater than 1TB not supported on this system.\n);
207 return XFS_ERROR(E2BIG);
208 }
209 #endif

(http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=blob;f=fs/xfs/xfs_mount.c;hb=60a4726a60437654e2af369ccc8458376e1657b9)

So, good story, but is not true.

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 5:12 PM, Dave Chinner da...@fromorbit.com wrote:

 On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote:
 
 On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:
 
 Yes it's an ancient 32 bit machine.  There must be a complex bug
 involved as the system, when originally mounted, claimed the
 correct free space and only as used over time did the
 discrepancy between used and free grow.  I'm afraid I chose
 btrfs because it appeared capable of breaking the 16 tera limit
 on a 32 bit system.  If this isn't the case then it's incredible
 that I've been using this file system for about a year without
 difficulty until now.
 
 Yep, it's not a good bug. This happened some years ago on XFS too,
 where people would use the file system for a long time and then at
 16TB+1byte written to the volume, kablewy! And then it wasn't
 usable at all, until put on a 64-bit kernel.
 
 http://oss.sgi.com/pipermail/xfs/2014-February/034588.html
 
 Well, no, that's not what I said.

What are you thinking I said you said? I wasn't quoting or paraphrasing 
anything you've said above. I had done a google search on this early and found 
some rather old threads where some people had this experience of making a large 
file system on a 32-bit kernel, and only after filling it beyond 16TB did they 
run into the problem. Here is one of them:

http://lists.centos.org/pipermail/centos/2011-April/109142.html



 I said that it was limited on XFS,
 not that the limit was a result of a user making a filesystem too
 large and then finding out it didn't work. Indeed, you can't do that
 on XFS - mkfs will refuse to run on a block device it can't access the
 last block on, and the kernel has the same can I access the last
 block of the filesystem sanity checks that are run at mount and
 growfs time.

Nope. What I reported on the XFS list, I had used mkfs.xfs while running 32bit 
kernel on a 20TB virtual disk. It did not fail to make the file system, it 
failed only to mount it. It was the same booted virtual machine, I created the 
file system and immediately mounted it. If you want the specifics, I'll post on 
the XFS list with versions and reproduce steps.


 
 IOWs, XFS has *never* allowed 16TB on 32 bit systems on Linux.

OK that's fine, I've only reported what other people said they experienced, and 
it comes as no surprise they might have been confused. Although not knowing the 
size of one's file system would seem to be rare.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Dave Chinner
On Thu, Feb 27, 2014 at 05:27:48PM -0700, Chris Murphy wrote:
 
 On Feb 27, 2014, at 5:12 PM, Dave Chinner da...@fromorbit.com
 wrote:
 
  On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote:
  
  On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:
  
  Yes it's an ancient 32 bit machine.  There must be a complex
  bug involved as the system, when originally mounted, claimed
  the correct free space and only as used over time did the
  discrepancy between used and free grow.  I'm afraid I chose
  btrfs because it appeared capable of breaking the 16 tera
  limit on a 32 bit system.  If this isn't the case then it's
  incredible that I've been using this file system for about a
  year without difficulty until now.
  
  Yep, it's not a good bug. This happened some years ago on XFS
  too, where people would use the file system for a long time and
  then at 16TB+1byte written to the volume, kablewy! And then it
  wasn't usable at all, until put on a 64-bit kernel.
  
  http://oss.sgi.com/pipermail/xfs/2014-February/034588.html
  
  Well, no, that's not what I said.
 
 What are you thinking I said you said? I wasn't quoting or
 paraphrasing anything you've said above. I had done a google
 search on this early and found some rather old threads where some
 people had this experience of making a large file system on a
 32-bit kernel, and only after filling it beyond 16TB did they run
 into the problem. Here is one of them:
 
 http://lists.centos.org/pipermail/centos/2011-April/109142.html

sigh

No, he didn't fill it with 16TB of data and then have it fail. He
made a new filesystem *larger* than 16TB and tried to mount it:

| On a CentOS 32-bit backup server with a 17TB LVM logical volume on
| EMC storage.  Worked great, until it rolled 16TB.  Then it quit
| working.  Altogether.  /var/log/messages told me that the
| filesystem was too large to be mounted. Had to re-image the VM as
| a 64-bit CentOS, and then re-attached the RDM's to the LUNs
| holding the PV's for the LV, and it mounted instantly, and we
| kept on trucking.

This just backs up what I told you originally - that XFS has always
refused to mount 16TB filesystems on 32 bit systems.

  I said that it was limited on XFS, not that the limit was a
  result of a user making a filesystem too large and then finding
  out it didn't work. Indeed, you can't do that on XFS - mkfs will
  refuse to run on a block device it can't access the last block
  on, and the kernel has the same can I access the last block of
  the filesystem sanity checks that are run at mount and growfs
  time.
 
 Nope. What I reported on the XFS list, I had used mkfs.xfs while
 running 32bit kernel on a 20TB virtual disk. It did not fail to
 make the file system, it failed only to mount it.

You said no such thing. All you said was you couldn't mount a
filesystem  16TB - you made no mention of how you made the fs, what
the block device was or any other details.

 It was the same
 booted virtual machine, I created the file system and immediately
 mounted it. If you want the specifics, I'll post on the XFS list
 with versions and reproduce steps.

Did you check to see whether the block device silently wrapped at
16TB? There's a real good chance it did - but you might have got
lucky because mkfs.xfs uses direct IO and *maybe* that works
correctly on block devices on 32 bit systems. I wouldn't bet on it,
though, given it's something we don't support and therefore never
test

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Roman Mamedov
On Thu, 27 Feb 2014 12:19:05 -0600
Justin Brown otakujunct...@gmail.com wrote:

 I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in

Do you sleep well at night knowing that if one disk fails, you end up with
basically a RAID0 of 7x3TB disks? And that if 2nd one encounters unreadable
sector during rebuild, you lost your data? RAID5 actually stopped working 5
years ago, apparently you didn't get the memo. :)
http://hardware.slashdot.org/story/08/10/21/2126252/why-raid-5-stops-working-in-2009

 need of help.  Disk usage (du) shows 13 tera allocated yet strangely
 enough df shows approx. 780 gigs are free.  It seems, somehow, btrfs
 has eaten roughly 4 tera internally.  I've run a scrub and a balance
 usage=5 with no success, in fact I lost about 20 gigs after the

Did you run balance with -dusage=5 or -musage=5? Or both?
What is the output of the balance command?

 terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
 Data, single: total=17.58TiB, used=17.57TiB
 System, DUP: total=8.00MiB, used=1.93MiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=392.00GiB, used=33.50GiB
   ^

If you'd use -musage=5, I think this metadata reserve should have been
shrunk, and you'd gain a lot more free space.

But then as others mentioned it may be risky to use this FS on 32-bit at all,
so I'd suggest trying anything else only after you reboot into a 64-bit kernel.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 9:21 PM, Dave Chinner da...@fromorbit.com wrote:
 
 http://lists.centos.org/pipermail/centos/2011-April/109142.html
 
 sigh
 
 No, he didn't fill it with 16TB of data and then have it fail. He
 made a new filesystem *larger* than 16TB and tried to mount it:
 
 | On a CentOS 32-bit backup server with a 17TB LVM logical volume on
 | EMC storage.  Worked great, until it rolled 16TB.  Then it quit
 | working.  Altogether.  /var/log/messages told me that the
 | filesystem was too large to be mounted. Had to re-image the VM as
 | a 64-bit CentOS, and then re-attached the RDM's to the LUNs
 | holding the PV's for the LV, and it mounted instantly, and we
 | kept on trucking.
 
 This just backs up what I told you originally - that XFS has always
 refused to mount 16TB filesystems on 32 bit systems.

That isn't how I read that at all. It was a 17TB LV, working great (i.e. 
mounted) until it was filled with 16TB, then it quite working and could not 
subsequently be mounted until put on a 64-bit kernel.

I don't see how it's working great if it's not mountable.



 
 I said that it was limited on XFS, not that the limit was a
 result of a user making a filesystem too large and then finding
 out it didn't work. Indeed, you can't do that on XFS - mkfs will
 refuse to run on a block device it can't access the last block
 on, and the kernel has the same can I access the last block of
 the filesystem sanity checks that are run at mount and growfs
 time.
 
 Nope. What I reported on the XFS list, I had used mkfs.xfs while
 running 32bit kernel on a 20TB virtual disk. It did not fail to
 make the file system, it failed only to mount it.
 
 You said no such thing. All you said was you couldn't mount a
 filesystem  16TB - you made no mention of how you made the fs, what
 the block device was or any other details.

All correct. It wasn't intended as a bug report, it seemed normal. What I 
reported = the mount failure.

VBox 25TB VDI as a single block device, as well as 5x 5TB VDIs in an 20TB 
linear LV, as well as a 100TB virtual size LV using LVM thinp - all can be 
formatted with default mkfs.xfs with no complaints.

3.13.4-200.fc20.i686+PAE
xfsprogs-3.1.11-2.fc20.i686


 
 It was the same
 booted virtual machine, I created the file system and immediately
 mounted it. If you want the specifics, I'll post on the XFS list
 with versions and reproduce steps.
 
 Did you check to see whether the block device silently wrapped at
 16TB? There's a real good chance it did - but you might have got
 lucky because mkfs.xfs uses direct IO and *maybe* that works
 correctly on block devices on 32 bit systems. I wouldn't bet on it,
 though, given it's something we don't support and therefore never
 test….

I did not check to see if any of the block devices silently wrapped, I don't 
know how to do that although I have a strace of the mkfs on the 100TB virtual 
LV here:

https://dl.dropboxusercontent.com/u/3253801/mkfsxfs32bit100TBvLV.txt


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 11:19 AM, Justin Brown otakujunct...@gmail.com wrote:

 terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
 Data, single: total=17.58TiB, used=17.57TiB
 System, DUP: total=8.00MiB, used=1.93MiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=392.00GiB, used=33.50GiB
 Metadata, single: total=8.00MiB, used=0.00

After glancing at this again, what I thought might be going on might not be 
going on. The fact it has 17+TB already used, not merely allocated, doesn't 
seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels.

But then I don't know why du -h is reporting only 13T total used. And I'm 
unconvinced this is a balance issue either. Is anything obviously missing from 
the file system?


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 11:13 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On Feb 27, 2014, at 11:19 AM, Justin Brown otakujunct...@gmail.com wrote:
 
 terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
 Data, single: total=17.58TiB, used=17.57TiB
 System, DUP: total=8.00MiB, used=1.93MiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=392.00GiB, used=33.50GiB
 Metadata, single: total=8.00MiB, used=0.00
 
 After glancing at this again, what I thought might be going on might not be 
 going on. The fact it has 17+TB already used, not merely allocated, doesn't 
 seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels.
 
 But then I don't know why du -h is reporting only 13T total used. And I'm 
 unconvinced this is a balance issue either. Is anything obviously missing 
 from the file system?

What are your mount options? Maybe compression?

Clearly du is calculating things differently. I'm getting:

du -sch = 4.2G
df -h= 5.4G
btrfs df  = 4.7G data and 620MB metadata(total).

I am using compress=lzo.

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Duncan
Roman Mamedov posted on Fri, 28 Feb 2014 10:34:36 +0600 as excerpted:

 But then as others mentioned it may be risky to use this FS on 32-bit at
 all, so I'd suggest trying anything else only after you reboot into a
 64-bit kernel.

Based on what I've read on-list, btrfs is not arch-agnostic, with certain 
on-disk sizes set to native kernel page size, etc, so a filesystem 
created on one arch may well not work on another.

Question: Does this apply to x86/amd64?  Will a filesystem created/used 
on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading 
to 64-bit imply backing up (in this case) double-digit TiB of data to 
something other than btrfs and testing it, doing a mkfs on the original 
filesystem once in 64-bit mode, and restoring all that data from backup?

If the existing 32-bit x86 btrfs can't be used on 64-bit amd64, 
transferring all that data (assuming there's something big enough 
available to transfer it to!) to backup and then restoring it is going to 
hurt!

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Roman Mamedov
On Fri, 28 Feb 2014 07:27:06 + (UTC)
Duncan 1i5t5.dun...@cox.net wrote:

 Based on what I've read on-list, btrfs is not arch-agnostic, with certain 
 on-disk sizes set to native kernel page size, etc, so a filesystem 
 created on one arch may well not work on another.
 
 Question: Does this apply to x86/amd64?  Will a filesystem created/used 
 on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading 
 to 64-bit imply backing up (in this case) double-digit TiB of data to 
 something other than btrfs and testing it, doing a mkfs on the original 
 filesystem once in 64-bit mode, and restoring all that data from backup?

Page size (4K) is the same on both i386 and amd64. It's also the same on ARM.

Problem arises only on architectures like MIPS and PowerPC, some variants of
which use 16K or 64K page sizes.

Other than this page size issue, it has no arch-specific dependencies,  e.g.
no on-disk structures with CPU-native integer sized fields etc, that'd be too
crazy to be true.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Help with space

2014-02-27 Thread Justin Brown
Apologies for the late reply, I'd assumed the issue was closed even
given the unusual behavior.  My mount options are:

/dev/sdb1 on /var/lib/nobody/fs/ubfterra type btrfs
(rw,noatime,nodatasum,nodatacow,noacl,space_cache,skip_balance)

I only recently added nodatacow and skip_balance in an attempt to
figure out where the missing space had gone.  I don't know what impact
it might have if any on things.  I've got a full balance running at
the moment which, after about a day or so, has managed to process 5%
of the chunks it's considering (988 out of about 18396 chunks balanced
(989 considered),  95% left).  The amount of free space has vacillated
slightly, growing by about a gig to shrink back.  As far as objects in
the file system missing, I've not seen any such.  I've a lot of files
of various data types, the majority is encoded japanese animation.
Since I actually play these files via samba from a htpc, particularly
the more recent additions, I'd hazard to guess that if something were
breaking I'd have tripped across it by now, the unusual used to free
space delta being the exception.  My brother also uses this raid for
data storage, he's something of a closet meteorologist and is
fascinated by tornadoes.  He hasn't noticed any unusual behavior
either.  I'm in the process of sourcing a 64 bit capable system in the
hopes that will resolve the issue.  Neither of us are currently
writing anything to the file system for fear of things breaking, but
both have been reading from it without issue other than the noticeable
impact in performance balance seems to be having.  Thanks for the
help.

-Justin


On Fri, Feb 28, 2014 at 12:26 AM, Chris Murphy li...@colorremedies.com wrote:

 On Feb 27, 2014, at 11:13 PM, Chris Murphy li...@colorremedies.com wrote:


 On Feb 27, 2014, at 11:19 AM, Justin Brown otakujunct...@gmail.com wrote:

 terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
 Data, single: total=17.58TiB, used=17.57TiB
 System, DUP: total=8.00MiB, used=1.93MiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=392.00GiB, used=33.50GiB
 Metadata, single: total=8.00MiB, used=0.00

 After glancing at this again, what I thought might be going on might not be 
 going on. The fact it has 17+TB already used, not merely allocated, doesn't 
 seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels.

 But then I don't know why du -h is reporting only 13T total used. And I'm 
 unconvinced this is a balance issue either. Is anything obviously missing 
 from the file system?

 What are your mount options? Maybe compression?

 Clearly du is calculating things differently. I'm getting:

 du -sch = 4.2G
 df -h= 5.4G
 btrfs df  = 4.7G data and 620MB metadata(total).

 I am using compress=lzo.

 Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Justin Brown
Absolutely.  I'd like to know the answer to this, as 13 tera will take
a considerable amount of time to back up anywhere, assuming I find a
place.  I'm considering rebuilding a smaller raid with newer drives
(it was originally built using 16 250 gig western digital drives, it's
about eleven years old now, having been in use the entire time without
failure, I'm considering replacing each 250 gig with a 3 tera
alternative).  Unfortunately, between upgrading the host and building
a new raid the expense isn't something I'm anticipating with
pleasure...

On Fri, Feb 28, 2014 at 1:27 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Roman Mamedov posted on Fri, 28 Feb 2014 10:34:36 +0600 as excerpted:

 But then as others mentioned it may be risky to use this FS on 32-bit at
 all, so I'd suggest trying anything else only after you reboot into a
 64-bit kernel.

 Based on what I've read on-list, btrfs is not arch-agnostic, with certain
 on-disk sizes set to native kernel page size, etc, so a filesystem
 created on one arch may well not work on another.

 Question: Does this apply to x86/amd64?  Will a filesystem created/used
 on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading
 to 64-bit imply backing up (in this case) double-digit TiB of data to
 something other than btrfs and testing it, doing a mkfs on the original
 filesystem once in 64-bit mode, and restoring all that data from backup?

 If the existing 32-bit x86 btrfs can't be used on 64-bit amd64,
 transferring all that data (assuming there's something big enough
 available to transfer it to!) to backup and then restoring it is going to
 hurt!

 --
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html