Re: dup vs raid1 in single disk

2017-02-08 Thread Austin S. Hemmelgarn

On 2017-02-07 17:28, Kai Krakow wrote:

Am Thu, 19 Jan 2017 15:02:14 -0500
schrieb "Austin S. Hemmelgarn" :


On 2017-01-19 13:23, Roman Mamedov wrote:

On Thu, 19 Jan 2017 17:39:37 +0100
"Alejandro R. Mosteo"  wrote:


I was wondering, from a point of view of data safety, if there is
any difference between using dup or making a raid1 from two
partitions in the same disk. This is thinking on having some
protection against the typical aging HDD that starts to have bad
sectors.


RAID1 will write slower compared to DUP, as any optimization to
make RAID1 devices work in parallel will cause a total performance
disaster for you as you will start trying to write to both
partitions at the same time, turning all linear writes into random
ones, which are about two orders of magnitude slower than linear on
spinning hard drives. DUP shouldn't have this issue, but still it
will be twice slower than single, since you are writing everything
twice.

As of right now, there will actually be near zero impact on write
performance (or at least, it's way less than the theoretical 50%)
because there really isn't any optimization to speak of in the
multi-device code.  That will hopefully change over time, but it's
not likely to do so any time in the future since nobody appears to be
working on multi-device write performance.


I think that's only true if you don't account the seek overhead. In
single device RAID1 mode you will always seek half of the device while
writing data, and even when reading between odd and even PIDs. In
contrast, DUP mode doesn't guarantee your seeks to be shorter but from
a statistical point of view, on the average it should be shorter. So it
should yield better performance (tho I wouldn't expect it to be
observable, depending on your workload).

So, on devices having no seek overhead (aka SSD), it is probably true
(minus bus bandwidth considerations). For HDD I'd prefer DUP.

From data safety point of view: It's more likely that adjacent
and nearby sectors are bad. So DUP imposes a higher risk of written
data being written to only bad sectors - which means data loss or even
file system loss (if metadata hits this problem).

To be realistic: I wouldn't trade space usage for duplicate data on an
already failing disk, no matter if it's DUP or RAID1. HDD disk space is
cheap, and using such a scenario is just waste of performance AND
space - no matter what. I don't understand the purpose of this. It just
results in fake safety.

Better get two separate devices half the size. There's a better chance
of getting a better cost/space ratio anyways, plus better performance
and safety.


There's also the fact that you're writing more metadata than data
most of the time unless you're dealing with really big files, and
metadata is already DUP mode (unless you are using an SSD), so the
performance hit isn't 50%, it's actually a bit more than half the
ratio of data writes to metadata writes.



On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a
single copy thus deduplicating them. This negates the purpose of
increased redunancy (sic) and just wastes space"


That ability is vastly overestimated in the man page. There is no
miracle content-addressable storage system working at 500 MB/sec
speeds all within a little cheap controller on SSDs. Likely most of
what it can do, is just compress simple stuff, such as runs of
zeroes or other repeating byte sequences.

Most of those that do in-line compression don't implement it in
firmware, they implement it in hardware, and even DEFLATE can get 500
MB/second speeds if properly implemented in hardware.  The firmware
may control how the hardware works, but it's usually hardware doing
heavy lifting in that case, and getting a good ASIC made that can hit
the required performance point for a reasonable compression algorithm
like LZ4 or Snappy is insanely cheap once you've gotten past the VLSI
work.


I still thinks it's a myth... The overhead of managing inline
deduplication is just way too high to implement it without jumping
through expensive hoops. Most workloads have almost zero deduplication
potential. And even when, their temporal occurrence is spaced so far
that an inline deduplicator won't catch it.
Just like the proposed implementation in BTRFS, it's not complete 
deduplication.  In fact, the only devices I've ever seen that do this 
appear to implement it just like what was proposed for BTRFS, just with 
a much smaller cache.  They were also insanely expensive.


If it would be all so easy, btrfs would already have it working in
mainline. I don't even remember that those patches is still being
worked on.

With this in mind, I think dup metadata is still a good think to have
even on SSD and I would always force to enable it.

Agreed.


Potential for deduplication is only when using snapshots (which already
are deduplicated when taken) or when handling 

Re: dup vs raid1 in single disk

2017-02-08 Thread Alejandro R. Mosteo

On 07/02/17 23:28, Kai Krakow wrote:

To be realistic: I wouldn't trade space usage for duplicate data on an
already failing disk, no matter if it's DUP or RAID1. HDD disk space is
cheap, and using such a scenario is just waste of performance AND
space - no matter what. I don't understand the purpose of this. It just
results in fake safety.
The disk is already replaced and no longer my workstation main drive. I 
work with large datasets in my research, and I don't care much about 
sustained I/O efficiency, since they're only read when needed. Hence, is 
a matter of juicing out the last life of that disk, instead of 
discarding it right away. This way I can have one extra local storage 
that may spare me the copy from a remote, so I prefer to play with it 
until it dies. Besides, it affords me a chance to play with btrfs/zfs in 
ways that I wouldn't normally risk, and I can also assess their behavior 
with a truly failing disk.


In the end, after a destructive write pass with badblocks, the disk 
increasing uncorrectable sectors have disappeared... go figure. So right 
now I have a btrfs filesystem built with single profile on top of four 
differently sized partitions. When/if bad blocks reappear I'll test some 
raid configuration; probably raidz unless btrfs raid5 is somewhat usable 
by then (why go with half a disk worth when you can have 2/3? ;-))


Thanks for your justified concern though.

Alex.


Better get two separate devices half the size. There's a better chance
of getting a better cost/space ratio anyways, plus better performance
and safety.


There's also the fact that you're writing more metadata than data
most of the time unless you're dealing with really big files, and
metadata is already DUP mode (unless you are using an SSD), so the
performance hit isn't 50%, it's actually a bit more than half the
ratio of data writes to metadata writes.
  

On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a
single copy thus deduplicating them. This negates the purpose of
increased redunancy (sic) and just wastes space"

That ability is vastly overestimated in the man page. There is no
miracle content-addressable storage system working at 500 MB/sec
speeds all within a little cheap controller on SSDs. Likely most of
what it can do, is just compress simple stuff, such as runs of
zeroes or other repeating byte sequences.

Most of those that do in-line compression don't implement it in
firmware, they implement it in hardware, and even DEFLATE can get 500
MB/second speeds if properly implemented in hardware.  The firmware
may control how the hardware works, but it's usually hardware doing
heavy lifting in that case, and getting a good ASIC made that can hit
the required performance point for a reasonable compression algorithm
like LZ4 or Snappy is insanely cheap once you've gotten past the VLSI
work.

I still thinks it's a myth... The overhead of managing inline
deduplication is just way too high to implement it without jumping
through expensive hoops. Most workloads have almost zero deduplication
potential. And even when, their temporal occurrence is spaced so far
that an inline deduplicator won't catch it.

If it would be all so easy, btrfs would already have it working in
mainline. I don't even remember that those patches is still being
worked on.

With this in mind, I think dup metadata is still a good think to have
even on SSD and I would always force to enable it.

Potential for deduplication is only when using snapshots (which already
are deduplicated when taken) or when handling user data on a file
server in a multi-user environment. Users tend to copy their files all
over the place - multiple directories of multiple gigabytes. Potential
is also where you're working with client machine backups or vm images.
I regularly see deduplication efficiency of 30-60% in such scenarios -
file servers mostly which I'm handling. But due to temporally far
spaced occurrence of duplicate blocks, only offline or nearline
deduplication works here.


And the DUP mode is still useful on SSDs, for cases when one copy
of the DUP gets corrupted in-flight due to a bad controller or RAM
or cable, you could then restore that block from its good-CRC DUP
copy.

The only window of time during which bad RAM could result in only one
copy of a block being bad is after the first copy is written but
before the second is, which is usually an insanely small amount of
time.  As far as the cabling, the window for errors resulting in a
single bad copy of a block is pretty much the same as for RAM, and if
they're persistently bad, you're more likely to lose data for other
reasons.

It depends on the design of the software. You're true if this memory
block is simply a single block throughout its lifetime in RAM before
written to storage. But if it is already handled as duplicate block in
memory, odds are different. I hope btrfs is doing this right... ;-)


That 

Re: dup vs raid1 in single disk

2017-02-07 Thread Dan Mons
On 8 February 2017 at 08:28, Kai Krakow  wrote:
> I still thinks it's a myth... The overhead of managing inline
> deduplication is just way too high to implement it without jumping
> through expensive hoops. Most workloads have almost zero deduplication
> potential. And even when, their temporal occurrence is spaced so far
> that an inline deduplicator won't catch it.
>
> If it would be all so easy, btrfs would already have it working in
> mainline. I don't even remember that those patches is still being
> worked on.
>
> With this in mind, I think dup metadata is still a good think to have
> even on SSD and I would always force to enable it.
>
> Potential for deduplication is only when using snapshots (which already
> are deduplicated when taken) or when handling user data on a file
> server in a multi-user environment. Users tend to copy their files all
> over the place - multiple directories of multiple gigabytes. Potential
> is also where you're working with client machine backups or vm images.
> I regularly see deduplication efficiency of 30-60% in such scenarios -
> file servers mostly which I'm handling. But due to temporally far
> spaced occurrence of duplicate blocks, only offline or nearline
> deduplication works here.

I'm a sysadmin by trade, managing many PB of storage for a media
company.  Our primary storage are Oracle ZFS appliances, and all of
our secondary/nearline storage is Linux+BtrFS.

ZFS's inline deduplication is awful.  It consumes enormous amounts of
RAM that is orders of magnitude more valuable as ARC/Cache, and
becomes immediately useless whenever a storage node is rebooted
(necessary to apply mandatory security patches) and the in-memory
tables are lost (meaning cold data is rarely re-examined, and the
inline dedup becomes less efficient).

Conversely, I use  "dupremove" as a one-shot/offline deduplication
tool on all of our BtrFS storage.  I can be set as a cron job to be
done outside of business hours, and use an SQLite database to store
the necessary dedup hash information on disk, rather than in RAM.
>From the point of view of someone who manages large amounts of long
term centralised storage, this is a far superior way to deal with
deduplication, as it offers more flexibility and far better
space-saving ratios at a lower memory cost.

We trialled ZFS dedup for a few months, and decided to turn it off, as
there was far less benefit to ZFS using all that RAM for dedup than
there was for it to be cache.  I've been requesting Oracle offer a
similar offline dedup tool for their ZFS appliance for a very long
time, and if BtrFS ever did offer inline dedup, I wouldn't bother
using it for all of the reasons above.

-Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dup vs raid1 in single disk

2017-02-07 Thread Hans van Kranenburg
On 02/07/2017 11:28 PM, Kai Krakow wrote:
> Am Thu, 19 Jan 2017 15:02:14 -0500
> schrieb "Austin S. Hemmelgarn" :
> 
>> On 2017-01-19 13:23, Roman Mamedov wrote:
>>> On Thu, 19 Jan 2017 17:39:37 +0100
>>> [...]
>>> And the DUP mode is still useful on SSDs, for cases when one copy
>>> of the DUP gets corrupted in-flight due to a bad controller or RAM
>>> or cable, you could then restore that block from its good-CRC DUP
>>> copy.  
>> The only window of time during which bad RAM could result in only one 
>> copy of a block being bad is after the first copy is written but
>> before the second is, which is usually an insanely small amount of
>> time.  As far as the cabling, the window for errors resulting in a
>> single bad copy of a block is pretty much the same as for RAM, and if
>> they're persistently bad, you're more likely to lose data for other
>> reasons.
> 
> It depends on the design of the software. You're true if this memory
> block is simply a single block throughout its lifetime in RAM before
> written to storage. But if it is already handled as duplicate block in
> memory, odds are different. I hope btrfs is doing this right... ;-)

In memory, it's just one copy, happily sitting around, getting corrupted
by cosmic rays and other stuff done to it by aliens, after which a valid
checksum is calculated for the corrupt data, after which it goes on its
way to disk, twice. Yay.

>> That said, I do still feel that DUP mode has value on SSD's.  The 
>> primary arguments against it are:
>> 1. It wears out the SSD faster.
> 
> I don't think this is a huge factor, even more when looking at TBW
> capabilities of modern SSDs. And prices are low enough to better swap
> early than waiting for the disaster hitting you. Instead, you can still
> use the old SSD for archival storage (but this has drawbacks, don't
> leave them without power for months or years!) or as a shock resistent
> USB mobile drive on the go.
> 
>> 2. The blocks are likely to end up in the same erase block, and 
>> therefore there will be no benefit.
> 
> Oh, this is probably a point to really think about... Would ssd_spread
> help here?

I think there was another one, SSD firmware deduplicating writes,
converting the DUP into single again, giving a false idea of it being DUP.

This is one that can be solved by e.g. using disk encryption, which
causes same writes to show up as different data on disk.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dup vs raid1 in single disk

2017-02-07 Thread Kai Krakow
Am Thu, 19 Jan 2017 15:02:14 -0500
schrieb "Austin S. Hemmelgarn" :

> On 2017-01-19 13:23, Roman Mamedov wrote:
> > On Thu, 19 Jan 2017 17:39:37 +0100
> > "Alejandro R. Mosteo"  wrote:
> >  
> >> I was wondering, from a point of view of data safety, if there is
> >> any difference between using dup or making a raid1 from two
> >> partitions in the same disk. This is thinking on having some
> >> protection against the typical aging HDD that starts to have bad
> >> sectors.  
> >
> > RAID1 will write slower compared to DUP, as any optimization to
> > make RAID1 devices work in parallel will cause a total performance
> > disaster for you as you will start trying to write to both
> > partitions at the same time, turning all linear writes into random
> > ones, which are about two orders of magnitude slower than linear on
> > spinning hard drives. DUP shouldn't have this issue, but still it
> > will be twice slower than single, since you are writing everything
> > twice.  
> As of right now, there will actually be near zero impact on write 
> performance (or at least, it's way less than the theoretical 50%) 
> because there really isn't any optimization to speak of in the 
> multi-device code.  That will hopefully change over time, but it's
> not likely to do so any time in the future since nobody appears to be 
> working on multi-device write performance.

I think that's only true if you don't account the seek overhead. In
single device RAID1 mode you will always seek half of the device while
writing data, and even when reading between odd and even PIDs. In
contrast, DUP mode doesn't guarantee your seeks to be shorter but from
a statistical point of view, on the average it should be shorter. So it
should yield better performance (tho I wouldn't expect it to be
observable, depending on your workload).

So, on devices having no seek overhead (aka SSD), it is probably true
(minus bus bandwidth considerations). For HDD I'd prefer DUP.

>From data safety point of view: It's more likely that adjacent
and nearby sectors are bad. So DUP imposes a higher risk of written
data being written to only bad sectors - which means data loss or even
file system loss (if metadata hits this problem).

To be realistic: I wouldn't trade space usage for duplicate data on an
already failing disk, no matter if it's DUP or RAID1. HDD disk space is
cheap, and using such a scenario is just waste of performance AND
space - no matter what. I don't understand the purpose of this. It just
results in fake safety.

Better get two separate devices half the size. There's a better chance
of getting a better cost/space ratio anyways, plus better performance
and safety.

> There's also the fact that you're writing more metadata than data
> most of the time unless you're dealing with really big files, and
> metadata is already DUP mode (unless you are using an SSD), so the
> performance hit isn't 50%, it's actually a bit more than half the
> ratio of data writes to metadata writes.
> >  
> >> On a related note, I see this caveat about dup in the manpage:
> >>
> >> "For example, a SSD drive can remap the blocks internally to a
> >> single copy thus deduplicating them. This negates the purpose of
> >> increased redunancy (sic) and just wastes space"  
> >
> > That ability is vastly overestimated in the man page. There is no
> > miracle content-addressable storage system working at 500 MB/sec
> > speeds all within a little cheap controller on SSDs. Likely most of
> > what it can do, is just compress simple stuff, such as runs of
> > zeroes or other repeating byte sequences.  
> Most of those that do in-line compression don't implement it in 
> firmware, they implement it in hardware, and even DEFLATE can get 500 
> MB/second speeds if properly implemented in hardware.  The firmware
> may control how the hardware works, but it's usually hardware doing
> heavy lifting in that case, and getting a good ASIC made that can hit
> the required performance point for a reasonable compression algorithm
> like LZ4 or Snappy is insanely cheap once you've gotten past the VLSI
> work.

I still thinks it's a myth... The overhead of managing inline
deduplication is just way too high to implement it without jumping
through expensive hoops. Most workloads have almost zero deduplication
potential. And even when, their temporal occurrence is spaced so far
that an inline deduplicator won't catch it.

If it would be all so easy, btrfs would already have it working in
mainline. I don't even remember that those patches is still being
worked on.

With this in mind, I think dup metadata is still a good think to have
even on SSD and I would always force to enable it.

Potential for deduplication is only when using snapshots (which already
are deduplicated when taken) or when handling user data on a file
server in a multi-user environment. Users tend to copy their files all
over the place - multiple directories of multiple gigabytes. 

Re: dup vs raid1 in single disk

2017-01-21 Thread Alejandro R. Mosteo

Thanks Austin and Roman for the interesting discussion.

Alex.

On 19/01/17 21:02, Austin S. Hemmelgarn wrote:

On 2017-01-19 13:23, Roman Mamedov wrote:

On Thu, 19 Jan 2017 17:39:37 +0100
"Alejandro R. Mosteo"  wrote:


I was wondering, from a point of view of data safety, if there is any
difference between using dup or making a raid1 from two partitions in
the same disk. This is thinking on having some protection against the
typical aging HDD that starts to have bad sectors.


RAID1 will write slower compared to DUP, as any optimization to make 
RAID1
devices work in parallel will cause a total performance disaster for 
you as
you will start trying to write to both partitions at the same time, 
turning
all linear writes into random ones, which are about two orders of 
magnitude
slower than linear on spinning hard drives. DUP shouldn't have this 
issue, but
still it will be twice slower than single, since you are writing 
everything

twice.
As of right now, there will actually be near zero impact on write 
performance (or at least, it's way less than the theoretical 50%) 
because there really isn't any optimization to speak of in the 
multi-device code.  That will hopefully change over time, but it's not 
likely to do so any time in the future since nobody appears to be 
working on multi-device write performance.


You could consider DUP data for when a disk is already known to be 
getting bad
sectors from time to time -- but then it's a fringe exercise to try 
and keep
using such disk in the first place. Yeah with DUP data DUP metadata 
you can
likely have some more life out of such disk as a throwaway storage 
space for
non-essential data, at half capacity, but is it worth the effort, as 
it's

likely to start failing progressively worse over time.

In all other cases the performance and storage space penalty of DUP 
within a
single device are way too great (and gained redundancy is too low) 
compared
to a proper system of single profile data + backups, or a RAID5/6 
system (not

Btrfs-based) + backups.
That really depends on your usage.  In my case, I run DUP data on 
single disks regularly.  I still do backups of course, but the 
performance is worth far less for me (especially in the cases where 
I'm using NVMe SSD's which have performance measured in thousands of 
MB/s for both reads and writes) than the ability to recover from 
transient data corruption without needing to go to a backup.


As long as /home and any other write heavy directories are on a 
separate partition, I would actually advocate using DUP data on your 
root filesystem if you can afford the space simply because it's a 
whole lot easier to recover other data if the root filesystem still 
works.  Most of the root filesystem except some stuff under /var 
follows a WORM access pattern, and even the stuff that doesn't in /var 
is usually not performance critical, so the write performance penalty 
won't have anywhere near as much impact on how well the system runs as 
you might think.


There's also the fact that you're writing more metadata than data most 
of the time unless you're dealing with really big files, and metadata 
is already DUP mode (unless you are using an SSD), so the performance 
hit isn't 50%, it's actually a bit more than half the ratio of data 
writes to metadata writes.



On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a single
copy thus deduplicating them. This negates the purpose of increased
redunancy (sic) and just wastes space"


That ability is vastly overestimated in the man page. There is no 
miracle
content-addressable storage system working at 500 MB/sec speeds all 
within a

little cheap controller on SSDs. Likely most of what it can do, is just
compress simple stuff, such as runs of zeroes or other repeating byte
sequences.
Most of those that do in-line compression don't implement it in 
firmware, they implement it in hardware, and even DEFLATE can get 500 
MB/second speeds if properly implemented in hardware.  The firmware 
may control how the hardware works, but it's usually hardware doing 
heavy lifting in that case, and getting a good ASIC made that can hit 
the required performance point for a reasonable compression algorithm 
like LZ4 or Snappy is insanely cheap once you've gotten past the VLSI 
work.


And the DUP mode is still useful on SSDs, for cases when one copy of 
the DUP
gets corrupted in-flight due to a bad controller or RAM or cable, you 
could

then restore that block from its good-CRC DUP copy.
The only window of time during which bad RAM could result in only one 
copy of a block being bad is after the first copy is written but 
before the second is, which is usually an insanely small amount of 
time.  As far as the cabling, the window for errors resulting in a 
single bad copy of a block is pretty much the same as for RAM, and if 
they're persistently bad, you're more likely to lose 

Re: dup vs raid1 in single disk

2017-01-19 Thread Austin S. Hemmelgarn

On 2017-01-19 13:23, Roman Mamedov wrote:

On Thu, 19 Jan 2017 17:39:37 +0100
"Alejandro R. Mosteo"  wrote:


I was wondering, from a point of view of data safety, if there is any
difference between using dup or making a raid1 from two partitions in
the same disk. This is thinking on having some protection against the
typical aging HDD that starts to have bad sectors.


RAID1 will write slower compared to DUP, as any optimization to make RAID1
devices work in parallel will cause a total performance disaster for you as
you will start trying to write to both partitions at the same time, turning
all linear writes into random ones, which are about two orders of magnitude
slower than linear on spinning hard drives. DUP shouldn't have this issue, but
still it will be twice slower than single, since you are writing everything
twice.
As of right now, there will actually be near zero impact on write 
performance (or at least, it's way less than the theoretical 50%) 
because there really isn't any optimization to speak of in the 
multi-device code.  That will hopefully change over time, but it's not 
likely to do so any time in the future since nobody appears to be 
working on multi-device write performance.


You could consider DUP data for when a disk is already known to be getting bad
sectors from time to time -- but then it's a fringe exercise to try and keep
using such disk in the first place. Yeah with DUP data DUP metadata you can
likely have some more life out of such disk as a throwaway storage space for
non-essential data, at half capacity, but is it worth the effort, as it's
likely to start failing progressively worse over time.

In all other cases the performance and storage space penalty of DUP within a
single device are way too great (and gained redundancy is too low) compared
to a proper system of single profile data + backups, or a RAID5/6 system (not
Btrfs-based) + backups.
That really depends on your usage.  In my case, I run DUP data on single 
disks regularly.  I still do backups of course, but the performance is 
worth far less for me (especially in the cases where I'm using NVMe 
SSD's which have performance measured in thousands of MB/s for both 
reads and writes) than the ability to recover from transient data 
corruption without needing to go to a backup.


As long as /home and any other write heavy directories are on a separate 
partition, I would actually advocate using DUP data on your root 
filesystem if you can afford the space simply because it's a whole lot 
easier to recover other data if the root filesystem still works.  Most 
of the root filesystem except some stuff under /var follows a WORM 
access pattern, and even the stuff that doesn't in /var is usually not 
performance critical, so the write performance penalty won't have 
anywhere near as much impact on how well the system runs as you might think.


There's also the fact that you're writing more metadata than data most 
of the time unless you're dealing with really big files, and metadata is 
already DUP mode (unless you are using an SSD), so the performance hit 
isn't 50%, it's actually a bit more than half the ratio of data writes 
to metadata writes.



On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a single
copy thus deduplicating them. This negates the purpose of increased
redunancy (sic) and just wastes space"


That ability is vastly overestimated in the man page. There is no miracle
content-addressable storage system working at 500 MB/sec speeds all within a
little cheap controller on SSDs. Likely most of what it can do, is just
compress simple stuff, such as runs of zeroes or other repeating byte
sequences.
Most of those that do in-line compression don't implement it in 
firmware, they implement it in hardware, and even DEFLATE can get 500 
MB/second speeds if properly implemented in hardware.  The firmware may 
control how the hardware works, but it's usually hardware doing heavy 
lifting in that case, and getting a good ASIC made that can hit the 
required performance point for a reasonable compression algorithm like 
LZ4 or Snappy is insanely cheap once you've gotten past the VLSI work.


And the DUP mode is still useful on SSDs, for cases when one copy of the DUP
gets corrupted in-flight due to a bad controller or RAM or cable, you could
then restore that block from its good-CRC DUP copy.
The only window of time during which bad RAM could result in only one 
copy of a block being bad is after the first copy is written but before 
the second is, which is usually an insanely small amount of time.  As 
far as the cabling, the window for errors resulting in a single bad copy 
of a block is pretty much the same as for RAM, and if they're 
persistently bad, you're more likely to lose data for other reasons.


That said, I do still feel that DUP mode has value on SSD's.  The 
primary arguments against it are:

1. It 

Re: dup vs raid1 in single disk

2017-01-19 Thread Roman Mamedov
On Thu, 19 Jan 2017 17:39:37 +0100
"Alejandro R. Mosteo"  wrote:

> I was wondering, from a point of view of data safety, if there is any
> difference between using dup or making a raid1 from two partitions in
> the same disk. This is thinking on having some protection against the
> typical aging HDD that starts to have bad sectors.

RAID1 will write slower compared to DUP, as any optimization to make RAID1
devices work in parallel will cause a total performance disaster for you as
you will start trying to write to both partitions at the same time, turning
all linear writes into random ones, which are about two orders of magnitude
slower than linear on spinning hard drives. DUP shouldn't have this issue, but
still it will be twice slower than single, since you are writing everything
twice.

You could consider DUP data for when a disk is already known to be getting bad
sectors from time to time -- but then it's a fringe exercise to try and keep
using such disk in the first place. Yeah with DUP data DUP metadata you can
likely have some more life out of such disk as a throwaway storage space for
non-essential data, at half capacity, but is it worth the effort, as it's
likely to start failing progressively worse over time.

In all other cases the performance and storage space penalty of DUP within a
single device are way too great (and gained redundancy is too low) compared
to a proper system of single profile data + backups, or a RAID5/6 system (not
Btrfs-based) + backups.

> On a related note, I see this caveat about dup in the manpage:
> 
> "For example, a SSD drive can remap the blocks internally to a single
> copy thus deduplicating them. This negates the purpose of increased
> redunancy (sic) and just wastes space"

That ability is vastly overestimated in the man page. There is no miracle
content-addressable storage system working at 500 MB/sec speeds all within a
little cheap controller on SSDs. Likely most of what it can do, is just
compress simple stuff, such as runs of zeroes or other repeating byte
sequences.

And the DUP mode is still useful on SSDs, for cases when one copy of the DUP
gets corrupted in-flight due to a bad controller or RAM or cable, you could
then restore that block from its good-CRC DUP copy.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: dup vs raid1 in single disk

2017-01-19 Thread Austin S. Hemmelgarn

On 2017-01-19 11:39, Alejandro R. Mosteo wrote:

Hello list,

I was wondering, from a point of view of data safety, if there is any
difference between using dup or making a raid1 from two partitions in
the same disk. This is thinking on having some protection against the
typical aging HDD that starts to have bad sectors.

On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a single
copy thus deduplicating them. This negates the purpose of increased
redunancy (sic) and just wastes space"

SSDs failure modes are different (more an all or nothing thing, I'm
told) so it wouldn't apply to the use case above, but I'm curious for
curiosity's sake if there would be any difference too.


On a traditional HDD, there actually is a reasonable safety benefit to 
using 2 partitions in raid1 mode over using dup mode.  This is because 
most traditional HDD firmware still keeps the mapping of physical 
sectors to logical sectors mostly linear, so having separate partitions 
will (usually) mean that the two copies are not located near each other 
on physical media.  A similar but weaker version of the same effect can 
be achieved by using the 'ssd_spread' mount option, but I would not 
suggest relying on that.  This doesn't apply to hybrid drives (because 
they move stuff around however they want like SSD's), or SMR drives 
(because they rewrite large portions of the disk when one place gets 
rewritten, so physical separation of the data copies doesn't get you as 
much protection).


For most SSD's, there is no practical benefit because the FTL in the SSD 
firmware generally maps physical sectors to logical sectors in whatever 
arbitrary way it wants, which is usually not going to be linear.


As far as failure modes on an SSD, you usually see one of two things 
happen, either the whole disk starts acting odd (or stops working), or 
individual blocks a few MB in size (which seem to move around the disk 
as they get over-written) start behaving odd.  The first case is the 
firmware or primary electronics going bad, while the second is 
individual erase blocks going bad.  As a general rule, SSD's will run 
longer as they're going bad than HDD's will, but in both cases you 
should look at replacing the device once you start seeing the error 
counters going up consistently over time (or if you see them suddenly 
jump to a much higher number).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: dup vs raid1 in single disk

2017-01-19 Thread Alejandro R. Mosteo
Hello list,

I was wondering, from a point of view of data safety, if there is any
difference between using dup or making a raid1 from two partitions in
the same disk. This is thinking on having some protection against the
typical aging HDD that starts to have bad sectors.

On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a single
copy thus deduplicating them. This negates the purpose of increased
redunancy (sic) and just wastes space"

SSDs failure modes are different (more an all or nothing thing, I'm
told) so it wouldn't apply to the use case above, but I'm curious for
curiosity's sake if there would be any difference too.

Thanks,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html