Re: More RAID weirdness: external RAID over network

2023-03-20 Thread Nicolas George
Tim Woodall (12023-03-17):
> Yes. It's possible. Took me about 5 minutes to work out the steps. All
> of which are already mentioned upthread.

All of them, except one.

> mdadm --build ${md} --level=raid1 --raid-devices=2 ${d1} missing

Until now, all suggestions with mdadm started with:

mdadm --create /dev/md0 --level=mirror --force --raid-devices=1 \
--metadata=1.0 /dev/local_dev missing

You suggest --build rather than --create, and indeed:

   Build  Build  an  array  that  doesn't  have  per-device  metadata (su‐
  perblocks).  For these sorts of arrays, mdadm cannot differenti‐
  ate  between  initial creation and subsequent assembly of an ar‐
  ray.  It also cannot perform any checks that appropriate  compo‐
  nents  have  been  requested.   Because  of this, the Build mode
  should only be used together with a  complete  understanding  of
  what you are doing.

Using --create would have damaged the data; people who suggested this
just had not understood the question.

I had not noticed this feature of mdadm, thanks for letting me know
about it.

Regards,

-- 
  Nicolas George



Re: RAID1 + iSCSI as backup (was Re: More RAID weirdness: external RAID over network)

2023-03-18 Thread Dan Ritter
David Christensen wrote: 
> On 3/17/23 19:25, Gregory Seidman wrote:
> > On Fri, Mar 17, 2023 at 06:05:27PM -0700, David Christensen wrote:
> > > On 3/17/23 12:36, Gregory Seidman wrote:
> > [...]
> > > > This thread has piqued my interest, because I have been lax in doing 
> > > > proper
> > > > backups. I currently run a RAID1 mirroring across three disks (plus a 
> > > > hot
> > > > spare). On top of that is LUKS, and on top of that is LVM. I keep 
> > > > meaning
> > > > to manually fail a disk then store it in a safe deposit box or 
> > > > something as
> > > > a backup, but I have not gotten around to it.
> > > > 
> > > > It sounds to me like adding an iSCSI volume (e.g. from AWS) to the RAID 
> > > > as
> > > > an additional mirror would be a way to produce the off-site backup I 
> > > > want
> > > > (and LUKS means I am not concerned about encryption in transit). It also
> > > > sounds like you're saying this is not a good backup approach. Ignoring
> > > > cost, what am I missing?
> > > > 
> > > > > Reco
> > > > --Gregory
> > > 
> > > I would not consider using a cloud device as a RAID member -- that sounds
> > > both slow and brittle.  Live data needs to be on local hardware.
> > [...]
> > 
> > Thinking about it more, that makes sense. Maybe the right approach is to
> > split the difference. I can manually fail a mirror, dd it over to an iSCSI
> > target, then re-add it.
> 
> 
> If you are serious about iSCSI, I suggest evaluating it.  Build a RAID1
> using two local disks.  Benchmark it.  Run it through various
> failure-recovery use-cases.  Then add an iSCSI volume on another host in the
> LAN, repeat the benchmarks, and repeat the failure-recovery use-cases.  Then
> add an iSCSI volume in the cloud, repeat the benchmarks, and repeat the
> failure-recovery use-cases.  I would be interested in reading the results.

I can think of a few cases where this would not be horribly
slow, but no cases where performance is likely to be smooth and
uninterrupted.


> > > On 3/17/23 13:52, Dan Ritter wrote:

> Some people have to deal with "audit" and "discovery".

Usually people know if they have to deal with audit when they
set a system up. Occasionally people know they they will be
dealing with discovery; more often it becomes a requirement
post hoc.

> I use old-school ZFS-on-Linux (ZOL), which does not have built-in
> encryption.  So, I encrypt each partition below ZFS.  A pool with many
> partitions could multiply the CPU cryptographic workload.  I make sure to
> buy processors with AES-NI.
> 
> 
> The Debian stable zfs-dkms package is the newer OpenZFS.  It may support
> built-in encryption.  I do not know how the cryptographic efficiency
> compares.

It does. Efficiency is similar.

-dsr-



Re: RAID1 + iSCSI as backup (was Re: More RAID weirdness: external RAID over network)

2023-03-18 Thread David Christensen

On 3/17/23 19:25, Gregory Seidman wrote:

On Fri, Mar 17, 2023 at 06:05:27PM -0700, David Christensen wrote:

On 3/17/23 12:36, Gregory Seidman wrote:

[...]

This thread has piqued my interest, because I have been lax in doing proper
backups. I currently run a RAID1 mirroring across three disks (plus a hot
spare). On top of that is LUKS, and on top of that is LVM. I keep meaning
to manually fail a disk then store it in a safe deposit box or something as
a backup, but I have not gotten around to it.

It sounds to me like adding an iSCSI volume (e.g. from AWS) to the RAID as
an additional mirror would be a way to produce the off-site backup I want
(and LUKS means I am not concerned about encryption in transit). It also
sounds like you're saying this is not a good backup approach. Ignoring
cost, what am I missing?


Reco

--Gregory


I would not consider using a cloud device as a RAID member -- that sounds
both slow and brittle.  Live data needs to be on local hardware.

[...]

Thinking about it more, that makes sense. Maybe the right approach is to
split the difference. I can manually fail a mirror, dd it over to an iSCSI
target, then re-add it.



If you are serious about iSCSI, I suggest evaluating it.  Build a RAID1 
using two local disks.  Benchmark it.  Run it through various 
failure-recovery use-cases.  Then add an iSCSI volume on another host in 
the LAN, repeat the benchmarks, and repeat the failure-recovery 
use-cases.  Then add an iSCSI volume in the cloud, repeat the 
benchmarks, and repeat the failure-recovery use-cases.  I would be 
interested in reading the results.




On 3/17/23 13:52, Dan Ritter wrote:

Three different things:

resiliency in the face of storage failure: RAID.


And what I'm really trying to achieve is resiliency in the face of all the
drives failing, e.g. due to a fire or other catastrophe.



I assume you mean "all the drives failing in one computer".


I assume a cloud iSCSI volume is on RAID, so you should only need one 
(unless you are worried about the vendor).




restoration of files that were recently deleted: snapshots.


I don't have automated LVM snapshotting set up, but I could and probably
should. That would cover that use case.



STFW LVM snapshots differ from ZFS snapshots:

1.  ZFS snapshots are read-only.

2.  All of the snapshots for a given ZFS filesystem are automatically 
mounted in a hidden, known subdirectory under the filesystem mount point 
-- .zfs/snapshot.  This makes it easy to retrieve, compare, restore 
from, etc., prior copies of files and/or directories using standard 
userland tools.


3.  A ZFS dataset (filesystem or volume) can be rolled back to a prior 
snapshot, discarding all changes made to the dataset and destroying any 
intermediate snapshots, bookmarks, and/or clones.




complete restoration of a filesystem: backup.


This can be achieved with the same off-site full-disk backup.



I would think recovery of a RAID1 with an off-site iSCSI member would 
involve building a new RAID1 based upon that off-site iSCSI member (?).




(and technically, a fourth: complete restoration of points in
time: archives).


That isn't a use case I've considered, and I don't think it's a use case I
have.



Some people have to deal with "audit" and "discovery".



* Check-summing filesystems (I prefer ZFS-on-Linux).

[...]

With four disks, the OP could use two in a ZFS mirror for live data, use
zfs-auto-snapshot for user-friendly recovery, and use the other two
individually as on-site and off-site backup media.


I do like the checksumming ZFS offers. The main reason I haven't switched
to ZFS, aside from already having a working setup with RAID/LUKS/LVM and
not wanting to fix what isn't broken, is that ZFS encryption is per volume
instead of the entire pool overall. That means that I either need to create
an encrypted ZFS volume for each of my existing LVM filesystems,
multiplying the hassle of unlocking them all, or I need to create a single
encrypted ZFS volume and put LVM on top of it. Is there a better way?


David

--Gregory



If you use ZFS, you will not need mdadm, LVM, ext4, etc..


I use old-school ZFS-on-Linux (ZOL), which does not have built-in 
encryption.  So, I encrypt each partition below ZFS.  A pool with many 
partitions could multiply the CPU cryptographic workload.  I make sure 
to buy processors with AES-NI.



The Debian stable zfs-dkms package is the newer OpenZFS.  It may support 
built-in encryption.  I do not know how the cryptographic efficiency 
compares.



David



Re: RAID1 + iSCSI as backup (was Re: More RAID weirdness: external RAID over network)

2023-03-17 Thread Gregory Seidman
On Fri, Mar 17, 2023 at 06:05:27PM -0700, David Christensen wrote:
> On 3/17/23 12:36, Gregory Seidman wrote:
[...]
> > This thread has piqued my interest, because I have been lax in doing proper
> > backups. I currently run a RAID1 mirroring across three disks (plus a hot
> > spare). On top of that is LUKS, and on top of that is LVM. I keep meaning
> > to manually fail a disk then store it in a safe deposit box or something as
> > a backup, but I have not gotten around to it.
> > 
> > It sounds to me like adding an iSCSI volume (e.g. from AWS) to the RAID as
> > an additional mirror would be a way to produce the off-site backup I want
> > (and LUKS means I am not concerned about encryption in transit). It also
> > sounds like you're saying this is not a good backup approach. Ignoring
> > cost, what am I missing?
> > 
> > > Reco
> > --Gregory
> 
> I would not consider using a cloud device as a RAID member -- that sounds
> both slow and brittle.  Live data needs to be on local hardware.
[...]

Thinking about it more, that makes sense. Maybe the right approach is to
split the difference. I can manually fail a mirror, dd it over to an iSCSI
target, then re-add it.

> On 3/17/23 13:52, Dan Ritter wrote:
> > Three different things:
> >
> > resiliency in the face of storage failure: RAID.

And what I'm really trying to achieve is resiliency in the face of all the
drives failing, e.g. due to a fire or other catastrophe.

> > restoration of files that were recently deleted: snapshots.

I don't have automated LVM snapshotting set up, but I could and probably
should. That would cover that use case.

> > complete restoration of a filesystem: backup.

This can be achieved with the same off-site full-disk backup.

> > (and technically, a fourth: complete restoration of points in
> > time: archives).

That isn't a use case I've considered, and I don't think it's a use case I
have.

[...]
> > -dsr-
[...]
> I would add:
> 
> * ECC memory.

In place, yes.

> * Check-summing filesystems (I prefer ZFS-on-Linux).
[...]
> With four disks, the OP could use two in a ZFS mirror for live data, use
> zfs-auto-snapshot for user-friendly recovery, and use the other two
> individually as on-site and off-site backup media.

I do like the checksumming ZFS offers. The main reason I haven't switched
to ZFS, aside from already having a working setup with RAID/LUKS/LVM and
not wanting to fix what isn't broken, is that ZFS encryption is per volume
instead of the entire pool overall. That means that I either need to create
an encrypted ZFS volume for each of my existing LVM filesystems,
multiplying the hassle of unlocking them all, or I need to create a single
encrypted ZFS volume and put LVM on top of it. Is there a better way?

> David
--Gregory



Re: RAID1 + iSCSI as backup (was Re: More RAID weirdness: external RAID over network)

2023-03-17 Thread David Christensen

On 3/17/23 12:36, Gregory Seidman wrote:

On Fri, Mar 17, 2023 at 06:00:46PM +0300, Reco wrote:
[...]

PS There's that old saying, "RAID is not a substitute for a backup".
What you're trying to do sounds suspiciously similar to an old "RAID
split-mirror" backup technique. Just saying.


This thread has piqued my interest, because I have been lax in doing proper
backups. I currently run a RAID1 mirroring across three disks (plus a hot
spare). On top of that is LUKS, and on top of that is LVM. I keep meaning
to manually fail a disk then store it in a safe deposit box or something as
a backup, but I have not gotten around to it.

It sounds to me like adding an iSCSI volume (e.g. from AWS) to the RAID as
an additional mirror would be a way to produce the off-site backup I want
(and LUKS means I am not concerned about encryption in transit). It also
sounds like you're saying this is not a good backup approach. Ignoring
cost, what am I missing?


Reco

--Gregory



I would not consider using a cloud device as a RAID member -- that 
sounds both slow and brittle.  Live data needs to be on local hardware.



I have considered putting an encrypted filesystem on top of a cloud 
volume -- but, that sounds brittle; both for live data and for backups.



I have put encrypted tarballs in cloud filesystems (e.g. archives) -- 
KISS; I like it.



On 3/17/23 13:52, Dan Ritter wrote:
> Three different things:
>
> resiliency in the face of storage failure: RAID.
>
> restoration of files that were recently deleted: snapshots.
>
> complete restoration of a filesystem: backup.
>
> (and technically, a fourth: complete restoration of points in
> time: archives).
>
> You can combine the approaches, but they can only be substituted
> in particular directions. A RAID alone doesn't give you
> protection against deleted files (or deleted filesystems), which
> is what a backup is for.
>
> -dsr-


+1


I would add:

* ECC memory.

* Check-summing filesystems (I prefer ZFS-on-Linux).

* Multiple backup media in rotation.

* Another computer for taking backups, doing restores, etc..


With four disks, the OP could use two in a ZFS mirror for live data, use 
zfs-auto-snapshot for user-friendly recovery, and use the other two 
individually as on-site and off-site backup media.



David



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Tim Woodall

On Fri, 17 Mar 2023, Nicolas George wrote:


Nicolas George (12023-03-17):

It is not vagueness, it is genericness: /dev/something is anything and
contains anything, and I want a solution that works for anything.


Just to be clear: I KNOW that what I am asking, the ability to
synchronize an existing block device onto another over the network with
only minimal downtime before and after, is possible, because I would be
capable of implementing it if I had a few hundreds hours on my hands and
nothing better to do with them.

What I am really asking is if the tool to do it already exists in some
way. Because I do not have a few hundreds hours on my hands, and if I
had, I would spend them finishing AVWriter, writing my TODO list
manager, implementing a re-connecting SOCKS proxy and a server for user
applications over websockets (and re-reading Tolkien and Mistborn and
Terre d'Ange).



Yes. It's possible. Took me about 5 minutes to work out the steps. All
of which are already mentioned upthread.

This is a very quick hacked together proof of concept script. Requires root to
run and you're on your own if it totally trashes something.

#!/bin/bash

md=/dev/md123

dd if=/dev/zero bs=1k count=10k of=dsk.1
dd if=/dev/zero bs=1k count=10k of=dsk.2

d1=$( losetup -f )
losetup ${d1} dsk.1

d2=$( losetup -f )
losetup ${d2} dsk.2

mke2fs -j ${d1}

echo "Mounting ${d1}"
mount ${d1} /mnt/fred
dd if=/dev/random of=/mnt/fred/signature bs=1k count=10k
ls -al /mnt/fred
umount /mnt/fred

mdadm --build ${md} --level=raid1 --raid-devices=2 ${d1} missing

echo "Mounting single disk raid"
mount ${md} /mnt/fred
ls -al /mnt/fred

mdadm ${md} --add ${d2}

sleep 10
echo "Done sleeping - sync had better be done!"

mdadm ${md} --fail ${d2}
mdadm ${md} --remove ${d2}

echo "Mounting ${d2}"
mount ${d2} /mnt/root
ls -al /mnt/root

diff /mnt/fred/signature /mnt/root/signature && echo "Comparison passed"

umount /mnt/root

umount /mnt/fred
mdadm --stop ${md}

losetup -d ${d1}
losetup -d ${d2}

exit 0

And here is the output:
# ./test.sh
10240+0 records in
10240+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0588102 s, 178 MB/s
10240+0 records in
10240+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0588907 s, 178 MB/s
mke2fs 1.44.5 (15-Dec-2018)
Discarding device blocks: done
Creating filesystem with 10240 1k blocks and 2560 inodes
Filesystem UUID: a5fc6516-d7ec-488e-a9be-6d01fc153c54
Superblock backups stored on blocks:
8193

Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

Mounting /dev/loop1
dd: error writing '/mnt/fred/signature': No space left on device
8755+0 records in
8754+0 records out
8964096 bytes (9.0 MB, 8.5 MiB) copied, 0.080614 s, 111 MB/s
total 8804
drwxr-xr-x  3 root root1024 Mar 17 22:39 .
drwxr-xr-x 20 root root1024 Oct  9 17:11 ..
drwx--  2 root root   12288 Mar 17 22:39 lost+found
-rw-r--r--  1 root root 8964096 Mar 17 22:39 signature
mdadm: array /dev/md123 built and started.
Mounting single disk raid
total 8804
drwxr-xr-x  3 root root1024 Mar 17 22:39 .
drwxr-xr-x 20 root root1024 Oct  9 17:11 ..
drwx--  2 root root   12288 Mar 17 22:39 lost+found
-rw-r--r--  1 root root 8964096 Mar 17 22:39 signature
mdadm: hot added /dev/loop2
Done sleeping - sync had better be done!
mdadm: set /dev/loop2 faulty in /dev/md123
mdadm: hot removed /dev/loop2 from /dev/md123
Mounting /dev/loop2
total 8804
drwxr-xr-x  3 root root1024 Mar 17 22:39 .
drwxr-xr-x 20 root root1024 Oct  9 17:11 ..
drwx--  2 root root   12288 Mar 17 22:39 lost+found
-rw-r--r--  1 root root 8964096 Mar 17 22:39 signature
Comparison passed
mdadm: stopped /dev/md123



So I have a full disk, I build it into an array and remount it. I then
add a spare disk, let the sync complete, fail and remove the disk and
then compare the random data file still on the mounted raid with the
file on the failed disk.

It depends on the sync completing in 10 seconds - otherwise you won't
get that "Comparison passed" - but given that this is 10MB on an SSD it
takes a fraction of a second to sync.

Tim.



Re: RAID1 + iSCSI as backup (was Re: More RAID weirdness: external RAID over network)

2023-03-17 Thread Dan Ritter
Gregory Seidman wrote: 
> On Fri, Mar 17, 2023 at 06:00:46PM +0300, Reco wrote:
> [...]
> > PS There's that old saying, "RAID is not a substitute for a backup".
> > What you're trying to do sounds suspiciously similar to an old "RAID
> > split-mirror" backup technique. Just saying.

...


> It sounds to me like adding an iSCSI volume (e.g. from AWS) to the RAID as
> an additional mirror would be a way to produce the off-site backup I want
> (and LUKS means I am not concerned about encryption in transit). It also
> sounds like you're saying this is not a good backup approach. Ignoring
> cost, what am I missing?


Three different things:

resiliency in the face of storage failure: RAID.

restoration of files that were recently deleted: snapshots.

complete restoration of a filesystem: backup.

(and technically, a fourth: complete restoration of points in
time: archives).

You can combine the approaches, but they can only be substituted
in particular directions. A RAID alone doesn't give you
protection against deleted files (or deleted filesystems), which
is what a backup is for.

-dsr-



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Nicolas George
Nicolas George (12023-03-17):
> It is not vagueness, it is genericness: /dev/something is anything and
> contains anything, and I want a solution that works for anything.

Just to be clear: I KNOW that what I am asking, the ability to
synchronize an existing block device onto another over the network with
only minimal downtime before and after, is possible, because I would be
capable of implementing it if I had a few hundreds hours on my hands and
nothing better to do with them.

What I am really asking is if the tool to do it already exists in some
way. Because I do not have a few hundreds hours on my hands, and if I
had, I would spend them finishing AVWriter, writing my TODO list
manager, implementing a re-connecting SOCKS proxy and a server for user
applications over websockets (and re-reading Tolkien and Mistborn and
Terre d'Ange).

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature


Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Nicolas George
Greg Wooledge (12023-03-17):
> > I have a block device on the local host /dev/something with data on it.
   ^^^

There. I have data, therefore, any solution that assumes the data is not
there can only be proposed by somebody who did not read carefully.

> 
> And so on.  In fact, I don't see the word "btrfs" anywhere in the email.

Indeed.

> Oh and hey, as long as we're being meta here, why would you write a
> sentence like:
> 
> > I have a block device on the local host /dev/something with data on it.
> 
> Why wouldn't you give the actual NAME of the block device, and say what
> type of file system is on it?  Assuming it's a file system, and not a
> raw partition used for swap, or an Oracle database, or glob knows what.
> This vagueness serves no purpose at all.

It is not vagueness, it is genericness: /dev/something is anything and
contains anything, and I want a solution that works for anything.

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature


RAID1 + iSCSI as backup (was Re: More RAID weirdness: external RAID over network)

2023-03-17 Thread Gregory Seidman
On Fri, Mar 17, 2023 at 06:00:46PM +0300, Reco wrote:
[...]
> PS There's that old saying, "RAID is not a substitute for a backup".
> What you're trying to do sounds suspiciously similar to an old "RAID
> split-mirror" backup technique. Just saying.

This thread has piqued my interest, because I have been lax in doing proper
backups. I currently run a RAID1 mirroring across three disks (plus a hot
spare). On top of that is LUKS, and on top of that is LVM. I keep meaning
to manually fail a disk then store it in a safe deposit box or something as
a backup, but I have not gotten around to it.

It sounds to me like adding an iSCSI volume (e.g. from AWS) to the RAID as
an additional mirror would be a way to produce the off-site backup I want
(and LUKS means I am not concerned about encryption in transit). It also
sounds like you're saying this is not a good backup approach. Ignoring
cost, what am I missing?

> Reco
--Gregory



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Greg Wooledge
On Fri, Mar 17, 2023 at 05:01:57PM +0100, Nicolas George wrote:
> Dan Ritter (12023-03-17):
> > If Reco didn't understand your question, it's because you are
> > very light on details.
> 
> No. Reco's answers contradict the very first sentence of my first
> e-mail.

The first sentence of your first email
 is:

> Hi.

The second:

> Is this possible: ?

The third:

> I have a block device on the local host /dev/something with data on it.

And so on.  In fact, I don't see the word "btrfs" anywhere in the email.

So, which sentence did Reco contradict by suggesting an alternative
approach?  I'm more than a little bit confused.

Oh and hey, as long as we're being meta here, why would you write a
sentence like:

> I have a block device on the local host /dev/something with data on it.

Why wouldn't you give the actual NAME of the block device, and say what
type of file system is on it?  Assuming it's a file system, and not a
raw partition used for swap, or an Oracle database, or glob knows what.
This vagueness serves no purpose at all.



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Tim Woodall

On Fri, 17 Mar 2023, Nicolas George wrote:


Dan Ritter (12023-03-17):

If Reco didn't understand your question, it's because you are
very light on details.


No. Reco's answers contradict the very first sentence of my first
e-mail.



Is this possible?

How can Reco's answers contradict that.

Reco's answers echoed pretty much exactly what I would have said. If you
don't know how to get from those answers to exactly what you want then
you need to be a lot more precise in the requirements.

There is a mdadm mode with no metadata at all. check the --build option.
I've never used it though...

But as Reco suggested, easiest and safest is probably to make space for
the superblock.



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Nicolas George
Dan Ritter (12023-03-17):
> If Reco didn't understand your question, it's because you are
> very light on details.

No. Reco's answers contradict the very first sentence of my first
e-mail.

-- 
  Nicolas George



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Dan Ritter
Nicolas George wrote: 
> Reco (12023-03-17):
> > Well, theoretically you can use Btrfs instead.
> 
> No, I cannot. Obviously.
> 
> > What you're trying to do sounds suspiciously similar to an old "RAID
> > split-mirror" backup technique.
> 
> Absolutely not.
> 
> If you do not understand the question, it is okay to not answer.


If Reco didn't understand your question, it's because you are
very light on details.

So far you've got recommendations on how to solve your expressed
problem with mdadm and drbd. Since you aren't clear about
limitations other than minimizing the number of interruptions
and manual steps, I will also recommend these on-filesystem
level approaches:

rsync
fssync

-dsr-



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Nicolas George
Reco (12023-03-17):
> Well, theoretically you can use Btrfs instead.

No, I cannot. Obviously.

> What you're trying to do sounds suspiciously similar to an old "RAID
> split-mirror" backup technique.

Absolutely not.

If you do not understand the question, it is okay to not answer.

-- 
  Nicolas George



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Reco
On Fri, Mar 17, 2023 at 03:46:54PM +0100, Nicolas George wrote:
> Reco (12023-03-17):
> > Yes, it will destroy the contents of the device, so backup
> 
> No. If I accepted to have to rely on an extra copy of the data, I would
> not be trying to do something complicated like that.

Well, theoretically you can use Btrfs instead.
I recall that there was some way to convert ext4 to btrfs without losing
anything.
Practically, friends do not let friends to use btrfs ;)
Unless you're using SLES (*not* OpenSUSE), and do backups frequently,
then it's kind of OK.

In conclusion, implementing mdadm + iSCSI + ext4 would be probably the
best way to achieve whatever you want to do.


PS There's that old saying, "RAID is not a substitute for a backup".
What you're trying to do sounds suspiciously similar to an old "RAID
split-mirror" backup technique. Just saying.

Reco



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Nicolas George
Reco (12023-03-17):
> Yes, it will destroy the contents of the device, so backup

No. If I accepted to have to rely on an extra copy of the data, I would
not be trying to do something complicated like that.

-- 
  Nicolas George



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Reco
Hi.

On Fri, Mar 17, 2023 at 01:52:34PM +0100, Nicolas George wrote:
> Reco (12023-03-17):
> > - DRBD
> 
> That looks interesting, with “meta-disk device”.
> 
> > - MDADM + iSCSI
> 
> Maybe possible, but not the way you suggest, see below.
> 
> > - zpool attach/detach
> 
> I do not think that is an option. Can you explain how you think it can
> work?

It's similar to MDADM, but with a small bonus and a pile of drawbacks on
top of it.

Create zpool from your device.
Yes, it will destroy the contents of the device, so backup your files
beforehand, and put them back after the creation of zpool.

Use iSCSI/NBD/FCoE/NVMe (basically any network protocol that can provide
a block device to another host) to make your zpool mirrored.
This is done by zpool attach/detach commands.

Small bonus that I've mentioned earlier is "zpool resilvering"
(syncronization between mirror sides) concerns only actual data residing
in a zpool. I.e. if you have 1Tb mirrored zpool which is filled to 200Gb
you will resync 200Gb.
In comparison, mdadm RAID resync will happily read 1Tb from one drive
and write 1Tb to another *unless* you're using mdadm bitmaps.

ZFS/ZPool drawbacks are numerous and well-documented, but I'll mention a
single one - you do not fill your zpool to 100%. In fact, even 90%
capacity of zpool usually equals trouble.


> 
> > mdadm --create /dev/md0 --level=mirror --force --raid-devices=1 \
> > --metadata=1.0 /dev/local_dev missing
> > 
> > --metadata=1.0 is highly important here, as it's one of the few mdadm
> > metadata formats that keeps said metadata at the end of the device.
> 
> Well, I am sorry to report that you did not read my message carefully
> enough: keeping the metadata at the end of the device is no more an
> option than keeping it at the beginning or in the middle: there is
> already data everywhere on the device.

Not unless you know the magic trick. See below.

> Also, the mdadm command you just gave is pretty explicit that it will
> wipe the local device.

You mean, like this?

# mdadm --create /dev/md127 --level=mirror --force --raid-devices=2 \
--metadata=1.0 /dev/loop0 missing
mdadm: /dev/loop0 appears to contain an ext2fs file system
   size=1048512K  mtime=Thu Jan  1 00:00:00 1970
Continue creating array?


mdadm lies to you :) This is how it's done.

# tune2fs -l /dev/loop0 | grep 'Block count'
Block count:  262144
# resize2fs /dev/loop0 262128
resize2fs 1.46.2 (28-Feb-2021)
Resizing the filesystem on /dev/loop0 to 262128 (4k) blocks.
The filesystem on /dev/loop0 is now 262128 (4k) blocks long.
# mdadm --create /dev/md127 --level=mirror --force --raid-devices=2 \
--metadata=1.0 /dev/loop0 missing
mdadm: /dev/loop0 appears to contain an ext2fs file system
   size=1048512K  mtime=Thu Jan  1 00:00:00 1970
Continue creating array? y
mdadm: array /dev/md127 started.

# fsck -f /dev/md127
fsck from util-linux 2.36.1
e2fsck 1.46.2 (28-Feb-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md127: 11/65536 files (0.0% non-contiguous), 12955/262128 block


And the main beauty of it is that kernel will forbid you to run
"resize2fs /dev/local_dev" as long as MD array is assembled, and
"resize2fs /dev/md127" will take into the account that 16 4k blocks at
the end.

And I'm pretty sure you can reduce your filesystem by 16 4k blocks.

That --metadata=1.0 is the main part of the trick. One can easily shrink
the filesystem from its tail, but it's much harder to do the same from
its head (which you'd have to do with --metadata=1.2).

Reco



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Nicolas George
Reco (12023-03-17):
> - DRBD

That looks interesting, with “meta-disk device”.

> - MDADM + iSCSI

Maybe possible, but not the way you suggest, see below.

> - zpool attach/detach

I do not think that is an option. Can you explain how you think it can
work?

> mdadm --create /dev/md0 --level=mirror --force --raid-devices=1 \
>   --metadata=1.0 /dev/local_dev missing
> 
> --metadata=1.0 is highly important here, as it's one of the few mdadm
> metadata formats that keeps said metadata at the end of the device.

Well, I am sorry to report that you did not read my message carefully
enough: keeping the metadata at the end of the device is no more an
option than keeping it at the beginning or in the middle: there is
already data everywhere on the device.

Also, the mdadm command you just gave is pretty explicit that it will
wipe the local device.

Thanks for pointing DRBD.

-- 
  Nicolas George



Re: More RAID weirdness: external RAID over network

2023-03-17 Thread Reco
Hi.

On Fri, Mar 17, 2023 at 11:09:09AM +0100, Nicolas George wrote:
> Is this possible: ?

Actually, there are at least three ways of doing it:

- DRBD
- MDADM + iSCSI
- zpool attach/detach

But DRBD was designed with continuous replication in mind, and ZFS has
severe processor architecture restrictions, and somewhat unusual design
decisions for the filesystem storage.
So let's keep it on MDADM + iSCSI for now.


> What I want to do:
> 
> 1. Stop programs and umount /dev/something
> 
> 2. mdadm --create /dev/md0 --level=mirror --force --raid-devices=1 \
>   --metadata-file /data/raid_something /dev/something

a) Replace that with:

mdadm --create /dev/md0 --level=mirror --force --raid-devices=1 \
--metadata=1.0 /dev/local_dev missing


--metadata=1.0 is highly important here, as it's one of the few mdadm
metadata formats that keeps said metadata at the end of the device.

b) Nobody forbids you to run degraded RAID1 all the time. Saves you
unmounting and mounting again.


> → Now I have everything running again completely normally after a very
> short service interruption. But behind the scenes files operations go
> through /dev/md0 before reaching /dev/something. If I want to go back, I
> de-configure /dev/md0 and can start using /dev/something directly again.
> 
> 4. mdadm --add /dev/md0 remote:/dev/something && mdadm --grow /dev/md0 
> --raid-devices=2

And "remote:/dev/something" is merely "iscsiadm --mode node --targetname
xxx --portal remote --login".
Then add resulting block device as planned.


That assumes that "remote" runs configured iSCSI target ("tgt" in
current stable is perfectly fine for that), "local" can reach "remote"
via tcp:3260, and you do not care about data encryption for the data in
transmission.

Reco