Re: Split RAID: Proposal for archival RAID using incremental batch checksum

Greg Freemyer Mon, 24 Nov 2014 05:21:07 -0800


On November 24, 2014 1:48:48 AM EST, Anshuman Aggarwal 
<anshuman.aggar...@gmail.com> wrote:
>Sandeep,
> This isn't exactly RAID4 (only thing in common is a single parity
>disk but the data is not striped at all). I did bring it up on the
>linux-raid mailing list and have had a short conversation with Neil.
>He wasn't too excited about device mapper but didn't indicate why or
>why not.


If it was early in your proposal it may simply be he didn't understand it.

The delayed writes to the parity disk you described would have been tough for 
device mapper to manage.  It doesn't typically maintain its own longer term 
buffers, so that would have been something that might have given him concern.  
The only reason you provided was reduced wear and tear for the parity drive.

Reduced wear and tear in this case is a red herring.  The kernel already 
buffers writes to the data disk, so no need to separately buffer parity writes.

>I would like to have this as a layer for each block device on top of
>the original block devices (intercepting write requests to the block
>devices and updating the parity disk). Is device mapper the write
>interface?

I think yes, but dm and md are actually separate.  I think of dm as a subset of 
md, but if you are going to really do this you will need to learn the details 
better than I know them:

https://www.kernel.org/doc/Documentation/device-mapper/dm-raid.txt

You will need to add code to both the dm and md kernel code.

I assume you know that both mdraid (mdadm) and lvm userspace tools are used to 
manage device mapper, so you would have to add user space support to mdraid/lvm 
as well.

> What are the others? 

Well btrfs as an example incorporates a lot of raid capability into the 
filesystem.  Thus btrfs is a monolithic driver that has consumed much of the 
dm/md layer.  I can't speak to why they are doing that, but I find it 
troubling.  Having monolithic aspects to the kernel has always been something 
the Linux kernel avoided.

> Also if I don't store the metadata on
>the block device itself (to allow the block device to be unaware of
>the RAID4 on top...how would the kernel be informed of which devices
>together form the Split RAID.

I don't understand the question.

I haven't thought through the process, but with mdraid/lvm you would identify 
the physical drives as under dm control.  (mdadm for md, pvcreate for dm). Then 
configure the split raid setup.

Have you gone through the process of creating a raid5 with mdadm.  If not at 
least read a howto about it.

https://raid.wiki.kernel.org/index.php/RAID_setup

I assume you would have mdadm form your multi-disk split raid volume composed 
of all the physical disks, then use lvm commands to define the block range on 
the the first drive as a lv (logical volume).  Same for the other data drives.

Then use mkfs to put a filesystem on each lv.

The filesystem has no knowledge there is a split raid below it.  It simply 
reads/writes to the overall, device mapper is layered below it and triggers the 
required i/o calls.

Ie. For a read, it is a straight passthrough.  For a write, the old data and 
old parity have to be read in, modified, written out.  Device mapper does this 
now for raid 4/5/6, so most of the code is in place.

>Appreciate the help.
>
>Thanks,
>Anshuman

I just realized I replied to a top post.

Seriously, don't do that on kernel lists if you want to be taken seriously.  It 
immediately identifies you as unfamiliar with the kernel mailing list 
netiquette.

Greg
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Re: Split RAID: Proposal for archival RAID using incremental batch checksum

Reply via email to