On Fri, Aug 26, 2022 at 7:26 AM Dale <rdalek1...@gmail.com> wrote:
>
> I looked into the Raspberry and the newest version, about $150 now, doesn't 
> even have SATA ports.

The Pi4 is definitely a step up from the previous versions in terms of
IO, but it is still pretty limited.  It has USB3 and gigabit, and they
don't share a USB host or anything like that, so you should get close
to full performance out of both.  The CPU is of course pretty limited,
as is RAM.  Biggest benefit is the super-low power consumption, and
that is something I take seriously as for a lot of cheap hardware that
runs 24x7 the power cost rapidly exceeds the purchase price.  I see
people buying old servers for $100 or whatever and those things will
often go through $100 worth of electricity in a few months.

How many hard drives are you talking about?  There are two general
routes to go for something like this.  The simplest and most
traditional way is a NAS box of some kind, with RAID.  The issue with
these approaches is that you're limited by the number of hard drives
you can run off of one host, and of course if anything other than a
drive fails you're offline.  The other approach is a distributed
filesystem.  That ramps up the learning curve quite a bit, but for
something like media where IOPS doesn't matter it eliminates the need
to try to cram a dozen hard drives into one host.  Ceph can also do
IOPS but you're talking 10GbE + NVMe and big bucks, and that is how
modern server farms would do it.

I'll describe the traditional route since I suspect that is where
you're going to end up.  If you only had 2-4 drives total you could
probably get away with a Pi4 and USB3 drives, but if you want
encryption or anything CPU-intensive you're probably going to
bottleneck on the CPU.  It would be fine if you're more concerned with
capacity than storage.

For more drives than that, or just to be more robust, then any
standard amd64 build will be fine.  Obviously a motherboard with lots
of SATA ports will help here.  However, that almost always is a
bottleneck on consumer gear, and the typical solution to that for SATA
is a host bus adapter.  They're expensive new, but cheap on ebay (I've
had them fail though, which is probably why companies tend to sell
them while they're still working).  They also use a ton of power -
I've measured them using upwards of 60W - they're designed for servers
where nobody seems to care.  A typical HBA can provide 8-32 SATA
ports, via mini-SAS breakout cables (one mini-SAS port can provide 4
SATA ports).  HBAs tend to use a lot of PCIe lanes - you don't
necessarily need all of them if you only have a few drives and they're
spinning disks, but it is probably easiest if you get a CPU with
integrated graphics and use the 16x slot for the HBA.  That or get a
motherboard with two large slots (they usually aren't 16x, but getting
4-8x slots on a consumer motherboard isn't super-common).

For software I'd use mdadm plus LVM.  ZFS or btrfs are your other
options, and those can run on bare metal, but btrfs is immature and
ZFS cannot be reshaped the way mdadm can, so there are tradeoffs.  If
you want to use your existing drives and don't have a backup to
restore or want to do it live, then the easiest option there is to add
one drive to the system to expand capacity.  Put mdadm on that drive
as a degraded raid1 or whatever, then put LVM on top, and migrate data
from an existing disk live over to the new one, freeing up one or more
existing drives.  Then put mdadm on those and LVM and migrate more
data onto them, and so on, until everything is running on top of
mdadm.  Of course you need to plan how you want the array to look and
have enough drives that you get the desired level of redundancy.  You
can start with degraded arrays (which is no worse than what you have
now), then when enough drives are freed up they can be added as pairs
to fill it out.

If you want to go the distributed storage route then CephFS is the
canonical solution at this point but it is RAM-hungry so it tends to
be expensive.  It is also complex, but there are ansible playbooks and
so on to manage that (though playbooks with 100+ plays in them make me
nervous).  For something simpler MooseFS or LizardFS are probably
where I'd start.  I'm running LizardFS but they've been on the edge of
death for years upstream and MooseFS licensing is apparently better
now, so I'd probably look at that first.  I did a talk on lizardfs
recently: https://www.youtube.com/watch?v=dbMRcVrdsQs

-- 
Rich

Reply via email to