RE: [gentoo-user] Synchronous writes over the network.

2021-12-28 Thread Laurence Perkins


>> -Original Message-
>> From: Rich Freeman  
>> Sent: Thursday, December 23, 2021 9:50 AM
>> To: gentoo-user@lists.gentoo.org
>> Subject: Re: [gentoo-user] Synchronous writes over the network.
>> 
>> On Thu, Dec 23, 2021 at 12:39 PM Mark Knecht  wrote:
>> >
>> > I'll respond to Rich's points in a bit but on this point I think 
>> > you're both right - new SSDs are very very reliable and I'm not overly 
>> > worried, but it seems a given that forcing more and more writes to an 
>> > SSD has to up the probability of a failure at some point. Zero writes 
>> > is almost no chance of failure, trillions of writes eventually wears 
>> > something out.
>> >
>> 
>> Every SSD has a rating for total writes.  This varies and the ones that cost 
>> more will get more writes (often significantly more), and wear pattern 
>> matters a great deal.  Chia fortunately seems to have died off pretty 
>> quickly but there is still a ton of data from those who were speculating on 
>> it, and they were buying high end SSDs and treating them as expendable 
>> resources - and plotting Chia is actually a fairly ideal use case as you 
>> write a few hundred GB and then you trim it all when you're done, so the 
>> entirety of the drive is getting turned over regularly.  People plotting 
>> Chia were literally going through cases of high-end SSDs due to write wear, 
>> running them until failure in a matter of weeks.
>> 
>> Obviously if you just write something and read it back constantly then wear 
>> isn't an issue.
>> 
>> Just googled the Samsung Evo 870 and they're rated to 600x their capacity in 
>> writes, for example.  If you write 600TB to the 1TB version of the drive, 
>> then it is likely to fail on you not too long after.
>> 
>> Sure, it is a lot better than it used to be, and for typical use cases I 
>> agree that they last longer than spinning disks.  However, a ZIL is not a 
>> "typical use case" as such things are measured.
>> 
>> --
>> Rich
>> 
>> 

This is also why the video surveillance industry still uses spinning rust for 
anything beyond a very minimal capacity.  

Rotating drives wear out primarily based on time run as long as you're not 
thrashing the heads all the time by running Windows (Windows 10 seems to have 
given up even trying to optimize read and write on non-SSD media.)

SSD wears out primarily based on write throughput, and anything with a large 
turnover can easily wear one out in a matter of months.

LMP


RE: [gentoo-user] Synchronous writes over the network.

2021-12-28 Thread Laurence Perkins


>> -Original Message-
>> From: Wols Lists  
>> Sent: Thursday, December 23, 2021 9:54 AM
>> To: gentoo-user@lists.gentoo.org
>> Subject: Re: [gentoo-user] Synchronous writes over the network.
>> 
>> > As always I'm interested in your comments about what works or 
>> > doesn't work about this sort of setup.
>> > 
>> My main desktop/server currently has two 4TB drives split 1TB/3TB. The two 
>> 3TB partitions are raid-5'd with a 3TB drive to give me 6TB of /home space.
>> 
>> I'm planning to buy an 8TB drive as a backup. The plan is it will go into a 
>> test-bed machine, that will be used for all sorts of stuff, but it will at 
>> least keep a copy of my data off my main machine.
>> 
>> But you get the idea. If you get two spare drives you can back up on to 
>> them. I don't know what facilities ZFS offers for sync'ing filesystems, but 
>> if you're go somewhere regularly, where you can stash a hard disk (even a 
>> shed down the bottom of the garden :-), you back up onto disk 1, swap it for 
>> disk 2, back up on to disk 1, swap it for disk 2 ...
>> 
>> AND YOUR BACKUP IS OFF SITE!
>> 
>> Cheers,
>> Wol

Data does not exist unless it exists in at least three places.  Assume your 
most recent backup will also have a problem and plan accordingly with regard to 
time and number of copies.

Back in the day the rule of thumb was that whatever losing the data would cost 
you, you should probably spend about a third of that on redundancy.

LMP


RE: [gentoo-user] Synchronous writes over the network.

2021-12-28 Thread Laurence Perkins

>> 
>> -Original Message-
>> From: Wols Lists  
>> Sent: Thursday, December 23, 2021 2:29 PM
>> To: gentoo-user@lists.gentoo.org
>> Subject: Re: [gentoo-user] Synchronous writes over the network.
>> 
>> On 23/12/2021 21:50, Mark Knecht wrote:
>> > In the case of astrophotography I will have multiple copies of the 
>> > original photos. The process of stacking the individual photos can 
>> > create gigabytes of intermediate files but as long as the originals 
>> > are safe then it's just a matter of starting over. In my 
>> > astrophotography setup I create about 50Mbyte per minute and take 
>> > pictures for hours so a set of photos coming in at 1-2GB and up to 
>> > maybe 10GB isn't uncommon. I might create 30-50GB of intermediate 
>> > files which eventually get deleted but they can reside on the server 
>> > while I'm working. None of that has to be terribly fast.
>> 
>> :-)
>> 
>> Seeing as I run lvm, that sounds a perfect use case. Create an LV, dump the 
>> files on it, when you're done unmount and delete the LV.
>> 
>> I'm thinking of pulling the same stunt with wherever gentoo dumps its build 
>> files etc. Let it build up til I think I need a clearout, then create a new 
>> lv and scrap the old one.
>> 
>> Cheers,
>> Wol
>> 
>> 

The locations where Portage drops build files, package files, and source 
archives are set in make.conf if you really want to do this.
But the build files get deleted automatically when finished unless there was an 
error or you specifically told Portage not to, and the "eclean" tool will clean 
up the stale things in the other locations without deleting stuff that's 
actually still useful.

LMP


Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Wols Lists

On 23/12/2021 21:50, Mark Knecht wrote:

In the case of astrophotography I will have multiple copies of the
original photos. The process of stacking the individual photos can
create gigabytes of intermediate files but as long as the originals
are safe then it's just a matter of starting over. In my astrophotography
setup I create about 50Mbyte per minute and take pictures for hours
so a set of photos coming in at 1-2GB and up to maybe 10GB isn't
uncommon. I might create 30-50GB of intermediate files which
eventually get deleted but they can reside on the server while I'm
working. None of that has to be terribly fast.


:-)

Seeing as I run lvm, that sounds a perfect use case. Create an LV, dump 
the files on it, when you're done unmount and delete the LV.


I'm thinking of pulling the same stunt with wherever gentoo dumps its 
build files etc. Let it build up til I think I need a clearout, then 
create a new lv and scrap the old one.


Cheers,
Wol



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Mark Knecht
On Thu, Dec 23, 2021 at 10:27 AM Rich Freeman  wrote:
>
> On Thu, Dec 23, 2021 at 11:56 AM Mark Knecht  wrote:
> >

>
> > Instead
> > of a ZIL in machine 1 the SSD becomes a ZLOG cache most likely holding
> > a cached copy of the currently active astrophotography projects.
>
> I think you're talking about L2ARC.  I don't think "ZLOG" is a thing,
> and a log device in ZFS is just another name for ZIL (since that's
> what it is - a high performance data journal).
>

Thank you. Yes, L2ARC.

> L2ARC drives don't need to be mirrored and their failure is harmless.
> They generally only improve things, but of course they do nothing to
> improve write performance - just read performance.
>
> >As always I'm interested in your comments about what works or
> > doesn't work about this sort of setup.
>
> Ultimately it all comes down to your requirements and how you use
> stuff.  What is the impact to you if you lose this real-time audio
> recording?  If you will just have to record something over again but
> that isn't a big deal, then what you're doing sounds fine to me.

Actually, no.

>   If
> you are recording stuff that is mission-critical and can't be repeated
> and you're going to lose a lot of money or reputation if you lose a
> recording, then I'd have that recording machine be pretty reliable
> which means redundant everything (server grade hardware with fault
> tolerance and RAID/etc, or split the recording onto two redundant sets
> of cheap consumer hardware).

Closer to mission critical.

When recording live music, most especially in situations with
lots of musicians, you don't want to miss a good take. In cases where
you are just capturing a band playing it's just about getting it on disk,
however in cases where you are adding to music that's already on disk,
say a vocalist singing live over the top of music the band played earlier
then having the hardware screw up a good take is really a downer.

>
> I do something similar - all the storage I care about is on
> Linux/ZFS/lizardfs with redundancy and backup.  I do process
> photos/video on a windows box on an NVMe, but that is almost never the
> only copy of my data.  I might offload media to the windows box from
> my camera, but if I lose that then I still have the camera.  I might
> do some processing on windows like generating thumbnails/etc on NVMe
> before I move it to network storage.  In the end though it goes to zfs
> on linux and gets backed up and so on.  If I need to process some
> videos I might copy data back to a windows NVMe for more performance
> if I don't want to directly spool stuff off the network, but my risks
> are pretty minimal if that goes down at any point.  And this is just
> personal stuff - I care about it and don't want to lose it, but it
> isn't going to damage my career if I lose it.  If I were dealing with
> data professionally it still wouldn't be a bad arrangement but I might
> invest in a few things differently.
>

In the case of recording audio it just gets down to how large a
project you are working on. 3 minute pop songs aren't much of an
issue. 10-20 stereo tracks at 96KHz isn't all that large. For those
the audio might fit in DRAM. However if you're working on some
wonderful 30 minute prog rock piece with 100 or more stereo tracks
it can get a lot larger but (in my mind anyway) the main desktop
machine will have some sort of M.2 and maybe it fits in there
and it gets read off hard disk before the session starts and there's
probably no problem.

I haven't given this a huge amount of worry because my current
machine does an almost perfect job with 8-9 year old technology.

In the case of astrophotography I will have multiple copies of the
original photos. The process of stacking the individual photos can
create gigabytes of intermediate files but as long as the originals
are safe then it's just a matter of starting over. In my astrophotography
setup I create about 50Mbyte per minute and take pictures for hours
so a set of photos coming in at 1-2GB and up to maybe 10GB isn't
uncommon. I might create 30-50GB of intermediate files which
eventually get deleted but they can reside on the server while I'm
working. None of that has to be terribly fast.

> Just ask yourself what hardware needs to fail for you to lose
> something you care about at any moment of time.  If you can tolerate
> the loss of just about any individual piece of hardware that's a
> pretty good first step for just about anything, and is really all you
> need for most consumer stuff.  Backups are fine as long as they're
> recent enough and you don't mind redoing work.
>
Agreed.

Thanks,
Mark



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Wols Lists

On 23/12/2021 16:56, Mark Knecht wrote:

Rich & Wols,
Thanks for the responses. I'll post a single response here. I had
thought of the need to mirror the ZIL but didn't have enough physical
disk slots in the backup machine for the 2nd SSD. I do think this is a
critical point if I was to use the ZIL at all.


Okay, how heavily are you going to hammer the server writing to it? If 
you aren't going to stress it, don't bother with the ZIL.


Based on inputs from the two of you I'm investigating a different
overall setup for my home network:

Previously - a new main desktop that holds all my data. Lots of disk
space, lots of data. All of my big data work - audio recording
sessions and astrophotography - are done on this machine. Two
__backup__ machines. Desktop machines are backed up to machine 1,
machine 1 backed up to machine 2, machine 2 eventually backed up to
some cloud service.

Now - a new desktop machine that holds audio recording data currently
being recorded and used due to real-time latency requirements.


Sounds good...

< Two new

network machines: Machine 1 would be both a backup machine as well as
a file server. The file server portion of this machine holds
astrophotography data and recorded video files. PixInsight running on
my desktop accesses and stores over the network to machine 1. Instead
of a ZIL in machine 1 the SSD becomes a ZLOG cache most likely holding
a cached copy of the currently active astrophotography projects.


Actually, it sounds like the best use of the SSD would be your working 
directory in your desktop.



Machine 1 may also run a couple of VMs over time.


Whatever :-) Just make sure that it's easy to back up! I'd be inclined 
to have a bunch of raid-5'd disks ...


 Machine 2 is a pure

backup machine of everything on Machine 1.

I'd say don't waste your money. You don't need a *third* machine. Spend 
the money on some large disk drives, an eSATA card for machine 1, and a 
hard disk docking station ...



FYI - Machine 1 will always be located close to my desktop machines
and use the 1Gb/S wired network. iperf suggests I get about 850Mb/S on
and off of Machine 1. Machine 2 will be remote and generally backed up
overnight using wireless.

As always I'm interested in your comments about what works or
doesn't work about this sort of setup.

My main desktop/server currently has two 4TB drives split 1TB/3TB. The 
two 3TB partitions are raid-5'd with a 3TB drive to give me 6TB of /home 
space.


I'm planning to buy an 8TB drive as a backup. The plan is it will go 
into a test-bed machine, that will be used for all sorts of stuff, but 
it will at least keep a copy of my data off my main machine.


But you get the idea. If you get two spare drives you can back up on to 
them. I don't know what facilities ZFS offers for sync'ing filesystems, 
but if you're go somewhere regularly, where you can stash a hard disk 
(even a shed down the bottom of the garden :-), you back up onto disk 1, 
swap it for disk 2, back up on to disk 1, swap it for disk 2 ...


AND YOUR BACKUP IS OFF SITE!

Cheers,
Wol



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Rich Freeman
On Thu, Dec 23, 2021 at 12:39 PM Mark Knecht  wrote:
>
> I'll respond to Rich's points in a bit but on this point I think
> you're both right - new SSDs are very very reliable and I'm not overly
> worried, but it seems a given that forcing more and more writes to an
> SSD has to up the probability of a failure at some point. Zero writes
> is almost no chance of failure, trillions of writes eventually wears
> something out.
>

Every SSD has a rating for total writes.  This varies and the ones
that cost more will get more writes (often significantly more), and
wear pattern matters a great deal.  Chia fortunately seems to have
died off pretty quickly but there is still a ton of data from those
who were speculating on it, and they were buying high end SSDs and
treating them as expendable resources - and plotting Chia is actually
a fairly ideal use case as you write a few hundred GB and then you
trim it all when you're done, so the entirety of the drive is getting
turned over regularly.  People plotting Chia were literally going
through cases of high-end SSDs due to write wear, running them until
failure in a matter of weeks.

Obviously if you just write something and read it back constantly then
wear isn't an issue.

Just googled the Samsung Evo 870 and they're rated to 600x their
capacity in writes, for example.  If you write 600TB to the 1TB
version of the drive, then it is likely to fail on you not too long
after.

Sure, it is a lot better than it used to be, and for typical use cases
I agree that they last longer than spinning disks.  However, a ZIL is
not a "typical use case" as such things are measured.

-- 
Rich



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Mark Knecht
On Thu, Dec 23, 2021 at 10:35 AM Wols Lists  wrote:
>
> On 23/12/2021 17:26, Rich Freeman wrote:
> > Plus it is an SSD that you're forcing a lot of writes
> > through, so that is going to increase your risk of failure at some
> > point.
>
> A lot of people can't get away from the fact that early SSDs weren't
> that good. And I won't touch micro-SD for that reason. But all the
> reports now are that a decent SSD is likely to outlast spinning rust.
>
> Cheers,
> Wol
>

I'll respond to Rich's points in a bit but on this point I think
you're both right - new SSDs are very very reliable and I'm not overly
worried, but it seems a given that forcing more and more writes to an
SSD has to up the probability of a failure at some point. Zero writes
is almost no chance of failure, trillions of writes eventually wears
something out.

Mark



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Wols Lists

On 23/12/2021 17:26, Rich Freeman wrote:

Plus it is an SSD that you're forcing a lot of writes
through, so that is going to increase your risk of failure at some
point.


A lot of people can't get away from the fact that early SSDs weren't 
that good. And I won't touch micro-SD for that reason. But all the 
reports now are that a decent SSD is likely to outlast spinning rust.


Cheers,
Wol



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Rich Freeman
On Thu, Dec 23, 2021 at 11:56 AM Mark Knecht  wrote:
>
>Thanks for the responses. I'll post a single response here. I had
> thought of the need to mirror the ZIL but didn't have enough physical
> disk slots in the backup machine for the 2nd SSD. I do think this is a
> critical point if I was to use the ZIL at all.

Yeah, I wouldn't run ZIL non-mirrored, especially if your underlying
storage is mirrored.  The whole point of sync is to sacrifice
performance for reliability, and if all it does is force the write to
the one device in the array that isn't mirrored that isn't helping.
Plus if you're doing a lot of syncs then that ZIL could have a lot of
data on it.  Plus it is an SSD that you're forcing a lot of writes
through, so that is going to increase your risk of failure at some
point.

Nobody advocates for non-mirrored ZIL, at least if your array itself
is mirrored.

> Instead
> of a ZIL in machine 1 the SSD becomes a ZLOG cache most likely holding
> a cached copy of the currently active astrophotography projects.

I think you're talking about L2ARC.  I don't think "ZLOG" is a thing,
and a log device in ZFS is just another name for ZIL (since that's
what it is - a high performance data journal).

L2ARC drives don't need to be mirrored and their failure is harmless.
They generally only improve things, but of course they do nothing to
improve write performance - just read performance.

>As always I'm interested in your comments about what works or
> doesn't work about this sort of setup.

Ultimately it all comes down to your requirements and how you use
stuff.  What is the impact to you if you lose this real-time audio
recording?  If you will just have to record something over again but
that isn't a big deal, then what you're doing sounds fine to me.  If
you are recording stuff that is mission-critical and can't be repeated
and you're going to lose a lot of money or reputation if you lose a
recording, then I'd have that recording machine be pretty reliable
which means redundant everything (server grade hardware with fault
tolerance and RAID/etc, or split the recording onto two redundant sets
of cheap consumer hardware).

I do something similar - all the storage I care about is on
Linux/ZFS/lizardfs with redundancy and backup.  I do process
photos/video on a windows box on an NVMe, but that is almost never the
only copy of my data.  I might offload media to the windows box from
my camera, but if I lose that then I still have the camera.  I might
do some processing on windows like generating thumbnails/etc on NVMe
before I move it to network storage.  In the end though it goes to zfs
on linux and gets backed up and so on.  If I need to process some
videos I might copy data back to a windows NVMe for more performance
if I don't want to directly spool stuff off the network, but my risks
are pretty minimal if that goes down at any point.  And this is just
personal stuff - I care about it and don't want to lose it, but it
isn't going to damage my career if I lose it.  If I were dealing with
data professionally it still wouldn't be a bad arrangement but I might
invest in a few things differently.

Just ask yourself what hardware needs to fail for you to lose
something you care about at any moment of time.  If you can tolerate
the loss of just about any individual piece of hardware that's a
pretty good first step for just about anything, and is really all you
need for most consumer stuff.  Backups are fine as long as they're
recent enough and you don't mind redoing work.

-- 
Rich



Re: [gentoo-user] Synchronous writes over the network.

2021-12-23 Thread Mark Knecht
On Mon, Dec 20, 2021 at 12:52 PM Rich Freeman  wrote:
>
> On Mon, Dec 20, 2021 at 1:52 PM Mark Knecht  wrote:
> >
> > I've recently built 2 TrueNAS file servers. The first (and main) unit
> > runs all the time and serves to backup my home user machines.
> > Generally speaking I (currently) put data onto it using rsync but it
> > also has an NFS mount that serves as a location for my Raspberry Pi to
> > store duplicate copies of astrophotography pictures live as they come
> > off the DSLR in the middle of the night.
> >
> > ...
> >
> > The thing is that the ZIL is only used for synchronous writes and I
> > don't know whether anything I'm doing to back up my user machines,
> > which currently is just rsync commands, is synchronous or could be
> > made synchronous, and I do not know if the NFS writes from the R_Pi
> > are synchronous or could be made so.
> >
>
> Disclaimer: some of this stuff is a bit arcane and the documentation
> isn't very great, so I could be missing a nuance somewhere.
>
> First, one of your options is to set sync=always on the zfs dataset,
> if synchronous behavior is strongly desired.  That will force ALL
> writes at the filesystem level to be synchronous.  It will of course
> also normally kill performance but the ZIL may very well save you if
> your SSD performs adequately.  This still only applies at the
> filesystem level, which may be an issue with NFS (read on).
>
> I'm not sure how exactly you're using rsync from the description above
> (rsyncd, directly client access, etc).  In any case I don't think
> rsync has any kind of option to force synchronous behavior.  I'm not
> sure if manually running a sync on the server after using rsync will
> use the ZIL or not.  If you're using sync=always then that should
> cover rsync no matter how you're doing it.
>
> Nfs is a little different as both the server-side and client-side have
> possible asynchronous behavior.  By default the nfs client is
> asynchronous, so caching can happen on the client before the file is
> even sent to the server.  This can be disabled with the mount option
> sync on the client side.  That will force all data to be sent to the
> server immediately.  Any nfs server or filesystem settings on the
> server side will not have any impact if the client doesn't transmit
> the data to the server.  The server also has a sync setting which
> defaults to on, and it additionally has another layer of caching on
> top of that which can be disabled with no_wdelay on the export.  Those
> server-side settings probably delay anything getting to the filesystem
> and so they would have precedence over any filesystem-level settings.
>
> As you can see you need to use a bit of a kill-it-with-fire approach
> to get synchronous behavior, as it traditionally performs so poorly
> that everybody takes steps to try to prevent it from happening.
>
> I'll also note that the main thing synchronous behavior protects you
> from is unclean shutdown of the server.  It has no bearing on what
> happens if a client goes down uncleanly.  If you don't expect server
> crashes it may not provide much benefit.
>
> If you're using ZIL you should consider having the ZIL mirrored, as
> any loss of the ZIL devices will otherwise cause data loss.  Use of
> the ZIL is also going to create wear on your SSD so consider that and
> your overall disk load before setting sync=always on the dataset.
> Since the setting is at the dataset level you could have multiple
> mountpoints and have a different sync policy for each.  The default is
> normal POSIX behavior which only syncs when requested (sync, fsync,
> O_SYNC, etc).
>
> --
> Rich
>

Rich & Wols,
   Thanks for the responses. I'll post a single response here. I had
thought of the need to mirror the ZIL but didn't have enough physical
disk slots in the backup machine for the 2nd SSD. I do think this is a
critical point if I was to use the ZIL at all.

   Based on inputs from the two of you I'm investigating a different
overall setup for my home network:

Previously - a new main desktop that holds all my data. Lots of disk
space, lots of data. All of my big data work - audio recording
sessions and astrophotography - are done on this machine. Two
__backup__ machines. Desktop machines are backed up to machine 1,
machine 1 backed up to machine 2, machine 2 eventually backed up to
some cloud service.

Now - a new desktop machine that holds audio recording data currently
being recorded and used due to real-time latency requirements. Two new
network machines: Machine 1 would be both a backup machine as well as
a file server. The file server portion of this machine holds
astrophotography data and recorded video files. PixInsight running on
my desktop accesses and stores over the network to machine 1. Instead
of a ZIL in machine 1 the SSD becomes a ZLOG cache most likely holding
a cached copy of the currently active astrophotography projects.
Machine 1 may also run a couple of VMs over time. Machine 2 is a pure
backup 

Re: [gentoo-user] Synchronous writes over the network.

2021-12-20 Thread Rich Freeman
On Mon, Dec 20, 2021 at 2:52 PM Wols Lists  wrote:
>
> And it might also mean blocking writes, which is why you don't want it
> on spinning rust. But it also means that it is (almost) guaranteed to
> get to permanent storage, which is why you do want it for mail,
> databases, etc.
>

The reason that mail/databases/etc use synchronous behavior isn't
because it is "almost" guaranteed to make it to storage.  The reason
they use it is because you have multiple hosts, and each host can
guarantee non-loss of data internally, but synchronous behavior is
necessary to ensure that data is not lost on a handoff.

Take a mail server.  If your SMTP connection goes down for any reason
before the server communicates that the mail was accepted then the
sender will assume the mail was not delivered, and will try again.  So
if the network goes down, or the SMTP server crashes, then the client
will cache the mail and try again.  Most mail servers will have the
data already on-disk before even attempting to deliver mail, so even
if all the computers involved go down during this handoff nothing is
lost as it is still in the client cache on-disk.

On the other hand, once the server confirms delivery then
responsibility is handed off and the client can forget about the mail.
It is important that the mail server not communicate that the mail was
received until it can guarantee that it won't lose the mail.  That is
usually accomplished by the server syncing the mail file to the
on-disk spool and blocking until that is successful before
communicating back to the client that the mail was delivered.

Database transactions behave similarly.

If the userspace application either does a write on a file opened with
O_SYNC or does an fsync system call on the file, and the system call
returns, then the data is present on-disk and will be persistent even
if the power is lost at the very next moment.  It is acceptable for a
filesystem to return the call if the data is in a persistent journal,
which is what the ZIL is, as long as it is flushed to disk.

Of course, you can still accept mail or implement a database
asynchronously, but you lose a number of data protections that are
otherwise designed into the software (well, assuming you're not
storing your data in MyISAM...).  :)

-- 
Rich



Re: [gentoo-user] Synchronous writes over the network.

2021-12-20 Thread Wols Lists

On 20/12/2021 18:52, Mark Knecht wrote:

The thing is that the ZIL is only used for synchronous writes and I
don't know whether anything I'm doing to back up my user machines,
which currently is just rsync commands, is synchronous or could be
made synchronous, and I do not know if the NFS writes from the R_Pi
are synchronous or could be made so.


"Synchronous writes" basically means "in the order they were written".

And it might also mean blocking writes, which is why you don't want it 
on spinning rust. But it also means that it is (almost) guaranteed to 
get to permanent storage, which is why you do want it for mail, 
databases, etc.


Your typical (asynchronous) app calls "write", chucks it at the kernel, 
and forgets about it. Hence "asynchronous" - "without regard to time".


Your app which has switched on synchronicity will lock until the write 
has completed.


Your understanding about the ZIL sounds about right - whatever you throw 
at the NAS will be saved to the ZIL before it gets written properly 
later. Your apps (rsync etc) don't need to worry, the kernel will cache 
stuff, flood it through to the ZIL, and the NAS will take it from there.


The only thing I'd worry about is how "bursty" is the data being chucked 
at the NAS. A backup is likely to be a stream that could easily 
overwhelm the buffers, and that's not good. Do you have an rsync daemon 
on the NAS? The more you can make the writes smaller and bursty the 
better, and running an rsync daemon is one of the ways.


Cheers,
Wol



Re: [gentoo-user] Synchronous writes over the network.

2021-12-20 Thread Rich Freeman
On Mon, Dec 20, 2021 at 1:52 PM Mark Knecht  wrote:
>
> I've recently built 2 TrueNAS file servers. The first (and main) unit
> runs all the time and serves to backup my home user machines.
> Generally speaking I (currently) put data onto it using rsync but it
> also has an NFS mount that serves as a location for my Raspberry Pi to
> store duplicate copies of astrophotography pictures live as they come
> off the DSLR in the middle of the night.
>
> ...
>
> The thing is that the ZIL is only used for synchronous writes and I
> don't know whether anything I'm doing to back up my user machines,
> which currently is just rsync commands, is synchronous or could be
> made synchronous, and I do not know if the NFS writes from the R_Pi
> are synchronous or could be made so.
>

Disclaimer: some of this stuff is a bit arcane and the documentation
isn't very great, so I could be missing a nuance somewhere.

First, one of your options is to set sync=always on the zfs dataset,
if synchronous behavior is strongly desired.  That will force ALL
writes at the filesystem level to be synchronous.  It will of course
also normally kill performance but the ZIL may very well save you if
your SSD performs adequately.  This still only applies at the
filesystem level, which may be an issue with NFS (read on).

I'm not sure how exactly you're using rsync from the description above
(rsyncd, directly client access, etc).  In any case I don't think
rsync has any kind of option to force synchronous behavior.  I'm not
sure if manually running a sync on the server after using rsync will
use the ZIL or not.  If you're using sync=always then that should
cover rsync no matter how you're doing it.

Nfs is a little different as both the server-side and client-side have
possible asynchronous behavior.  By default the nfs client is
asynchronous, so caching can happen on the client before the file is
even sent to the server.  This can be disabled with the mount option
sync on the client side.  That will force all data to be sent to the
server immediately.  Any nfs server or filesystem settings on the
server side will not have any impact if the client doesn't transmit
the data to the server.  The server also has a sync setting which
defaults to on, and it additionally has another layer of caching on
top of that which can be disabled with no_wdelay on the export.  Those
server-side settings probably delay anything getting to the filesystem
and so they would have precedence over any filesystem-level settings.

As you can see you need to use a bit of a kill-it-with-fire approach
to get synchronous behavior, as it traditionally performs so poorly
that everybody takes steps to try to prevent it from happening.

I'll also note that the main thing synchronous behavior protects you
from is unclean shutdown of the server.  It has no bearing on what
happens if a client goes down uncleanly.  If you don't expect server
crashes it may not provide much benefit.

If you're using ZIL you should consider having the ZIL mirrored, as
any loss of the ZIL devices will otherwise cause data loss.  Use of
the ZIL is also going to create wear on your SSD so consider that and
your overall disk load before setting sync=always on the dataset.
Since the setting is at the dataset level you could have multiple
mountpoints and have a different sync policy for each.  The default is
normal POSIX behavior which only syncs when requested (sync, fsync,
O_SYNC, etc).

-- 
Rich



[gentoo-user] Synchronous writes over the network.

2021-12-20 Thread Mark Knecht
I wonder if someone can help me get educated about synchronous writes
to a file server over the network? Is this something that's designed
into specific apps or is this something that I have control of at the
sys admin level?

I've recently built 2 TrueNAS file servers. The first (and main) unit
runs all the time and serves to backup my home user machines.
Generally speaking I (currently) put data onto it using rsync but it
also has an NFS mount that serves as a location for my Raspberry Pi to
store duplicate copies of astrophotography pictures live as they come
off the DSLR in the middle of the night.

The second TrueNAS machine serves to back up this first machine but
resides at the other end of the house to protect data in case of fire.
Eventually I'll probably backup all of this offsite but for now it's
two old computers and a bunch of disks.

The question about synchronous writes comes in the configuration of
TrueNAS. TrueNAS supports what it calls a ZIL (ZFS Intent Log) which
is a smaller SSD at the front end of the write data flow. The idea (as
I understand it) is that the ZIL allows writes to the server to be
cached quickly onto, in my case, an SSD, and then eventually written
to spinning drives when the system gets around to it. Once new data
arrives at the ZIL it remains until it's written and verified at which
time the entries in the ZIL are removed. The ZIL does not do anything
to speed up reads from the file server.

The thing is that the ZIL is only used for synchronous writes and I
don't know whether anything I'm doing to back up my user machines,
which currently is just rsync commands, is synchronous or could be
made synchronous, and I do not know if the NFS writes from the R_Pi
are synchronous or could be made so.

If someone can point me in the right direction in terms of reading and
study I'd appreciate it.

Thanks,
Mark