Hello,

There seems to be some differences of opinion about Bacula's spooling.

Spooling was implemented for a number of reasons:
1. Primarily to reduce tape shoe shining (particularly due to Clients
doing Incremental backups)
2. To allow multiple jobs to write to the tape in larger chunks to
improve restore times.

Note: there is no primordial reason to have a spool file as large as the
tape.  However, it is a
good idea to have the spool file as large as the "Maximum File Size"
value in the Device resource
in the bacula-sd.conf file.

When spooling is done, blocks are written as they come in to the spool
file. There is
one spool file per job in the spool directory.  When a terminates or
reaches the maximum
spool size, the spooled data will be despooled, that is read back and
written to the tape. When
despooling is being done, it is the only job that writes to the tape, so
its blocks are all contiguous
on the tape.  Other jobs may still spool while despooling is being done.

At the time that the code was initially written, the principal idea of
spooling was preventing shoe shining
because of slow input of data from the File daemons, but with faster and
faster tape drives, shoe shining
may potentially occur even with spooling if the spool device is slow. 
In fact with the most modern
drives, you will need either SSD spooling devices or some sort of RAID
to ensure that data can be
delivered from the spool device to the tape fast enough.

Best regards,
Kern

On 2/7/19 9:50 AM, Adam Nielsen wrote:
>> I have a different understanding of the function.  there is no need
>> to read "data off clients as fast as possible" - if your clients are
>> fast, they have no problems to provide the data at lower rates,
>> too.  It's the other way round; if your clients are slow so they
>> cannot feed the data fast enough to keep the tape streaming.  This
>> often happens when you run for example incremental backups over a
>> big data set (say, millions of files) with only little changes.
> It may also help in this case, but so would buffering instead of
> spooling.  I was under the impression that in the case of slow clients,
> Bacula is designed to read from many clients at the same time, so that
> it can get the throughput required for the tape without spooling.
>
>> In this case the client gets the time it needs to traverse the fole
>> directory tree, and when done, you have all date to be backed up
>> collected in the spool area which is then fast enough to kepp the
>> tape happily streaming.
> This is true, but the drawback of the spool file is that you need
> enough disk space to hold a full tape's worth of data for it to
> perform optimally.  If the spool file is not an exact multiple of the
> tape size, performance will drop.
>
>>> Consequently spooling works best when the spool file is large
>>> enough to contain one whole tape's worth of data, and you have
>>> enough clients backing up that there is always a complete spool
>>> file ready to write out to tape.  
>> This is not necessary.  Or only one possible special case.
> It's not necessary, but if you do not do this, performance will suffer
> and your tape will shoe shine.
>
>>> Anything less than this and spooling will slow things down.  
>> This is not correct, if you consider incremental and differential
>> backups.
> I am only referring to getting data from the spool file onto tape.
> Let's say you have a 100GB spool file and you are writing to an 800GB
> tape.  The process will go like this:
>
>  * Read 100GB from client, tape is idle
>  * Write 100GB to tape, pause tape
>  * Read next 100GB from client while tape is paused
>  * Start up tape again and write next 100GB
>
> Thus even if your clients can keep up with the tape 100% of the time,
> you will still introduce extra shoe shining if your spool file is not
> exactly one tape in size.
>
> (If your spool file is larger than one tape, then you will fill up one
> tape in one continuous operation which is perfect, but then the second
> tape will pause once the end of the spool file has been reached which
> is not ideal either.)
>
> So you can see that using a spool file is typically worse for a tape
> drive, as it will almost always introduce additional stop/start cycles
> (shoe shining) which would not be there otherwise, unless you have a
> very slow client.
>
> This is why in my opinion buffering is better, because a small FIFO
> buffer can read data from clients *while* writing to tape, so there is
> no extra shoe shining.  A buffer will also not harm performance if it
> is not required, however using a spool file when one is not needed will
> make performance worse.
>
> With my own experience writing data to tape using mbuffer and tar, a
> buffer of 4GB was enough to prevent all shoe shining, and it did not
> slow down the process at all.  However with Bacula, my spool file must
> be 800GB to achieve the same result, and even this makes the process
> take much longer because the tape is idle while the spool file is
> filling up the first time.
>
> I don't have 800GB available for the spool file either, which means my
> choices are:
>
>   1.  Use a smaller spool file and live with tape shoe shine.
>
>   2.  Don't use a spool file at all and live with tape shoe shine
>       caused by slow clients.
>
>   3.  Buy more disk I can't use for real storage because it must be
>       reserved for Bacula scratch space, and live with shoe shine as
>       well because tapes are never exactly 800GB.
>
>   4.  Implement buffering support in Bacula so that I can eliminate
>       shoe shining and speed up my backups, without buying new hardware.
>
> I definitely favour #4 because having support for large tape buffers in
> Bacula would provide some big performance benefits.
>
> Cheers,
> Adam.
>
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>




_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to