Re: [Bacula-users] LTO tape performances, again...

2024-01-25 Thread Josh Fisher via Bacula-users


On 1/24/24 12:48, Marco Gaiarin wrote:

My new IBM LTO9 tape unit have a data sheet performace of:


https://www.ibm.com/docs/it/ts4500-tape-library/1.10.0?topic=performance-lto-specifications

so on worst case (compression disabled) seems to perform 400 MB/s on an LTO9
tape.


Practically on Bacula i get 70-80 MB/s. I've just:


1) followed:


https://www.bacula.org/9.6.x-manuals/en/problems/Testing_Your_Tape_Drive_Wit.html#SECTION00422000

  getting 237.7 MB/s on random data (worst case).


2) checked disk performance (data came only from local disk); i've currently
  3 servers, some perform better, some worster, but the best one have a read
disk performance pretty decent, at least 200MB/s on random access (1500 MB/s
on sequential one).



Disk that is local to the server does not mean it is local to the 
bacula-sd process or tape drive. If the connection is 1 gigabit 
Ethernet, then max rate is going to be 125 MB/s.



3) disabled data spooling, of course; as just stated, data came only from
  local disks. Enabled attribute spooling.



That is probably not what you want to do. You want the the bacula-sd 
process to spool data on its local disk so that when it is despooled to 
the tape drive it is reading only from local disk, not from a small RAM 
buffer that is being filled through a network socket. Even with a 10 G 
Ethernet network it is better to spool data for LTO tape drives, since 
the client itself might not be able to keep up with the tape drive, or 
is busy, or the network is congested, etc.






Clearly i can expect some performance penalty on Bacula and mixed files, but
really 70MB/s are slow...


What else can i tackle with?


Thanks.
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Autochangers and unload timeout...

2024-01-25 Thread Bill Arlofski via Bacula-users

On 1/24/24 10:13, Marco Gaiarin wrote:


I've reached my first tape change on my autochangers, yeah!


But...

  24-Jan 17:22 cnpve3-sd JobId 16234: [SI0202] End of Volume "AAJ661L9" at 333:49131 on 
device "LTO9Storage0" (/dev/nst0). Write of 524288 bytes got -1.
  24-Jan 17:22 cnpve3-sd JobId 16234: Re-read of last block succeeded.
  24-Jan 17:22 cnpve3-sd JobId 16234: End of medium on Volume "AAJ661L9" 
Bytes=17,846,022,566,912 Blocks=34,038,588 at 24-Jan-2024 17:22.
  24-Jan 17:22 cnpve3-sd JobId 16234: 3307 Issuing autochanger "unload Volume 
AAJ661L9, Slot 2, Drive 0" command.
  24-Jan 17:28 cnpve3-sd JobId 16234: 3995 Bad autochanger "unload Volume AAJ661L9, 
Slot 2, Drive 0": ERR=Child died from signal 15: Termination
Results=Program killed by Bacula (timeout)
  24-Jan 17:28 cnpve3-sd JobId 16234: 3304 Issuing autochanger "load Volume 
AAJ660L9, Slot 1, Drive 0" command.
  24-Jan 17:29 cnpve3-sd JobId 16234: 3305 Autochanger "load Volume AAJ660L9, Slot 
1, Drive 0", status is OK.

So, unload timeout, but subsequent load command works as expected (and
backup are continuing...).

I can safely ignore this? Better tackle with tiemout parameters on
/etc/bacula/scripts/mtx-changer.conf script?


Thanks.


Hello Marco,

It looks like the mtx command (called by the mtx-changer script) is taking more than 6 minutes to return, so the process is 
being killed.


But, it then looks like it *must* have succeeded since the load command loads a 
new tape into the now empty drive.

You can try a few things to debug this.

First, I would stop the SD, and then manually load/unload tapes into your drive 
with the mtx command:

# mtx -f /dev/tape/by-id/ status


If this shows a tape loaded in, for example, drive 0, unload it:

# mtx -f /dev/tape/by-id/ unload X 0  (where X is the slot 
reported loaded in the drive)


Then, try loading a different tape:

# mtx -f /dev/tape/by-id/ load Y 0(where Y is a slot that 
has a tape in it, of course :)


By doing these manual steps, At least you can find out how long your tape library takes for these processes, and then you can 
adjust mtx-changer.conf as Pierre explained.



Additionally, if you are feeling brave and like playing the part if guinea pig, you can try replacing the default mtx-changer 
bash/perl script in your Autochanger's "ChangerCommand" with my `mtx-changer-python.py` script. It is a drop-in replacement 
with better logging and some additional features (with more planned). It is very configurable, and logs everything very 
clearly - including mtx changer errors, etc (log level is configurable, of course).


It needs a few Python modules installed, and as far as I know very few people have even tried it (maybe no one, lol) - But I 
have been running it in the Bacula Systems lab with our tape library since this past Summer and it "just works"™


If you are even interested, you can find it in my Github account where I have shared it and a few other scripts here: 
https://github.com/waa



Best regards,
Bill

--
Bill Arlofski
w...@protonmail.com



signature.asc
Description: OpenPGP digital signature
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO tape performances, again...

2024-01-25 Thread Pierre Bernhardt

Am 25.01.24 um 10:06 schrieb Marco Gaiarin:



2) checked disk performance (data came only from local disk); i've currently
  3 servers, some perform better, some worster, but the best one have a read
disk performance pretty decent, at least 200MB/s on random access (1500 MB/s
on sequential one).


Jim Pollard on private email ask me about controllers: i've not specified,
sorry, but LTO units are connected to a specific SAS controller, not the
disks one.


I'm registred also a lower than expected write performance.
My LTO-6 drive should be handle 160 MB/s uncompressable random data.
By the way mostly bacula said after writing a sequence that the tranfer speed
is mostly round about 80 MB/s.
I've not investigated yet but normally it should go faster. The job is spooled
to /tmp and the swap is not in use. So the Transfer should be much more
faster.

My suggestion now is:

Create a big data-random file as like a spool file in /tmp.
Spool it with dd from tmp to /dev/null
Spool from /dev/random to tape
Spool from /tmp to tape

Any suggestions about bs usage or something else?

Pierre





___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Clarification on incremental files...

2024-01-25 Thread Martin Simmons
> On Wed, 24 Jan 2024 18:00:56 +0100, Marco Gaiarin said:
> 
> Suppose that setting 5 or 1 options depend on setting options; eg, it is
> totally unuseful to have:
>   Options {
> Signature = MD5
> accurate = <...>1
>   }
> 
> so, calculating MD5 and checking SHA1.

In fact, I think that will check MD5.  The implementation can only store one
type of checksum in the catlog and incremental backups just check whatever was
stored (and assume it is the same type as in the original backup).

> But options 'i' what mean? Compare THE inode number? Or the number of inodes
> of that particolar file?

'i' is the st_ino field in the stat, i.e. the number that uniquely identifies
the data for the file in the file system.  Note that there is always exactly 1
inode that references the data for each file in a UNIX file system.

> Also, 'n' mean soft or hard link? What is the interrelation between 'i' and
> 'n' options?

'n' is the st_nlinks field in the stat, i.e. the number of hard links to the
inode.  Nothing records the number of soft links.

> Anyway, i'm currently trying:
> 
>   Options {
> Signature = MD5
> accurate = pugsmcd5
>   }

I think you need to remove 'c' otherwise you will get the same results as
before when the ctime changes.

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] newbie: errors in baculum

2024-01-25 Thread Stefan G. Weichinger

seems I hit this:

https://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg72759.html

Is it advised to use bacularis instead now?


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] newbie: errors in baculum

2024-01-25 Thread Stefan G. Weichinger

Am 25.01.24 um 11:58 schrieb Stefan G. Weichinger:


"Error code: 1000

Message: Internal error. TDbCommand failed to execute the query SQL " 
SELECT conname, consrc, contype, "


Googled that, unsure about it.



I find 
https://www.mail-archive.com/search?l=bacula-users@lists.sourceforge.net=subject:%22Re%5C%3A+%5C%5BBacula%5C-users%5C%5D+Baculum+api+installs%2C+but+throws+Error+1000%22=newest=1


but the linked PR/patch seems not to match exactly ... it's 3 yrs old so 
I assume it's not valid anymore in my environment.


This is PostgreSQL-15.5 (Debian 15.5-0+deb12u1)


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO tape performances, again...

2024-01-25 Thread Marco Gaiarin


> 2) checked disk performance (data came only from local disk); i've currently
>  3 servers, some perform better, some worster, but the best one have a read
> disk performance pretty decent, at least 200MB/s on random access (1500 MB/s
> on sequential one).

Jim Pollard on private email ask me about controllers: i've not specified,
sorry, but LTO units are connected to a specific SAS controller, not the
disks one.

-- 
  ...mi dispiace solo un po' per gli svizzeri, ieri hanno giocato una
  partita di merda, oggi gli arriva bossi...(Piccia, il 27/6/2006)




___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] newbie: errors in baculum

2024-01-25 Thread Stefan G. Weichinger



greetings, bacula-users

I am a complete starter with bacula and test things in a Debian 12.4 
machine.


I can write to and read from tape already, using an autochanger ... 
looks good already.


I maybe have a "mixed" setup: at first I installed from the debian 
repos, then I found the specific bacula repos for Debian.


Which repos to use for bacula and bookworm?

I added baculum:

# cat /etc/apt/sources.list.d/baculum.list
deb http://www.bacula.org/downloads/baculum/stable-11/debian bullseye main
deb-src http://www.bacula.org/downloads/baculum/stable-11/debian 
bullseye main


and was able to set up API and web ... although in the baculum-GUI I get 
database-related errors like:


"Error code: 1000

Message: Internal error. TDbCommand failed to execute the query SQL " 
SELECT conname, consrc, contype, "


Googled that, unsure about it.

Do I maybe have a too old DB now, coming from bacula-9.6?
Is it compatible at all?

Should I somehow upgrade bacula?

Should I start from scratch?

pls advise, thanks in advance, Stefan


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Autochangers and unload timeout...

2024-01-25 Thread Pierre Bernhardt

Am 24.01.24 um 18:13 schrieb Marco Gaiarin:

  24-Jan 17:22 cnpve3-sd JobId 16234: [SI0202] End of Volume "AAJ661L9" at 333:49131 on 
device "LTO9Storage0" (/dev/nst0). Write of 524288 bytes got -1.
  24-Jan 17:22 cnpve3-sd JobId 16234: Re-read of last block succeeded.
  24-Jan 17:22 cnpve3-sd JobId 16234: End of medium on Volume "AAJ661L9" 
Bytes=17,846,022,566,912 Blocks=34,038,588 at 24-Jan-2024 17:22.
  24-Jan 17:22 cnpve3-sd JobId 16234: 3307 Issuing autochanger "unload Volume 
AAJ661L9, Slot 2, Drive 0" command.
  24-Jan 17:28 cnpve3-sd JobId 16234: 3995 Bad autochanger "unload Volume AAJ661L9, 
Slot 2, Drive 0": ERR=Child died from signal 15: Termination
Results=Program killed by Bacula (timeout)
  24-Jan 17:28 cnpve3-sd JobId 16234: 3304 Issuing autochanger "load Volume 
AAJ660L9, Slot 1, Drive 0" command.
  24-Jan 17:29 cnpve3-sd JobId 16234: 3305 Autochanger "load Volume AAJ660L9, Slot 
1, Drive 0", status is OK.

So, unload timeout, but subsequent load command works as expected (and
backup are continuing...).

In the mtx-changer.conf
You can set debug_log=1 to create a mtx.log in ~bacula home dir which should be 
/var/lib/bacula.
I'd set debug_level=100 to log everything.

Maybe the offline time is to low. In my opinion I give it simply 900 seconds to 
prevent me
from failures the drive needs more time than expected athough almost it needs 
less than 60 seconds.

offline_sleep should be 1.

By the way I'm using mtx-changer script for years untouched I found in my one 
the parameters won't
be used in the waiting loop:

# The purpose of this function to wait a maximum
#   time for the drive. It will
#   return as soon as the drive is ready, or after
#   waiting a maximum of 900 seconds.
# Note, this is very system dependent, so if you are
#   not running on Linux, you will probably need to
#   re-write it, or at least change the grep target.
#   We've attempted to get the appropriate OS grep targets
#   in the code at the top of this script.
#
wait_for_drive() {
  i=0
  while [ $i -le 900 ]; do  # Wait max 900 seconds
if mt -f $1 status 2>&1 | grep "${ready}" >/dev/null 2>&1; then
  stinit 2>/dev/null >/dev/null
  break
fi
debug $dbglvl "Device $1 - not ready, retrying..."
sleep 1
i=`expr $i + 1`
  done
}

By the way I'm not further sure that is still the state in the distributed 
mtx-changer script.
Normally I would expect in the while statement something like

   while [ ${offline_sleep -eq 1 ] && [ $i -le ${offline_time} ]; do  # Wait 
max 900 seconds

(untested)

Cheers,
Pierre




___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users