[Bacula-users] ongoing problems with v 9.0.3

2018-06-13 Thread Jerry Lowry
Hi,

Each time I have to change a disk during my offsite backups I get errors
from the job that is running and it fails. My storage and pool definitions
follow below, they have not changed for the last 10 years and have been
working without any error or problems up until I upgraded to 9.0.3 of
bacula and migrated the database to MariaDB 10.2.8-1.
If any other configuration files are needed I can add them.  I loose data
on each of these backups because of these errors.

Any help with this would be great,

thanks,
jerry

# Definition of file storage device
Storage {
  Name = midswap# offsite disk
# Do not use "localhost" here
  #Address = kilchis# N.B. Use a fully qualified name here
  Address = kilchis  # N.B. Use a fully qualified name here
  SDPort = 9103
  Password = ""
  Device = MidSwap
  Media Type = File
}
# File Pool definition
Pool {
  Name = OffsiteMid
  Pool Type = Copy
  Next Pool = OffsiteMid
  Storage = midswap
  Recycle = yes   # Bacula can automatically recycle
Volumes
  AutoPrune = yes # Prune expired volumes
  Volume Retention = 30 years # thirty years
  Maximum Volume Bytes = 1800G   # Limit Volume to disk size
  Maximum Volumes = 10   # Limit number of Volumes in Pool
}

---

emails sent at disk full message:


13-Jun 17:52 kilchis JobId 37853: Job BackupUsers.2018-06-12_23.47.07_32 is
waiting. Cannot find any appendable volumes.

Please use the "label" command to create a new Volume for:

Storage:  "MidSwap" (/MidSwap)

Pool: OffsiteMid

Media type:   File


13-Jun 17:52 kilchis JobId 37851: Fatal error: Out of freespace caused End
of Volume "homeMS-5" at 981661189531 on device "MidSwap" (/MidSwap). Write
of 64512 bytes got 10853.

13-Jun 17:52 kilchis JobId 37851: Elapsed time=02:59:41, Transfer
rate=67.40 M Bytes/second

12-Jun 23:47 kilchis-dir JobId 37850: Copying using JobId=37780
Job=BackupUsers.2018-06-09_20.05.00_18

13-Jun 14:52 kilchis-dir JobId 37850: Start Copying JobId 37850,
Job=CopyHMDiskToDisk.2018-06-12_23.47.07_29

13-Jun 14:52 kilchis-dir JobId 37850: Using Device "Home" to read.

13-Jun 14:52 kilchis JobId 37850: Ready to read from volume "home-6" on
File device "Home" (/engineering/Home).

13-Jun 14:52 kilchis JobId 37850: Forward spacing Volume "home-6" to
addr=824369125834 13-Jun 17:39 kilchis JobId 37850: End of Volume "home-6"
at addr=1503238496266 on device "Home" (/engineering/Home).

13-Jun 17:39 kilchis JobId 37850: Ready to read from volume "home-7" on
File device "Home" (/engineering/Home).

13-Jun 17:39 kilchis JobId 37850: Forward spacing Volume "home-7" to
addr=215 13-Jun 17:52 kilchis JobId 37850: Error: bsock.c:649 Write error
sending 65540 bytes to client:10.20.10.21:9103: ERR=Connection reset by
peer 13-Jun 17:52 kilchis JobId 37850: Fatal error: read.c:277 Error
sending to File daemon. ERR=Connection reset by peer 13-Jun 17:52 kilchis
JobId 37850: Elapsed time=02:59:42, Transfer rate=67.39 M Bytes/second
13-Jun 17:52 kilchis JobId 37850: Error: bsock.c:537 Socket has errors=1 on
call to client:10.20.10.21:9103 13-Jun 17:52 kilchis JobId 37850: Error:
bsock.c:537 Socket has errors=1 on call to client:10.20.10.21:9103 13-Jun
17:52 kilchis-dir JobId 37850: Error: Bacula kilchis-dir 9.0.6 (20Nov17):




Build OS:   x86_64-pc-linux-gnu redhat

  Prev Backup JobId:  37780

  Prev Backup Job:BackupUsers.2018-06-09_20.05.00_18

  New Backup JobId:   37851

  Current JobId:  37850

  Current Job:CopyHMDiskToDisk.2018-06-12_23.47.07_29

  Backup Level:   Full

  Client: kilchis-fd

  FileSet:"Mid Set" 2011-04-11 13:13:32

  Read Pool:  "HomePool" (From Command input)

  Read Storage:   "home" (From Job resource)

  Write Pool: "OffsiteMid" (From Command input)

  Write Storage:  "midswap" (From Command input)

  Catalog:"MyCatalog" (From Client resource)

  Start time: 13-Jun-2018 14:52:20

  End time:   13-Jun-2018 17:52:04

  Elapsed time:   2 hours 59 mins 44 secs

  Priority:   10

  SD Files Written:   1,784,587

  SD Bytes Written:   726,665,971,203 (726.6 GB)

  Rate:   67383.7 KB/s

  Volume name(s): homeMS-5

  Volume Session Id:  82

  Volume Session Time:1528397911

  Last Volume Bytes:  981,661,189,531 (981.6 GB)

  SD Errors:  3

  SD termination status:  Error

  Termination:*** Copying Error ***
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-use

Re: [Bacula-users] Trying to restore, but can't

2018-06-13 Thread Ryan Butler
I did some looking into the index thing and found that I had some extra indices 
on the File table that apparently were crippling performance.

These two pages led me in the right direction:

http://www.bacula.org/7.0.x-manuals/en/main/Catalog_Maintenance.html#SECTION004392000

http://wiki.bacula.org/doku.php?id=faq#restore_takes_a_long_time_to_retrieve_sql_results_from_mysql_catalog

Once I dropped the extra indices, the "Building directory tree" step took about 
1 minute (even with 3 million files, even with only 16GB RAM).

-ryan

From: Josh Fisher 
Sent: Wednesday, June 13, 2018 3:59 AM
To: Ryan Butler ; bacula-users@lists.sourceforge.net
Subject: Re: [Bacula-users] Trying to restore, but can't


On 6/12/2018 12:35 PM, Ryan Butler wrote:
Hello all,

After a failed upgrade to one of our organization's web apps, I'm trying to 
restore the web app directory from Bacula. I went into bconsole, restore, 
option 3 (enter jobid), and have been sitting on "Building directory tree for 
JobId(s) 56810 ..." for the past 12 hours. The MySQL process is using 100% CPU 
on one core, but other than that, I have no indication that anything is 
happening. And this is something that I need to figure out why it is happening, 
but what I really need right now is to just get the web app restored so users 
don't start getting angry.

100% CPU on one core with MySQL is often due to a missing index.



I really just need to restore a single directory (and all of its files and 
subdirectories), so I've been looking at option 11 (enter a list of directories 
to restore for jobid), and it looks like I'd have to manually specify all the 
subdirectories, which is doable. However, would that option also need to "Build 
directory tree", and therefore I'd be back at the original problem?

Well, option 11 should use a different query than option 3, so if the problem 
is a missing index, then option 11 may well work better than option 3.



We're on version 7.0.5, which I know is old, and I have plans to update it 
after this disaster.

I suggest moving the catalog to postgres on SSD storage, while you're at it.



Here's the details on that particular Job:

JobFiles: 2,927,789
JobBytes: 54,769,194,913 (54 GB)

Any help is appreciated!!!

Ryan Butler
Systems Administrator

[/Users/ryan_butler/Library/Containers/com.microsoft.Outlook/Data/Library/Caches/Signatures/signature_1633926894]





--

Check out the vibrant tech community on one of the world's most

engaging tech sites, Slashdot.org! http://sdm.link/slashdot




___

Bacula-users mailing list

Bacula-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/bacula-users

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Trying to restore, but can't

2018-06-13 Thread Dimitri Maziuk via Bacula-users

On 6/13/2018 5:59 AM, Josh Fisher wrote:


I suggest moving the catalog to postgres on SSD storage, while you're at it.


Mine's on spinning rust and it still works OK given enough RAM.

I think another issue here is the expectations: we ran a one-CPU 
(relatively) low-RAM test server for a while and saw that in testing. So 
when we started using bacula in production we already knew restores 
could take a while.


(And ditch mysql: allegedly it can be tuned up for tolerable 
performance, but why bother when there's postgres.)


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Why are copy jobs creating incremental jobs?

2018-06-13 Thread Mariusz Mazur
Hi, I'm running bacula 7.4.7 and have a job to occasionally copy recent
full jobs from my full pool to my tapes (Full-Poll -> Tape-Pool).

The gist of the copy job is here:

Name = CopyFull2Tape
Type = Copy
Level = Full
Pool = Full-Pool
Selection Type = SQL Query
Selection Pattern = "
select max(j.jobid) from job j, pool p where
p.name='Full-Pool' and j.poolid=p.poolid and
j.jobstatus='T' and j.type='B' and j.level='F' and j.jobbytes>0 and
starttime>now()-'3 weeks'::interval
group by j.name;"

So I'm explicitly only copying completed full jobs. And yet, the director
gives me this:

27888  Copy Full  0 0  CopyFull2Tape is waiting on max
Storage jobs
27902  Copy Full  0 0  CopyFull2Tape is running
27903  Back Incr  0 0  ca2-regular   is running
27904  Copy Full  0 0  CopyFull2Tape is waiting on max
Storage jobs
27905  Back Incr  0 0  db5-regular   is waiting
execution
27906  Copy Full  0 0  CopyFull2Tape is waiting on max
Storage jobs
27907  Back Incr  0 0  dbc1n1-cfgis waiting
execution
27908  Copy Full  0 0  CopyFull2Tape is waiting on max
Storage jobs

What are those 'Back Incr' jobs? It's confusing.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Trying to restore, but can't

2018-06-13 Thread Josh Fisher


On 6/12/2018 12:35 PM, Ryan Butler wrote:


Hello all,

After a failed upgrade to one of our organization’s web apps, I’m 
trying to restore the web app directory from Bacula. I went into 
bconsole, restore, option 3 (enter jobid), and have been sitting on 
“Building directory tree for JobId(s) 56810 …” for the past 12 hours. 
The MySQL process is using 100% CPU on one core, but other than that, 
I have no indication that anything is happening. And this is something 
that I need to figure out why it is happening, but what I really need 
right now is to just get the web app restored so users don’t start 
getting angry.




100% CPU on one core with MySQL is often due to a missing index.

I really just need to restore a single directory (and all of its files 
and subdirectories), so I’ve been looking at option 11 (enter a list 
of directories to restore for jobid), and it looks like I’d have to 
manually specify all the subdirectories, which is doable. However, 
would that option also need to “Build directory tree”, and therefore 
I’d be back at the original problem?




Well, option 11 should use a different query than option 3, so if the 
problem is a missing index, then option 11 may well work better than 
option 3.


We’re on version 7.0.5, which I know is old, and I have plans to 
update it after this disaster.




I suggest moving the catalog to postgres on SSD storage, while you're at it.


Here’s the details on that particular Job:

JobFiles: 2,927,789

JobBytes: 54,769,194,913 (54 GB)

Any help is appreciated!!!

*Ryan Butler*

Systems Administrator

/Users/ryan_butler/Library/Containers/com.microsoft.Outlook/Data/Library/Caches/Signatures/signature_1633926894 





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users