Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread Marco van Wieringen
Felix Schwarz  oss.schwarz.eu> writes:

> 
> > First of all sqlite is just a proof of concept and should not be
> > used for production use a proper database like mysql or postgresql.
> 
> I feared as much. I like sqlite for "simple databases" (and I figured Bacula's
> DB would be one of those) as it is completely maintenance-free. Well, I
> switched to PostgreSQL 8.4 now.
Still a seriously old postgresql version but better then sqlite. There is
nothing wrong with Sqlite but it just not very good in bacula for having lots
of files. 

> 
> Still a full backup after a base backup does not work for me... With Postgres
> it's a bit different though:
> I see that the fd sends some data back but after some minutes the fd goes
> somehow into "idle":
>  - backup is still running according to bconsole (for dir+sd+fd)
>  - almost no CPU/IO on dir/sd/fd anymore
> 
> strace on the client (Bacula 5.2.11 on Fedora 17) shows that it just does some
> writes until it fails with EAGAIN and then waits in a select again. The client
> doesn't seem to recover.
> 
strace is about the worst way of determining what is going on. If you really
want to know what is going on I would say run the fd/sd/dir with a -f -d 100
to debug it.

Marco


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread Felix Schwarz
Am 05.11.2012 15:20, schrieb Felix Schwarz:
> Btw: What does 'Spool Attributes = yes/no' do? I wasn't able to find much info
> in the docs on that.

Scratch that, I found it ("SpoolAttributes") after I remembered that white
spaces are not significant in the option names.

fs


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread Felix Schwarz
Am 05.11.2012 11:12, schrieb Marco van Wieringen:
> Looks like its busy updating the sqlite database with backup info.
> You probably have configured the system with attribute spooling which
> means that after the backup is done the director needs to update the 
> database with all the file data to be able to restore your backup.

Btw: What does 'Spool Attributes = yes/no' do? I wasn't able to find much info
in the docs on that.

> First of all sqlite is just a proof of concept and should not be
> used for production use a proper database like mysql or postgresql.

I feared as much. I like sqlite for "simple databases" (and I figured Bacula's
DB would be one of those) as it is completely maintenance-free. Well, I
switched to PostgreSQL 8.4 now.

Still a full backup after a base backup does not work for me... With Postgres
it's a bit different though:
I see that the fd sends some data back but after some minutes the fd goes
somehow into "idle":
 - backup is still running according to bconsole (for dir+sd+fd)
 - almost no CPU/IO on dir/sd/fd anymore

strace on the client (Bacula 5.2.11 on Fedora 17) shows that it just does some
writes until it fails with EAGAIN and then waits in a select again. The client
doesn't seem to recover.

I'll try to play around with the acurate/spool settings to see if that makes
any difference.

fs

[pid 20195] select(6, NULL, [5], NULL, {10, 0} 
[pid 20212] <... select resumed> )  = 0 (Timeout)
[pid 20212] select(6, [5], NULL, NULL, {5, 0}^[[B 
[pid 19992] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)
[pid 19992] futex(0x35f8a5ea20, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 19992] futex(0x35f8a5ea64, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
93, {1352124663, 548817000},  
[pid 20195] <... select resumed> )  = 1 (out [5], left {2, 191603})
[pid 20195] write(5, 
"\367W\205\232/t\2619C\253\354\232^;\377\0322\262\273\25G\264\22\364\4\n\232\371Q\223;\240"...,
 74) = 74
[pid 20195] fcntl(5, F_SETFL, O_RDWR)   = 0
[pid 20195] fcntl(5, F_GETFL)   = 0x2 (flags O_RDWR)

[pid 20195] fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 20195] write(5, "\27\3\1\0 
\1\315\247\267\321\353\341+\274\27\340\251\302q\250\16\230\376\7\211|\320ds\353e\377"...,
 154) = 80
[pid 20195] write(5, 
"\367W\205\232/t\2619C\253\354\232^;\377\0322\262\273\25G\264\22\364\4\n\232\371Q\223;\240"...,
 74) = -1 EAGAIN (Resource temporarily unavailable)

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread lst_hoe02

Zitat von Radosław Korzeniewski :

> Hello,
>
> 2012/11/5 
>>
>> No, we have around 3.5 billion files with some jobs with not problem,
>>
>
> Wow, 3.5 _billion_ files on a single job. Amazing.
>

Sorry, confused by the million/billion (short/long scale). Should have  
been more precise with 3.5 x 10^6 (german million not milliarde)

>> but we use PostgreSQL as backend db and don't do base jobs.
>
>
> Could you share some information about your setup. It is very interesting
> how Bacula and PostgreSQL can handle this kind of _huge_ backup job. What
> hardware? What archive device (disk? tape? both?). I can imagine that you
> should have no less then 10 _billion_ files in your catalog. The largest
> Bacula deployment I ever see was a few hundred million files in a whole
> catalog, not a single job.

So we are, depending on the short/long scale you have in mind by  
factor 1000 or 100 smaller than expected ;-)

The real numbers are only 1.45 TB and 3.500.000 files for one machine.  
The others are somewhat smaller but there is still another with  
1.200.000 files.

The whole catalog is around 7GB as of now, so no problem.

So let's repeat: I will never use ambiguous "billion" again...

Regards

Andreas



--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread Radosław Korzeniewski
Hello,

2012/11/5 
>
> No, we have around 3.5 billion files with some jobs with not problem,
>

Wow, 3.5 _billion_ files on a single job. Amazing.


> but we use PostgreSQL as backend db and don't do base jobs.


Could you share some information about your setup. It is very interesting
how Bacula and PostgreSQL can handle this kind of _huge_ backup job. What
hardware? What archive device (disk? tape? both?). I can imagine that you
should have no less then 10 _billion_ files in your catalog. The largest
Bacula deployment I ever see was a few hundred million files in a whole
catalog, not a single job.

best regards
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread lst_hoe02

Zitat von Felix Schwarz :

> Hi Luis,
>
> Am 03.11.2012 12:57, schrieb Luis H. Forchesatto:
>> Quantos arquivos são backupeados?
> "How many files are backed up?"
>
> (I used Google translate, let's hope I understood you correctly)
>
> I checked the base backup job and that one had ~700k files with a  
> total of ~65
> GB (uncompressed). That shouldn't be too hard for bacula, isn't it?
>

No, we have around 3.5 billion files with some jobs with not problem,  
but we use PostgreSQL as backend db and don't do base jobs. You should  
have a look at your sqlite settings and first test without base jobs.  
If your backups work add base jobs to the setup and see what happens.

Regards

Andreas



--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-05 Thread Marco van Wieringen
Felix Schwarz  oss.schwarz.eu> writes:

> "\r\0\0\0\3\0\241\0\2\335\1\277\0\241\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
> [pid  1827] lseek(10, 40020992, SEEK_SET 
> [pid  2636] <... nanosleep resumed> NULL) = 0
> [pid  1827] <... lseek resumed> )   = 40020992
> [pid  1468] <... nanosleep resumed> NULL) = 0
> [pid  2636] access("/var/spool/bacula/bacula.db-journal", F_OK  ...>
> [pid  1827] read(10,  
> [pid  1468] access("/var/spool/bacula/bacula.db-journal", F_OK  ...>
> [pid  2636] <... access resumed> )  = 0
> [pid  1827] <... read resumed>
> "\r\0\0\0\3\0\226\0\2\340\1\275\0\226\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
> [pid  1468] <... access resumed> )  = 0
> 
Looks like its busy updating the sqlite database with backup info.
You probably have configured the system with attribute spooling which
means that after the backup is done the director needs to update the 
database with all the file data to be able to restore your backup.

First of all sqlite is just a proof of concept and should not be
used for production use a proper database like mysql or postgresql.
the SQLite code is also not very smart when it comes to doing lots
of updates and those databases are known to need some bandwidth of
your disks. Looking at the fact is also using a journal it could very
well be some severe contention on that journal when inserting lots of
files.

Marco


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-04 Thread Felix Schwarz
Hi Luis,

Am 03.11.2012 12:57, schrieb Luis H. Forchesatto:
> Quantos arquivos são backupeados?
"How many files are backed up?"

(I used Google translate, let's hope I understood you correctly)

I checked the base backup job and that one had ~700k files with a total of ~65
GB (uncompressed). That shouldn't be too hard for bacula, isn't it?

fs

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-03 Thread Luis H. Forchesatto
Saudações.

Quantos arquivos são backupeados?

Dependendo pode ser algo relacionado com a quantia de arquivos que ele tem
que salvar na base de dados e/ou processar. Junta isso com uma possível
compressão ou criptografia e você tem um server praticamente inutilizável.

Att.

2012/11/2 Felix Schwarz 

> Hey,
>
> Am 01.11.2012 22:16, schrieb Domen Kožar:
> > Can you show configuration? Do you have compression enabled?
>
> Here's the fileset:
>
> FileSet {
> Name = "data"
> Include {
> Options {
> signature = SHA1
> compression = GZIP
> portable = yes
> aclsupport=yes
> xattrsupport = yes
> }
>
> The (regular) job has 'Accurate = yes', the base job hasn't.
>
> Which config do you need in addition?
>
> fs
>
>
> --
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> ___
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>



-- 
Att.*
***
--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-02 Thread Felix Schwarz
Hey,

Am 01.11.2012 22:16, schrieb Domen Kožar:
> Can you show configuration? Do you have compression enabled?

Here's the fileset:

FileSet {
Name = "data"
Include {
Options {
signature = SHA1
compression = GZIP
portable = yes
aclsupport=yes
xattrsupport = yes
}

The (regular) job has 'Accurate = yes', the base job hasn't.

Which config do you need in addition?

fs

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula director consumes all CPU after full backup

2012-11-01 Thread Domen Kožar
Can you show configuration? Do you have compression enabled?


On Thu, Nov 1, 2012 at 9:45 PM, Felix Schwarz
wrote:

> Hi,
>
> I'm running Bacula 5.2.12 (RPMs distributed by Simone Caronni on
> repos.fedorapeople.org) on CentOS 6 (x86_64) with a sqlite catalog.
>
> I defined a base backup job and a regular backup job. The base job worked
> fine
> after the subsequent full backup the bacula director "hangs": It consumes
> all
> CPU (for at least 4 hours) and I can't start new jobs through bconsole.
>
> Some observations/background info:
> - The client/sd tell me that the backup was successful, however the
> director
>   lists the job as still running (with status 'terminated').
> - With a new bconsole connection I can issue some queries on the director
> so
>   it's at least partially alive.
> - My setup is a pretty simple one currently: disk based backup, about
>   60-100 GB data, all communication encrypted with TLS certificates (though
>   no the backups themself).
>
> I attached the status output as well as a short snippet of an strace on the
> bacula-director.
>
> My questions:
> - Is that a known bug? (or a bug at all?)
> - Which additional information should I provide you?
>
> fs
>
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Bacula director consumes all CPU after full backup

2012-11-01 Thread Felix Schwarz
Hi,

I'm running Bacula 5.2.12 (RPMs distributed by Simone Caronni on
repos.fedorapeople.org) on CentOS 6 (x86_64) with a sqlite catalog.

I defined a base backup job and a regular backup job. The base job worked fine
after the subsequent full backup the bacula director "hangs": It consumes all
CPU (for at least 4 hours) and I can't start new jobs through bconsole.

Some observations/background info:
- The client/sd tell me that the backup was successful, however the director
  lists the job as still running (with status 'terminated').
- With a new bconsole connection I can issue some queries on the director so
  it's at least partially alive.
- My setup is a pretty simple one currently: disk based backup, about
  60-100 GB data, all communication encrypted with TLS certificates (though
  no the backups themself).

I attached the status output as well as a short snippet of an strace on the
bacula-director.

My questions:
- Is that a known bug? (or a bug at all?)
- Which additional information should I provide you?

fs
ws3-client Version: 5.2.11 (10 September 2012)  x86_64-redhat-linux-gnu redhat 
Miracle)
Daemon started 25-Okt-12 14:12. Jobs: run=1 running=0.
... 
Terminated Jobs:
 JobId  LevelFiles  Bytes   Status   FinishedName 
==
10  Full732,6601.967 G  OK   01-Nov-12 17:24 ws3



server4-sd Version: 5.2.12 (12 September 2012) x86_64-redhat-linux-gnu redhat 
Enterprise release
Daemon started 01-Nov-12 16:06. Jobs: run=3, running=0.
Terminated Jobs:
 JobId  LevelFiles  Bytes   Status   FinishedName 
===
...
10  Full732,6602.117 G  OK   01-Nov-12 17:24 ws3


server4-dir Version: 5.2.12 (12 September 2012) x86_64-redhat-linux-gnu redhat 
Enterprise release
Daemon started 01-Nov-12 16:48. Jobs: run=2, running=1 mode=0,0
Running Jobs:
Console connected at 01-Nov-12 17:26
Console connected at 01-Nov-12 17:28
Console connected at 01-Nov-12 17:29
Console connected at 01-Nov-12 21:11
 JobId Level   Name   Status
==
10 Fullws3.2012-11-01_16.59.29_07 has terminated


[pid  1827] read(10, 
"\r\0\0\0\3\0t\0\2\322\1\240\0t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) 
= 1024
[pid  1827] lseek(10, 40014848, SEEK_SET) = 40014848
[pid  1827] read(10, 
"\r\0\0\0\3\0u\0\2\317\1\243\0u\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) 
= 1024
[pid  1827] lseek(10, 40015872, SEEK_SET) = 40015872
[pid  1827] read(10, 
"\r\0\0\0\3\0{\0\2\321\1\250\0{\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) 
= 1024
[pid  1827] lseek(10, 40016896, SEEK_SET) = 40016896
[pid  1827] read(10, 
"\r\0\0\0\3\0w\0\2\322\1\245\0w\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) 
= 1024
[pid  1827] lseek(10, 40017920, SEEK_SET) = 40017920
[pid  1827] read(10, 
"\r\0\0\0\3\0\222\0\2\333\1\271\0\222\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1024) = 1024
[pid  1827] lseek(10, 40018944, SEEK_SET) = 40018944
[pid  1827] read(10, 
"\r\0\0\0\3\0\210\0\2\332\1\263\0\210\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1024) = 1024
[pid  1827] lseek(10, 40019968, SEEK_SET) = 40019968
[pid  1827] read(10, 
"\r\0\0\0\3\0\241\0\2\335\1\277\0\241\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1024) = 1024
[pid  1827] lseek(10, 40020992, SEEK_SET 
[pid  2636] <... nanosleep resumed> NULL) = 0
[pid  1827] <... lseek resumed> )   = 40020992
[pid  1468] <... nanosleep resumed> NULL) = 0
[pid  2636] access("/var/spool/bacula/bacula.db-journal", F_OK 
[pid  1827] read(10,  
[pid  1468] access("/var/spool/bacula/bacula.db-journal", F_OK 
[pid  2636] <... access resumed> )  = 0
[pid  1827] <... read resumed> 
"\r\0\0\0\3\0\226\0\2\340\1\275\0\226\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1024) = 1024
[pid  1468] <... access resumed> )  = 0

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users