Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 8/9/2010 8:16 PM, Dan Langille wrote: > On 8/9/2010 4:23 PM, Rory Campbell-Lange wrote: >> On 04/08/10, Rory Campbell-Lange (r...@campbell-lange.net) wrote: >>> On 03/08/10, Dan Langille (d...@langille.org) wrote: On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: >>> > Yes, a batch insert is faster than a specfic insert, but the latter > should be done at the "written-to-tape" transaction time, and could be > done asynchronously, but in a transaction. So... don't use batch-insert. Go the single insert approach. I dare you. ;) >>> >>> Yes, I'm trying! I'm trying to do it properly by recompiling debian >>> stable backports and I'm running into library dependancy problems. >> >> Done. I've recompiled with batch inserts set to off. >> >> I tested spooling to see how this would work although it isn't strictly >> necessary for my situation (a single server with AoE and internal >> storage and a locally attached tape drive). The backup started running >> off disk at around 100MB/s and then spooling at around 100MB/s. The disk >> copy slowed down dramatically over the weekend due to contention due to >> some external MD5 audits and rsync processes. > > Well, spooling data to disk first makes sense if your network cannot > keep up with your tape drive. You want to avoid start/stop on the tape. > > Spooling attributes is different. You may want to try that on and then > off to see how things go. Off is what we wanted to avoid I think. Oh... Spooling Attributes = No is what you want to use I think. -- Dan Langille - http://langille.org/ -- This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 8/9/2010 4:23 PM, Rory Campbell-Lange wrote: > On 04/08/10, Rory Campbell-Lange (r...@campbell-lange.net) wrote: >> On 03/08/10, Dan Langille (d...@langille.org) wrote: >>> On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: >> Yes, a batch insert is faster than a specfic insert, but the latter should be done at the "written-to-tape" transaction time, and could be done asynchronously, but in a transaction. >>> >>> So... don't use batch-insert. Go the single insert approach. I dare >>> you. ;) >> >> Yes, I'm trying! I'm trying to do it properly by recompiling debian >> stable backports and I'm running into library dependancy problems. > > Done. I've recompiled with batch inserts set to off. > > I tested spooling to see how this would work although it isn't strictly > necessary for my situation (a single server with AoE and internal > storage and a locally attached tape drive). The backup started running > off disk at around 100MB/s and then spooling at around 100MB/s. The disk > copy slowed down dramatically over the weekend due to contention due to > some external MD5 audits and rsync processes. Well, spooling data to disk first makes sense if your network cannot keep up with your tape drive. You want to avoid start/stop on the tape. Spooling attributes is different. You may want to try that on and then off to see how things go. Off is what we wanted to avoid I think. > > I'm now going to play with non-spooled backups and some options such as > noatime. > >Build OS: x86_64-pc-linux-gnu debian 5.0.5 >JobId: 1 >Job:HAbkp.2010-08-06_15.46.23_03 >Backup Level: Full >Client: "clwbackup-fd" 5.0.2 (28Apr10) > x86_64-pc-linux-gnu,debian,5.0.5 >FileSet:"HAfileset" 2010-08-06 15:46:23 >Pool: "HAPool" (From Job resource) >Catalog:"MyCatalog" (From Client resource) >Storage:"CLW_LTO4" (From Job resource) >Scheduled time: 06-Aug-2010 15:46:07 >Start time: 06-Aug-2010 15:46:25 >End time: 09-Aug-2010 18:23:16 >Elapsed time: 3 days 2 hours 36 mins 51 secs >Priority: 10 >FD Files Written: 7,706,717 >SD Files Written: 7,706,717 >FD Bytes Written: 7,337,839,018,613 (7.337 TB) >SD Bytes Written: 7,339,475,824,330 (7.339 TB) >Rate: 27317.7 KB/s >Software Compression: None >VSS:no >Encryption: no >Accurate: no >Volume name(s): HA-01|HA-02|HA-03|HA-04|HA-05|HA-06|HA-07 >Volume Session Id: 1 >Volume Session Time:1281090914 >Last Volume Bytes: 896,114,386,944 (896.1 GB) >Non-fatal FD errors:0 >SD Errors: 0 >FD termination status: OK >SD termination status: OK >Termination:Backup OK > > -- Dan Langille - http://langille.org/ -- This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 04/08/10, Rory Campbell-Lange (r...@campbell-lange.net) wrote: > On 03/08/10, Dan Langille (d...@langille.org) wrote: > > On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: > > > > Yes, a batch insert is faster than a specfic insert, but the latter > > > should be done at the "written-to-tape" transaction time, and could be > > > done asynchronously, but in a transaction. > > > > So... don't use batch-insert. Go the single insert approach. I dare > > you. ;) > > Yes, I'm trying! I'm trying to do it properly by recompiling debian > stable backports and I'm running into library dependancy problems. Done. I've recompiled with batch inserts set to off. I tested spooling to see how this would work although it isn't strictly necessary for my situation (a single server with AoE and internal storage and a locally attached tape drive). The backup started running off disk at around 100MB/s and then spooling at around 100MB/s. The disk copy slowed down dramatically over the weekend due to contention due to some external MD5 audits and rsync processes. I'm now going to play with non-spooled backups and some options such as noatime. Build OS: x86_64-pc-linux-gnu debian 5.0.5 JobId: 1 Job:HAbkp.2010-08-06_15.46.23_03 Backup Level: Full Client: "clwbackup-fd" 5.0.2 (28Apr10) x86_64-pc-linux-gnu,debian,5.0.5 FileSet:"HAfileset" 2010-08-06 15:46:23 Pool: "HAPool" (From Job resource) Catalog:"MyCatalog" (From Client resource) Storage:"CLW_LTO4" (From Job resource) Scheduled time: 06-Aug-2010 15:46:07 Start time: 06-Aug-2010 15:46:25 End time: 09-Aug-2010 18:23:16 Elapsed time: 3 days 2 hours 36 mins 51 secs Priority: 10 FD Files Written: 7,706,717 SD Files Written: 7,706,717 FD Bytes Written: 7,337,839,018,613 (7.337 TB) SD Bytes Written: 7,339,475,824,330 (7.339 TB) Rate: 27317.7 KB/s Software Compression: None VSS:no Encryption: no Accurate: no Volume name(s): HA-01|HA-02|HA-03|HA-04|HA-05|HA-06|HA-07 Volume Session Id: 1 Volume Session Time:1281090914 Last Volume Bytes: 896,114,386,944 (896.1 GB) Non-fatal FD errors:0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination:Backup OK -- Rory Campbell-Lange r...@campbell-lange.net -- This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 8/4/2010 6:50 AM, Rory Campbell-Lange wrote: > On 04/08/10, Dan Langille (d...@langille.org) wrote: > >> I think stored procedures would be a good idea. That way, we could >> improve the functionality without changing the application. >> >> That said, we do not at present use stored procedures. We'd have to >> proceed carefully. > > Fair enough. Can restores run off marked columns in database tables? > This exercise could be taken out of the main core of Bacula into a > little Django app (perhaps) and not affect the main body of Bacula at > all except for some way of executing restores. I have no idea what you mean by that question. -- Dan Langille - http://langille.org/ -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 04/08/10, Dan Langille (d...@langille.org) wrote: > I think stored procedures would be a good idea. That way, we could > improve the functionality without changing the application. > > That said, we do not at present use stored procedures. We'd have to > proceed carefully. Fair enough. Can restores run off marked columns in database tables? This exercise could be taken out of the main core of Bacula into a little Django app (perhaps) and not affect the main body of Bacula at all except for some way of executing restores. -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 8/4/2010 6:28 AM, Rory Campbell-Lange wrote: > On 03/08/10, Dan Langille (d...@langille.org) wrote: >> On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: > >>> Yes, a batch insert is faster than a specfic insert, but the latter >>> should be done at the "written-to-tape" transaction time, and could be >>> done asynchronously, but in a transaction. >> >> So... don't use batch-insert. Go the single insert approach. I dare >> you. ;) > > Yes, I'm trying! I'm trying to do it properly by recompiling debian > stable backports and I'm running into library dependancy problems. > >>> This is a cludge (with an inefficient correlated subquery!) that could >>> easily miss paths which exist from previous, unrelated backups. A >>> continuous insert process against a job and mediaid simply wouldn't need >>> to do this. > >> If you want to patch it, we'll certainly look at it. > > I'll look at some of the post batch insert queries and look to optimise > them somewhat with our database development team in the next month. > >>> More native support for postgres would also allow, for instance, faster >>> and more powerful searching of catalogues for retrieves, rather than the >>> strange restore procedure required through bconsole. > >> Sure, it would. We're always looking for more people to take on the >> heavy lifting. > > Would it be helpful if we developed some proof-of-concept plpgsql > functions for searching catalogues? We could also do a bit of work on > UTF8 conversion/support. I think stored procedures would be a good idea. That way, we could improve the functionality without changing the application. That said, we do not at present use stored procedures. We'd have to proceed carefully. -- Dan Langille - http://langille.org/ -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 04/08/10, Marc Cousin (cousinm...@gmail.com) wrote: > > Changes to one part of Bacula has to be compatible with all other parts > > of Bacula. For example, given that we support SQLite, PostgreSQL, and > > MySQL, we have to keep each in mind. Yes, it's a compromise. > Moreover, yes, PostgreSQL is a robust DB. Capable of doing very high end > transactionnal work. But that's not what bacula requires. Bacula requires > very high throughput (some of us have to insert 50 millions rows a day) Touche -- I've been thinking too much of my single server backing itself up. > > The running out of space is a PostgreSQL issue. > I wouldn't say that. I would say that configuring a 10GB /var for the > database filesystem when doing 7TB backups is the issue. Give it more > space and stop wasting time. > ... > I totally agree with Dan : try the insert mode. And don't forget to > post your results back. OK. > > > This is a cludge (with an inefficient correlated subquery!) that could > > > easily miss paths which exist from previous, unrelated backups. A > > > continuous insert process against a job and mediaid simply wouldn't need > > > to do this. > > > > If you want to patch it, we'll certainly look at it. > Yes, so what ? Do you propose doing a transaction for every file > backed up ? It is used as a data dump, because that is what the > catalog is (for the tape part). And we have to be fast when inserting > there, there are some configurations with hundreds of client > spooling??? not juste one backup. Fair enough. Thanks for your comments. Regards Rory -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 03/08/10, Dan Langille (d...@langille.org) wrote: > On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: > > Yes, a batch insert is faster than a specfic insert, but the latter > > should be done at the "written-to-tape" transaction time, and could be > > done asynchronously, but in a transaction. > > So... don't use batch-insert. Go the single insert approach. I dare > you. ;) Yes, I'm trying! I'm trying to do it properly by recompiling debian stable backports and I'm running into library dependancy problems. > > This is a cludge (with an inefficient correlated subquery!) that could > > easily miss paths which exist from previous, unrelated backups. A > > continuous insert process against a job and mediaid simply wouldn't need > > to do this. > If you want to patch it, we'll certainly look at it. I'll look at some of the post batch insert queries and look to optimise them somewhat with our database development team in the next month. > > More native support for postgres would also allow, for instance, faster > > and more powerful searching of catalogues for retrieves, rather than the > > strange restore procedure required through bconsole. > Sure, it would. We're always looking for more people to take on the > heavy lifting. Would it be helpful if we developed some proof-of-concept plpgsql functions for searching catalogues? We could also do a bit of work on UTF8 conversion/support. Rory -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: > Yes, a batch insert is faster than a specfic insert, but the latter > should be done at the "written-to-tape" transaction time, and could be > done asynchronously, but in a transaction. Its pretty crazy for a>7TB > tape backup to fail because of a temporary file/table problem at the END > of the backup process rather than during it. Perhaps this will help: * batch insert determines the method to be used. * Spool Attributes determines when this method is used. Does that help? -- Dan Langille - http://langille.org/ -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 8/3/2010 7:09 PM, Rory Campbell-Lange wrote: > Actually, this is what I don't get. Postgresql is a highly scalable, > robust database system and it is being used as a data dump rather than a > working tool for creating a transaction-based working catalogue. Changes to one part of Bacula has to be compatible with all other parts of Bacula. For example, given that we support SQLite, PostgreSQL, and MySQL, we have to keep each in mind. Yes, it's a compromise. > Yes, a batch insert is faster than a specfic insert, but the latter > should be done at the "written-to-tape" transaction time, and could be > done asynchronously, but in a transaction. So... don't use batch-insert. Go the single insert approach. I dare you. ;) > Its pretty crazy for a>7TB > tape backup to fail because of a temporary file/table problem at the END > of the backup process rather than during it. The running out of space is a PostgreSQL issue. > Also the copy writes to a temporary table and then some rather curious > inserts are done into the Bacula tables. E.g: > > INSERT INTO > Path (Path) > SELECT > a.Path > FROM ( > SELECT DISTINCT Path FROM batch > ) AS a > WHERE NOT EXISTS > (SELECT Path FROM Path WHERE Path = a.Path) > > This is a cludge (with an inefficient correlated subquery!) that could > easily miss paths which exist from previous, unrelated backups. A > continuous insert process against a job and mediaid simply wouldn't need > to do this. If you want to patch it, we'll certainly look at it. > More native support for postgres would also allow, for instance, faster > and more powerful searching of catalogues for retrieves, rather than the > strange restore procedure required through bconsole. Sure, it would. We're always looking for more people to take on the heavy lifting. > I'm delighted to be using Bacula (particularly after our tribulations > with Amanda) but it seems to me that Bacula could lean much more heavily > on Postgresql. Yep. I was this close to deploying Amanda when I found Bacula. I switched immediately. When the PostgreSQL module was written, it was based exactly upon the MySQL module. Why? Ease. Walk first. Run later. As time has gone on, more PostgreSQL specific code has been added. Now that we have extensive regression testing, such changes are easier to evaluate. -- Dan Langille - http://langille.org/ -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 03/08/10, Marc Cousin (cousinm...@gmail.com) wrote: > > > > 3. Why is Bacula using a batch file at all? Why not simply do a straight > > > >insert? > > > > > > Because 7,643,966 inserts would be much slower. > > > > Really? I've logged Bacula's performance on the server and the inserts > > run at around 0.35 ms and updates at around 0.5 ms. > What is traced, usually, is execution time. You won't easily get : > - Parse time of the query. It is basically 0 with batch insert, where it is > very measurable with insert. > - Round trip duration and overhead. This one, even if everything is > running on the same machine, is where the costs savings are high with > batch insert : if you run everything on inserts, the inserting > process has to wait for the database to acknowledge each operation > before submitting the next one. And inserting records in bacula > isn't all about inserts. There are some selects too, to lookup for > pathid and filenameid. You also pay a penalty because you send back > data to the caller (how many inserted records and the like). > > To give you a very simplified simulation, I've tried inserting 1 million > integer > values the way the batch insert works (copy), It takes 3.5 seconds, mostly IO > bound. > > With inserts, 77s, mostly CPU bound. > > The gains are lower with bacula, because data inserted is more complex, bacula > itself is more complex, there are indexes to maintain, but it gives you an > idea > of why there is a batch mode. Actually, this is what I don't get. Postgresql is a highly scalable, robust database system and it is being used as a data dump rather than a working tool for creating a transaction-based working catalogue. Yes, a batch insert is faster than a specfic insert, but the latter should be done at the "written-to-tape" transaction time, and could be done asynchronously, but in a transaction. Its pretty crazy for a >7TB tape backup to fail because of a temporary file/table problem at the END of the backup process rather than during it. Also the copy writes to a temporary table and then some rather curious inserts are done into the Bacula tables. E.g: INSERT INTO Path (Path) SELECT a.Path FROM ( SELECT DISTINCT Path FROM batch ) AS a WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path) This is a cludge (with an inefficient correlated subquery!) that could easily miss paths which exist from previous, unrelated backups. A continuous insert process against a job and mediaid simply wouldn't need to do this. More native support for postgres would also allow, for instance, faster and more powerful searching of catalogues for retrieves, rather than the strange restore procedure required through bconsole. I'm delighted to be using Bacula (particularly after our tribulations with Amanda) but it seems to me that Bacula could lean much more heavily on Postgresql. -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 03/08/10, Martin Simmons (mar...@lispworks.com) wrote: > > > > 2. Can I stop this needless after-backup insertion? I tried setting > > > >Spool Attributes to NO but it did not work > > > > > > You need to rebuild Bacula with the --disable-batch-insert option, but it > > > might run quite slowly. Setting synchronous_commit = off in > > > postgresql.conf > > > might help to make it faster. > > > > Thanks about the note about the --disable-batch-insert compile time > > option. Changing the synchronous_commit flag to off will speed up > > inserts into the database which will is great, but it won't affect the > > size of the batch file. Please clarify why you are suggesting this. > > Sorry, I wasn't clear. I was only suggesting that you change > synchronous_commit if you use --disable-batch-insert. ... > > Perhaps this is something the developers could consider? > > You can do that already by compiling with --disable-batch-insert and setting > Spool Attributes to NO. Thanks very much for your comments, Martin. I'm about to try with this compile option and setting Spool Attributes to NO. I'll report back. -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
> On Tue, 3 Aug 2010 13:17:25 +0100, Rory Campbell-Lange said: > > Thanks very much for your response, Martin. > > On 03/08/10, Martin Simmons (mar...@lispworks.com) wrote: > > > On Tue, 3 Aug 2010 10:15:18 +0100, Rory Campbell-Lange said: > > > > I have 3.4GB free in /var where Postgresql is located. At the end of a > > > large backup job (7,643,966 files taking up 7.265TB of space) Postgres > > > bails out copying a batch file into the File table due to a mysterious > > > "no space left on device" error. > > > > > > Questions: > > > 1. How should I size my postgresql partition? > > > > I expect 7.6 million records to need at least 800MB when inserted and the > > batch tables will need a similar amount during the backup. It is difficult > > to > > predict what the hash-join temporary file will need because it depends on > > the > > internals of PostgreSQL. > > > > Firstly though I suggest running df frequently during the backup to verify > > that the problem really is /var filling up. > > My server logs over the backup period still show over 2GB free in /var > (where postgresql is held) and 8GB in /tmp. Thanks however for the rule > of thumb sizes for the records. It isn't clear to me if your logs cover the minutes and seconds up to the point of failure. After the failure, the temporary table/file will have been deleted so the free space will appear to be fine again. > > > > 2. Can I stop this needless after-backup insertion? I tried setting > > >Spool Attributes to NO but it did not work > > > > You need to rebuild Bacula with the --disable-batch-insert option, but it > > might run quite slowly. Setting synchronous_commit = off in postgresql.conf > > might help to make it faster. > > Thanks about the note about the --disable-batch-insert compile time > option. Changing the synchronous_commit flag to off will speed up > inserts into the database which will is great, but it won't affect the > size of the batch file. Please clarify why you are suggesting this. Sorry, I wasn't clear. I was only suggesting that you change synchronous_commit if you use --disable-batch-insert. > > > 3. Why is Bacula using a batch file at all? Why not simply do a straight > > >insert? > > > > Because 7,643,966 inserts would be much slower. > > Really? I've logged Bacula's performance on the server and the inserts > run at around 0.35 ms and updates at around 0.5 ms. > 8 million inserts at 0.35ms will take about 46 minutes. The batch insert code made a noticable difference for me. It takes 2 to 3 minutes to process 950,000 file records, but I don't have the figures for the non-batch insert now. "Process" here is the whole operation, not just inserting into the File table (it has to query and update Filename and Path too). > But it would be > quite possible for Bacula to do this asynchronously while it does the > job of writing data from disk to tape, which in this case takes several > days. That's true. Moreover, your average file size is quite large, so the per file time to insert the records may not be so important anyway. > Perhaps this is something the developers could consider? You can do that already by compiling with --disable-batch-insert and setting Spool Attributes to NO. __Martin -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
Thanks very much for your response, Martin. On 03/08/10, Martin Simmons (mar...@lispworks.com) wrote: > > On Tue, 3 Aug 2010 10:15:18 +0100, Rory Campbell-Lange said: > > I have 3.4GB free in /var where Postgresql is located. At the end of a > > large backup job (7,643,966 files taking up 7.265TB of space) Postgres > > bails out copying a batch file into the File table due to a mysterious > > "no space left on device" error. > > > > Questions: > > 1. How should I size my postgresql partition? > > I expect 7.6 million records to need at least 800MB when inserted and the > batch tables will need a similar amount during the backup. It is difficult to > predict what the hash-join temporary file will need because it depends on the > internals of PostgreSQL. > > Firstly though I suggest running df frequently during the backup to verify > that the problem really is /var filling up. My server logs over the backup period still show over 2GB free in /var (where postgresql is held) and 8GB in /tmp. Thanks however for the rule of thumb sizes for the records. > > 2. Can I stop this needless after-backup insertion? I tried setting > >Spool Attributes to NO but it did not work > > You need to rebuild Bacula with the --disable-batch-insert option, but it > might run quite slowly. Setting synchronous_commit = off in postgresql.conf > might help to make it faster. Thanks about the note about the --disable-batch-insert compile time option. Changing the synchronous_commit flag to off will speed up inserts into the database which will is great, but it won't affect the size of the batch file. Please clarify why you are suggesting this. > > 3. Why is Bacula using a batch file at all? Why not simply do a straight > >insert? > > Because 7,643,966 inserts would be much slower. Really? I've logged Bacula's performance on the server and the inserts run at around 0.35 ms and updates at around 0.5 ms. 8 million inserts at 0.35ms will take about 46 minutes. But it would be quite possible for Bacula to do this asynchronously while it does the job of writing data from disk to tape, which in this case takes several days. Perhaps this is something the developers could consider? In the mean time I will move Postgres to a 10G dedicated XFS partition and try again. -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
> On Tue, 3 Aug 2010 10:15:18 +0100, Rory Campbell-Lange said: > > I'm fairly desperate for some advice on this issue. > > I have 3.4GB free in /var where Postgresql is located. At the end of a > large backup job (7,643,966 files taking up 7.265TB of space) Postgres > bails out copying a batch file into the File table due to a mysterious > "no space left on device" error. > > Questions: > 1. How should I size my postgresql partition? I expect 7.6 million records to need at least 800MB when inserted and the batch tables will need a similar amount during the backup. It is difficult to predict what the hash-join temporary file will need because it depends on the internals of PostgreSQL. Firstly though I suggest running df frequently during the backup to verify that the problem really is /var filling up. > 2. Can I stop this needless after-backup insertion? I tried setting >Spool Attributes to NO but it did not work You need to rebuild Bacula with the --disable-batch-insert option, but it might run quite slowly. Setting synchronous_commit = off in postgresql.conf might help to make it faster. > 3. Why is Bacula using a batch file at all? Why not simply do a straight >insert? Because 7,643,966 inserts would be much slower. __Martin > I'm keen to get this 7TB backup out, but I have to reconfigure Bacula to > suit. Help much appreciated! > > Rory > > > On 02/08/10, Rory Campbell-Lange (r...@campbell-lange.net) wrote: > > I turned off spooling and set the Spool Attributes directive to no and > > reran the backup. The backup job completes but the database insert bails > > out. > > > > clwbackup-dir JobId 8: Fatal error: sql_create.c:894 Fill File table > > Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, > > LStat, MD5)SELECT batch.FileIndex, bat ch.JobId, Path.PathId, > > Filename.FilenameId,batch.LStat, batch.MD5 FROM batch JOIN Path ON (batc > > h.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): > > ERR=ERROR: could not write to hash-join temporary file: No space left > > on device > > > > I don't understand why bacula isn't writing to the database > > continuously. Why is a batch file needed? > > > -- > Rory Campbell-Lange > r...@campbell-lange.net > > -- > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
I'm fairly desperate for some advice on this issue. I have 3.4GB free in /var where Postgresql is located. At the end of a large backup job (7,643,966 files taking up 7.265TB of space) Postgres bails out copying a batch file into the File table due to a mysterious "no space left on device" error. Questions: 1. How should I size my postgresql partition? 2. Can I stop this needless after-backup insertion? I tried setting Spool Attributes to NO but it did not work 3. Why is Bacula using a batch file at all? Why not simply do a straight insert? I'm keen to get this 7TB backup out, but I have to reconfigure Bacula to suit. Help much appreciated! Rory On 02/08/10, Rory Campbell-Lange (r...@campbell-lange.net) wrote: > I turned off spooling and set the Spool Attributes directive to no and > reran the backup. The backup job completes but the database insert bails > out. > > clwbackup-dir JobId 8: Fatal error: sql_create.c:894 Fill File table > Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, > LStat, MD5)SELECT batch.FileIndex, bat ch.JobId, Path.PathId, > Filename.FilenameId,batch.LStat, batch.MD5 FROM batch JOIN Path ON (batc > h.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): > ERR=ERROR: could not write to hash-join temporary file: No space left > on device > > I don't understand why bacula isn't writing to the database > continuously. Why is a batch file needed? -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 30/07/10, Dan Langille (d...@langille.org) wrote: > On 7/30/2010 3:53 AM, Rory Campbell-Lange wrote: > > > > Fatal error: sql_create.c:843 Batch end postgresql.c:748 error ending > > batch mode: ERROR: could not extend relation 1663/17472/17828: > > wrote only 4096 of 8192 bytes at block 98374 > > HINT: Check free disk space. ... > >The database itself is only just over 500MB. > > I think you're suggesting Bacula used up 3.4GB in a query.. > > How many files are you backing up? 7,643,966 files taking up 7.265 T of space on tape. ... > Umm, or it could be a problem with the way you have your computer > system configured. :) It's a matter of perspective. In a 6.5TB > backup, I'm going to guess there are a large number of files, given > that var filled up. Can you extend var? Or create a symlink to > another filesystem to give PostgreSQL the space it needs. I'm intrigued about this batch file. Where is it and how big is it? I can increase the size of /var, or move the postgresql database mount point, but it is difficult to know how much size it may require. > > Is it not possible to change this > >arrangement to use sequential inserts instead? > > Look at the Spool Attributes directive. Set it to know. This way, > the details of each file will be added to the database right after > that file is backed up. I turned off spooling and set the Spool Attributes directive to no and reran the backup. The backup job completes but the database insert bails out. clwbackup-dir JobId 8: Fatal error: sql_create.c:894 Fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)SELECT batch.FileIndex, bat ch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5 FROM batch JOIN Path ON (batc h.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=ERROR: could not write to hash-join temporary file: No space left on device I don't understand why bacula isn't writing to the database continuously. Why is a batch file needed? -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
Thanks very much for your response, Eric. On 30/07/10, Eric Bollengier (eric.bolleng...@baculasystems.com) wrote: > It depends on how many files you backup, but a Bacula catalog requires some > space, specially if you handle many files. I think you need help to configure > your PostgreSQL catalog. 3.4GB free is not enough in any cases. Do mean this is not enough space for postgres in the long run? I understand that. > > Clearly there is a problem with the size of the temporary file used > > during the batch copy update. Since there are successful inserts into > > the log table milliseconds later this clearly points to a problem in the > > way Bacula inserts data in batch mode. Is it not possible to change this > > arrangement to use sequential inserts instead? > > You can try --disable-batch-insert, insertion process will take several hours > or days instead of minutes, but if you don't have disk space... > > I'm also keen to know if I can append to this large job to try and > > retrieve the set of data, or do I have to start again? > > You can also try to change the source code by uncommenting the "db changes" > check line in src/cats/sql_create.c db_create_file_attributes_record(). > > It allows to commit batch session every X records. It should work, and can > help in your situation but this code is untested, so I advise you to do some > restore tests. Feedback will be appreciated. Thanks for the suggestion. I'm going to first try Dan Langille's suggestion of setting "Spool Attributes" to "no". -- Rory Campbell-Lange r...@campbell-lange.net -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 7/30/2010 8:22 AM, Dan Langille wrote: > Look at the Spool Attributes directive. Set it to know. know? Try no. See http://www.bacula.org/5.0.x-manuals/en/main/main/Configuring_Director.html -- Dan Langille - http://langille.org/ -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula + Postgres : copy batch problem
On 7/30/2010 3:53 AM, Rory Campbell-Lange wrote: > Bacula has bailed out near the end of a 6.5TB backup (which is really > frustrating!) > > Fatal error: sql_create.c:843 Batch end postgresql.c:748 error ending > batch mode: ERROR: could not extend relation 1663/17472/17828: > wrote only 4096 of 8192 bytes at block 98374 > HINT: Check free disk space. > > on postgresql 8.3. > > This is the same issue as Holger Rauch's problems reported here: > http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg41952.html > This is with a backup spooling to a local holding disk. The job spool > sizes are set at 50G on a spool directory size of 300G. No problems > there. > > My database is here: > /dev/sda3 9.2G 5.4G 3.4G 62% /var > only 1% of the inodes are used. > > The database itself is only just over 500MB. I think you're suggesting Bacula used up 3.4GB in a query.. How many files are you backing up? > I've done some searching at it appears that the best response to this > problem is from Postgresql's Tom Lane: > http://markmail.org/message/shclbb4iaphypswv His suggestion is that the > query made a massive temporary file that caused /var to overfill. Also > see http://www.mail-archive.com/pgsql-performa...@postgresql.org/msg31231.html > > Clearly there is a problem with the size of the temporary file used > during the batch copy update. Since there are successful inserts into > the log table milliseconds later this clearly points to a problem in the > way Bacula inserts data in batch mode. Umm, or it could be a problem with the way you have your computer system configured. :) It's a matter of perspective. In a 6.5TB backup, I'm going to guess there are a large number of files, given that var filled up. Can you extend var? Or create a symlink to another filesystem to give PostgreSQL the space it needs. > Is it not possible to change this > arrangement to use sequential inserts instead? Look at the Spool Attributes directive. Set it to know. This way, the details of each file will be added to the database right after that file is backed up. > I'm also keen to know if I can append to this large job to try and > retrieve the set of data, or do I have to start again? Start again. -- Dan Langille - http://langille.org/ -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users