Re: [Bacula-users] restores not working
On 25/03/2010 11:36 a.m., James Harper wrote: >> >> We've used Legato networker before (we still do, as we're not yet >> successfully completed the migration; and it's looking more and more >> grim prospect by the day), and on approximately the same dataset (of >> about 500 million records spread over 100 servers) and somewhat >> weaker hardware, it would allow user to start selecting files to >> restore in matter of *seconds* (and it was using it's simple db6 >> files, no server/database tuning required at all) >> >> Now with bacula 5.0.1, we have to wait several *hours* before we can >> start selecting files to restore, and it is considered "normal" ?! >> > > I've always thought that Bacula could do this a bit better. If you are > selecting files rather than restoring everything then the chances are > you only want a small subset of all files (always exceptions of course), > so why read the whole tree in at once? Why not read it in as required, > or read it in 'layer by layer' in the background so the user can start > selecting files immediately. > > Complexity is probably the reason why it hasn't been done, but it would > be an interesting project. Actually it's not that hard (I've done it), but some of the queries can be quite slow, particularly the one to find all the subdirectories of a given chosen directory. I'm working on our own internal Web GUI for doing restores (ExtJS with a tree-based view of the filesystem for selecting files); I've found that you can either: a) Do it with low memory usage (not building a tree and doing ad-hoc queries as you go, recursing through the selected directories), but it'll be quite slow, or b) Use memory and pre-build time to build the directory tree in memory, then relatively quickly select. (a) is better for small restores, (b) is better for restores of more than a few hundred files; if your database is grinding to a halt building the tree, it's gonna truly suck doing lots of small queries of large datasets required for (a). In the end I'm going to give the users a choice of which method, so they can make a human decision. It's hard to make that choice in code, because when the user has just selected a top level directory, the code doesn't know how deep the tree below that is. Could be 10 files, could be 10 million :) And for anyone interested: Doing (b) in php is not a good idea. 160byte overhead just to create an empty object, another 60+bytes per stored integer . Blech. Perl is not amazingly better, but can be wrangled down to more tolerable memory sizes with some trickery. The tree data storage really needs to be done in something language/mechanism that actually only uses 4 bytes to store a 32-bit integer :) Craig Miskell -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
> > We've used Legato networker before (we still do, as we're not yet > successfully completed the migration; and it's looking more and more > grim prospect by the day), and on approximately the same dataset (of > about 500 million records spread over 100 servers) and somewhat > weaker hardware, it would allow user to start selecting files to > restore in matter of *seconds* (and it was using it's simple db6 > files, no server/database tuning required at all) > > Now with bacula 5.0.1, we have to wait several *hours* before we can > start selecting files to restore, and it is considered "normal" ?! > I've always thought that Bacula could do this a bit better. If you are selecting files rather than restoring everything then the chances are you only want a small subset of all files (always exceptions of course), so why read the whole tree in at once? Why not read it in as required, or read it in 'layer by layer' in the background so the user can start selecting files immediately. Complexity is probably the reason why it hasn't been done, but it would be an interesting project. James -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
> You desperately need more ram. I have 4GB ram on my 64 bit bacula > servers I built in 2004 and I probably have 1/2 that many files. And I > said servers. My database is not on the same machine as the director. It's already on order from Crucial! -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
> On a modest Dell rackount with an E4500 2.2GHz CPU, 1GB of RAM & 2 > raid1 SAS disks, retention 5 weeks, about 20 clients and about 100mil > rows in File, the restore query for 'building directory tree' would > still not be done after 24 hours with 5.0.x, whereas for 3.x.x the > time to do this was not noticable. Maybe 2 min? > You desperately need more ram. I have 4GB ram on my 64 bit bacula servers I built in 2004 and I probably have 1/2 that many files. And I said servers. My database is not on the same machine as the director. John -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
On Wed, Mar 24, 2010 at 8:32 AM, Matija Nalis wrote: >>> It is probably not hung, but just very, very slow. >> >> Yes. You probably need a LOT more ram and to tune mysql's parameters. It appears so! I've spent a large amount of time tweaking MySQL actually, it must be RAM. > (Before you ask, we've had to upgrade because there were [and still > are in 5.0.1, although somewhat rarer] bugs with director stopping > working -- otherwise we would've downgraded back to 3.0.x) We had similar problems but as best I could figure at the time they were issues with MySQL Redhat distro RPM's. The MySQL database would just be hung, and have to be killed and restarted. > We'd try tweaking key_buffer (and converting to InnoDB and tweakings > innodb_buffer_pool_size), join_buffer_size, max_heap_table_size, > tmp_table_size, sort_buffer_size, read_buffer_size, read_rnd_buffer_size > but in the end we've had to reduce retention to just a few weeks in order > to make the restoration happen in reasonable times (ie. getting the file > selection in less than 10 minutes). On a modest Dell rackount with an E4500 2.2GHz CPU, 1GB of RAM & 2 raid1 SAS disks, retention 5 weeks, about 20 clients and about 100mil rows in File, the restore query for 'building directory tree' would still not be done after 24 hours with 5.0.x, whereas for 3.x.x the time to do this was not noticable. Maybe 2 min? >> Even with 48Gb ram, a few restores on our system (~255 million File >> records: up to 4 million files on some full backups but nost are under >> 100k entries) could take an hour to get past the "building directory >> tree" stage. wow. >> It's a _lot_ faster with postgresql and moderate tuning (My other gripes >> about the changeover notwithstanding, those are annoyances, not >> showstoppers) I am in the middle of repopulating our database with bscan into a postgres database hoping to find the same speed increase. I've given up on recovering the database or solving the problem with using MySQL. > As it is, it is *much faster* for us if we need to restore one file > to do a complete restore of whole server and then delete 99.999% of > the files, than to use the file catalog to select few files to > restore. I wish I had that avenue, but a few of our servers have a TB of user data, which makes it not viable for us. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
On Tue, Mar 23, 2010 at 04:07:17PM +, Alan Brown wrote: > Matija Nalis wrote: >> It is probably not hung, but just very, very slow. > > Yes. You probably need a LOT more ram and to tune mysql's parameters. Or maybe someone should tune the SQL queries (no, I'm not volunteering, it's not my forte) and/or the way bacula stores the catalog ? The main issue is that for us 3.0.3 was about 20-100 *times* faster for approximately same dataset than 5.0.1. I do understand that 5.0.0 added BaseFiles support, but IMHO such is a speed drop is not an acceptable tradeoff (especially if one cannot turn it off in order to get faster queries again. I'd gladly compile with "--disable-basefiles" if that gave me 2 orders of magnitude speedup). Queries that never took more than 3-5 *minutes* with 3.0.3 have started taking more that several *hours* with 5.0.0. (Before you ask, we've had to upgrade because there were [and still are in 5.0.1, although somewhat rarer] bugs with director stopping working -- otherwise we would've downgraded back to 3.0.x) We'd try tweaking key_buffer (and converting to InnoDB and tweakings innodb_buffer_pool_size), join_buffer_size, max_heap_table_size, tmp_table_size, sort_buffer_size, read_buffer_size, read_rnd_buffer_size but in the end we've had to reduce retention to just a few weeks in order to make the restoration happen in reasonable times (ie. getting the file selection in less than 10 minutes). > Even with 48Gb ram, a few restores on our system (~255 million File > records: up to 4 million files on some full backups but nost are under > 100k entries) could take an hour to get past the "building directory > tree" stage. That is really terrible, I really think the developers should look into it. We've used Legato networker before (we still do, as we're not yet successfully completed the migration; and it's looking more and more grim prospect by the day), and on approximately the same dataset (of about 500 million records spread over 100 servers) and somewhat weaker hardware, it would allow user to start selecting files to restore in matter of *seconds* (and it was using it's simple db6 files, no server/database tuning required at all) Now with bacula 5.0.1, we have to wait several *hours* before we can start selecting files to restore, and it is considered "normal" ?! Several minutes might be tolerated by our users (although even that is almost hundred times slower than they were used to !), but several hours most certainly isn't (and a retention drop from several months to several weeks as alternative is also isn't making them extremely happy) > It's a _lot_ faster with postgresql and moderate tuning (My other gripes > about the changeover notwithstanding, those are annoyances, not > showstoppers) Waiting several hours to choose file for restoring might not be an issue for you; but we have users which were used to waiting just several seconds to select files to restore (and a few more minutes for restore to happen), and they are not impressed at all with bacula. As it is, it is *much faster* for us if we need to restore one file to do a complete restore of whole server and then delete 99.999% of the files, than to use the file catalog to select few files to restore. That is ridiculous situation. > The lesson for us was that mysql doesn't scale to huge datasets well and > we should have switched to postgres much earlier. That might be, and we'll try converting to PostgreSQL (there are issues with moving bacula data from MySQL to PostgreSQL), it seems. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
Matija Nalis wrote: > It is probably not hung, but just very, very slow. Yes. You probably need a LOT more ram and to tune mysql's parameters. > We've had a same > issue, with about 500 million records in File (and about 120GB on > disk for File.ibd) on (mostly dedicated to mysql) machine (8gig RAM, > 8x2.33 Xeon, different configurations with about 3-6GB for mysql > buffers) -- it could take several hours for 5.0.1 until it completed > and the system was ready for selecting few files to restore. :-( Even with 48Gb ram, a few restores on our system (~255 million File records: up to 4 million files on some full backups but nost are under 100k entries) could take an hour to get past the "building directory tree" stage. It's a _lot_ faster with postgresql and moderate tuning (My other gripes about the changeover notwithstanding, those are annoyances, not showstoppers) The lesson for us was that mysql doesn't scale to huge datasets well and we should have switched to postgres much earlier. AB -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] restores not working
On Mon, Mar 22, 2010 at 01:25:38PM -0500, Jonathan R. Dundas wrote: > I have a RHEL 5 x86_64 patched-current install with bacula RPM's built > from the sourceforge src RPMs. I'm running MySQL community edition > 5.1.43-1 mysql.com RPMs. I have updated bacula and tried this same > operation with bacula 5.0.0, 5.0.1 and 5.0.2. When I try to restore a > backup, bconsole appears to hang here: > > [...] > > You have selected the following JobIds: 8518,8529,8550,8571,8593,8613 > Building directory tree for JobId(s) 8518,8529,8550,8571,8593,8613 ... > > and it stays there, never completing. > The MySQL database seems hung on this query: It is probably not hung, but just very, very slow. We've had a same issue, with about 500 million records in File (and about 120GB on disk for File.ibd) on (mostly dedicated to mysql) machine (8gig RAM, 8x2.33 Xeon, different configurations with about 3-6GB for mysql buffers) -- it could take several hours for 5.0.1 until it completed and the system was ready for selecting few files to restore. :-( We switched from MyISAM to InnoDB, it didn't help. Greatly reducing the retention times did help (as it made mysql dataset much smaller). see there bacula wiki on http://tinyurl.com/yg37ujf or http://wiki.bacula.org/doku.php?id=faq#restore_takes_a_long_time_to_retrieve_sql_results_from_catalog for more info. And let the list and/or wiki know if you manage to fix it, please ! -- Opinions above are GNU-copylefted. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] restores not working
I have a RHEL 5 x86_64 patched-current install with bacula RPM's built from the sourceforge src RPMs. I'm running MySQL community edition 5.1.43-1 mysql.com RPMs. I have updated bacula and tried this same operation with bacula 5.0.0, 5.0.1 and 5.0.2. When I try to restore a backup, bconsole appears to hang here: Automatically selected FileSet: helpdesk_fileset +---+---+--+---+-++ | JobId | Level | JobFiles | JobBytes | StartTime | VolumeName | +---+---+--+---+-++ | 8,518 | F | 53,265 | 3,209,269,371 | 2010-03-13 10:07:17 | 26L4 | | 8,529 | D | 417 | 1,426,524,684 | 2010-03-13 11:09:14 | 20L4 | | 8,550 | I | 132 | 1,286,991,462 | 2010-03-14 00:14:50 | 20L4 | | 8,571 | I | 318 | 1,394,149,909 | 2010-03-14 23:40:37 | 20L4 | | 8,593 | I | 261 | 1,290,652,644 | 2010-03-15 23:43:57 | 20L4 | | 8,613 | I |5,638 | 1,523,482,045 | 2010-03-17 00:48:46 | 20L4 | +---+---+--+---+-++ You have selected the following JobIds: 8518,8529,8550,8571,8593,8613 Building directory tree for JobId(s) 8518,8529,8550,8571,8593,8613 ... and it stays there, never completing. The MySQL database seems hung on this query: SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (JobId) WHERE File.JobId IN (8518,8529,8550,8571,8593,8613) UNION ALL SELECT JobTDate, PathId, FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON(BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (8518,8529,8550,8571,8593,8613) ) AS tmp GROUP BY PathId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM BaseFiles WHERE JobId IN (8518,8529,8550,8571,8593,8613)) OR Job.JobId IN (8518,8529,8550,8571,8593,8613)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WHERE FileIndex > 0 ORDER BY Temp.JobId, FileIndex ASC I have checked & double checked having the right indexes, here are a few, if I'm missing one please let me know and I will list it: 1. mysql> show index from File; 2. +---+++--+-+---+-+--++--++-+ 3. | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 4. +---+++--+-+---+-+--++--++-+ 5. | File | 0 | PRIMARY|1 | FileId | A | 109205565 | NULL | NULL | | BTREE | | 6. | File | 1 | JobId |1 | JobId | A | 645 | NULL | NULL | | BTREE | | 7. | File | 1 | JobId |2 | PathId | A | 4200214 | NULL | NULL | | BTREE | | 8. | File | 1 | JobId |3 | FilenameId | A | 109205565 | NULL | NULL | | BTREE | | 9. | File | 1 | idxPathId |1 | PathId | A | 1050053 | NULL | NULL | | BTREE | | 10. | File | 1 | idxFilenameId |1 | FilenameId | A | 2800142 | NULL | NULL | | BTREE | | 11. | File | 1 | file_jobid_idx |1 | JobId | A | 645 | NULL | NULL | | BTREE | | 12. +---+++--+-+---+-+--++--++-+ 13. mysql> show index from Path; 14. +---++--+--+-+---+-+--++--++-+ 15. | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 16. +---++--+--+-+---+-+--++--++-+ 17. | Path | 0 | PRIMARY |1 | PathId | A | 3086385 | NULL | NULL | | BTREE | | 18. | Pa