Re: [Bacula-users] restores not working

2010-03-25 Thread Craig Miskell
On 25/03/2010 11:36 a.m., James Harper wrote:
>>
>> We've used Legato networker before (we still do, as we're not yet
>> successfully completed the migration; and it's looking more and more
>> grim prospect by the day), and on approximately the same dataset (of
>> about 500 million records spread over 100 servers) and somewhat
>> weaker hardware, it would allow user to start selecting files to
>> restore in matter of *seconds* (and it was using it's simple db6
>> files, no server/database tuning required at all)
>>
>> Now with bacula 5.0.1, we have to wait several *hours* before we can
>> start selecting files to restore, and it is considered "normal" ?!
>>
>
> I've always thought that Bacula could do this a bit better. If you are
> selecting files rather than restoring everything then the chances are
> you only want a small subset of all files (always exceptions of course),
> so why read the whole tree in at once? Why not read it in as required,
> or read it in 'layer by layer' in the background so the user can start
> selecting files immediately.
>
> Complexity is probably the reason why it hasn't been done, but it would
> be an interesting project.
Actually it's not that hard (I've done it), but some of the queries can 
be quite slow, particularly the one to find all the subdirectories of a 
given chosen directory.  I'm working on our own internal Web GUI for 
doing restores (ExtJS with a tree-based view of the filesystem for 
selecting files); I've found that you can either:
a) Do it with low memory usage (not building a tree and doing ad-hoc 
queries as you go, recursing through the selected directories), but 
it'll be quite slow, or
b) Use memory and pre-build time to build the directory tree in memory, 
then relatively quickly select.

(a) is better for small restores, (b) is better for restores of more 
than a few hundred files; if your database is grinding to a halt 
building the tree, it's gonna truly suck doing lots of small queries of 
large datasets required for (a).  In the end I'm going to give the users 
a choice of which method, so they can make a human decision.  It's hard 
to make that choice in code, because when the user has just selected a 
top level directory, the code doesn't know how deep the tree below that 
is.  Could be 10 files, could be 10 million :)

And for anyone interested:  Doing (b) in php is not a good idea. 
160byte overhead just to create an empty object, another 60+bytes per 
stored integer . Blech.  Perl is not amazingly better, but can be 
wrangled down to more tolerable memory sizes with some trickery.  The 
tree data storage really needs to be done in something 
language/mechanism that actually only uses 4 bytes to store a 32-bit 
integer :)

Craig Miskell

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-24 Thread James Harper
> 
> We've used Legato networker before (we still do, as we're not yet
> successfully completed the migration; and it's looking more and more
> grim prospect by the day), and on approximately the same dataset (of
> about 500 million records spread over 100 servers) and somewhat
> weaker hardware, it would allow user to start selecting files to
> restore in matter of *seconds* (and it was using it's simple db6
> files, no server/database tuning required at all)
> 
> Now with bacula 5.0.1, we have to wait several *hours* before we can
> start selecting files to restore, and it is considered "normal" ?!
> 

I've always thought that Bacula could do this a bit better. If you are
selecting files rather than restoring everything then the chances are
you only want a small subset of all files (always exceptions of course),
so why read the whole tree in at once? Why not read it in as required,
or read it in 'layer by layer' in the background so the user can start
selecting files immediately.

Complexity is probably the reason why it hasn't been done, but it would
be an interesting project.

James

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-24 Thread Jonathan R. Dundas
> You desperately need more ram. I have 4GB ram on my 64 bit bacula
> servers I built in 2004 and I probably have 1/2 that many files. And I
> said servers. My database is not on the same machine as the director.

It's already on order from Crucial!
--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-24 Thread John Drescher
> On a modest Dell rackount with an E4500 2.2GHz CPU, 1GB of RAM & 2
> raid1 SAS disks, retention 5 weeks, about 20 clients and about 100mil
> rows in File, the restore query for 'building directory tree' would
> still not be done after 24 hours with 5.0.x, whereas for 3.x.x the
> time to do this was not noticable.  Maybe 2 min?
>
You desperately need more ram. I have 4GB ram on my 64 bit bacula
servers I built in 2004 and I probably have 1/2 that many files. And I
said servers. My database is not on the same machine as the director.

John

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-24 Thread Jonathan R. Dundas
On Wed, Mar 24, 2010 at 8:32 AM, Matija Nalis  wrote:
>>> It is probably not hung, but just very, very slow.
>>
>> Yes. You probably need a LOT more ram and to tune mysql's parameters.
It appears so!  I've spent a large amount of time tweaking MySQL
actually, it must be RAM.

> (Before you ask, we've had to upgrade because there were [and still
> are in 5.0.1, although somewhat rarer] bugs with director stopping
> working -- otherwise we would've downgraded back to 3.0.x)
We had similar problems but as best I could figure at the time they
were issues with MySQL Redhat distro RPM's.  The MySQL database would
just be hung, and have to be killed and restarted.

> We'd try tweaking key_buffer (and converting to InnoDB and tweakings
> innodb_buffer_pool_size), join_buffer_size, max_heap_table_size,
> tmp_table_size, sort_buffer_size, read_buffer_size, read_rnd_buffer_size
> but in the end we've had to reduce retention to just a few weeks in order
> to make the restoration happen in reasonable times (ie. getting the file
> selection in less than 10 minutes).
On a modest Dell rackount with an E4500 2.2GHz CPU, 1GB of RAM & 2
raid1 SAS disks, retention 5 weeks, about 20 clients and about 100mil
rows in File, the restore query for 'building directory tree' would
still not be done after 24 hours with 5.0.x, whereas for 3.x.x the
time to do this was not noticable.  Maybe 2 min?

>> Even with 48Gb ram, a few restores on our system (~255 million File
>> records: up to 4 million files on some full backups but nost are under
>> 100k entries) could take an hour to get past the "building directory
>> tree" stage.
wow.

>> It's a _lot_ faster with postgresql and moderate tuning (My other gripes
>> about the changeover notwithstanding, those are annoyances, not
>> showstoppers)
I am in the middle of repopulating our database with bscan into a
postgres database hoping to find the same speed increase.  I've given
up on recovering the database or solving the problem with using MySQL.

> As it is, it is *much faster* for us if we need to restore one file
> to do a complete restore of whole server and then delete 99.999% of
> the files, than to use the file catalog to select few files to
> restore.
I wish I had that avenue, but a few of our servers have a TB of user
data, which makes it not viable for us.

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-24 Thread Matija Nalis
On Tue, Mar 23, 2010 at 04:07:17PM +, Alan Brown wrote:
> Matija Nalis wrote:
>> It is probably not hung, but just very, very slow.
>
> Yes. You probably need a LOT more ram and to tune mysql's parameters.

Or maybe someone should tune the SQL queries (no, I'm not
volunteering, it's not my forte) and/or the way bacula stores the
catalog ?

The main issue is that for us 3.0.3 was about 20-100 *times* faster
for approximately same dataset than 5.0.1. I do understand that 5.0.0
added BaseFiles support, but IMHO such is a speed drop is not an
acceptable tradeoff (especially if one cannot turn it off in order to
get faster queries again. I'd gladly compile with "--disable-basefiles"
if that gave me 2 orders of magnitude speedup).

Queries that never took more than 3-5 *minutes* with 3.0.3 have
started taking more that several *hours* with 5.0.0.

(Before you ask, we've had to upgrade because there were [and still
are in 5.0.1, although somewhat rarer] bugs with director stopping
working -- otherwise we would've downgraded back to 3.0.x)

We'd try tweaking key_buffer (and converting to InnoDB and tweakings
innodb_buffer_pool_size), join_buffer_size, max_heap_table_size,
tmp_table_size, sort_buffer_size, read_buffer_size, read_rnd_buffer_size 
but in the end we've had to reduce retention to just a few weeks in order 
to make the restoration happen in reasonable times (ie. getting the file 
selection in less than 10 minutes).

> Even with 48Gb ram, a few restores on our system (~255 million File  
> records: up to 4 million files on some full backups but nost are under  
> 100k entries) could take an hour to get past the "building directory  
> tree" stage.

That is really terrible, I really think the developers should look
into it.

We've used Legato networker before (we still do, as we're not yet
successfully completed the migration; and it's looking more and more
grim prospect by the day), and on approximately the same dataset (of
about 500 million records spread over 100 servers) and somewhat
weaker hardware, it would allow user to start selecting files to
restore in matter of *seconds* (and it was using it's simple db6
files, no server/database tuning required at all)

Now with bacula 5.0.1, we have to wait several *hours* before we can
start selecting files to restore, and it is considered "normal" ?!

Several minutes might be tolerated by our users (although even that
is almost hundred times slower than they were used to !), but several
hours most certainly isn't (and a retention drop from several months
to several weeks as alternative is also isn't making them extremely
happy)

> It's a _lot_ faster with postgresql and moderate tuning (My other gripes  
> about the changeover notwithstanding, those are annoyances, not  
> showstoppers)

Waiting several hours to choose file for restoring might not be an
issue for you; but we have users which were used to waiting just
several seconds to select files to restore (and a few more minutes
for restore to happen), and they are not impressed at all with
bacula.

As it is, it is *much faster* for us if we need to restore one file
to do a complete restore of whole server and then delete 99.999% of
the files, than to use the file catalog to select few files to
restore. That is ridiculous situation.

> The lesson for us was that mysql doesn't scale to huge datasets well and  
> we should have switched to postgres much earlier.

That might be, and we'll try converting to PostgreSQL (there are
issues with moving bacula data from MySQL to PostgreSQL), it seems.


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-23 Thread Alan Brown
Matija Nalis wrote:

> It is probably not hung, but just very, very slow.

Yes. You probably need a LOT more ram and to tune mysql's parameters.

> We've had a same
> issue, with about 500 million records in File (and about 120GB on
> disk for File.ibd) on (mostly dedicated to mysql) machine (8gig RAM,
> 8x2.33 Xeon, different configurations with about 3-6GB for mysql
> buffers) -- it could take several hours for 5.0.1 until it completed
> and the system was ready for selecting few files to restore. :-(

Even with 48Gb ram, a few restores on our system (~255 million File 
records: up to 4 million files on some full backups but nost are under 
100k entries) could take an hour to get past the "building directory 
tree" stage.

It's a _lot_ faster with postgresql and moderate tuning (My other gripes 
about the changeover notwithstanding, those are annoyances, not 
showstoppers)

The lesson for us was that mysql doesn't scale to huge datasets well and 
we should have switched to postgres much earlier.

AB



--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] restores not working

2010-03-23 Thread Matija Nalis
On Mon, Mar 22, 2010 at 01:25:38PM -0500, Jonathan R. Dundas wrote:
> I have a RHEL 5 x86_64 patched-current install with bacula RPM's built
> from the sourceforge src RPMs.  I'm running MySQL community edition
> 5.1.43-1 mysql.com RPMs.  I have updated bacula and tried this same
> operation with bacula 5.0.0, 5.0.1 and 5.0.2.  When I try to restore a
> backup, bconsole appears to hang here:
>
> [...]
> 
> You have selected the following JobIds: 8518,8529,8550,8571,8593,8613
> Building directory tree for JobId(s) 8518,8529,8550,8571,8593,8613 ...
>
> and it stays there, never completing.
> The MySQL database seems hung on this query:

It is probably not hung, but just very, very slow. We've had a same
issue, with about 500 million records in File (and about 120GB on
disk for File.ibd) on (mostly dedicated to mysql) machine (8gig RAM,
8x2.33 Xeon, different configurations with about 3-6GB for mysql
buffers) -- it could take several hours for 5.0.1 until it completed
and the system was ready for selecting few files to restore. :-(

We switched from MyISAM to InnoDB, it didn't help. Greatly reducing
the retention times did help (as it made mysql dataset much smaller).

see there bacula wiki on http://tinyurl.com/yg37ujf or
http://wiki.bacula.org/doku.php?id=faq#restore_takes_a_long_time_to_retrieve_sql_results_from_catalog

for more info. 

And let the list and/or wiki know if you manage to fix it, please !

-- 
Opinions above are GNU-copylefted.

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] restores not working

2010-03-22 Thread Jonathan R. Dundas
I have a RHEL 5 x86_64 patched-current install with bacula RPM's built
from the sourceforge src RPMs.  I'm running MySQL community edition
5.1.43-1 mysql.com RPMs.  I have updated bacula and tried this same
operation with bacula 5.0.0, 5.0.1 and 5.0.2.  When I try to restore a
backup, bconsole appears to hang here:

Automatically selected FileSet: helpdesk_fileset
+---+---+--+---+-++
| JobId | Level | JobFiles | JobBytes  | StartTime   | VolumeName |
+---+---+--+---+-++
| 8,518 | F |   53,265 | 3,209,269,371 | 2010-03-13 10:07:17 | 26L4   |
| 8,529 | D |  417 | 1,426,524,684 | 2010-03-13 11:09:14 | 20L4   |
| 8,550 | I |  132 | 1,286,991,462 | 2010-03-14 00:14:50 | 20L4   |
| 8,571 | I |  318 | 1,394,149,909 | 2010-03-14 23:40:37 | 20L4   |
| 8,593 | I |  261 | 1,290,652,644 | 2010-03-15 23:43:57 | 20L4   |
| 8,613 | I |5,638 | 1,523,482,045 | 2010-03-17 00:48:46 | 20L4   |
+---+---+--+---+-++
You have selected the following JobIds: 8518,8529,8550,8571,8593,8613

Building directory tree for JobId(s) 8518,8529,8550,8571,8593,8613 ...



and it stays there, never completing.


The MySQL database seems hung on this query:

SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat,
MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId
AS PathId, File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, (
SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT
JobTDate, PathId, FilenameId FROM File JOIN Job USING (JobId) WHERE
File.JobId IN (8518,8529,8550,8571,8593,8613) UNION ALL SELECT
JobTDate, PathId, FilenameId FROM BaseFiles JOIN File USING (FileId)
JOIN Job  ON(BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN
(8518,8529,8550,8571,8593,8613) ) AS tmp GROUP BY PathId, FilenameId )
AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM BaseFiles
WHERE JobId IN (8518,8529,8550,8571,8593,8613)) OR Job.JobId IN
(8518,8529,8550,8571,8593,8613)) AND T1.JobTDate = Job.JobTDate AND
Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId =
File.FilenameId ) AS Temp JOIN Filename ON (Filename.FilenameId =
Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WHERE
FileIndex > 0 ORDER BY Temp.JobId, FileIndex ASC


I have checked & double checked having the right indexes, here are a
few, if I'm missing one please let me know and I will list it:
   1.
  mysql> show index from File;
   2.
  
+---+++--+-+---+-+--++--++-+
   3.
  | Table | Non_unique | Key_name   | Seq_in_index |
Column_name | Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
   4.
  
+---+++--+-+---+-+--++--++-+
   5.
  | File  |  0 | PRIMARY|1 | FileId
  | A |   109205565 | NULL | NULL   |  | BTREE  |
   |
   6.
  | File  |  1 | JobId  |1 | JobId
  | A | 645 | NULL | NULL   |  | BTREE  |
   |
   7.
  | File  |  1 | JobId  |2 | PathId
  | A | 4200214 | NULL | NULL   |  | BTREE  |
   |
   8.
  | File  |  1 | JobId  |3 |
FilenameId  | A |   109205565 | NULL | NULL   |  |
BTREE  | |
   9.
  | File  |  1 | idxPathId  |1 | PathId
  | A | 1050053 | NULL | NULL   |  | BTREE  |
   |
  10.
  | File  |  1 | idxFilenameId  |1 |
FilenameId  | A | 2800142 | NULL | NULL   |  |
BTREE  | |
  11.
  | File  |  1 | file_jobid_idx |1 | JobId
  | A | 645 | NULL | NULL   |  | BTREE  |
   |
  12.
  
+---+++--+-+---+-+--++--++-+
  13.
  mysql> show index from Path;
  14.
  
+---++--+--+-+---+-+--++--++-+
  15.
  | Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type |
Comment |
  16.
  
+---++--+--+-+---+-+--++--++-+
  17.
  | Path  |  0 | PRIMARY  |1 | PathId  | A
| 3086385 | NULL | NULL   |  | BTREE  |
 |
  18.
  | Pa