subject:"\[Bacula\-users\] Deadlock error"

Re: [Bacula-users] Deadlock error

2015-08-10 Thread Craig Shiroma

Hi Ana,

Did you monitor resource usage during the backups? The list of files
generated for accurate backups are
kept in memory (by both director and client), so this should cause
resource use (CPU, memory, etc.) to increase in both hosts.

I did monitor these items earlier, but did not notice a huge drop in usage
in these items.  However, I was using Zabbix to monitor which takes a
reading every so many minutes.  It's possible it didn't take a reading when
the problem occurred.  I should've used something like top instead or
VMware's performance tools.  I'll take a look at the graphs again...I may
have missed it.

However, as usual your suggestions helped!  I increased the amount of
memory on Director and added more CPUs.  I also reduced the number of
concurrent jobs from 50 to 25.  So far, two days of incremental backups
have not produced any deadlock errors.  All my backups finished
successfully.

Thank you so much for the help!  (again)  :-)   Your advice is always,
always so helpful.

-craig


On Sat, Aug 8, 2015 at 4:11 PM, Ana Emília M. Arruda emiliaarr...@gmail.com
 wrote:

 Hi Craig,

 On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Hi Ana,

 Thank you for the suggestion!


 You're welcome!



 I'll look into adding more CPU and memory to director, although I didn't
 see much of an impact on either between a non-accurate run and an accurate
 run.  For example, there was large depletion of available memory, no
 swapping, or high load.


 Did you monitor resource usage during the backups? The list of files
 generated for accurate backups are kept in memory (by both director and
 client), so this should cause resource use (CPU, memory, etc.) to increase
 in both hosts.



 I did add more memory to the catalog server and turned Accurate back on
 for the same hosts.  I had no deadlocks.  Last night was mostly Fulls,
 though.  Not sure if that makes a difference.  Would you know if Bacula
 would uses less resources when fulls are run because it is going to back up
 everything anyway and no comparison of files needs to be made (I'm
 guessing)?  When a full is done, does Bacula still need to keep a list of
 the files in memory for hosts using Accurate backups?  My first thought is
 no.


 Mine too. When using accurate backups, the amount of resources used
 should be noticed when running incremental, differential and full+basejobs
 backups.



 Thanks again for the help!  Your posts are always so helpful.


 Thank you too!
 Best regards,
 Ana



 -craig


 On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda 
 emiliaarr...@gmail.com wrote:

 Hello Craig,

 In one of your posts you mentioned Segmentation violation in the
 director host. Accurate backups requires more resources than normal ones.
 Have you checked if CPU and memory resources are enough in director and the
 clients that are configured for using accurate mode?

 Best regards,
 Ana

 On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com
  wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the hosts
 that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about
 databases -- especially MySQL since I use PostgreSQL.  PostgreSQL is 
 harder
 to install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw
 the error, not Bacula.  Whatever DB you are using is what is having the
 issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please
 be aware that it may be a very good database, maybe even better than 
 MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries 
 that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it

Re: [Bacula-users] Deadlock error

2015-08-10 Thread Craig Shiroma

Hi Kelvin,

Thank you for the info and help!  Good information to keep in mind.

I think I found the root of the problem thanks to everyone.  See my reply
to Ana.

Thanks again for the post.  It's much appreciated.

-craig


On Fri, Aug 7, 2015 at 10:46 AM, Kelvin Minter kb.min...@gmail.com wrote:

 MyISAM is terrible for transactions. If the deadlock is happening because
 of table locking then switching the engine to InnoDB might help your
 problem.
 MyISAM locks the entire table while InnoDB only locks the rows it is
 updating.

 Check out the link below.

 http://stackoverflow.com/questions/20148/myisam-versus-innodb

 On Thu, Aug 6, 2015 at 8:37 PM, Ana Emília M. Arruda 
 emiliaarr...@gmail.com wrote:

 Hello Craig,

 In one of your posts you mentioned Segmentation violation in the
 director host. Accurate backups requires more resources than normal ones.
 Have you checked if CPU and memory resources are enough in director and the
 clients that are configured for using accurate mode?

 Best regards,
 Ana

 On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the hosts
 that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about databases
 -- especially MySQL since I use PostgreSQL.  PostgreSQL is harder to
 install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw
 the error, not Bacula.  Whatever DB you are using is what is having the
 issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please
 be aware that it may be a very good database, maybe even better than 
 MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with
 Accurate -- the database routines know nothing about accurate and none of
 the data is different.  It is more likely due to the VM environment or to
 some build or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on
 what may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com
 wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of
 days and got the same errors on some VMs that were running accurate and
 some that were not.  These hosts were running concurrently.  I would say
 90% of the hosts that were configured to use Accurate finished
 successfully.  However, there were a few that failed with the deadlock
 error -- some that were configured to use accurate and some that were not
 configured to use accurate.  Also, on all of these, a second job started
 for each of the affected hosts right after Bacula detected the deadlock
 even though it said a reschedule would happen 3600 seconds later (the 
 3600
 seconds is correct).

 Tonight, I disabled accurate on all

Re: [Bacula-users] Deadlock error

2015-08-10 Thread Craig Shiroma

Hi Josip,

Thank you for the advice and for looking that up in the mysql docs. That
was pretty much the error I was getting.

See my reply to Ana.

Again, thank you for the help. I really appreciate it.

-craig

On Thu, Aug 6, 2015 at 9:26 PM, Josip Deanovic djosip+n...@linuxpages.net
wrote:

On Thursday 2015-08-06 09:44:06 Craig Shiroma wrote:
Hi Kern,

Thank you for the info! We're using MySQL 5.6 Percona Server, Release
68.0, Revision 656.

Would this setting cause the problem?
innodb_lock_wait_timeout = 100

Is it too high or too low or has no bearing on the problem?

Hi!

http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout

Documentation says:

-BEGIN-
The timeout in seconds an InnoDB transaction may wait for a row lock
before giving up. The default value is 50 seconds. A transaction that
tries to access a row that is locked by another InnoDB transaction will
hang for at most this many seconds before issuing the following error:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
When a lock wait timeout occurs, the current statement is not executed.
The current transaction is not rolled back.
-END-

So I wouldn't say that decreasing this value would change anything in
your case.

--
Josip Deanovic

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Deadlock error

2015-08-10 Thread Ana Emília M . Arruda

Hi Craig,

Good news!

You're welcome (again) :). And thank you for your feedback. They are always
useful.

Best regards,
Ana

On Mon, Aug 10, 2015 at 3:35 PM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

 Hi Ana,

 Did you monitor resource usage during the backups? The list of files
 generated for accurate backups are
 kept in memory (by both director and client), so this should cause
 resource use (CPU, memory, etc.) to increase in both hosts.

 I did monitor these items earlier, but did not notice a huge drop in usage
 in these items.  However, I was using Zabbix to monitor which takes a
 reading every so many minutes.  It's possible it didn't take a reading when
 the problem occurred.  I should've used something like top instead or
 VMware's performance tools.  I'll take a look at the graphs again...I may
 have missed it.

 However, as usual your suggestions helped!  I increased the amount of
 memory on Director and added more CPUs.  I also reduced the number of
 concurrent jobs from 50 to 25.  So far, two days of incremental backups
 have not produced any deadlock errors.  All my backups finished
 successfully.

 Thank you so much for the help!  (again)  :-)   Your advice is always,
 always so helpful.

 -craig


 On Sat, Aug 8, 2015 at 4:11 PM, Ana Emília M. Arruda 
 emiliaarr...@gmail.com wrote:

 Hi Craig,

 On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Hi Ana,

 Thank you for the suggestion!


 You're welcome!



 I'll look into adding more CPU and memory to director, although I didn't
 see much of an impact on either between a non-accurate run and an accurate
 run.  For example, there was large depletion of available memory, no
 swapping, or high load.


 Did you monitor resource usage during the backups? The list of files
 generated for accurate backups are kept in memory (by both director and
 client), so this should cause resource use (CPU, memory, etc.) to increase
 in both hosts.



 I did add more memory to the catalog server and turned Accurate back on
 for the same hosts.  I had no deadlocks.  Last night was mostly Fulls,
 though.  Not sure if that makes a difference.  Would you know if Bacula
 would uses less resources when fulls are run because it is going to back up
 everything anyway and no comparison of files needs to be made (I'm
 guessing)?  When a full is done, does Bacula still need to keep a list of
 the files in memory for hosts using Accurate backups?  My first thought is
 no.


 Mine too. When using accurate backups, the amount of resources used
 should be noticed when running incremental, differential and full+basejobs
 backups.



 Thanks again for the help!  Your posts are always so helpful.


 Thank you too!
 Best regards,
 Ana



 -craig


 On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda 
 emiliaarr...@gmail.com wrote:

 Hello Craig,

 In one of your posts you mentioned Segmentation violation in the
 director host. Accurate backups requires more resources than normal ones.
 Have you checked if CPU and memory resources are enough in director and the
 clients that are configured for using accurate mode?

 Best regards,
 Ana

 On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma 
 shiroma.crai...@gmail.com wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects
 the deadlock for each affected job instead of waiting until the 
 rescheduled
 job runs?  I verified that there were no duplicate jobs in the queue 
 before
 the backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the
 hosts that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server,
 Release 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about
 databases -- especially MySQL since I use PostgreSQL.  PostgreSQL is 
 harder
 to install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com
 wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw
 the error, not Bacula.  Whatever DB you are using is what is having the
 issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please
 be aware that it may be a very good database, maybe

Re: [Bacula-users] Deadlock error

2015-08-08 Thread Ana Emília M . Arruda

Hi Craig,

On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

 Hi Ana,

 Thank you for the suggestion!


You're welcome!



 I'll look into adding more CPU and memory to director, although I didn't
 see much of an impact on either between a non-accurate run and an accurate
 run.  For example, there was large depletion of available memory, no
 swapping, or high load.


Did you monitor resource usage during the backups? The list of files
generated for accurate backups are kept in memory (by both director and
client), so this should cause resource use (CPU, memory, etc.) to increase
in both hosts.



 I did add more memory to the catalog server and turned Accurate back on
 for the same hosts.  I had no deadlocks.  Last night was mostly Fulls,
 though.  Not sure if that makes a difference.  Would you know if Bacula
 would uses less resources when fulls are run because it is going to back up
 everything anyway and no comparison of files needs to be made (I'm
 guessing)?  When a full is done, does Bacula still need to keep a list of
 the files in memory for hosts using Accurate backups?  My first thought is
 no.


Mine too. When using accurate backups, the amount of resources used should
be noticed when running incremental, differential and full+basejobs backups.



 Thanks again for the help!  Your posts are always so helpful.


Thank you too!
Best regards,
Ana



 -craig


 On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda 
 emiliaarr...@gmail.com wrote:

 Hello Craig,

 In one of your posts you mentioned Segmentation violation in the
 director host. Accurate backups requires more resources than normal ones.
 Have you checked if CPU and memory resources are enough in director and the
 clients that are configured for using accurate mode?

 Best regards,
 Ana

 On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the hosts
 that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about databases
 -- especially MySQL since I use PostgreSQL.  PostgreSQL is harder to
 install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw
 the error, not Bacula.  Whatever DB you are using is what is having the
 issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please
 be aware that it may be a very good database, maybe even better than 
 MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with
 Accurate -- the database routines know nothing about accurate and none of
 the data is different.  It is more likely due to the VM environment or to
 some build or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on
 what may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com
 wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this

Re: [Bacula-users] Deadlock error

2015-08-07 Thread Josip Deanovic

On Thursday 2015-08-06 09:44:06 Craig Shiroma wrote:
 Hi Kern,
 
 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.
 
 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100
 
 Is it too high or too low or has no bearing on the problem?

Hi!

http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout

Documentation says:

-BEGIN-
 The timeout in seconds an InnoDB transaction may wait for a row lock 
before giving up. The default value is 50 seconds. A transaction that 
tries to access a row that is locked by another InnoDB transaction will 
hang for at most this many seconds before issuing the following error:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
When a lock wait timeout occurs, the current statement is not executed. 
The current transaction is not rolled back.
-END-


So I wouldn't say that decreasing this value would change anything in
your case.


-- 
Josip Deanovic

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Deadlock error

2015-08-07 Thread Craig Shiroma

Hi Ana,

Thank you for the suggestion!

I'll look into adding more CPU and memory to director, although I didn't
see much of an impact on either between a non-accurate run and an accurate
run.  For example, there was large depletion of available memory, no
swapping, or high load.

I did add more memory to the catalog server and turned Accurate back on for
the same hosts.  I had no deadlocks.  Last night was mostly Fulls, though.
Not sure if that makes a difference.  Would you know if Bacula would uses
less resources when fulls are run because it is going to back up everything
anyway and no comparison of files needs to be made (I'm guessing)?  When a
full is done, does Bacula still need to keep a list of the files in memory
for hosts using Accurate backups?  My first thought is no.

Thanks again for the help!  Your posts are always so helpful.

-craig


On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com
 wrote:

 Hello Craig,

 In one of your posts you mentioned Segmentation violation in the director
 host. Accurate backups requires more resources than normal ones. Have you
 checked if CPU and memory resources are enough in director and the clients
 that are configured for using accurate mode?

 Best regards,
 Ana

 On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the hosts
 that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about databases
 -- especially MySQL since I use PostgreSQL.  PostgreSQL is harder to
 install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw the
 error, not Bacula.  Whatever DB you are using is what is having the issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please be
 aware that it may be a very good database, maybe even better than MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with
 Accurate -- the database routines know nothing about accurate and none of
 the data is different.  It is more likely due to the VM environment or to
 some build or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on
 what may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days
 and got the same errors on some VMs that were running accurate and some
 that were not.  These hosts were running concurrently.  I would say 90% of
 the hosts that were configured to use Accurate finished successfully.
 However, there were a few that failed with the deadlock error -- some that
 were configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a

Re: [Bacula-users] Deadlock error

2015-08-07 Thread Ana Emília M . Arruda

Hello Craig,

In one of your posts you mentioned Segmentation violation in the director
host. Accurate backups requires more resources than normal ones. Have you
checked if CPU and memory resources are enough in director and the clients
that are configured for using accurate mode?

Best regards,
Ana

On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the hosts
 that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about databases
 -- especially MySQL since I use PostgreSQL.  PostgreSQL is harder to
 install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw the
 error, not Bacula.  Whatever DB you are using is what is having the issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please be
 aware that it may be a very good database, maybe even better than MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with
 Accurate -- the database routines know nothing about accurate and none of
 the data is different.  It is more likely due to the VM environment or to
 some build or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on
 what may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days
 and got the same errors on some VMs that were running accurate and some
 that were not.  These hosts were running concurrently.  I would say 90% of
 the hosts that were configured to use Accurate finished successfully.
 However, there were a few that failed with the deadlock error -- some that
 were configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it said
 a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts
 configured to use accurate and some configured not to use accurate?  Or, is
 it an all or none thing, meaning all hosts that run concurrently must
 either be using accurate backup or not using accurate backup (cannot mix
 the two)?

 2. It seems like the hosts that get out of the starting gate first are
 the ones affected.  I am configured to run 50 jobs concurrently.  Again, no
 problems with accurate turned off on all hosts for months now.

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each

Re: [Bacula-users] Deadlock error

2015-08-07 Thread Josip Deanovic

On Thursday 2015-08-06 09:44:06 Craig Shiroma wrote:
 Hi Kern,
 
 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.
 
 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100
 
 Is it too high or too low or has no bearing on the problem?
 
 Thanks again,
 -craig

One more thing...

MySQL is using MyISAM storage engine by default while MySQL Percona
is using InnoDB by default.

Maybe this could be the source of the problem you are experiencing.
Unless you have a better idea I would suggest to try it with MyISAM
storage engine. I know few applications that just can't work very well
with InnoDB and I don't know if bacula has been thoroughly tested with
InnoDB MySQL support.

I am using bacula with both MyISAM and InnoDB with the Accurate option
enabled but my jobs are usually not executing simultaneously because I
can afford it due to the small number of jobs per bacula installation 
(less than 100 jobs and they are relatively small and quick).

Optimized database and database engine could increase the database
performance considerably but in your case it wouldn't solve the
problem unless something is really really bad on the database side.

-- 
Josip Deanovic

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Deadlock error

2015-08-07 Thread Kelvin Minter

MyISAM is terrible for transactions. If the deadlock is happening because
of table locking then switching the engine to InnoDB might help your
problem.
MyISAM locks the entire table while InnoDB only locks the rows it is
updating.

Check out the link below.

http://stackoverflow.com/questions/20148/myisam-versus-innodb

On Thu, Aug 6, 2015 at 8:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com
 wrote:

 Hello Craig,

 In one of your posts you mentioned Segmentation violation in the director
 host. Accurate backups requires more resources than normal ones. Have you
 checked if CPU and memory resources are enough in director and the clients
 that are configured for using accurate mode?

 Best regards,
 Ana

 On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Thanks Kern!  I'll bring in a DBA on our side to have a look.

 Would you have any thoughts on this question posed earlier?

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 This happened on both nights I ran with Accurate turned On on the hosts
 that had failed backups because of the deadlock.

 Regards,
 -craig

 On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about databases
 -- especially MySQL since I use PostgreSQL.  PostgreSQL is harder to
 install and a bit harder to configure than MySQL, but it performs much
 better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw the
 error, not Bacula.  Whatever DB you are using is what is having the issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please be
 aware that it may be a very good database, maybe even better than MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with
 Accurate -- the database routines know nothing about accurate and none of
 the data is different.  It is more likely due to the VM environment or to
 some build or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on
 what may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days
 and got the same errors on some VMs that were running accurate and some
 that were not.  These hosts were running concurrently.  I would say 90% of
 the hosts that were configured to use Accurate finished successfully.
 However, there were a few that failed with the deadlock error -- some that
 were configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it 
 said
 a reschedule would happen 3600 seconds later (the 3600 seconds is 
 correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished 
 successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts
 configured to use accurate and some configured not to use accurate?  Or, 
 is
 it an all or none thing, meaning all hosts

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Craig Shiroma

Hi Bryn,

Thank you for the translation!  :-)  Much appreciated.  I'll ask our DBA to
take a look at the DB (mysql).  Maybe it needs some tuning for Accurate.
Do you know of any documentation for this?  I only saw a couple of small
sections for Accurate in the manual, mainly how to turn it on and that it
uses more resources.  I haven't had a chance to read the whole manual yet
so I might have missed the section.

-craig

On Thu, Aug 6, 2015 at 6:46 AM, Bryn Hughes li...@nashira.ca wrote:

 I think what Kern is getting at is that your database is what threw the
 error, not Bacula.  Whatever DB you are using is what is having the issue.

 Bryn


 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on what
 may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days
 and got the same errors on some VMs that were running accurate and some
 that were not.  These hosts were running concurrently.  I would say 90% of
 the hosts that were configured to use Accurate finished successfully.
 However, there were a few that failed with the deadlock error -- some that
 were configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it said
 a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts configured
 to use accurate and some configured not to use accurate?  Or, is it an all
 or none thing, meaning all hosts that run concurrently must either be using
 accurate backup or not using accurate backup (cannot mix the two)?

 2. It seems like the hosts that get out of the starting gate first are
 the ones affected.  I am configured to run 50 jobs concurrently.  Again, no
 problems with accurate turned off on all hosts for months now.

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.


 Bacula is not aware of any SQL internal deadlocks.


 From the INNODB Monitor output:

 TRANSACTION:
 TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
 mysql tables in use 4, locked 4
 9 lock struct(s), heap size 1184, 5 row lock(s)
 MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637
 host 192.168.10.99 bacula Sending data
 INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
 DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
 Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN
 Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
 Filename.Name)
 WAITING FOR THIS LOCK TO BE GRANTED:
 TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC
 waiting
 WE ROLL BACK TRANSACTION (2)

 I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
 Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 5's
 and Windows Servers 2008 and 2012R2.

 Any help would be much appreciated.

 Warmest regards,
 -craig

 On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 BTW, I suppose there could've been two jobs for the host(s) in
 scheduling queue.  If this was the case, is there a way to find out after
 the fact?  If this did actually happen, what could cause duplicate jobs to
 be scheduled on the same day at the same time?  I know no one manually ran
 the jobs in question.  Again, this only was a problem for a few of the jobs
 that ran last night, not all of them and some to do accurate backup and
 some not.

 Regards,
 -craig

 On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com
  wrote:

 Hello,

 I had a few backups fail last night with the following error:

 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex,
 JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex,
 batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
 batch.DeltaSeq FROM

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Kern Sibbald


  
  
On 06.08.2015 21:36, Craig Shiroma
  wrote:


  Hi Bryn,


Thank you for the translation!  :-)  Much appreciated. 
  I'll ask our DBA to take a look at the DB (mysql).  Maybe it
  needs some tuning for Accurate.  Do you know of any
  documentation for this?
  


Type: "tuning mysql for bacula" into your browser ...




  
 I only saw a couple of small sections for Accurate in the
  manual, mainly how to turn it on and that it uses more
  resources.  I haven't had a chance to read the whole manual
  yet so I might have missed the section.


-craig
  
  
On Thu, Aug 6, 2015 at 6:46 AM, Bryn
  Hughes li...@nashira.ca
  wrote:
  

  I think what Kern is getting at is that your database
is what threw the error, not Bacula.  Whatever DB you
are using is what is having the issue.

Bryn

  

On 2015-08-06 09:11 AM, Craig Shiroma wrote:
  

  
  

  
Hi Kern,
  
  
  Thank you very much for the reply!  Would you
have any suggestions on what may be causing this
problem or how I can debug it?  Obviously, I'm
encountering deadlocks when accurate backup runs
on some of our hosts and we want to use accurate
backup on all of our hosts if possible.
  
  
  Warmest regards,
  -craig


  On Thu, Aug 6, 2015 at
12:11 AM, Kern Sibbald k...@sibbald.com
wrote:

  
  On 06.08.2015 10:15, Craig Shiroma
wrote:
  
  
Hello again,
  
  
  I just thought I'd update this
post with more information in hopes
of getting some explanation for the
deadlocks.  
  
  
  I ran with Accurate backup on our
test VMs (RHEL) for a couple of days
and got the same errors on some VMs
that were running accurate and some
that were not.  These hosts were
running concurrently.  I would say
90% of the hosts that were
configured to use Accurate finished
successfully.  However, there were a
few that failed with the deadlock
error -- some that were configured
to use accurate and some that were
not configured to use accurate. 
Also, on all of these, a second job
started for each of the affected
hosts right after Bacula detected
the deadlock even though it said a
reschedule would happen 3600 seconds
later (the 3600 seconds is correct).
  
  
  Tonight, I disabled accurate on
all hosts and the deadlocks did not
happen.  No errors were detected and
all the backups finished
successfully.
  
  
  Some questions...
  1.  Can I back up multiple hosts
concurrently with some hosts
configured to use accurate and some
configured not to use accurate?  Or,
is it an all or none thing, meaning

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Craig Shiroma

Hi Kern,

Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
68.0, Revision 656.

Would this setting cause the problem?
innodb_lock_wait_timeout = 100

Is it too high or too low or has no bearing on the problem?

Thanks again,
-craig


On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw the
 error, not Bacula.  Whatever DB you are using is what is having the issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please be
 aware that it may be a very good database, maybe even better than MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with Accurate
 -- the database routines know nothing about accurate and none of the data
 is different.  It is more likely due to the VM environment or to some build
 or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on what
 may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days
 and got the same errors on some VMs that were running accurate and some
 that were not.  These hosts were running concurrently.  I would say 90% of
 the hosts that were configured to use Accurate finished successfully.
 However, there were a few that failed with the deadlock error -- some that
 were configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it said
 a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts configured
 to use accurate and some configured not to use accurate?  Or, is it an all
 or none thing, meaning all hosts that run concurrently must either be using
 accurate backup or not using accurate backup (cannot mix the two)?

 2. It seems like the hosts that get out of the starting gate first are
 the ones affected.  I am configured to run 50 jobs concurrently.  Again, no
 problems with accurate turned off on all hosts for months now.

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.


 Bacula is not aware of any SQL internal deadlocks.


 From the INNODB Monitor output:

 TRANSACTION:
 TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
 mysql tables in use 4, locked 4
 9 lock struct(s), heap size 1184, 5 row lock(s)
 MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637
 host 192.168.10.99 bacula Sending data
 INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
 DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
 Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN
 Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
 Filename.Name)
 WAITING FOR THIS LOCK TO BE GRANTED:
 TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC
 waiting
 WE ROLL BACK TRANSACTION (2)

 I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
 Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 5's
 and Windows Servers 2008 and 2012R2.

 Any help would be much appreciated.

 Warmest regards,
 -craig

 On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Kern Sibbald


  
  
On 06.08.2015 21:44, Craig Shiroma
  wrote:


  Hi Kern,


Thank you for the info!  We're using MySQL 5.6 Percona
  Server, Release 68.0, Revision 656.


Would this setting cause the problem?
innodb_lock_wait_timeout
= 100



Is it too high or too low or has no bearing on the problem?
  


Sorry, I am a Bacula programmer, and I do not know much about
databases -- especially MySQL since I use PostgreSQL.  PostgreSQL is
harder to install and a bit harder to configure than MySQL, but it
performs much better.


  


Thanks again,
-craig


  
  
On Thu, Aug 6, 2015 at 9:26 AM, Kern
  Sibbald k...@sibbald.com
  wrote:
  

On 06.08.2015 18:46, Bryn Hughes wrote:


  I think what Kern is getting at is that your
database is what threw the error, not Bacula. 
Whatever DB you are using is what is having the
issue.
  


   Yes.  That is exactly what I was implying.  
  
  The rest of this is directed to Craig:
  If you are using MariaDB (I have no indication that you
  are), please be aware that it may be a very good database,
  maybe even better than MySQL, but Bacula is built and
  tested against MySQL, and if you use binaries that were
  built for MySQL, you could run into problems by using
  MariaDB.  Even if your binaries were explicitly built with
  MariaDB, it may not be compatible with the way Bacula
  works.  Bacula has a tendency to push databases to the
  extreme, and it works well with MySQL and PostgreSQL, but
  possibly not with other databases.  I bring up MariaDB
  because it has been mentioned in another posting to this
  list.
  
  I would be very surprised if your problem has anything to
  do with Accurate -- the database routines know nothing
  about accurate and none of the data is different.  It is
  more likely due to the VM environment or to some build or
  version problem with MySQL (or MariaDB).
  
  Best regards,
  Kern
  

  
  
 Bryn
  
  On 2015-08-06 09:11 AM, Craig Shiroma wrote:


  Hi Kern,


Thank you very much for the reply!  Would
  you have any suggestions on what may be
  causing this problem or how I can debug it? 
  Obviously, I'm encountering deadlocks when
  accurate backup runs on some of our hosts and
  we want to use accurate backup on all of our
  hosts if possible.


Warmest regards,
-craig
  
  
On Thu, Aug 6, 2015 at
  12:11 AM, Kern Sibbald k...@sibbald.com
  wrote:
  

On 06.08.2015 10:15, Craig Shiroma
  wrote:


  Hello again,


I just thought I'd update this
  post with more information in
  hopes of getting some explanation
  for the deadlocks.  


I ran with Accurate backup on
  our test VMs (RHEL) for a couple
  of days and got the same errors on
  some VMs that were running
  accurate and some that were not. 
  These hosts were running
  concurrently.  I would say 90% of
  the hosts that were configured to
  use Accurate finished

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Kern Sibbald


  
  
On 06.08.2015 10:15, Craig Shiroma
  wrote:


  Hello again,


I just thought I'd update this post with more information
  in hopes of getting some explanation for the deadlocks.  


I ran with Accurate backup on our test VMs (RHEL) for a
  couple of days and got the same errors on some VMs that were
  running accurate and some that were not.  These hosts were
  running concurrently.  I would say 90% of the hosts that were
  configured to use Accurate finished successfully.  However,
  there were a few that failed with the deadlock error -- some
  that were configured to use accurate and some that were not
  configured to use accurate.  Also, on all of these, a second
  job started for each of the affected hosts right after Bacula
  detected the deadlock even though it said a reschedule would
  happen 3600 seconds later (the 3600 seconds is correct).


Tonight, I disabled accurate on all hosts and the deadlocks
  did not happen.  No errors were detected and all the backups
  finished successfully.


Some questions...
1.  Can I back up multiple hosts concurrently with some
  hosts configured to use accurate and some configured not to
  use accurate?  Or, is it an all or none thing, meaning all
  hosts that run concurrently must either be using accurate
  backup or not using accurate backup (cannot mix the two)?


2. It seems like the hosts that get out of the starting
  gate first are the ones affected.  I am configured to run 50
  jobs concurrently.  Again, no problems with accurate turned
  off on all hosts for months now.


3. Why is Bacula spinning off a new job right away after it
  detects the deadlock for each affected job instead of waiting
  until the rescheduled job runs?  I verified that there were no
  duplicate jobs in the queue before the backups started
  running, no jobs were running before the start of the backups,
  and I did not start any of these backups manually to cause a
  second job to appear.
  


Bacula is not aware of any SQL internal deadlocks.


  


From the INNODB Monitor output:



  TRANSACTION:
  TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
  mysql tables in use 4, locked 4
  9 lock struct(s), heap size 1184, 5 row lock(s)
  MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700,
query id 29558637 host 192.168.10.99 bacula Sending
data
  INSERT INTO File (FileIndex, JobId, PathId, FilenameId,
LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId,
Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
batch.DeltaSeq FROM batch JOIN Path ON (batch.Path =
Path.Path) JOIN Filename ON (batch.Name = Filename.Name)
  WAITING FOR THIS LOCK TO BE GRANTED:
  TABLE LOCK table `bacula`.`File` trx id 208788977 lock
mode AUTO-INC waiting
  WE ROLL BACK TRANSACTION (2)



I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director,
  Storage and Catalog running on separate RHEL 6.6 hosts.  Our
  clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2.


Any help would be much appreciated.


Warmest regards,
-craig
  
  
On Tue, Aug 4, 2015 at 1:56 PM, Craig
  Shiroma shiroma.crai...@gmail.com
  wrote:
  
BTW, I suppose there could've been two jobs
  for the host(s) in scheduling queue.  If this was the
  case, is there a way to find out after the fact?  If this
  did actually happen, what could cause duplicate jobs to be
  scheduled on the same day at the same time?  I know no one
  manually ran the jobs in question.  Again, this only was a
  problem for a few of the jobs that ran last night, not all
  of them and some to do accurate backup and some not.
  
  
  Regards,
  -craig


  

  On Tue, Aug 4, 2015 at 9:27
AM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

  Hello,


I had a few backups fail last night with
  the following error:


2015-08-03

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Craig Shiroma

Thanks Kern!  I'll bring in a DBA on our side to have a look.

Would you have any thoughts on this question posed earlier?

3. Why is Bacula spinning off a new job right away after it detects the
deadlock for each affected job instead of waiting until the rescheduled job
runs?  I verified that there were no duplicate jobs in the queue before the
backups started running, no jobs were running before the start of the
backups, and I did not start any of these backups manually to cause a
second job to appear.

This happened on both nights I ran with Accurate turned On on the hosts
that had failed backups because of the deadlock.

Regards,
-craig

On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 21:44, Craig Shiroma wrote:

 Hi Kern,

 Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
 68.0, Revision 656.

 Would this setting cause the problem?
 innodb_lock_wait_timeout = 100

 Is it too high or too low or has no bearing on the problem?


 Sorry, I am a Bacula programmer, and I do not know much about databases --
 especially MySQL since I use PostgreSQL.  PostgreSQL is harder to install
 and a bit harder to configure than MySQL, but it performs much better.



 Thanks again,
 -craig


 On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 18:46, Bryn Hughes wrote:

 I think what Kern is getting at is that your database is what threw the
 error, not Bacula.  Whatever DB you are using is what is having the issue.


 Yes.  That is exactly what I was implying.

 The rest of this is directed to Craig:
 If you are using MariaDB (I have no indication that you are), please be
 aware that it may be a very good database, maybe even better than MySQL,
 but Bacula is built and tested against MySQL, and if you use binaries that
 were built for MySQL, you could run into problems by using MariaDB.  Even
 if your binaries were explicitly built with MariaDB, it may not be
 compatible with the way Bacula works.  Bacula has a tendency to push
 databases to the extreme, and it works well with MySQL and PostgreSQL, but
 possibly not with other databases.  I bring up MariaDB because it has been
 mentioned in another posting to this list.

 I would be very surprised if your problem has anything to do with
 Accurate -- the database routines know nothing about accurate and none of
 the data is different.  It is more likely due to the VM environment or to
 some build or version problem with MySQL (or MariaDB).

 Best regards,
 Kern


 Bryn

 On 2015-08-06 09:11 AM, Craig Shiroma wrote:

 Hi Kern,

 Thank you very much for the reply!  Would you have any suggestions on
 what may be causing this problem or how I can debug it?  Obviously, I'm
 encountering deadlocks when accurate backup runs on some of our hosts and
 we want to use accurate backup on all of our hosts if possible.

 Warmest regards,
 -craig

 On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days
 and got the same errors on some VMs that were running accurate and some
 that were not.  These hosts were running concurrently.  I would say 90% of
 the hosts that were configured to use Accurate finished successfully.
 However, there were a few that failed with the deadlock error -- some that
 were configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it said
 a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts configured
 to use accurate and some configured not to use accurate?  Or, is it an all
 or none thing, meaning all hosts that run concurrently must either be using
 accurate backup or not using accurate backup (cannot mix the two)?

 2. It seems like the hosts that get out of the starting gate first are
 the ones affected.  I am configured to run 50 jobs concurrently.  Again, no
 problems with accurate turned off on all hosts for months now.

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.


 Bacula is not aware of any SQL internal deadlocks.


 From the INNODB Monitor output:

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Bryn Hughes

I think what Kern is getting at is that your database is what threw the 
error, not Bacula.  Whatever DB you are using is what is having the issue.


Bryn

On 2015-08-06 09:11 AM, Craig Shiroma wrote:

Hi Kern,

Thank you very much for the reply!  Would you have any suggestions on 
what may be causing this problem or how I can debug it?  Obviously, 
I'm encountering deadlocks when accurate backup runs on some of our 
hosts and we want to use accurate backup on all of our hosts if possible.


Warmest regards,
-craig

On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com 
mailto:k...@sibbald.com wrote:


On 06.08.2015 10:15, Craig Shiroma wrote:

Hello again,

I just thought I'd update this post with more information in
hopes of getting some explanation for the deadlocks.

I ran with Accurate backup on our test VMs (RHEL) for a couple of
days and got the same errors on some VMs that were running
accurate and some that were not.  These hosts were running
concurrently.  I would say 90% of the hosts that were configured
to use Accurate finished successfully.  However, there were a few
that failed with the deadlock error -- some that were configured
to use accurate and some that were not configured to use
accurate.  Also, on all of these, a second job started for each
of the affected hosts right after Bacula detected the deadlock
even though it said a reschedule would happen 3600 seconds later
(the 3600 seconds is correct).

Tonight, I disabled accurate on all hosts and the deadlocks did
not happen.  No errors were detected and all the backups finished
successfully.

Some questions...
1.  Can I back up multiple hosts concurrently with some hosts
configured to use accurate and some configured not to use
accurate?  Or, is it an all or none thing, meaning all hosts that
run concurrently must either be using accurate backup or not
using accurate backup (cannot mix the two)?

2. It seems like the hosts that get out of the starting gate
first are the ones affected.  I am configured to run 50 jobs
concurrently.  Again, no problems with accurate turned off on all
hosts for months now.

3. Why is Bacula spinning off a new job right away after it
detects the deadlock for each affected job instead of waiting
until the rescheduled job runs?  I verified that there were no
duplicate jobs in the queue before the backups started running,
no jobs were running before the start of the backups, and I did
not start any of these backups manually to cause a second job to
appear.


Bacula is not aware of any SQL internal deadlocks.



From the INNODB Monitor output:

TRANSACTION:
TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
mysql tables in use 4, locked 4
9 lock struct(s), heap size 1184, 5 row lock(s)
MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id
29558637 host 192.168.10.99 bacula Sending data
INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat,
MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM
batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON
(batch.Name = Filename.Name)
WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode
AUTO-INC waiting
WE ROLL BACK TRANSACTION (2)

I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage
and Catalog running on separate RHEL 6.6 hosts.  Our clients are
RHEL 6's, 5's and Windows Servers 2008 and 2012R2.

Any help would be much appreciated.

Warmest regards,
-craig

On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma
shiroma.crai...@gmail.com mailto:shiroma.crai...@gmail.com wrote:

BTW, I suppose there could've been two jobs for the host(s)
in scheduling queue.  If this was the case, is there a way to
find out after the fact?  If this did actually happen, what
could cause duplicate jobs to be scheduled on the same day at
the same time?  I know no one manually ran the jobs in
question.  Again, this only was a problem for a few of the
jobs that ran last night, not all of them and some to do
accurate backup and some not.

Regards,
-craig

On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma
shiroma.crai...@gmail.com
mailto:shiroma.crai...@gmail.com wrote:

Hello,

I had a few backups fail last night with the following error:

2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File
(FileIndex, JobId, PathId, FilenameId, LStat, MD5,
DeltaSeq) SELECT batch.FileIndex, batch.JobId,
Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
batch.DeltaSeq FROM batch JOIN Path ON (batch.Path =

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Kern Sibbald


  
  
On 06.08.2015 18:46, Bryn Hughes wrote:


  
  I think what Kern is getting at is
that your database is what threw the error, not Bacula. 
Whatever DB you are using is what is having the issue.
  


Yes.  That is exactly what I was implying.  

The rest of this is directed to Craig:
If you are using MariaDB (I have no indication that you are), please
be aware that it may be a very good database, maybe even better than
MySQL, but Bacula is built and tested against MySQL, and if you use
binaries that were built for MySQL, you could run into problems by
using MariaDB.  Even if your binaries were explicitly built with
MariaDB, it may not be compatible with the way Bacula works.  Bacula
has a tendency to push databases to the extreme, and it works well
with MySQL and PostgreSQL, but possibly not with other databases.  I
bring up MariaDB because it has been mentioned in another posting to
this list.

I would be very surprised if your problem has anything to do with
Accurate -- the database routines know nothing about accurate and
none of the data is different.  It is more likely due to the VM
environment or to some build or version problem with MySQL (or
MariaDB).

Best regards,
Kern


   Bryn

On 2015-08-06 09:11 AM, Craig Shiroma wrote:
  
  
Hi Kern,
  
  
  Thank you very much for the reply!  Would you have any
suggestions on what may be causing this problem or how I can
debug it?  Obviously, I'm encountering deadlocks when
accurate backup runs on some of our hosts and we want to use
accurate backup on all of our hosts if possible.
  
  
  Warmest regards,
  -craig


  On Thu, Aug 6, 2015 at 12:11 AM, Kern
Sibbald k...@sibbald.com
wrote:

  
  On 06.08.2015 10:15, Craig Shiroma wrote:
  
  
Hello again,
  
  
  I just thought I'd update this post with more
information in hopes of getting some explanation
for the deadlocks.  
  
  
  I ran with Accurate backup on our test VMs
(RHEL) for a couple of days and got the same
errors on some VMs that were running accurate
and some that were not.  These hosts were
running concurrently.  I would say 90% of the
hosts that were configured to use Accurate
finished successfully.  However, there were a
few that failed with the deadlock error -- some
that were configured to use accurate and some
that were not configured to use accurate.  Also,
on all of these, a second job started for each
of the affected hosts right after Bacula
detected the deadlock even though it said a
reschedule would happen 3600 seconds later (the
3600 seconds is correct).
  
  
  Tonight, I disabled accurate on all hosts and
the deadlocks did not happen.  No errors were
detected and all the backups finished
successfully.
  
  
  Some questions...
  1.  Can I back up multiple hosts concurrently
with some hosts configured to use accurate and
some configured not to use accurate?  Or, is it
an all or none thing, meaning all hosts that run
concurrently must either be using accurate
backup or not using accurate backup (cannot mix
the two)?
  
  
  2. It seems like the hosts that get out of
the starting gate first are the ones affected. 
I am configured to run 50 jobs concurrently. 
Again, no problems with accurate turned off on
all hosts for months now.
  
  
  3. Why is Bacula spinning off a new job right
away after it detects the deadlock for each
affected job instead of waiting until the
rescheduled job runs?  I verified that there

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Craig Shiroma

Hi Kern,

Thank you very much for the reply!  Would you have any suggestions on what
may be causing this problem or how I can debug it?  Obviously, I'm
encountering deadlocks when accurate backup runs on some of our hosts and
we want to use accurate backup on all of our hosts if possible.

Warmest regards,
-craig

On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote:

 On 06.08.2015 10:15, Craig Shiroma wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days and
 got the same errors on some VMs that were running accurate and some that
 were not.  These hosts were running concurrently.  I would say 90% of the
 hosts that were configured to use Accurate finished successfully.  However,
 there were a few that failed with the deadlock error -- some that were
 configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it said
 a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts configured
 to use accurate and some configured not to use accurate?  Or, is it an all
 or none thing, meaning all hosts that run concurrently must either be using
 accurate backup or not using accurate backup (cannot mix the two)?

 2. It seems like the hosts that get out of the starting gate first are the
 ones affected.  I am configured to run 50 jobs concurrently.  Again, no
 problems with accurate turned off on all hosts for months now.

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.


 Bacula is not aware of any SQL internal deadlocks.


 From the INNODB Monitor output:

 TRANSACTION:
 TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
 mysql tables in use 4, locked 4
 9 lock struct(s), heap size 1184, 5 row lock(s)
 MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637
 host 192.168.10.99 bacula Sending data
 INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
 DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
 Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN
 Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
 Filename.Name)
 WAITING FOR THIS LOCK TO BE GRANTED:
 TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC
 waiting
 WE ROLL BACK TRANSACTION (2)

 I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
 Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 5's
 and Windows Servers 2008 and 2012R2.

 Any help would be much appreciated.

 Warmest regards,
 -craig

 On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 BTW, I suppose there could've been two jobs for the host(s) in scheduling
 queue.  If this was the case, is there a way to find out after the fact?
 If this did actually happen, what could cause duplicate jobs to be
 scheduled on the same day at the same time?  I know no one manually ran the
 jobs in question.  Again, this only was a problem for a few of the jobs
 that ran last night, not all of them and some to do accurate backup and
 some not.

 Regards,
 -craig

 On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Hello,

 I had a few backups fail last night with the following error:

 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex,
 JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex,
 batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
 batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
 Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to
 get lock; try restarting transaction

 The only thing I did yesterday was switch a bunch of backups to use
 Accurate backup and restart bacula-dir and bacula-sd after that.  However,
 the above problem also occurred on some hosts that was not set to use
 Accurate backup.  From the log, it seems like two jobs for this host was
 scheduled to run at 18:00 because the second job started and found a
 duplicate job (job 123984) and canceled the backup.  I know there were no
 jobs running before 18:00 so 123984 was not an old job still running.  Same
 with the other jobs that were

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Craig Shiroma

Hello again,

I just thought I'd update this post with more information in hopes of
getting some explanation for the deadlocks.

I ran with Accurate backup on our test VMs (RHEL) for a couple of days and
got the same errors on some VMs that were running accurate and some that
were not.  These hosts were running concurrently.  I would say 90% of the
hosts that were configured to use Accurate finished successfully.  However,
there were a few that failed with the deadlock error -- some that were
configured to use accurate and some that were not configured to use
accurate.  Also, on all of these, a second job started for each of the
affected hosts right after Bacula detected the deadlock even though it said
a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

Tonight, I disabled accurate on all hosts and the deadlocks did not
happen.  No errors were detected and all the backups finished successfully.

Some questions...
1.  Can I back up multiple hosts concurrently with some hosts configured to
use accurate and some configured not to use accurate?  Or, is it an all or
none thing, meaning all hosts that run concurrently must either be using
accurate backup or not using accurate backup (cannot mix the two)?

2. It seems like the hosts that get out of the starting gate first are the
ones affected.  I am configured to run 50 jobs concurrently.  Again, no
problems with accurate turned off on all hosts for months now.

3. Why is Bacula spinning off a new job right away after it detects the
deadlock for each affected job instead of waiting until the rescheduled job
runs?  I verified that there were no duplicate jobs in the queue before the
backups started running, no jobs were running before the start of the
backups, and I did not start any of these backups manually to cause a
second job to appear.

From the INNODB Monitor output:

TRANSACTION:
TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
mysql tables in use 4, locked 4
9 lock struct(s), heap size 1184, 5 row lock(s)
MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637
host 192.168.10.99 bacula Sending data
INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN
Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
Filename.Name)
WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting
WE ROLL BACK TRANSACTION (2)

I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 5's
and Windows Servers 2008 and 2012R2.

Any help would be much appreciated.

Warmest regards,
-craig

On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

 BTW, I suppose there could've been two jobs for the host(s) in scheduling
 queue.  If this was the case, is there a way to find out after the fact?
 If this did actually happen, what could cause duplicate jobs to be
 scheduled on the same day at the same time?  I know no one manually ran the
 jobs in question.  Again, this only was a problem for a few of the jobs
 that ran last night, not all of them and some to do accurate backup and
 some not.

 Regards,
 -craig

 On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Hello,

 I had a few backups fail last night with the following error:

 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex,
 JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex,
 batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
 batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
 Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to
 get lock; try restarting transaction

 The only thing I did yesterday was switch a bunch of backups to use
 Accurate backup and restart bacula-dir and bacula-sd after that.  However,
 the above problem also occurred on some hosts that was not set to use
 Accurate backup.  From the log, it seems like two jobs for this host was
 scheduled to run at 18:00 because the second job started and found a
 duplicate job (job 123984) and canceled the backup.  I know there were no
 jobs running before 18:00 so 123984 was not an old job still running.  Same
 with the other jobs that were canceled because of the above situation.

 Anyway, does anyone have an idea what would cause this, especially how
 the second job got shot into the system.  After the deadlock error, Bacula
 said it would reschedule the job.  However the second job started right
 after the deadlock error instead of one hour later which makes me think
 that there were two jobs for this host scheduled to run at 18:00.

 Thank you in advance,
 -craig



--

Re: [Bacula-users] Deadlock error

2015-08-06 Thread Craig Shiroma

One thing I missed mentioning was on the second night accurate backup was
used, the director died about 45 minutes into the backup with the following
error.  This did not happen on the first run with accurate enabled.

Aug  4 19:02:27 host bacula-dir: Bacula interrupted by signal 11:
Segmentation violation

Also, as many of you know, since a duplicate job was spinned off for each
job with a deadlock, the backup was cancelled.

On Wed, Aug 5, 2015 at 10:15 PM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

 Hello again,

 I just thought I'd update this post with more information in hopes of
 getting some explanation for the deadlocks.

 I ran with Accurate backup on our test VMs (RHEL) for a couple of days and
 got the same errors on some VMs that were running accurate and some that
 were not.  These hosts were running concurrently.  I would say 90% of the
 hosts that were configured to use Accurate finished successfully.  However,
 there were a few that failed with the deadlock error -- some that were
 configured to use accurate and some that were not configured to use
 accurate.  Also, on all of these, a second job started for each of the
 affected hosts right after Bacula detected the deadlock even though it said
 a reschedule would happen 3600 seconds later (the 3600 seconds is correct).

 Tonight, I disabled accurate on all hosts and the deadlocks did not
 happen.  No errors were detected and all the backups finished successfully.

 Some questions...
 1.  Can I back up multiple hosts concurrently with some hosts configured
 to use accurate and some configured not to use accurate?  Or, is it an all
 or none thing, meaning all hosts that run concurrently must either be using
 accurate backup or not using accurate backup (cannot mix the two)?

 2. It seems like the hosts that get out of the starting gate first are the
 ones affected.  I am configured to run 50 jobs concurrently.  Again, no
 problems with accurate turned off on all hosts for months now.

 3. Why is Bacula spinning off a new job right away after it detects the
 deadlock for each affected job instead of waiting until the rescheduled job
 runs?  I verified that there were no duplicate jobs in the queue before the
 backups started running, no jobs were running before the start of the
 backups, and I did not start any of these backups manually to cause a
 second job to appear.

 From the INNODB Monitor output:

 TRANSACTION:
 TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
 mysql tables in use 4, locked 4
 9 lock struct(s), heap size 1184, 5 row lock(s)
 MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637
 host 192.168.10.99 bacula Sending data
 INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
 DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
 Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN
 Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
 Filename.Name)
 WAITING FOR THIS LOCK TO BE GRANTED:
 TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC
 waiting
 WE ROLL BACK TRANSACTION (2)

 I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
 Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 5's
 and Windows Servers 2008 and 2012R2.

 Any help would be much appreciated.

 Warmest regards,
 -craig

 On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 BTW, I suppose there could've been two jobs for the host(s) in scheduling
 queue.  If this was the case, is there a way to find out after the fact?
 If this did actually happen, what could cause duplicate jobs to be
 scheduled on the same day at the same time?  I know no one manually ran the
 jobs in question.  Again, this only was a problem for a few of the jobs
 that ran last night, not all of them and some to do accurate backup and
 some not.

 Regards,
 -craig

 On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com
 wrote:

 Hello,

 I had a few backups fail last night with the following error:

 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex,
 JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex,
 batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
 batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
 Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to
 get lock; try restarting transaction

 The only thing I did yesterday was switch a bunch of backups to use
 Accurate backup and restart bacula-dir and bacula-sd after that.  However,
 the above problem also occurred on some hosts that was not set to use
 Accurate backup.  From the log, it seems like two jobs for this host was
 scheduled to run at 18:00 because the second job started and found a
 duplicate job (job 123984) and canceled the backup.  I know there were no
 jobs running before 18:00 so 123984 was not an old job still running.  Same
 with

Re: [Bacula-users] Deadlock error

2015-08-04 Thread Craig Shiroma

BTW, I suppose there could've been two jobs for the host(s) in scheduling
queue.  If this was the case, is there a way to find out after the fact?
If this did actually happen, what could cause duplicate jobs to be
scheduled on the same day at the same time?  I know no one manually ran the
jobs in question.  Again, this only was a problem for a few of the jobs
that ran last night, not all of them and some to do accurate backup and
some not.

Regards,
-craig

On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com
wrote:

 Hello,

 I had a few backups fail last night with the following error:

 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId,
 PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex,
 batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
 batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
 Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to
 get lock; try restarting transaction

 The only thing I did yesterday was switch a bunch of backups to use
 Accurate backup and restart bacula-dir and bacula-sd after that.  However,
 the above problem also occurred on some hosts that was not set to use
 Accurate backup.  From the log, it seems like two jobs for this host was
 scheduled to run at 18:00 because the second job started and found a
 duplicate job (job 123984) and canceled the backup.  I know there were no
 jobs running before 18:00 so 123984 was not an old job still running.  Same
 with the other jobs that were canceled because of the above situation.

 Anyway, does anyone have an idea what would cause this, especially how the
 second job got shot into the system.  After the deadlock error, Bacula said
 it would reschedule the job.  However the second job started right after
 the deadlock error instead of one hour later which makes me think that
 there were two jobs for this host scheduled to run at 18:00.

 Thank you in advance,
 -craig

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

Re: [Bacula-users] Deadlock error

22 matches

Site Navigation

Mail list logo

Footer information