Re: [Bacula-users] Deadlock error
Hi Ana, Did you monitor resource usage during the backups? The list of files generated for accurate backups are kept in memory (by both director and client), so this should cause resource use (CPU, memory, etc.) to increase in both hosts. I did monitor these items earlier, but did not notice a huge drop in usage in these items. However, I was using Zabbix to monitor which takes a reading every so many minutes. It's possible it didn't take a reading when the problem occurred. I should've used something like top instead or VMware's performance tools. I'll take a look at the graphs again...I may have missed it. However, as usual your suggestions helped! I increased the amount of memory on Director and added more CPUs. I also reduced the number of concurrent jobs from 50 to 25. So far, two days of incremental backups have not produced any deadlock errors. All my backups finished successfully. Thank you so much for the help! (again) :-) Your advice is always, always so helpful. -craig On Sat, Aug 8, 2015 at 4:11 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hi Craig, On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hi Ana, Thank you for the suggestion! You're welcome! I'll look into adding more CPU and memory to director, although I didn't see much of an impact on either between a non-accurate run and an accurate run. For example, there was large depletion of available memory, no swapping, or high load. Did you monitor resource usage during the backups? The list of files generated for accurate backups are kept in memory (by both director and client), so this should cause resource use (CPU, memory, etc.) to increase in both hosts. I did add more memory to the catalog server and turned Accurate back on for the same hosts. I had no deadlocks. Last night was mostly Fulls, though. Not sure if that makes a difference. Would you know if Bacula would uses less resources when fulls are run because it is going to back up everything anyway and no comparison of files needs to be made (I'm guessing)? When a full is done, does Bacula still need to keep a list of the files in memory for hosts using Accurate backups? My first thought is no. Mine too. When using accurate backups, the amount of resources used should be noticed when running incremental, differential and full+basejobs backups. Thanks again for the help! Your posts are always so helpful. Thank you too! Best regards, Ana -craig On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it
Re: [Bacula-users] Deadlock error
Hi Kelvin, Thank you for the info and help! Good information to keep in mind. I think I found the root of the problem thanks to everyone. See my reply to Ana. Thanks again for the post. It's much appreciated. -craig On Fri, Aug 7, 2015 at 10:46 AM, Kelvin Minter kb.min...@gmail.com wrote: MyISAM is terrible for transactions. If the deadlock is happening because of table locking then switching the engine to InnoDB might help your problem. MyISAM locks the entire table while InnoDB only locks the rows it is updating. Check out the link below. http://stackoverflow.com/questions/20148/myisam-versus-innodb On Thu, Aug 6, 2015 at 8:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all
Re: [Bacula-users] Deadlock error
Hi Josip, Thank you for the advice and for looking that up in the mysql docs. That was pretty much the error I was getting. See my reply to Ana. Again, thank you for the help. I really appreciate it. -craig On Thu, Aug 6, 2015 at 9:26 PM, Josip Deanovic djosip+n...@linuxpages.net wrote: On Thursday 2015-08-06 09:44:06 Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Hi! http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout Documentation says: -BEGIN- The timeout in seconds an InnoDB transaction may wait for a row lock before giving up. The default value is 50 seconds. A transaction that tries to access a row that is locked by another InnoDB transaction will hang for at most this many seconds before issuing the following error: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction When a lock wait timeout occurs, the current statement is not executed. The current transaction is not rolled back. -END- So I wouldn't say that decreasing this value would change anything in your case. -- Josip Deanovic -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Deadlock error
Hi Craig, Good news! You're welcome (again) :). And thank you for your feedback. They are always useful. Best regards, Ana On Mon, Aug 10, 2015 at 3:35 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hi Ana, Did you monitor resource usage during the backups? The list of files generated for accurate backups are kept in memory (by both director and client), so this should cause resource use (CPU, memory, etc.) to increase in both hosts. I did monitor these items earlier, but did not notice a huge drop in usage in these items. However, I was using Zabbix to monitor which takes a reading every so many minutes. It's possible it didn't take a reading when the problem occurred. I should've used something like top instead or VMware's performance tools. I'll take a look at the graphs again...I may have missed it. However, as usual your suggestions helped! I increased the amount of memory on Director and added more CPUs. I also reduced the number of concurrent jobs from 50 to 25. So far, two days of incremental backups have not produced any deadlock errors. All my backups finished successfully. Thank you so much for the help! (again) :-) Your advice is always, always so helpful. -craig On Sat, Aug 8, 2015 at 4:11 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hi Craig, On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hi Ana, Thank you for the suggestion! You're welcome! I'll look into adding more CPU and memory to director, although I didn't see much of an impact on either between a non-accurate run and an accurate run. For example, there was large depletion of available memory, no swapping, or high load. Did you monitor resource usage during the backups? The list of files generated for accurate backups are kept in memory (by both director and client), so this should cause resource use (CPU, memory, etc.) to increase in both hosts. I did add more memory to the catalog server and turned Accurate back on for the same hosts. I had no deadlocks. Last night was mostly Fulls, though. Not sure if that makes a difference. Would you know if Bacula would uses less resources when fulls are run because it is going to back up everything anyway and no comparison of files needs to be made (I'm guessing)? When a full is done, does Bacula still need to keep a list of the files in memory for hosts using Accurate backups? My first thought is no. Mine too. When using accurate backups, the amount of resources used should be noticed when running incremental, differential and full+basejobs backups. Thanks again for the help! Your posts are always so helpful. Thank you too! Best regards, Ana -craig On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe
Re: [Bacula-users] Deadlock error
Hi Craig, On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hi Ana, Thank you for the suggestion! You're welcome! I'll look into adding more CPU and memory to director, although I didn't see much of an impact on either between a non-accurate run and an accurate run. For example, there was large depletion of available memory, no swapping, or high load. Did you monitor resource usage during the backups? The list of files generated for accurate backups are kept in memory (by both director and client), so this should cause resource use (CPU, memory, etc.) to increase in both hosts. I did add more memory to the catalog server and turned Accurate back on for the same hosts. I had no deadlocks. Last night was mostly Fulls, though. Not sure if that makes a difference. Would you know if Bacula would uses less resources when fulls are run because it is going to back up everything anyway and no comparison of files needs to be made (I'm guessing)? When a full is done, does Bacula still need to keep a list of the files in memory for hosts using Accurate backups? My first thought is no. Mine too. When using accurate backups, the amount of resources used should be noticed when running incremental, differential and full+basejobs backups. Thanks again for the help! Your posts are always so helpful. Thank you too! Best regards, Ana -craig On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this
Re: [Bacula-users] Deadlock error
On Thursday 2015-08-06 09:44:06 Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Hi! http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout Documentation says: -BEGIN- The timeout in seconds an InnoDB transaction may wait for a row lock before giving up. The default value is 50 seconds. A transaction that tries to access a row that is locked by another InnoDB transaction will hang for at most this many seconds before issuing the following error: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction When a lock wait timeout occurs, the current statement is not executed. The current transaction is not rolled back. -END- So I wouldn't say that decreasing this value would change anything in your case. -- Josip Deanovic -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Deadlock error
Hi Ana, Thank you for the suggestion! I'll look into adding more CPU and memory to director, although I didn't see much of an impact on either between a non-accurate run and an accurate run. For example, there was large depletion of available memory, no swapping, or high load. I did add more memory to the catalog server and turned Accurate back on for the same hosts. I had no deadlocks. Last night was mostly Fulls, though. Not sure if that makes a difference. Would you know if Bacula would uses less resources when fulls are run because it is going to back up everything anyway and no comparison of files needs to be made (I'm guessing)? When a full is done, does Bacula still need to keep a list of the files in memory for hosts using Accurate backups? My first thought is no. Thanks again for the help! Your posts are always so helpful. -craig On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a
Re: [Bacula-users] Deadlock error
Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each
Re: [Bacula-users] Deadlock error
On Thursday 2015-08-06 09:44:06 Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Thanks again, -craig One more thing... MySQL is using MyISAM storage engine by default while MySQL Percona is using InnoDB by default. Maybe this could be the source of the problem you are experiencing. Unless you have a better idea I would suggest to try it with MyISAM storage engine. I know few applications that just can't work very well with InnoDB and I don't know if bacula has been thoroughly tested with InnoDB MySQL support. I am using bacula with both MyISAM and InnoDB with the Accurate option enabled but my jobs are usually not executing simultaneously because I can afford it due to the small number of jobs per bacula installation (less than 100 jobs and they are relatively small and quick). Optimized database and database engine could increase the database performance considerably but in your case it wouldn't solve the problem unless something is really really bad on the database side. -- Josip Deanovic -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Deadlock error
MyISAM is terrible for transactions. If the deadlock is happening because of table locking then switching the engine to InnoDB might help your problem. MyISAM locks the entire table while InnoDB only locks the rows it is updating. Check out the link below. http://stackoverflow.com/questions/20148/myisam-versus-innodb On Thu, Aug 6, 2015 at 8:37 PM, Ana Emília M. Arruda emiliaarr...@gmail.com wrote: Hello Craig, In one of your posts you mentioned Segmentation violation in the director host. Accurate backups requires more resources than normal ones. Have you checked if CPU and memory resources are enough in director and the clients that are configured for using accurate mode? Best regards, Ana On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts
Re: [Bacula-users] Deadlock error
Hi Bryn, Thank you for the translation! :-) Much appreciated. I'll ask our DBA to take a look at the DB (mysql). Maybe it needs some tuning for Accurate. Do you know of any documentation for this? I only saw a couple of small sections for Accurate in the manual, mainly how to turn it on and that it uses more resources. I haven't had a chance to read the whole manual yet so I might have missed the section. -craig On Thu, Aug 6, 2015 at 6:46 AM, Bryn Hughes li...@nashira.ca wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. Bacula is not aware of any SQL internal deadlocks. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM
Re: [Bacula-users] Deadlock error
On 06.08.2015 21:36, Craig Shiroma wrote: Hi Bryn, Thank you for the translation! :-) Much appreciated. I'll ask our DBA to take a look at the DB (mysql). Maybe it needs some tuning for Accurate. Do you know of any documentation for this? Type: "tuning mysql for bacula" into your browser ... I only saw a couple of small sections for Accurate in the manual, mainly how to turn it on and that it uses more resources. I haven't had a chance to read the whole manual yet so I might have missed the section. -craig On Thu, Aug 6, 2015 at 6:46 AM, Bryn Hughes li...@nashira.ca wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning
Re: [Bacula-users] Deadlock error
Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. Bacula is not aware of any SQL internal deadlocks. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma
Re: [Bacula-users] Deadlock error
On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished
Re: [Bacula-users] Deadlock error
On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. Bacula is not aware of any SQL internal deadlocks. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03
Re: [Bacula-users] Deadlock error
Thanks Kern! I'll bring in a DBA on our side to have a look. Would you have any thoughts on this question posed earlier? 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. This happened on both nights I ran with Accurate turned On on the hosts that had failed backups because of the deadlock. Regards, -craig On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 21:44, Craig Shiroma wrote: Hi Kern, Thank you for the info! We're using MySQL 5.6 Percona Server, Release 68.0, Revision 656. Would this setting cause the problem? innodb_lock_wait_timeout = 100 Is it too high or too low or has no bearing on the problem? Sorry, I am a Bacula programmer, and I do not know much about databases -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to install and a bit harder to configure than MySQL, but it performs much better. Thanks again, -craig On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. Bacula is not aware of any SQL internal deadlocks. From the INNODB Monitor output:
Re: [Bacula-users] Deadlock error
I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com mailto:k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. Bacula is not aware of any SQL internal deadlocks. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com mailto:shiroma.crai...@gmail.com wrote: BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com mailto:shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path =
Re: [Bacula-users] Deadlock error
On 06.08.2015 18:46, Bryn Hughes wrote: I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue. Yes. That is exactly what I was implying. The rest of this is directed to Craig: If you are using MariaDB (I have no indication that you are), please be aware that it may be a very good database, maybe even better than MySQL, but Bacula is built and tested against MySQL, and if you use binaries that were built for MySQL, you could run into problems by using MariaDB. Even if your binaries were explicitly built with MariaDB, it may not be compatible with the way Bacula works. Bacula has a tendency to push databases to the extreme, and it works well with MySQL and PostgreSQL, but possibly not with other databases. I bring up MariaDB because it has been mentioned in another posting to this list. I would be very surprised if your problem has anything to do with Accurate -- the database routines know nothing about accurate and none of the data is different. It is more likely due to the VM environment or to some build or version problem with MySQL (or MariaDB). Best regards, Kern Bryn On 2015-08-06 09:11 AM, Craig Shiroma wrote: Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there
Re: [Bacula-users] Deadlock error
Hi Kern, Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible. Warmest regards, -craig On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald k...@sibbald.com wrote: On 06.08.2015 10:15, Craig Shiroma wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. Bacula is not aware of any SQL internal deadlocks. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to get lock; try restarting transaction The only thing I did yesterday was switch a bunch of backups to use Accurate backup and restart bacula-dir and bacula-sd after that. However, the above problem also occurred on some hosts that was not set to use Accurate backup. From the log, it seems like two jobs for this host was scheduled to run at 18:00 because the second job started and found a duplicate job (job 123984) and canceled the backup. I know there were no jobs running before 18:00 so 123984 was not an old job still running. Same with the other jobs that were
Re: [Bacula-users] Deadlock error
Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to get lock; try restarting transaction The only thing I did yesterday was switch a bunch of backups to use Accurate backup and restart bacula-dir and bacula-sd after that. However, the above problem also occurred on some hosts that was not set to use Accurate backup. From the log, it seems like two jobs for this host was scheduled to run at 18:00 because the second job started and found a duplicate job (job 123984) and canceled the backup. I know there were no jobs running before 18:00 so 123984 was not an old job still running. Same with the other jobs that were canceled because of the above situation. Anyway, does anyone have an idea what would cause this, especially how the second job got shot into the system. After the deadlock error, Bacula said it would reschedule the job. However the second job started right after the deadlock error instead of one hour later which makes me think that there were two jobs for this host scheduled to run at 18:00. Thank you in advance, -craig --
Re: [Bacula-users] Deadlock error
One thing I missed mentioning was on the second night accurate backup was used, the director died about 45 minutes into the backup with the following error. This did not happen on the first run with accurate enabled. Aug 4 19:02:27 host bacula-dir: Bacula interrupted by signal 11: Segmentation violation Also, as many of you know, since a duplicate job was spinned off for each job with a deadlock, the backup was cancelled. On Wed, Aug 5, 2015 at 10:15 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello again, I just thought I'd update this post with more information in hopes of getting some explanation for the deadlocks. I ran with Accurate backup on our test VMs (RHEL) for a couple of days and got the same errors on some VMs that were running accurate and some that were not. These hosts were running concurrently. I would say 90% of the hosts that were configured to use Accurate finished successfully. However, there were a few that failed with the deadlock error -- some that were configured to use accurate and some that were not configured to use accurate. Also, on all of these, a second job started for each of the affected hosts right after Bacula detected the deadlock even though it said a reschedule would happen 3600 seconds later (the 3600 seconds is correct). Tonight, I disabled accurate on all hosts and the deadlocks did not happen. No errors were detected and all the backups finished successfully. Some questions... 1. Can I back up multiple hosts concurrently with some hosts configured to use accurate and some configured not to use accurate? Or, is it an all or none thing, meaning all hosts that run concurrently must either be using accurate backup or not using accurate backup (cannot mix the two)? 2. It seems like the hosts that get out of the starting gate first are the ones affected. I am configured to run 50 jobs concurrently. Again, no problems with accurate turned off on all hosts for months now. 3. Why is Bacula spinning off a new job right away after it detects the deadlock for each affected job instead of waiting until the rescheduled job runs? I verified that there were no duplicate jobs in the queue before the backups started running, no jobs were running before the start of the backups, and I did not start any of these backups manually to cause a second job to appear. From the INNODB Monitor output: TRANSACTION: TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock mysql tables in use 4, locked 4 9 lock struct(s), heap size 1184, 5 row lock(s) MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637 host 192.168.10.99 bacula Sending data INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC waiting WE ROLL BACK TRANSACTION (2) I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, 5's and Windows Servers 2008 and 2012R2. Any help would be much appreciated. Warmest regards, -craig On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma shiroma.crai...@gmail.com wrote: BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to get lock; try restarting transaction The only thing I did yesterday was switch a bunch of backups to use Accurate backup and restart bacula-dir and bacula-sd after that. However, the above problem also occurred on some hosts that was not set to use Accurate backup. From the log, it seems like two jobs for this host was scheduled to run at 18:00 because the second job started and found a duplicate job (job 123984) and canceled the backup. I know there were no jobs running before 18:00 so 123984 was not an old job still running. Same with
Re: [Bacula-users] Deadlock error
BTW, I suppose there could've been two jobs for the host(s) in scheduling queue. If this was the case, is there a way to find out after the fact? If this did actually happen, what could cause duplicate jobs to be scheduled on the same day at the same time? I know no one manually ran the jobs in question. Again, this only was a problem for a few of the jobs that ran last night, not all of them and some to do accurate backup and some not. Regards, -craig On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma shiroma.crai...@gmail.com wrote: Hello, I had a few backups fail last night with the following error: 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to get lock; try restarting transaction The only thing I did yesterday was switch a bunch of backups to use Accurate backup and restart bacula-dir and bacula-sd after that. However, the above problem also occurred on some hosts that was not set to use Accurate backup. From the log, it seems like two jobs for this host was scheduled to run at 18:00 because the second job started and found a duplicate job (job 123984) and canceled the backup. I know there were no jobs running before 18:00 so 123984 was not an old job still running. Same with the other jobs that were canceled because of the above situation. Anyway, does anyone have an idea what would cause this, especially how the second job got shot into the system. After the deadlock error, Bacula said it would reschedule the job. However the second job started right after the deadlock error instead of one hour later which makes me think that there were two jobs for this host scheduled to run at 18:00. Thank you in advance, -craig -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users