Hi Kelvin, Thank you for the info and help! Good information to keep in mind.
I think I found the root of the problem thanks to everyone. See my reply to Ana. Thanks again for the post. It's much appreciated. -craig On Fri, Aug 7, 2015 at 10:46 AM, Kelvin Minter <kb.min...@gmail.com> wrote: > MyISAM is terrible for transactions. If the deadlock is happening because > of table locking then switching the engine to InnoDB might help your > problem. > MyISAM locks the entire table while InnoDB only locks the rows it is > updating. > > Check out the link below. > > http://stackoverflow.com/questions/20148/myisam-versus-innodb > > On Thu, Aug 6, 2015 at 8:37 PM, Ana Emília M. Arruda < > emiliaarr...@gmail.com> wrote: > >> Hello Craig, >> >> In one of your posts you mentioned Segmentation violation in the >> director host. Accurate backups requires more resources than normal ones. >> Have you checked if CPU and memory resources are enough in director and the >> clients that are configured for using accurate mode? >> >> Best regards, >> Ana >> >> On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma <shiroma.crai...@gmail.com> >> wrote: >> >>> Thanks Kern! I'll bring in a DBA on our side to have a look. >>> >>> Would you have any thoughts on this question posed earlier? >>> >>> 3. Why is Bacula spinning off a new job right away after it detects the >>> deadlock for each affected job instead of waiting until the rescheduled job >>> runs? I verified that there were no duplicate jobs in the queue before the >>> backups started running, no jobs were running before the start of the >>> backups, and I did not start any of these backups manually to cause a >>> second job to appear. >>> >>> This happened on both nights I ran with Accurate turned On on the hosts >>> that had failed backups because of the deadlock. >>> >>> Regards, >>> -craig >>> >>> On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald <k...@sibbald.com> wrote: >>> >>>> On 06.08.2015 21:44, Craig Shiroma wrote: >>>> >>>> Hi Kern, >>>> >>>> Thank you for the info! We're using MySQL 5.6 Percona Server, Release >>>> 68.0, Revision 656. >>>> >>>> Would this setting cause the problem? >>>> innodb_lock_wait_timeout = 100 >>>> >>>> Is it too high or too low or has no bearing on the problem? >>>> >>>> >>>> Sorry, I am a Bacula programmer, and I do not know much about databases >>>> -- especially MySQL since I use PostgreSQL. PostgreSQL is harder to >>>> install and a bit harder to configure than MySQL, but it performs much >>>> better. >>>> >>>> >>>> >>>> Thanks again, >>>> -craig >>>> >>>> >>>> On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald <k...@sibbald.com> wrote: >>>> >>>>> On 06.08.2015 18:46, Bryn Hughes wrote: >>>>> >>>>> I think what Kern is getting at is that your database is what threw >>>>> the error, not Bacula. Whatever DB you are using is what is having the >>>>> issue. >>>>> >>>>> >>>>> Yes. That is exactly what I was implying. >>>>> >>>>> The rest of this is directed to Craig: >>>>> If you are using MariaDB (I have no indication that you are), please >>>>> be aware that it may be a very good database, maybe even better than >>>>> MySQL, >>>>> but Bacula is built and tested against MySQL, and if you use binaries that >>>>> were built for MySQL, you could run into problems by using MariaDB. Even >>>>> if your binaries were explicitly built with MariaDB, it may not be >>>>> compatible with the way Bacula works. Bacula has a tendency to push >>>>> databases to the extreme, and it works well with MySQL and PostgreSQL, but >>>>> possibly not with other databases. I bring up MariaDB because it has been >>>>> mentioned in another posting to this list. >>>>> >>>>> I would be very surprised if your problem has anything to do with >>>>> Accurate -- the database routines know nothing about accurate and none of >>>>> the data is different. It is more likely due to the VM environment or to >>>>> some build or version problem with MySQL (or MariaDB). >>>>> >>>>> Best regards, >>>>> Kern >>>>> >>>>> >>>>> Bryn >>>>> >>>>> On 2015-08-06 09:11 AM, Craig Shiroma wrote: >>>>> >>>>> Hi Kern, >>>>> >>>>> Thank you very much for the reply! Would you have any suggestions on >>>>> what may be causing this problem or how I can debug it? Obviously, I'm >>>>> encountering deadlocks when accurate backup runs on some of our hosts and >>>>> we want to use accurate backup on all of our hosts if possible. >>>>> >>>>> Warmest regards, >>>>> -craig >>>>> >>>>> On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald <k...@sibbald.com> >>>>> wrote: >>>>> >>>>>> On 06.08.2015 10:15, Craig Shiroma wrote: >>>>>> >>>>>> Hello again, >>>>>> >>>>>> I just thought I'd update this post with more information in hopes of >>>>>> getting some explanation for the deadlocks. >>>>>> >>>>>> I ran with Accurate backup on our test VMs (RHEL) for a couple of >>>>>> days and got the same errors on some VMs that were running accurate and >>>>>> some that were not. These hosts were running concurrently. I would say >>>>>> 90% of the hosts that were configured to use Accurate finished >>>>>> successfully. However, there were a few that failed with the deadlock >>>>>> error -- some that were configured to use accurate and some that were not >>>>>> configured to use accurate. Also, on all of these, a second job started >>>>>> for each of the affected hosts right after Bacula detected the deadlock >>>>>> even though it said a reschedule would happen 3600 seconds later (the >>>>>> 3600 >>>>>> seconds is correct). >>>>>> >>>>>> Tonight, I disabled accurate on all hosts and the deadlocks did not >>>>>> happen. No errors were detected and all the backups finished >>>>>> successfully. >>>>>> >>>>>> Some questions... >>>>>> 1. Can I back up multiple hosts concurrently with some hosts >>>>>> configured to use accurate and some configured not to use accurate? Or, >>>>>> is >>>>>> it an all or none thing, meaning all hosts that run concurrently must >>>>>> either be using accurate backup or not using accurate backup (cannot mix >>>>>> the two)? >>>>>> >>>>>> 2. It seems like the hosts that get out of the starting gate first >>>>>> are the ones affected. I am configured to run 50 jobs concurrently. >>>>>> Again, no problems with accurate turned off on all hosts for months now. >>>>>> >>>>>> 3. Why is Bacula spinning off a new job right away after it detects >>>>>> the deadlock for each affected job instead of waiting until the >>>>>> rescheduled >>>>>> job runs? I verified that there were no duplicate jobs in the queue >>>>>> before >>>>>> the backups started running, no jobs were running before the start of the >>>>>> backups, and I did not start any of these backups manually to cause a >>>>>> second job to appear. >>>>>> >>>>>> >>>>>> Bacula is not aware of any SQL internal deadlocks. >>>>>> >>>>>> >>>>>> From the INNODB Monitor output: >>>>>> >>>>>> TRANSACTION: >>>>>> TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock >>>>>> mysql tables in use 4, locked 4 >>>>>> 9 lock struct(s), heap size 1184, 5 row lock(s) >>>>>> MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id >>>>>> 29558637 <host> 192.168.10.99 bacula Sending data >>>>>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >>>>>> DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, >>>>>> Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch >>>>>> JOIN >>>>>> Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = >>>>>> Filename.Name) >>>>>> WAITING FOR THIS LOCK TO BE GRANTED: >>>>>> TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC >>>>>> waiting >>>>>> WE ROLL BACK TRANSACTION (2) >>>>>> >>>>>> I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and >>>>>> Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL 6's, >>>>>> 5's >>>>>> and Windows Servers 2008 and 2012R2. >>>>>> >>>>>> Any help would be much appreciated. >>>>>> >>>>>> Warmest regards, >>>>>> -craig >>>>>> >>>>>> On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma < >>>>>> shiroma.crai...@gmail.com> wrote: >>>>>> >>>>>>> BTW, I suppose there could've been two jobs for the host(s) in >>>>>>> scheduling queue. If this was the case, is there a way to find out >>>>>>> after >>>>>>> the fact? If this did actually happen, what could cause duplicate jobs >>>>>>> to >>>>>>> be scheduled on the same day at the same time? I know no one manually >>>>>>> ran >>>>>>> the jobs in question. Again, this only was a problem for a few of the >>>>>>> jobs >>>>>>> that ran last night, not all of them and some to do accurate backup and >>>>>>> some not. >>>>>>> >>>>>>> Regards, >>>>>>> -craig >>>>>>> >>>>>>> On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma < >>>>>>> shiroma.crai...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I had a few backups fail last night with the following error: >>>>>>>> >>>>>>>> 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex, >>>>>>>> JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT >>>>>>>> batch.FileIndex, >>>>>>>> batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, >>>>>>>> batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = >>>>>>>> Path.Path) >>>>>>>> JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when >>>>>>>> trying to get lock; try restarting transaction >>>>>>>> >>>>>>>> The only thing I did yesterday was switch a bunch of backups to use >>>>>>>> Accurate backup and restart bacula-dir and bacula-sd after that. >>>>>>>> However, >>>>>>>> the above problem also occurred on some hosts that was not set to use >>>>>>>> Accurate backup. From the log, it seems like two jobs for this host >>>>>>>> was >>>>>>>> scheduled to run at 18:00 because the second job started and found a >>>>>>>> duplicate job (job 123984) and canceled the backup. I know there were >>>>>>>> no >>>>>>>> jobs running before 18:00 so 123984 was not an old job still running. >>>>>>>> Same >>>>>>>> with the other jobs that were canceled because of the above situation. >>>>>>>> >>>>>>>> Anyway, does anyone have an idea what would cause this, especially >>>>>>>> how the second job got shot into the system. After the deadlock error, >>>>>>>> Bacula said it would reschedule the job. However the second job >>>>>>>> started >>>>>>>> right after the deadlock error instead of one hour later which makes me >>>>>>>> think that there were two jobs for this host scheduled to run at 18:00. >>>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> -craig >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bacula-users mailing >>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bacula-users mailing >>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bacula-users mailing >>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Bacula-users mailing list >>>>> Bacula-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>>>> >>>>> >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Bacula-users mailing list >>> Bacula-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Bacula-users mailing list >> Bacula-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bacula-users >> >> >
------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users