Hi Craig, Good news!
You're welcome (again) :). And thank you for your feedback. They are always useful. Best regards, Ana On Mon, Aug 10, 2015 at 3:35 PM, Craig Shiroma <shiroma.crai...@gmail.com> wrote: > Hi Ana, > > >Did you monitor resource usage during the backups? The list of files > generated for accurate backups are > >kept in memory (by both director and client), so this should cause > resource use (CPU, memory, etc.) to increase in both hosts. > > I did monitor these items earlier, but did not notice a huge drop in usage > in these items. However, I was using Zabbix to monitor which takes a > reading every so many minutes. It's possible it didn't take a reading when > the problem occurred. I should've used something like top instead or > VMware's performance tools. I'll take a look at the graphs again...I may > have missed it. > > However, as usual your suggestions helped! I increased the amount of > memory on Director and added more CPUs. I also reduced the number of > concurrent jobs from 50 to 25. So far, two days of incremental backups > have not produced any deadlock errors. All my backups finished > successfully. > > Thank you so much for the help! (again) :-) Your advice is always, > always so helpful. > > -craig > > > On Sat, Aug 8, 2015 at 4:11 PM, Ana Emília M. Arruda < > emiliaarr...@gmail.com> wrote: > >> Hi Craig, >> >> On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma <shiroma.crai...@gmail.com> >> wrote: >> >>> Hi Ana, >>> >>> Thank you for the suggestion! >>> >> >> You're welcome! >> >> >>> >>> I'll look into adding more CPU and memory to director, although I didn't >>> see much of an impact on either between a non-accurate run and an accurate >>> run. For example, there was large depletion of available memory, no >>> swapping, or high load. >>> >> >> Did you monitor resource usage during the backups? The list of files >> generated for accurate backups are kept in memory (by both director and >> client), so this should cause resource use (CPU, memory, etc.) to increase >> in both hosts. >> >> >>> >>> I did add more memory to the catalog server and turned Accurate back on >>> for the same hosts. I had no deadlocks. Last night was mostly Fulls, >>> though. Not sure if that makes a difference. Would you know if Bacula >>> would uses less resources when fulls are run because it is going to back up >>> everything anyway and no comparison of files needs to be made (I'm >>> guessing)? When a full is done, does Bacula still need to keep a list of >>> the files in memory for hosts using Accurate backups? My first thought is >>> no. >>> >> >> Mine too. When using accurate backups, the amount of resources used >> should be noticed when running incremental, differential and full+basejobs >> backups. >> >> >>> >>> Thanks again for the help! Your posts are always so helpful. >>> >> >> Thank you too! >> Best regards, >> Ana >> >> >>> >>> -craig >>> >>> >>> On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda < >>> emiliaarr...@gmail.com> wrote: >>> >>>> Hello Craig, >>>> >>>> In one of your posts you mentioned Segmentation violation in the >>>> director host. Accurate backups requires more resources than normal ones. >>>> Have you checked if CPU and memory resources are enough in director and the >>>> clients that are configured for using accurate mode? >>>> >>>> Best regards, >>>> Ana >>>> >>>> On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma < >>>> shiroma.crai...@gmail.com> wrote: >>>> >>>>> Thanks Kern! I'll bring in a DBA on our side to have a look. >>>>> >>>>> Would you have any thoughts on this question posed earlier? >>>>> >>>>> 3. Why is Bacula spinning off a new job right away after it detects >>>>> the deadlock for each affected job instead of waiting until the >>>>> rescheduled >>>>> job runs? I verified that there were no duplicate jobs in the queue >>>>> before >>>>> the backups started running, no jobs were running before the start of the >>>>> backups, and I did not start any of these backups manually to cause a >>>>> second job to appear. >>>>> >>>>> This happened on both nights I ran with Accurate turned On on the >>>>> hosts that had failed backups because of the deadlock. >>>>> >>>>> Regards, >>>>> -craig >>>>> >>>>> On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald <k...@sibbald.com> wrote: >>>>> >>>>>> On 06.08.2015 21:44, Craig Shiroma wrote: >>>>>> >>>>>> Hi Kern, >>>>>> >>>>>> Thank you for the info! We're using MySQL 5.6 Percona Server, >>>>>> Release 68.0, Revision 656. >>>>>> >>>>>> Would this setting cause the problem? >>>>>> innodb_lock_wait_timeout = 100 >>>>>> >>>>>> Is it too high or too low or has no bearing on the problem? >>>>>> >>>>>> >>>>>> Sorry, I am a Bacula programmer, and I do not know much about >>>>>> databases -- especially MySQL since I use PostgreSQL. PostgreSQL is >>>>>> harder >>>>>> to install and a bit harder to configure than MySQL, but it performs much >>>>>> better. >>>>>> >>>>>> >>>>>> >>>>>> Thanks again, >>>>>> -craig >>>>>> >>>>>> >>>>>> On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald <k...@sibbald.com> >>>>>> wrote: >>>>>> >>>>>>> On 06.08.2015 18:46, Bryn Hughes wrote: >>>>>>> >>>>>>> I think what Kern is getting at is that your database is what threw >>>>>>> the error, not Bacula. Whatever DB you are using is what is having the >>>>>>> issue. >>>>>>> >>>>>>> >>>>>>> Yes. That is exactly what I was implying. >>>>>>> >>>>>>> The rest of this is directed to Craig: >>>>>>> If you are using MariaDB (I have no indication that you are), please >>>>>>> be aware that it may be a very good database, maybe even better than >>>>>>> MySQL, >>>>>>> but Bacula is built and tested against MySQL, and if you use binaries >>>>>>> that >>>>>>> were built for MySQL, you could run into problems by using MariaDB. >>>>>>> Even >>>>>>> if your binaries were explicitly built with MariaDB, it may not be >>>>>>> compatible with the way Bacula works. Bacula has a tendency to push >>>>>>> databases to the extreme, and it works well with MySQL and PostgreSQL, >>>>>>> but >>>>>>> possibly not with other databases. I bring up MariaDB because it has >>>>>>> been >>>>>>> mentioned in another posting to this list. >>>>>>> >>>>>>> I would be very surprised if your problem has anything to do with >>>>>>> Accurate -- the database routines know nothing about accurate and none >>>>>>> of >>>>>>> the data is different. It is more likely due to the VM environment or >>>>>>> to >>>>>>> some build or version problem with MySQL (or MariaDB). >>>>>>> >>>>>>> Best regards, >>>>>>> Kern >>>>>>> >>>>>>> >>>>>>> Bryn >>>>>>> >>>>>>> On 2015-08-06 09:11 AM, Craig Shiroma wrote: >>>>>>> >>>>>>> Hi Kern, >>>>>>> >>>>>>> Thank you very much for the reply! Would you have any suggestions >>>>>>> on what may be causing this problem or how I can debug it? Obviously, >>>>>>> I'm >>>>>>> encountering deadlocks when accurate backup runs on some of our hosts >>>>>>> and >>>>>>> we want to use accurate backup on all of our hosts if possible. >>>>>>> >>>>>>> Warmest regards, >>>>>>> -craig >>>>>>> >>>>>>> On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald <k...@sibbald.com> >>>>>>> wrote: >>>>>>> >>>>>>>> On 06.08.2015 10:15, Craig Shiroma wrote: >>>>>>>> >>>>>>>> Hello again, >>>>>>>> >>>>>>>> I just thought I'd update this post with more information in hopes >>>>>>>> of getting some explanation for the deadlocks. >>>>>>>> >>>>>>>> I ran with Accurate backup on our test VMs (RHEL) for a couple of >>>>>>>> days and got the same errors on some VMs that were running accurate and >>>>>>>> some that were not. These hosts were running concurrently. I would >>>>>>>> say >>>>>>>> 90% of the hosts that were configured to use Accurate finished >>>>>>>> successfully. However, there were a few that failed with the deadlock >>>>>>>> error -- some that were configured to use accurate and some that were >>>>>>>> not >>>>>>>> configured to use accurate. Also, on all of these, a second job >>>>>>>> started >>>>>>>> for each of the affected hosts right after Bacula detected the deadlock >>>>>>>> even though it said a reschedule would happen 3600 seconds later (the >>>>>>>> 3600 >>>>>>>> seconds is correct). >>>>>>>> >>>>>>>> Tonight, I disabled accurate on all hosts and the deadlocks did not >>>>>>>> happen. No errors were detected and all the backups finished >>>>>>>> successfully. >>>>>>>> >>>>>>>> Some questions... >>>>>>>> 1. Can I back up multiple hosts concurrently with some hosts >>>>>>>> configured to use accurate and some configured not to use accurate? >>>>>>>> Or, is >>>>>>>> it an all or none thing, meaning all hosts that run concurrently must >>>>>>>> either be using accurate backup or not using accurate backup (cannot >>>>>>>> mix >>>>>>>> the two)? >>>>>>>> >>>>>>>> 2. It seems like the hosts that get out of the starting gate first >>>>>>>> are the ones affected. I am configured to run 50 jobs concurrently. >>>>>>>> Again, no problems with accurate turned off on all hosts for months >>>>>>>> now. >>>>>>>> >>>>>>>> 3. Why is Bacula spinning off a new job right away after it detects >>>>>>>> the deadlock for each affected job instead of waiting until the >>>>>>>> rescheduled >>>>>>>> job runs? I verified that there were no duplicate jobs in the queue >>>>>>>> before >>>>>>>> the backups started running, no jobs were running before the start of >>>>>>>> the >>>>>>>> backups, and I did not start any of these backups manually to cause a >>>>>>>> second job to appear. >>>>>>>> >>>>>>>> >>>>>>>> Bacula is not aware of any SQL internal deadlocks. >>>>>>>> >>>>>>>> >>>>>>>> From the INNODB Monitor output: >>>>>>>> >>>>>>>> TRANSACTION: >>>>>>>> TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock >>>>>>>> mysql tables in use 4, locked 4 >>>>>>>> 9 lock struct(s), heap size 1184, 5 row lock(s) >>>>>>>> MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id >>>>>>>> 29558637 <host> 192.168.10.99 bacula Sending data >>>>>>>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >>>>>>>> DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, >>>>>>>> Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch >>>>>>>> JOIN >>>>>>>> Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = >>>>>>>> Filename.Name) >>>>>>>> WAITING FOR THIS LOCK TO BE GRANTED: >>>>>>>> TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode >>>>>>>> AUTO-INC waiting >>>>>>>> WE ROLL BACK TRANSACTION (2) >>>>>>>> >>>>>>>> I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage >>>>>>>> and Catalog running on separate RHEL 6.6 hosts. Our clients are RHEL >>>>>>>> 6's, >>>>>>>> 5's and Windows Servers 2008 and 2012R2. >>>>>>>> >>>>>>>> Any help would be much appreciated. >>>>>>>> >>>>>>>> Warmest regards, >>>>>>>> -craig >>>>>>>> >>>>>>>> On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma < >>>>>>>> shiroma.crai...@gmail.com> wrote: >>>>>>>> >>>>>>>>> BTW, I suppose there could've been two jobs for the host(s) in >>>>>>>>> scheduling queue. If this was the case, is there a way to find out >>>>>>>>> after >>>>>>>>> the fact? If this did actually happen, what could cause duplicate >>>>>>>>> jobs to >>>>>>>>> be scheduled on the same day at the same time? I know no one >>>>>>>>> manually ran >>>>>>>>> the jobs in question. Again, this only was a problem for a few of >>>>>>>>> the jobs >>>>>>>>> that ran last night, not all of them and some to do accurate backup >>>>>>>>> and >>>>>>>>> some not. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> -craig >>>>>>>>> >>>>>>>>> On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma < >>>>>>>>> shiroma.crai...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I had a few backups fail last night with the following error: >>>>>>>>>> >>>>>>>>>> 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File >>>>>>>>>> (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT >>>>>>>>>> batch.FileIndex, batch.JobId, Path.PathId, >>>>>>>>>> Filename.FilenameId,batch.LStat, >>>>>>>>>> batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = >>>>>>>>>> Path.Path) >>>>>>>>>> JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found >>>>>>>>>> when >>>>>>>>>> trying to get lock; try restarting transaction >>>>>>>>>> >>>>>>>>>> The only thing I did yesterday was switch a bunch of backups to >>>>>>>>>> use Accurate backup and restart bacula-dir and bacula-sd after that. >>>>>>>>>> However, the above problem also occurred on some hosts that was not >>>>>>>>>> set to >>>>>>>>>> use Accurate backup. From the log, it seems like two jobs for this >>>>>>>>>> host >>>>>>>>>> was scheduled to run at 18:00 because the second job started and >>>>>>>>>> found a >>>>>>>>>> duplicate job (job 123984) and canceled the backup. I know there >>>>>>>>>> were no >>>>>>>>>> jobs running before 18:00 so 123984 was not an old job still >>>>>>>>>> running. Same >>>>>>>>>> with the other jobs that were canceled because of the above >>>>>>>>>> situation. >>>>>>>>>> >>>>>>>>>> Anyway, does anyone have an idea what would cause this, >>>>>>>>>> especially how the second job got shot into the system. After the >>>>>>>>>> deadlock >>>>>>>>>> error, Bacula said it would reschedule the job. However the second >>>>>>>>>> job >>>>>>>>>> started right after the deadlock error instead of one hour later >>>>>>>>>> which >>>>>>>>>> makes me think that there were two jobs for this host scheduled to >>>>>>>>>> run at >>>>>>>>>> 18:00. >>>>>>>>>> >>>>>>>>>> Thank you in advance, >>>>>>>>>> -craig >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bacula-users mailing >>>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bacula-users mailing >>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bacula-users mailing >>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bacula-users mailing list >>>>>>> Bacula-users@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Bacula-users mailing list >>>>> Bacula-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>>>> >>>>> >>>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users