Hi Craig,

Good news!

You're welcome (again) :). And thank you for your feedback. They are always

Best regards,

On Mon, Aug 10, 2015 at 3:35 PM, Craig Shiroma <shiroma.crai...@gmail.com>

> Hi Ana,
> >Did you monitor resource usage during the backups? The list of files
> generated for accurate backups are
> >kept in memory (by both director and client), so this should cause
> resource use (CPU, memory, etc.) to increase in both hosts.
> I did monitor these items earlier, but did not notice a huge drop in usage
> in these items.  However, I was using Zabbix to monitor which takes a
> reading every so many minutes.  It's possible it didn't take a reading when
> the problem occurred.  I should've used something like top instead or
> VMware's performance tools.  I'll take a look at the graphs again...I may
> have missed it.
> However, as usual your suggestions helped!  I increased the amount of
> memory on Director and added more CPUs.  I also reduced the number of
> concurrent jobs from 50 to 25.  So far, two days of incremental backups
> have not produced any deadlock errors.  All my backups finished
> successfully.
> Thank you so much for the help!  (again)  :-)   Your advice is always,
> always so helpful.
> -craig
> On Sat, Aug 8, 2015 at 4:11 PM, Ana Emília M. Arruda <
> emiliaarr...@gmail.com> wrote:
>> Hi Craig,
>> On Fri, Aug 7, 2015 at 3:49 PM, Craig Shiroma <shiroma.crai...@gmail.com>
>> wrote:
>>> Hi Ana,
>>> Thank you for the suggestion!
>> ​You're welcome!​
>>> I'll look into adding more CPU and memory to director, although I didn't
>>> see much of an impact on either between a non-accurate run and an accurate
>>> run.  For example, there was large depletion of available memory, no
>>> swapping, or high load.
>> ​Did you monitor resource usage during the backups? The list of files
>> generated for accurate backups are kept in memory (by both director and
>> client), so this should cause resource use (CPU, memory, etc.) to increase
>> in both hosts.
>>> I did add more memory to the catalog server and turned Accurate back on
>>> for the same hosts.  I had no deadlocks.  Last night was mostly Fulls,
>>> though.  Not sure if that makes a difference.  Would you know if Bacula
>>> would uses less resources when fulls are run because it is going to back up
>>> everything anyway and no comparison of files needs to be made (I'm
>>> guessing)?  When a full is done, does Bacula still need to keep a list of
>>> the files in memory for hosts using Accurate backups?  My first thought is
>>> no.
>> ​Mine too. When using accurate backups, the amount of resources used
>> should be noticed when running incremental, differential and full+basejobs
>> backups.
>>> Thanks again for the help!  Your posts are always so helpful.
>> ​Thank you too!
>> Best regards,
>> Ana​
>>> -craig
>>> On Thu, Aug 6, 2015 at 3:37 PM, Ana Emília M. Arruda <
>>> emiliaarr...@gmail.com> wrote:
>>>> Hello Craig,
>>>> In one of your posts you mentioned Segmentation violation in the
>>>> director host. Accurate backups requires more resources than normal ones.
>>>> Have you checked if CPU and memory resources are enough in director and the
>>>> clients that are configured for using accurate mode?
>>>> Best regards,
>>>> Ana
>>>> On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma <
>>>> shiroma.crai...@gmail.com> wrote:
>>>>> Thanks Kern!  I'll bring in a DBA on our side to have a look.
>>>>> Would you have any thoughts on this question posed earlier?
>>>>> 3. Why is Bacula spinning off a new job right away after it detects
>>>>> the deadlock for each affected job instead of waiting until the 
>>>>> rescheduled
>>>>> job runs?  I verified that there were no duplicate jobs in the queue 
>>>>> before
>>>>> the backups started running, no jobs were running before the start of the
>>>>> backups, and I did not start any of these backups manually to cause a
>>>>> second job to appear.
>>>>> This happened on both nights I ran with Accurate turned On on the
>>>>> hosts that had failed backups because of the deadlock.
>>>>> Regards,
>>>>> -craig
>>>>> On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald <k...@sibbald.com> wrote:
>>>>>> On 06.08.2015 21:44, Craig Shiroma wrote:
>>>>>> Hi Kern,
>>>>>> Thank you for the info!  We're using MySQL 5.6 Percona Server,
>>>>>> Release 68.0, Revision 656.
>>>>>> Would this setting cause the problem?
>>>>>> innodb_lock_wait_timeout = 100
>>>>>> Is it too high or too low or has no bearing on the problem?
>>>>>> Sorry, I am a Bacula programmer, and I do not know much about
>>>>>> databases -- especially MySQL since I use PostgreSQL.  PostgreSQL is 
>>>>>> harder
>>>>>> to install and a bit harder to configure than MySQL, but it performs much
>>>>>> better.
>>>>>> Thanks again,
>>>>>> -craig
>>>>>> On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald <k...@sibbald.com>
>>>>>> wrote:
>>>>>>> On 06.08.2015 18:46, Bryn Hughes wrote:
>>>>>>> I think what Kern is getting at is that your database is what threw
>>>>>>> the error, not Bacula.  Whatever DB you are using is what is having the
>>>>>>> issue.
>>>>>>> Yes.  That is exactly what I was implying.
>>>>>>> The rest of this is directed to Craig:
>>>>>>> If you are using MariaDB (I have no indication that you are), please
>>>>>>> be aware that it may be a very good database, maybe even better than 
>>>>>>> MySQL,
>>>>>>> but Bacula is built and tested against MySQL, and if you use binaries 
>>>>>>> that
>>>>>>> were built for MySQL, you could run into problems by using MariaDB.  
>>>>>>> Even
>>>>>>> if your binaries were explicitly built with MariaDB, it may not be
>>>>>>> compatible with the way Bacula works.  Bacula has a tendency to push
>>>>>>> databases to the extreme, and it works well with MySQL and PostgreSQL, 
>>>>>>> but
>>>>>>> possibly not with other databases.  I bring up MariaDB because it has 
>>>>>>> been
>>>>>>> mentioned in another posting to this list.
>>>>>>> I would be very surprised if your problem has anything to do with
>>>>>>> Accurate -- the database routines know nothing about accurate and none 
>>>>>>> of
>>>>>>> the data is different.  It is more likely due to the VM environment or 
>>>>>>> to
>>>>>>> some build or version problem with MySQL (or MariaDB).
>>>>>>> Best regards,
>>>>>>> Kern
>>>>>>> Bryn
>>>>>>> On 2015-08-06 09:11 AM, Craig Shiroma wrote:
>>>>>>> Hi Kern,
>>>>>>> Thank you very much for the reply!  Would you have any suggestions
>>>>>>> on what may be causing this problem or how I can debug it?  Obviously, 
>>>>>>> I'm
>>>>>>> encountering deadlocks when accurate backup runs on some of our hosts 
>>>>>>> and
>>>>>>> we want to use accurate backup on all of our hosts if possible.
>>>>>>> Warmest regards,
>>>>>>> -craig
>>>>>>> On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald <k...@sibbald.com>
>>>>>>> wrote:
>>>>>>>> On 06.08.2015 10:15, Craig Shiroma wrote:
>>>>>>>> Hello again,
>>>>>>>> I just thought I'd update this post with more information in hopes
>>>>>>>> of getting some explanation for the deadlocks.
>>>>>>>> I ran with Accurate backup on our test VMs (RHEL) for a couple of
>>>>>>>> days and got the same errors on some VMs that were running accurate and
>>>>>>>> some that were not.  These hosts were running concurrently.  I would 
>>>>>>>> say
>>>>>>>> 90% of the hosts that were configured to use Accurate finished
>>>>>>>> successfully.  However, there were a few that failed with the deadlock
>>>>>>>> error -- some that were configured to use accurate and some that were 
>>>>>>>> not
>>>>>>>> configured to use accurate.  Also, on all of these, a second job 
>>>>>>>> started
>>>>>>>> for each of the affected hosts right after Bacula detected the deadlock
>>>>>>>> even though it said a reschedule would happen 3600 seconds later (the 
>>>>>>>> 3600
>>>>>>>> seconds is correct).
>>>>>>>> Tonight, I disabled accurate on all hosts and the deadlocks did not
>>>>>>>> happen.  No errors were detected and all the backups finished 
>>>>>>>> successfully.
>>>>>>>> Some questions...
>>>>>>>> 1.  Can I back up multiple hosts concurrently with some hosts
>>>>>>>> configured to use accurate and some configured not to use accurate?  
>>>>>>>> Or, is
>>>>>>>> it an all or none thing, meaning all hosts that run concurrently must
>>>>>>>> either be using accurate backup or not using accurate backup (cannot 
>>>>>>>> mix
>>>>>>>> the two)?
>>>>>>>> 2. It seems like the hosts that get out of the starting gate first
>>>>>>>> are the ones affected.  I am configured to run 50 jobs concurrently.
>>>>>>>> Again, no problems with accurate turned off on all hosts for months 
>>>>>>>> now.
>>>>>>>> 3. Why is Bacula spinning off a new job right away after it detects
>>>>>>>> the deadlock for each affected job instead of waiting until the 
>>>>>>>> rescheduled
>>>>>>>> job runs?  I verified that there were no duplicate jobs in the queue 
>>>>>>>> before
>>>>>>>> the backups started running, no jobs were running before the start of 
>>>>>>>> the
>>>>>>>> backups, and I did not start any of these backups manually to cause a
>>>>>>>> second job to appear.
>>>>>>>> Bacula is not aware of any SQL internal deadlocks.
>>>>>>>> From the INNODB Monitor output:
>>>>>>>> TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
>>>>>>>> mysql tables in use 4, locked 4
>>>>>>>> 9 lock struct(s), heap size 1184, 5 row lock(s)
>>>>>>>> MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id
>>>>>>>> 29558637 <host> bacula Sending data
>>>>>>>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
>>>>>>>> DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
>>>>>>>> Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch 
>>>>>>>> JOIN
>>>>>>>> Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
>>>>>>>> Filename.Name)
>>>>>>>> TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode
>>>>>>>> AUTO-INC waiting
>>>>>>>> I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage
>>>>>>>> and Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 
>>>>>>>> 6's,
>>>>>>>> 5's and Windows Servers 2008 and 2012R2.
>>>>>>>> Any help would be much appreciated.
>>>>>>>> Warmest regards,
>>>>>>>> -craig
>>>>>>>> On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma <
>>>>>>>> shiroma.crai...@gmail.com> wrote:
>>>>>>>>> BTW, I suppose there could've been two jobs for the host(s) in
>>>>>>>>> scheduling queue.  If this was the case, is there a way to find out 
>>>>>>>>> after
>>>>>>>>> the fact?  If this did actually happen, what could cause duplicate 
>>>>>>>>> jobs to
>>>>>>>>> be scheduled on the same day at the same time?  I know no one 
>>>>>>>>> manually ran
>>>>>>>>> the jobs in question.  Again, this only was a problem for a few of 
>>>>>>>>> the jobs
>>>>>>>>> that ran last night, not all of them and some to do accurate backup 
>>>>>>>>> and
>>>>>>>>> some not.
>>>>>>>>> Regards,
>>>>>>>>> -craig
>>>>>>>>> On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma <
>>>>>>>>> shiroma.crai...@gmail.com> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> I had a few backups fail last night with the following error:
>>>>>>>>>> 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File
>>>>>>>>>> (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT
>>>>>>>>>> batch.FileIndex, batch.JobId, Path.PathId, 
>>>>>>>>>> Filename.FilenameId,batch.LStat,
>>>>>>>>>> batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = 
>>>>>>>>>> Path.Path)
>>>>>>>>>> JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found 
>>>>>>>>>> when
>>>>>>>>>> trying to get lock; try restarting transaction
>>>>>>>>>> The only thing I did yesterday was switch a bunch of backups to
>>>>>>>>>> use Accurate backup and restart bacula-dir and bacula-sd after that.
>>>>>>>>>> However, the above problem also occurred on some hosts that was not 
>>>>>>>>>> set to
>>>>>>>>>> use Accurate backup.  From the log, it seems like two jobs for this 
>>>>>>>>>> host
>>>>>>>>>> was scheduled to run at 18:00 because the second job started and 
>>>>>>>>>> found a
>>>>>>>>>> duplicate job (job 123984) and canceled the backup.  I know there 
>>>>>>>>>> were no
>>>>>>>>>> jobs running before 18:00 so 123984 was not an old job still 
>>>>>>>>>> running.  Same
>>>>>>>>>> with the other jobs that were canceled because of the above 
>>>>>>>>>> situation.
>>>>>>>>>> Anyway, does anyone have an idea what would cause this,
>>>>>>>>>> especially how the second job got shot into the system.  After the 
>>>>>>>>>> deadlock
>>>>>>>>>> error, Bacula said it would reschedule the job.  However the second 
>>>>>>>>>> job
>>>>>>>>>> started right after the deadlock error instead of one hour later 
>>>>>>>>>> which
>>>>>>>>>> makes me think that there were two jobs for this host scheduled to 
>>>>>>>>>> run at
>>>>>>>>>> 18:00.
>>>>>>>>>> Thank you in advance,
>>>>>>>>>> -craig
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> _______________________________________________
>>>>>>>> Bacula-users mailing 
>>>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> _______________________________________________
>>>>>>> Bacula-users mailing 
>>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> _______________________________________________
>>>>>>> Bacula-users mailing 
>>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> _______________________________________________
>>>>>>> Bacula-users mailing list
>>>>>>> Bacula-users@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>> ------------------------------------------------------------------------------
>>>>> _______________________________________________
>>>>> Bacula-users mailing list
>>>>> Bacula-users@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
Bacula-users mailing list

Reply via email to