Hi Kelvin,

Thank you for the info and help!  Good information to keep in mind.

I think I found the root of the problem thanks to everyone.  See my reply
to Ana.

Thanks again for the post.  It's much appreciated.

-craig


On Fri, Aug 7, 2015 at 10:46 AM, Kelvin Minter <kb.min...@gmail.com> wrote:

> MyISAM is terrible for transactions. If the deadlock is happening because
> of table locking then switching the engine to InnoDB might help your
> problem.
> MyISAM locks the entire table while InnoDB only locks the rows it is
> updating.
>
> Check out the link below.
>
> http://stackoverflow.com/questions/20148/myisam-versus-innodb
>
> On Thu, Aug 6, 2015 at 8:37 PM, Ana Emília M. Arruda <
> emiliaarr...@gmail.com> wrote:
>
>> Hello Craig,
>>
>> In one of your posts you mentioned Segmentation violation in the
>> director host. Accurate backups requires more resources than normal ones.
>> Have you checked if CPU and memory resources are enough in director and the
>> clients that are configured for using accurate mode?
>>
>> Best regards,
>> Ana
>>
>> On Thu, Aug 6, 2015 at 5:36 PM, Craig Shiroma <shiroma.crai...@gmail.com>
>> wrote:
>>
>>> Thanks Kern!  I'll bring in a DBA on our side to have a look.
>>>
>>> Would you have any thoughts on this question posed earlier?
>>>
>>> 3. Why is Bacula spinning off a new job right away after it detects the
>>> deadlock for each affected job instead of waiting until the rescheduled job
>>> runs?  I verified that there were no duplicate jobs in the queue before the
>>> backups started running, no jobs were running before the start of the
>>> backups, and I did not start any of these backups manually to cause a
>>> second job to appear.
>>>
>>> This happened on both nights I ran with Accurate turned On on the hosts
>>> that had failed backups because of the deadlock.
>>>
>>> Regards,
>>> -craig
>>>
>>> On Thu, Aug 6, 2015 at 9:48 AM, Kern Sibbald <k...@sibbald.com> wrote:
>>>
>>>> On 06.08.2015 21:44, Craig Shiroma wrote:
>>>>
>>>> Hi Kern,
>>>>
>>>> Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
>>>> 68.0, Revision 656.
>>>>
>>>> Would this setting cause the problem?
>>>> innodb_lock_wait_timeout = 100
>>>>
>>>> Is it too high or too low or has no bearing on the problem?
>>>>
>>>>
>>>> Sorry, I am a Bacula programmer, and I do not know much about databases
>>>> -- especially MySQL since I use PostgreSQL.  PostgreSQL is harder to
>>>> install and a bit harder to configure than MySQL, but it performs much
>>>> better.
>>>>
>>>>
>>>>
>>>> Thanks again,
>>>> -craig
>>>>
>>>>
>>>> On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald <k...@sibbald.com> wrote:
>>>>
>>>>> On 06.08.2015 18:46, Bryn Hughes wrote:
>>>>>
>>>>> I think what Kern is getting at is that your database is what threw
>>>>> the error, not Bacula.  Whatever DB you are using is what is having the
>>>>> issue.
>>>>>
>>>>>
>>>>> Yes.  That is exactly what I was implying.
>>>>>
>>>>> The rest of this is directed to Craig:
>>>>> If you are using MariaDB (I have no indication that you are), please
>>>>> be aware that it may be a very good database, maybe even better than 
>>>>> MySQL,
>>>>> but Bacula is built and tested against MySQL, and if you use binaries that
>>>>> were built for MySQL, you could run into problems by using MariaDB.  Even
>>>>> if your binaries were explicitly built with MariaDB, it may not be
>>>>> compatible with the way Bacula works.  Bacula has a tendency to push
>>>>> databases to the extreme, and it works well with MySQL and PostgreSQL, but
>>>>> possibly not with other databases.  I bring up MariaDB because it has been
>>>>> mentioned in another posting to this list.
>>>>>
>>>>> I would be very surprised if your problem has anything to do with
>>>>> Accurate -- the database routines know nothing about accurate and none of
>>>>> the data is different.  It is more likely due to the VM environment or to
>>>>> some build or version problem with MySQL (or MariaDB).
>>>>>
>>>>> Best regards,
>>>>> Kern
>>>>>
>>>>>
>>>>> Bryn
>>>>>
>>>>> On 2015-08-06 09:11 AM, Craig Shiroma wrote:
>>>>>
>>>>> Hi Kern,
>>>>>
>>>>> Thank you very much for the reply!  Would you have any suggestions on
>>>>> what may be causing this problem or how I can debug it?  Obviously, I'm
>>>>> encountering deadlocks when accurate backup runs on some of our hosts and
>>>>> we want to use accurate backup on all of our hosts if possible.
>>>>>
>>>>> Warmest regards,
>>>>> -craig
>>>>>
>>>>> On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald <k...@sibbald.com>
>>>>> wrote:
>>>>>
>>>>>> On 06.08.2015 10:15, Craig Shiroma wrote:
>>>>>>
>>>>>> Hello again,
>>>>>>
>>>>>> I just thought I'd update this post with more information in hopes of
>>>>>> getting some explanation for the deadlocks.
>>>>>>
>>>>>> I ran with Accurate backup on our test VMs (RHEL) for a couple of
>>>>>> days and got the same errors on some VMs that were running accurate and
>>>>>> some that were not.  These hosts were running concurrently.  I would say
>>>>>> 90% of the hosts that were configured to use Accurate finished
>>>>>> successfully.  However, there were a few that failed with the deadlock
>>>>>> error -- some that were configured to use accurate and some that were not
>>>>>> configured to use accurate.  Also, on all of these, a second job started
>>>>>> for each of the affected hosts right after Bacula detected the deadlock
>>>>>> even though it said a reschedule would happen 3600 seconds later (the 
>>>>>> 3600
>>>>>> seconds is correct).
>>>>>>
>>>>>> Tonight, I disabled accurate on all hosts and the deadlocks did not
>>>>>> happen.  No errors were detected and all the backups finished 
>>>>>> successfully.
>>>>>>
>>>>>> Some questions...
>>>>>> 1.  Can I back up multiple hosts concurrently with some hosts
>>>>>> configured to use accurate and some configured not to use accurate?  Or, 
>>>>>> is
>>>>>> it an all or none thing, meaning all hosts that run concurrently must
>>>>>> either be using accurate backup or not using accurate backup (cannot mix
>>>>>> the two)?
>>>>>>
>>>>>> 2. It seems like the hosts that get out of the starting gate first
>>>>>> are the ones affected.  I am configured to run 50 jobs concurrently.
>>>>>> Again, no problems with accurate turned off on all hosts for months now.
>>>>>>
>>>>>> 3. Why is Bacula spinning off a new job right away after it detects
>>>>>> the deadlock for each affected job instead of waiting until the 
>>>>>> rescheduled
>>>>>> job runs?  I verified that there were no duplicate jobs in the queue 
>>>>>> before
>>>>>> the backups started running, no jobs were running before the start of the
>>>>>> backups, and I did not start any of these backups manually to cause a
>>>>>> second job to appear.
>>>>>>
>>>>>>
>>>>>> Bacula is not aware of any SQL internal deadlocks.
>>>>>>
>>>>>>
>>>>>> From the INNODB Monitor output:
>>>>>>
>>>>>> TRANSACTION:
>>>>>> TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
>>>>>> mysql tables in use 4, locked 4
>>>>>> 9 lock struct(s), heap size 1184, 5 row lock(s)
>>>>>> MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id
>>>>>> 29558637 <host> 192.168.10.99 bacula Sending data
>>>>>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
>>>>>> DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
>>>>>> Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch 
>>>>>> JOIN
>>>>>> Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
>>>>>> Filename.Name)
>>>>>> WAITING FOR THIS LOCK TO BE GRANTED:
>>>>>> TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC
>>>>>> waiting
>>>>>> WE ROLL BACK TRANSACTION (2)
>>>>>>
>>>>>> I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
>>>>>> Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 
>>>>>> 5's
>>>>>> and Windows Servers 2008 and 2012R2.
>>>>>>
>>>>>> Any help would be much appreciated.
>>>>>>
>>>>>> Warmest regards,
>>>>>> -craig
>>>>>>
>>>>>> On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma <
>>>>>> shiroma.crai...@gmail.com> wrote:
>>>>>>
>>>>>>> BTW, I suppose there could've been two jobs for the host(s) in
>>>>>>> scheduling queue.  If this was the case, is there a way to find out 
>>>>>>> after
>>>>>>> the fact?  If this did actually happen, what could cause duplicate jobs 
>>>>>>> to
>>>>>>> be scheduled on the same day at the same time?  I know no one manually 
>>>>>>> ran
>>>>>>> the jobs in question.  Again, this only was a problem for a few of the 
>>>>>>> jobs
>>>>>>> that ran last night, not all of them and some to do accurate backup and
>>>>>>> some not.
>>>>>>>
>>>>>>> Regards,
>>>>>>> -craig
>>>>>>>
>>>>>>> On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma <
>>>>>>> shiroma.crai...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I had a few backups fail last night with the following error:
>>>>>>>>
>>>>>>>> 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex,
>>>>>>>> JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT 
>>>>>>>> batch.FileIndex,
>>>>>>>> batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat,
>>>>>>>> batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = 
>>>>>>>> Path.Path)
>>>>>>>> JOIN Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when
>>>>>>>> trying to get lock; try restarting transaction
>>>>>>>>
>>>>>>>> The only thing I did yesterday was switch a bunch of backups to use
>>>>>>>> Accurate backup and restart bacula-dir and bacula-sd after that.  
>>>>>>>> However,
>>>>>>>> the above problem also occurred on some hosts that was not set to use
>>>>>>>> Accurate backup.  From the log, it seems like two jobs for this host 
>>>>>>>> was
>>>>>>>> scheduled to run at 18:00 because the second job started and found a
>>>>>>>> duplicate job (job 123984) and canceled the backup.  I know there were 
>>>>>>>> no
>>>>>>>> jobs running before 18:00 so 123984 was not an old job still running.  
>>>>>>>> Same
>>>>>>>> with the other jobs that were canceled because of the above situation.
>>>>>>>>
>>>>>>>> Anyway, does anyone have an idea what would cause this, especially
>>>>>>>> how the second job got shot into the system.  After the deadlock error,
>>>>>>>> Bacula said it would reschedule the job.  However the second job 
>>>>>>>> started
>>>>>>>> right after the deadlock error instead of one hour later which makes me
>>>>>>>> think that there were two jobs for this host scheduled to run at 18:00.
>>>>>>>>
>>>>>>>> Thank you in advance,
>>>>>>>> -craig
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bacula-users mailing 
>>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bacula-users mailing 
>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bacula-users mailing 
>>>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Bacula-users mailing list
>>>>> Bacula-users@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to