I think what Kern is getting at is that your database is what threw the error, not Bacula. Whatever DB you are using is what is having the issue.

Bryn

On 2015-08-06 09:11 AM, Craig Shiroma wrote:
Hi Kern,

Thank you very much for the reply! Would you have any suggestions on what may be causing this problem or how I can debug it? Obviously, I'm encountering deadlocks when accurate backup runs on some of our hosts and we want to use accurate backup on all of our hosts if possible.

Warmest regards,
-craig

On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald <k...@sibbald.com <mailto:k...@sibbald.com>> wrote:

    On 06.08.2015 10:15, Craig Shiroma wrote:
    Hello again,

    I just thought I'd update this post with more information in
    hopes of getting some explanation for the deadlocks.

    I ran with Accurate backup on our test VMs (RHEL) for a couple of
    days and got the same errors on some VMs that were running
    accurate and some that were not.  These hosts were running
    concurrently.  I would say 90% of the hosts that were configured
    to use Accurate finished successfully.  However, there were a few
    that failed with the deadlock error -- some that were configured
    to use accurate and some that were not configured to use
    accurate.  Also, on all of these, a second job started for each
    of the affected hosts right after Bacula detected the deadlock
    even though it said a reschedule would happen 3600 seconds later
    (the 3600 seconds is correct).

    Tonight, I disabled accurate on all hosts and the deadlocks did
    not happen.  No errors were detected and all the backups finished
    successfully.

    Some questions...
    1.  Can I back up multiple hosts concurrently with some hosts
    configured to use accurate and some configured not to use
    accurate?  Or, is it an all or none thing, meaning all hosts that
    run concurrently must either be using accurate backup or not
    using accurate backup (cannot mix the two)?

    2. It seems like the hosts that get out of the starting gate
    first are the ones affected.  I am configured to run 50 jobs
    concurrently.  Again, no problems with accurate turned off on all
    hosts for months now.

    3. Why is Bacula spinning off a new job right away after it
    detects the deadlock for each affected job instead of waiting
    until the rescheduled job runs?  I verified that there were no
    duplicate jobs in the queue before the backups started running,
    no jobs were running before the start of the backups, and I did
    not start any of these backups manually to cause a second job to
    appear.

    Bacula is not aware of any SQL internal deadlocks.


    From the INNODB Monitor output:

    TRANSACTION:
    TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
    mysql tables in use 4, locked 4
    9 lock struct(s), heap size 1184, 5 row lock(s)
    MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id
    29558637 <host> 192.168.10.99 bacula Sending data
    INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat,
    MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
    Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM
    batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON
    (batch.Name = Filename.Name)
    WAITING FOR THIS LOCK TO BE GRANTED:
    TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode
    AUTO-INC waiting
    WE ROLL BACK TRANSACTION (2)

    I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage
    and Catalog running on separate RHEL 6.6 hosts.  Our clients are
    RHEL 6's, 5's and Windows Servers 2008 and 2012R2.

    Any help would be much appreciated.

    Warmest regards,
    -craig

    On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma
    <shiroma.crai...@gmail.com <mailto:shiroma.crai...@gmail.com>> wrote:

        BTW, I suppose there could've been two jobs for the host(s)
        in scheduling queue.  If this was the case, is there a way to
        find out after the fact?  If this did actually happen, what
        could cause duplicate jobs to be scheduled on the same day at
        the same time?  I know no one manually ran the jobs in
        question.  Again, this only was a problem for a few of the
        jobs that ran last night, not all of them and some to do
        accurate backup and some not.

        Regards,
        -craig

        On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma
        <shiroma.crai...@gmail.com
        <mailto:shiroma.crai...@gmail.com>> wrote:

            Hello,

            I had a few backups fail last night with the following error:

            2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File
            (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
            DeltaSeq) SELECT batch.FileIndex, batch.JobId,
            Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
            batch.DeltaSeq FROM batch JOIN Path ON (batch.Path =
            Path.Path) JOIN Filename ON (batch.Name = Filename.Name):
            ERR=Deadlock found when trying to get lock; try
            restarting transaction

            The only thing I did yesterday was switch a bunch of
            backups to use Accurate backup and restart bacula-dir and
            bacula-sd after that. However, the above problem also
            occurred on some hosts that was not set to use Accurate
            backup.  From the log, it seems like two jobs for this
            host was scheduled to run at 18:00 because the second job
            started and found a duplicate job (job 123984) and
            canceled the backup.  I know there were no jobs running
            before 18:00 so 123984 was not an old job still running.
            Same with the other jobs that were canceled because of
            the above situation.

            Anyway, does anyone have an idea what would cause this,
especially how the second job got shot into the system. After the deadlock error, Bacula said it would reschedule
            the job. However the second job started right after the
            deadlock error instead of one hour later which makes me
            think that there were two jobs for this host scheduled to
            run at 18:00.

            Thank you in advance,
            -craig





    
------------------------------------------------------------------------------


    _______________________________________________
    Bacula-users mailing list
    Bacula-users@lists.sourceforge.net  
<mailto:Bacula-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/bacula-users




------------------------------------------------------------------------------


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to