[Bacula-devel] Possible bug with migration/copy jobs run manually from bconsole.

Alex Bramley Thu, 29 Oct 2009 07:28:16 -0700

Hello list,

I'm implementing disk-to-disk-to-tape backups with bacula 3.0.2, using
copy jobs to duplicate the disk-to-disk jobs to tape for off-site
storage. I'm running into a problem when I manually run the copy job
in that it only copies the *last* of the set of job id's to tape.


I'm using this suggestion [1] from the bacula-devel list to copy the
disk-to-disk jobs into my tape pools, which cycle every four weeks and
are written on mondays, wednesdays and fridays. I wrote a postgres
stored procedure which locates the correct job IDs for a host to use
in the selection pattern[2]. The tape pools are configured as in [3],
and the copy job as in [4]. I've not got to the point of running these
jobs from schedules yet, as I am still in the testing phase of the
disk-to-tape work.

I am running the copy job with: run pool=W01-copy <hostname>-tape

I can see that my stored procedure is returning the correct job id's.
Here is an example run's log output:

29-Oct 09:46 bksrv0-dir JobId 105: The following 4 JobIds were chosen
to be copied: 38,53,82,101
29-Oct 09:46 bksrv0-dir JobId 105: Job not run.
29-Oct 09:46 bksrv0-dir JobId 105: Error: Could not start migration job.
29-Oct 09:46 bksrv0-dir JobId 105: Job not run.
29-Oct 09:46 bksrv0-dir JobId 105: Error: Could not start migration job.
29-Oct 09:46 bksrv0-dir JobId 105: Job not run.
29-Oct 09:46 bksrv0-dir JobId 105: Error: Could not start migration job.
29-Oct 09:46 bksrv0-dir JobId 105: Copying using JobId=101
Job=<hostname>.2009-10-28_21.00.00_41
29-Oct 09:46 bksrv0-dir JobId 105: Unable to get Job Volume
Parameters. ERR=sql_get.c:442 No volumes found for JobId=101
29-Oct 09:46 bksrv0-dir JobId 105: Previous Job has no data to copy.
<snip>

The last error is not a problem; the last incremental for the host in
question had no data to back up. The problem is the three "Job not
run." errors above that. I have looked through the source[5] and
here's where I am seeing what looks like a bug. On line 867 all but
the last of the jobs is passed on to start_migration_job(). On line
920-921, inside this subroutine, a new run command line is assembled,
and on line 24 it is run. However, what is *not* being passed on is my
initial "pool=W01-copy", which causes the job to fail. I have tested
this in bconsole too; running "run pool=W01-copy jobid=53
job=<hostname>-tape" ran a copy job fine, whereas running "run
job=<hostname>-tape jobid=53" failed with the same "Job not run."
error.

In my schedules (that are not live yet) the pool will be specified. I
can't figure out whether this means that that pool will be passed on
to the jobs started by start_migration_job() correctly, despite having
spent most of the morning looking at the code. I also can't offer a
patch to fix the problem; I wouldn't know where to start :-)

If I can provide any more useful information to help fix the problem,
please just ask!

Many Thanks,
--Alex

[1] http://www.mail-archive.com/[email protected]/msg04599.html

[2]
CREATE OR REPLACE FUNCTION find_backup_jobs(host varchar)
    RETURNS SETOF integer AS $$
DECLARE
    last_full_time integer;
    last_diff_time integer;
BEGIN
    SELECT max(jobtdate) INTO last_full_time FROM job
      WHERE name = host AND level = 'F' AND type = 'B';
    SELECT max(jobtdate) INTO last_diff_time FROM job
      WHERE name = host AND level = 'D' AND type = 'B'
                        AND jobtdate > last_full_time;

    IF last_diff_time IS NOT NULL THEN
        -- differentials exist so this is a monthly backup
        RETURN QUERY
            SELECT jobid FROM job
              WHERE name = host AND type = 'B' AND (
                level = 'F' and jobtdate = last_full_time OR
                level = 'D' and jobtdate = last_diff_time OR
                level = 'I' and jobtdate > last_diff_time);
    ELSE
        -- no differentials here so grab full and all incrs
        RETURN QUERY
            SELECT jobid FROM job
              WHERE name = host AND type = 'B' AND (
                level = 'F' and jobtdate = last_full_time OR
                level = 'I' and jobtdate > last_full_time);
    END IF;

END;
$$ LANGUAGE plpgsql;

[3]
(an example)
Pool {
    Name       = W01
    Pool Type  = Backup
    Auto Prune = Yes
    Recycle    = Yes
    Storage    = TS3100
    Volume Retention      = 27 days
    Recycle Oldest Volume = yes
}
Pool {
    Name      = W01-copy
    Pool Type = Backup
    Next Pool = W01
}

[4]
(this comes from a perl script that generates bacula configuration
from a YAML input file)
Job {
    Name     = "$hostname"
    Client   = "$hostname"
    JobDefs  = "default-job"
    Storage  = "$hostname"
    Pool     = "$hostname"
    Schedule = "$schedule"
    FileSet  = "$fileset"
}

Job {
    Name     = "$hostname-tape"
    Client   = "None"
    JobDefs  = "default-tape"
    Pool     = "None" # overridden by schedule
    Storage  = "$hostname"
#    Schedule = "copy-to-tape"
    Selection Pattern = "select jobid from
find_backup_jobs('$hostname') as jobid;"
}

[5]: 
http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/bacula;a=blob;f=bacula/src/dird/migrate.c;h=2ffadb76fc04d502d81d4f33dfee6d96960b9ed8;hb=HEAD

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

[Bacula-devel] Possible bug with migration/copy jobs run manually from bconsole.

Reply via email to