On Sunday 02 September 2007 06:05, James Harper wrote:
> > Hello,
> >
> > Below, you will find a few notes on my ideas ...
> >
> > On Saturday 01 September 2007 13:55, James Harper wrote:
> > > Here's some thoughts on the way I'd like to see a plugin API
>
> implemented
>
> > > in the bacula file daemon, based on the work I've done on the
>
> Exchange
>
> > > Agent for Bacula.
> > >
> > > Agent loading a 'bacula library', or bacula loading an 'agent
>
> library'?
>
> > I think the most logical solution is a "plugin".
> >
> > > As long as the agent and bacula both support the same version of the
>
> API
>
> > > (and it would be good if they could negotiate a version, although
>
> this
>
> > > may be a heap more work), it would be nice to use dynamically
>
> loadable
>
> > > libraries rather than something compiled in. I think my preference
>
> would
>
> > > be for the bacula filed to load a .dll/.so based on a configuration
> > > option. So if 'plugin exchange.dll' was specified, bacula would use
>
> that
>
> > > rather than its own internal backup/restore code. Or maybe the
>
> normal
>
> > > filesystem backup itself would be a plugin?
> >
> > The most likely mode of working would be to add a directive that
>
> defines a
>
> > plugin directory, from which Bacula would load all plugins.  This
>
> seems to
>
> > be rather standard.
>
> The way I imagine it, the config file would identify a plugin to be used
> per job, although it would still make sense to identify a directory
> where the plugins live, and maybe each plugin could return a name which
> would be used in the config file (eg exchange.dll has a 'Name()'
> function exposed which returns 'MicrosoftExchange', so in the config
> file you'd be able to say 'plugin = MicrosoftExchange')

Yes, I am not sure.  In the current code base, there is already partial code 
that allows specification of the name of the plugin for reading and for 
writing, but as I say, I am not sure -- see below.

>
> > > What bacula would provide?
> > >
> > > Bacula would provide all of the network code, and make calls to the
> > > agent as required (eg 'prepare this file to start restoring'), and
> > > provide some helper functions (eg 'is this file in the fileset?') It
> > > would be responsible for handling compression (it would take a
>
> FILE_DATA
>
> > > stream and turn it into a GZIP_DATA (or other) stream as directed by
>
> the
>
> > > options), and would also add on any signature/digest information
> > > required, and would also provide data encryption if required.
> > >
> > > What the agent would provide?
> > >
> > > The way I've set up the Exchange agent, there is a base class which
> > > provides roughly what the bacula filed would, and then the actual
> > > Exchange code, which does the talking to exchange.
> > >
> > > For a backup, the following is roughly what I see that needs to
>
> happen:
> > > 1. bacula makes a 'prepare backup' call with the list of fileset
> > > options, and the type of backup being done (full, differential, etc)
> > > 2. bacula makes a 'next file' call to get a handle on the next file
>
> to
>
> > > be backed up, and a list of the stream types to be backed up
> > > 3. agent finds the next file to be backed up, making calls to a 'is
>
> this
>
> > > file in the fileset' function provided by bacula, and when it finds
> > > something, returns that file to bacula.
> > > 4. bacula makes a 'start backup of stream x' call to initiate the
>
> backup
>
> > > of a given stream type
> > > 5. bacula makes a 'read data' call to read the data for the current
> > > stream
> > > 6. loop to 4 until end of file
> > > 7. loop to 3 until all streams are done
> > > 8. loop to 2 until all files are done
> >
> > This is one part that I have not totally worked out.  As you describe
>
> it,
>
> > it
> > is not really complete in that Bacula needs to have a core backup
> > functionality that can be *extended* or *overridden* by plugins.  Thus
>
> we
>
> > need some way in the beginning that the plugin can more or less
>
> register
>
> > to receive control for certain files, or it could even receive control
>
> for
>
> > all files and either decide to handle them or not.  IMO the plugin
> > shouldn't really have to deal with a lot of options or the FileSets --
> > that is base Bacula responsibility.
> >
> > One of the big problems I have not yet figured out is: if you backup a
> > particular file with a plugin, then it should really be restored by
>
> the
>
> > same plugin.  How do you assure that?  What do you do if the plugin is
>
> not
>
> > there? Do you save the name of the plugin on the Volume? ...
>
> My idea of using plugins on a per-job basis solves this. Bacula just has
> to look up the job that backed it up and use the same plugin to restore
> (hmmm... does Bacula know the job that did the backup? What about on a
> bscan?). 

My plan was to have plugins as fine as the Options level (as currently 
partially implemented), which is finer grained than the Job level, and it 
does not at all solve the problem, but creates new problems; the main one 
being that unless you figure out how to put these plugin names on the Volume, 
the Volume is no longer complete, but requires a *special* conf file to be 
properly read.  I consider this a partial implementation that will get the 
user into trouble, so it is something I have ruled out implementing -- at 
least for the moment.

> It would be nice to be able to override this though, as for at 
> least MSSQL backups, the format on tape is exactly the same as the
> format stored on disk when you use the "BACKUP DATABASE xxx TO
> DISK='filename.bak'" command, so as a last resort you could just restore
> your SQL backup to a plain file and use MSSQL to do the restore from
> there, eg in disaster recovery mode where you don't have a working
> plugin yet.

It seems to me that if a particular plugin is not available, the user will not 
be able to restore his data.  I believe that you are viewing the problem from 
too narrow a perspective, because in general the kinds of plugins that will 
be written will not be easy to simulate in some sort of disaster recovery 
mode without the plugin.  Were it that simple, I don't think we would need 
the plugins.

>
> The problem with trying to integrate your whole backup (eg files +
> exchange + mssql) into one job is that each deals with different logical
> things. A file backup obviously just deals with plain files, but an
> exchange backup logically deals with storage groups and databases
> (databases consist of multiple files - normally two - and a storage
> group has multiple databases and then multiple log files which hold
> transactions for all databases in that storage groups).

At the very lowest level of the code, you are correct, but in reality a "file 
backup" does not just deal with plain files. It deals with a tree of objects.  
Normally you call those objects files, which is OK, but in fact to Bacula 
they are a whole bunch of different types of objects (directories, which form 
the tree; sockets; character files; block files; normal files; FIFOS, ...).  
There is no limit to the number of objects that Bacula can deal with. Each 
one has a different backup and restore method (though the code is not really 
that well organized). The only *current* requirement is that they be in a 
tree relationship.

I'm not sure why the above is a problem, other than finding a proper namespace 
and browsing the backup objects.

>
> But it is through discussions such as this that I'm sure we'll work it
> all out :)
>
> > > One of the things I haven't gotten my head around yet is the
>
> difference
>
> > > between how an incremental or differential backup is handled in the
> > > bacula filed and Exchange. If I understand it correctly, the
>
> director
>
> > > tells the filed when the last backup was done, and the filed gives
>
> it
>
> > > any files that have changed since then (with obvious problems for
> > > deleted or renamed files). Exchange keeps it's own internal numbers
>
> on
>
> > > when the backup was last done, and just gives you all the logfiles
>
> since
>
> > > the last full backup (differential) or all the logfiles since the
>
> last
>
> > > incremental. I'm pretty sure it will all still work just fine, but
>
> maybe
>
> > > the director might have some misinformation about exactly what the
> > > incremental/differential backup represents...
> > >
> > > On a completely unrelated note... I wonder if there is a way for any
> > > filesystems to keep some sort of journal (like the journal in
>
> journaling
>
> > > filesystems but with different requirements) around long enough to
>
> be
>
> > > able to track deleted or moved files in a way that would be useful
>
> to
>
> > > bacula to do accurate incremental/differential backups...
> >
> > The problem with such things is that if there are several backup
>
> programs,
>
> > the scheme will fail.  If you want real security, Bacula has to do
>
> this
>
> > itself.
>
> True, but a _lot_ more work for bacula. Although I haven't been
> following the discussions on 'true' incremental/differential backups so
> you may already have worked out a nice solution to this.

Yes, I know how to do project #1 "Accurate restoration of renamed/deleted 
files". The only unknown are some minor details mainly concerning 
performance.

Given that we have a proper namespace and can browse plugin backups, the 
techniques should work equally well there too.

>
> Your above statement is also true for Exchange right now, as per my
> previous paragraph. Exchange keeps track internally of when the last
> full backup was done, so bacula needs to somehow know that it doesn't
> have all the control it would like over incremental and differential
> backups.
>
> I guess one of the other things a plugin needs to do is tell bacula
> about its capabilities, eg 'Can do Incremental Backup', 'Can do
> Differential Backup', etc.

I don't think that is quite the right question.  All plugins will have to know 
how to do all implemented backup types.  It doesn't make sense to do a 
partial implementation.  That said, I imagine the plugin may have a certain 
flexibility in the type of backup it does.  If an Incremental is requested, 
there is no real harm if it does a Differential other than efficiency ... 
however, I don't imagine that would be a normal case.

Regards,

Kern

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to