Hello,

It is a little hard to answer this email since I don't remember the details of 
each email thread.  It would help in the future to leave a bit additional 
info, like who sent the last message.

I'll make a few comments below though ...

On Tuesday 09 October 2007 16:10, David Boyes wrote:
> > My thoughts on this would be to make the SD-MUX a a totally separate
> > daemon with perhaps it's own DB. And the mux logic be left completely
> > out of the Director.

No, that would be a major change to the way Bacula works, and although I 
didn't design by looking at existing programs, I notice that at least one 
major commercial backup solution has the same overall architecture as Bacula, 
and they praise themselves a lot for having it -- i.e. the Director is the 
central point of control. The other daemons only know about their specific 
tasks.

>
> The director has to be involved to some degree to ensure that device
> reservations are properly registered (to prevent it from trying to make
> conflicting reservations for devices for non-mux jobs).

Actually, the Volume/drive reservations are now handled in the SD.  The Dir 
passes all the info the SD could ever want, and it figures out if it can get 
a device (or waits if they are all busy) then informs the Dir.

> If we're that 
> far down the road, then having the director tell the sd-mux how to set
> up the sessions isn't that much further to go. I do agree that the
> sd-mux has to be a separate daemon, though -- it can borrow a lot of
> code from the existing sd and fd, though.

There is no advantage to make the sd-mux a separate program -- any more than 
the reservation system.  It could at some point be put into a DSO if desired, 
but I don't consider that urgent.  More below ...

>
> I think there's several key problems to solve here:
>
> 1) having the database record multiple locations for a file

That is not so easy to do.

> 2) having the sd-mux daemon

That is a trivial piece of additional code.  All the necessary DIR code is 
already written, and the DIR<->SD protocol already exists to cover this need.

> 3) having the director understand how to use the sd-mux (eg, how to know
> when one is needed, and how to instruct it what to do)

It already knows how to do this.  The backend (SD-mux) code is just not there.

> 4) modifying the restore process to understand multiple copies and
> restore from the most preferred one

As with #1 that is not so easy to do, though I think I have now worked out 
step 1 in the right direction.

>
> #1 is (IMHO) the least difficult problem: the last major rev of the
> database schema provided the data structure to record multiple
> locations. AFAIK, none of the code references anything beyond the first
> entry, but the space is there to record things once there is code to do
> so.

In the first baby step, there is probably no need for a database change.  
However, the key to understanding the difficulties, and something that is not 
going to change is that Bacula is Job based, not file based.

>
> #2 is essentially a meld of a SD and FD, plus a setup connection to the
> director. I'd suggest this be a daemon controlled by inetd, triggered by
> a connection request from the director to the control session port
> (minimize the # of static ports needed to 1 new port). Inetd would spin
> off a copy of the sd-mux for the director. The director would then
> instruct the sd-mux about the # of streams required and which actual SDs
> are involved. 

IMO inetd is a bad way to go. It will unnecessarily consume an extra port, and 
is a solution that worked well many years ago on small memory systems. Now 
that Microsoft has made 2GB the minimum working RAM for Vista, there is no 
disadvantage of having daemons or more code in the SD (in a DSO if necessary 
at some point).  Doing it with a continuously running daemon avoids problems 
of security, additional ports, the expense of initialization (reading the 
conf file, ...), and persistence (i.e. knowing what the current state of 
everything is).

> The director would then go about the usual device 
> reservation and volume selection process already in place for normal
> jobs. Once the actual SDs report ready, the director informs the real FD
> of the address and port # of the sd-mux, and backups occur as normal,
> with the sd-mux as the target SD for the real FD. The sd-mux acts like a
> FD to the real SDs, thus requiring no protocol changes on real FD or
> SDs. The SDs handle media as normal, signaling the director to notify it
> of volume changes as required. The sd-mux receives data, writes it to
> each real SD, and returns status when all the writes complete. At EOJ,
> the sd-mux handles the shutdown of the sessions to the real SDs, and
> then shuts down the session to the real FD. It then informs the director
> of the EOJ state, and exits.

I think there are little if any changes necessary for the DIR. 95% of the code 
to do the above has been there for many years, it just has been used in 
a "crippled" form.  I may have even added code to avoid users from 
accidentally getting into problems.  If the SD is made a bit smarter the DIR 
can be "uncrippled".

>
> This would also require some minor updates to the real SD logic to test
> for the presence of a file and update it's media record rather than
> inserting it (if such code doesn't already exist now).

The SD can be easily enhanced to provide the features you suggest, though in a 
bit different way than you imply that fits more with the existing code.

>
> #3 is somewhat covered in the above description. The sd-mux would need
> to know how many streams to prepare (3 is about the practical maximum
> based on experience with mainframe apps that do this type of work now),
> and the hostname/ip address and port numbers for the real SDs to use for
> this job, based on the reservations made by the director. The sd-mux
> would also need to know how to abort a job if a session to a real SD
> failed during the job.
> The sd-mux would also need to know the range of ports valid on the
> sd-mux host (note that the host running the sd-mux may NOT be the same
> host running the director, and we should design accordingly), and there
> may be a good reason to constrain the available ports on the sd-mux host
> for firewall friendliness reasons.

There is no need for dealing with any additional port.  Unlike other programs, 
Bacula has IANA registered ports, and the current design permits easy 
multiplexing of existing port numbers. 

>
> #4 is pretty simple once all the other things are done...8-) Your idea
> of a priority in the pool definition is a good one; I'd argue that there
> is a implicit method of defining this priority. If the file is available
> in a disk pool (or other random access storage), then we should prefer
> to pull the restored file from the disk. Media pools in the same
> location should have a lower priority, and media with a different
> location value should have a even lower priority. If a volume is marked
> missing or unavailable, it should be automatically skipped.
>
> An alternative method that would require more work, but would be
> ultimately better in terms of self management, would be to measure
> response time of storage daemons in the director over the last 10-20
> requests (eg, time from start of reservation to SD ready) in the
> director database, and choose the fastest responding SD that contains a
> copy of the file (subject to conditions listed above wrt to location).
> This would tend over time to spread out the load over multiple SDs at
> the same site.
>
> In a more general sense, this kind of approach would also be helpful in
> implementing multiple site migration jobs (a sd-mux could be used to
> move files between SDs, if a migration job spun off a daemon copy to act
> as a restore FD that immediately turned around and resent the data to a
> sd-mux.

Unfortunately due to other constraints (mostly my enterprise initiative), this 
is not something I personally am going to get to very soon.  I'll send an 
email on that in the next few weeks.

Regards,

Kern


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to