>>>>> On Thu, 29 Jun 2006 18:33:12 +0200, Kern Sibbald said:
> 
> On Thursday 29 June 2006 18:09, Martin Simmons wrote:
> > >>>>> On Thu, 29 Jun 2006 18:26:44 +0300, Peteris Krisjanis said:
> > >
> > > I found a solution, or at least it could clasified as working
> > > workaround, so I post it here for archives or someone else who has
> > > problems with it.
> > >
> > > So, I have Bacula server/director/sd as Debian and client/fd as OS X
> > > server. Bacula was installed trough Fink (in unstable/CVS packages,
> > > compiled from tar.gz). I configured it common client for Bacula and
> > > ensure it has right permitions to stream files to director.
> > >
> > > My problem was that I wanted to exclude files with Unicode characters
> > > with it. So I wrote simply in bacula-dir.conf file unicode symbols
> > > trough Gedit, restarted director and tried to launch my job. It failed
> > > to recognize Unicode characters written in director's file and went on
> > > with backup of these files, instead of excluding.
> > >
> > > First, I messed with various things like configuration file, tried the
> > > same situation with Linux workstation (where this situation was
> > > non-issue), etc. and then googled (and in same time got at least
> > > informative message from mailing list, thanks everyone for suggestions)
> > > and figured out that it is OS X different handling of Unicode on it's
> > > HFS+ file systems. OS X uses different way of composing characters (so
> > > called decomposed canonical format), so, it didn't understood simply
> > > what I wanted from it.
> > >
> > > First of all I think Bacula should be fixed to support this, but as it
> > > could take a quite time, but I loved Bacula and would like to have it as
> > > backup solution, I searched for some workarounds. And here is one.
> > >
> > > What is needed - graphical terminal like Konsole or Terminal of GNOME
> > > fame (or any other terminal with UTF-8 support). Open ssh connections to
> > > Debian (server) and OS X (client). On both boxes locale should UTF-8
> > > (en_US.UTF-8 on Mac, en_US.utf8 on Linux). On OS X box, do ls -lah or
> > > simply ls to get OS X "version" of file name in Unicode (unicode chars
> > > will mostly look like upper line). Simply do a Ctrl+C or copy, and then
> > > go to Debian box, open bacula-dir.conf and go to FileSet you need to get
> > > ths file/directory name in and paste it in. Save and restart bacula-dir
> > > and go on with your jobs.
> >
> > I'm glad you found a solution.  It is probably the best one for now.
> >
> > The issue is rather a nightmare, because on Linux you can probably create
> > two files that differ only in their canonicalization. :-(
> >
> > Possibly Bacula needs to have an option (per fileset?) that controls
> > unicode canonicalization/comparison, but it potentially spreads across the
> > FD, Director and any restore guis.
> 
> Hello Martin,
> 
> What the devil is unicode canonicalization?
> 
> Do you mean ensuring that the UTF-8 is proper UTF-8 since it is possible to 
> write incorrect UTF-8, which more or less works (at least for display), or 
> are you talking about something like converting 16 bit Unicode to UTF-8?

No, it is not related to UTF-8 itself, but is a problem inherent in the 16 (or
more) bit Unicode codes.

The issue is that a human reader generally just wants the visual appearance of
text to be right, but Unicode has to represent this as integer codes for
programatic use and also has to deal with lots of legacy codes.

The result is that there are multiple ways to represent things in Unicode.

In this case, for accented letters, you can have a single (e.g. Latin-1) code
for the accented letter or a code (e.g. ASCII) for the unaccented letter
followed by some special codes for the accents.  The conversion between these
forms is called (de)composition.  There are other things like this, which
leads to need for canonical forms (e.g. with maximum composition or maximum
decomposition) to help programmers handle things like comparison of strings.

The OP's problem was that different operating systems handle canonicalization
differently.  Linux (AFAIK) does no canonicalization in the kernel
(applications are expected to do it) whereas the Mac OS X kernel converts all
filenames to canonical decomposed format in the filesystem implementation.

Mixing composed and decompased strings within the same Director/Catalog leads
to great confusion...

__Martin

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to