On Thursday 29 June 2006 22:23, Martin Simmons wrote:
> >>>>> On Thu, 29 Jun 2006 18:33:12 +0200, Kern Sibbald said:
> >
> > On Thursday 29 June 2006 18:09, Martin Simmons wrote:
> > > >>>>> On Thu, 29 Jun 2006 18:26:44 +0300, Peteris Krisjanis said:
> > > >
> > > > I found a solution, or at least it could clasified as working
> > > > workaround, so I post it here for archives or someone else who has
> > > > problems with it.
> > > >
> > > > So, I have Bacula server/director/sd as Debian and client/fd as OS X
> > > > server. Bacula was installed trough Fink (in unstable/CVS packages,
> > > > compiled from tar.gz). I configured it common client for Bacula and
> > > > ensure it has right permitions to stream files to director.
> > > >
> > > > My problem was that I wanted to exclude files with Unicode characters
> > > > with it. So I wrote simply in bacula-dir.conf file unicode symbols
> > > > trough Gedit, restarted director and tried to launch my job. It
> > > > failed to recognize Unicode characters written in director's file and
> > > > went on with backup of these files, instead of excluding.
> > > >
> > > > First, I messed with various things like configuration file, tried
> > > > the same situation with Linux workstation (where this situation was
> > > > non-issue), etc. and then googled (and in same time got at least
> > > > informative message from mailing list, thanks everyone for
> > > > suggestions) and figured out that it is OS X different handling of
> > > > Unicode on it's HFS+ file systems. OS X uses different way of
> > > > composing characters (so called decomposed canonical format), so, it
> > > > didn't understood simply what I wanted from it.
> > > >
> > > > First of all I think Bacula should be fixed to support this, but as
> > > > it could take a quite time, but I loved Bacula and would like to have
> > > > it as backup solution, I searched for some workarounds. And here is
> > > > one.
> > > >
> > > > What is needed - graphical terminal like Konsole or Terminal of GNOME
> > > > fame (or any other terminal with UTF-8 support). Open ssh connections
> > > > to Debian (server) and OS X (client). On both boxes locale should
> > > > UTF-8 (en_US.UTF-8 on Mac, en_US.utf8 on Linux). On OS X box, do ls
> > > > -lah or simply ls to get OS X "version" of file name in Unicode
> > > > (unicode chars will mostly look like upper line). Simply do a Ctrl+C
> > > > or copy, and then go to Debian box, open bacula-dir.conf and go to
> > > > FileSet you need to get ths file/directory name in and paste it in.
> > > > Save and restart bacula-dir and go on with your jobs.
> > >
> > > I'm glad you found a solution.  It is probably the best one for now.
> > >
> > > The issue is rather a nightmare, because on Linux you can probably
> > > create two files that differ only in their canonicalization. :-(
> > >
> > > Possibly Bacula needs to have an option (per fileset?) that controls
> > > unicode canonicalization/comparison, but it potentially spreads across
> > > the FD, Director and any restore guis.
> >
> > Hello Martin,
> >
> > What the devil is unicode canonicalization?
> >
> > Do you mean ensuring that the UTF-8 is proper UTF-8 since it is possible
> > to write incorrect UTF-8, which more or less works (at least for
> > display), or are you talking about something like converting 16 bit
> > Unicode to UTF-8?
>
> No, it is not related to UTF-8 itself, but is a problem inherent in the 16
> (or more) bit Unicode codes.
>
> The issue is that a human reader generally just wants the visual appearance
> of text to be right, but Unicode has to represent this as integer codes for
> programatic use and also has to deal with lots of legacy codes.
>
> The result is that there are multiple ways to represent things in Unicode.
>
> In this case, for accented letters, you can have a single (e.g. Latin-1)
> code for the accented letter or a code (e.g. ASCII) for the unaccented
> letter followed by some special codes for the accents.  The conversion
> between these forms is called (de)composition.  There are other things like
> this, which leads to need for canonical forms (e.g. with maximum
> composition or maximum decomposition) to help programmers handle things
> like comparison of strings.
>
> The OP's problem was that different operating systems handle
> canonicalization differently.  Linux (AFAIK) does no canonicalization in
> the kernel (applications are expected to do it) whereas the Mac OS X kernel
> converts all filenames to canonical decomposed format in the filesystem
> implementation.
>
> Mixing composed and decompased strings within the same Director/Catalog
> leads to great confusion...
>

Egads,  thanks for the details.  I had forgotten about that aspect of Unicode 
since I rarely work with Windows (or Mac).  Hopefully myself or someone else 
Frank?  could summarized this for the manual.  For the moment, the manual 
lacks all mention of Unicode/UTF-8, so I will add this to my todo so it does 
not get lost.


-- 
Best regards,

Kern

  (">
  /\
  V_V

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to