Re: [Bacula-users] [GENERAL] Catastrophic changes to PostgreSQL 8.4

2009-12-03 Thread Frank Sweetser

On 12/3/2009 3:33 AM, Craig Ringer wrote:

Kern Sibbald wrote:

Hello,

Thanks for all the answers; I am a bit overwhelmed by the number, so I am
going to try to answer everyone in one email.

The first thing to understand is that it is *impossible* to know what the
encoding is on the client machine (FD -- or File daemon).  On say a


Or, even worse, which encoding the user or application was thinking of when it 
wrote a particular out.  There's no guarantee that any two files on a system 
were intended to be looked at with the same encoding.



Unix/Linux system, the user could create filenames with non-UTF-8 then switch
to UTF-8, or restore files that were tarred on Windows or on Mac, or simply
copy a Mac directory.  Finally, using system calls to create a file, you can
put *any* character into a filename.


While true in theory, in practice it's pretty unusual to have filenames
encoded with an encoding other than the system LC_CTYPE on a modern
UNIX/Linux/BSD machine.


Unless, of course, you're at a good sized school with lots of international 
students, and have fileservers holding filenames created on desktops running 
in Chinese, Turkish, Russian, and other locales.


In the end, a filename is (under linux, at least) just a string of arbitrary 
bytes containing anything except / and NULL.  If bacula tries to get too 
clever, and munges or misinterprets those bytes strings - or, worse yet, if 
the database does it behind your back - then stuff _will_ end up breaking.


(A few years back, someone heavily involved in linux kernel filesystem work 
was talking about this exact issue, and made the remark that many doing 
internationalization work secretly feel it would be easier to just teach 
everyone english.  Impossible as this may be, I have since come to understand 
what they were talking about...)


--
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
 GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [Bacula-users] [GENERAL] Catastrophic changes to PostgreSQL 8.4

2009-12-03 Thread Frank Sweetser

On 12/03/2009 10:54 AM, Craig Ringer wrote:

Frank Sweetser wrote:


Unless, of course, you're at a good sized school with lots of
international students, and have fileservers holding filenames created
on desktops running in Chinese, Turkish, Russian, and other locales.


What I struggle with here is why they're not using ru_RU.UTF-8,
cn_CN.UTF-8, etc as their locales. Why mix charsets?


The problem isn't so much what they're using on their unmanaged desktops.  The 
problem is that the server, which is the one getting backed up, holds an 
aggregation of files created by an unknown collection of applications running 
on a mish-mash of operating systems (every large edu has its horror story of 
the 15+ year old, unpatched, mission critical machine that no one dares touch) 
with wildly varying charset configurations, no doubt including horribly broken 
and pre-UTF ones.


The end result is a fileset full of filenames created on a hacked Chinese copy 
of XP, a Russian copy of winME, romanian RedHat 4.0, and Mac OS 8.


This kind of junk is, sadly, not uncommon in academic environments, where IT 
is often required to support stuff that they don't get to manage.


--
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general