On 23.11.2010 08:55, Maho NAKATA wrote:
Hi Bernd,
  
Hi Maho!
How we know where we use invalid non UTF-8 characters?
It's not immediately clear to me.
  
When you are in an environment where you have UTF-8 encoding set (eg. a Unix shell with the LANG=en-US.UTF-8 environment set) than list your files and there see any special chars that do not belong to your language eg. just a square or you see something like \234 being displayed instead of a char or similar than you know someone has commited those files using a different encoding, eg. BIG-5.

Another simple possible test is trying to import the whole CVS archive into an SVN Repository while having LANG=en-US.UTF-8 or similar set. The "svn import" command would complain on any non UTF-8 chars in that case.

thanks,
 Nakata Maho
  

Kind regards,
Bernd Eilers

From: Bernd Eilers <bernd.eil...@oracle.com>
Subject: [native-lang] native language webcontent developers please review your filenames for invalid non UTF-8 characters
Date: Mon, 22 Nov 2010 17:49:40 +0100

  
Hi native language communities and esp. the WebContent developers
among you!

I recently stumbled over a few filenames in OpenOffice.org's
webcontent which have invalid non-UTF-8 characters in their filenames.

The character encoding to be used for webconent checked into
OpenOffice.org´s webcontent CVS repository is UTF-8. Please make sure
to use an UTF-8 locale when checking in files with non-us-AscII
chars. For example if you are in france and are using some Unix OS set
LANG=fr.UTF-8 and not LANG=fr.ISO8859-15. GUI CVS Clients used on
Windows often allow to specify the encoding to be used explicitly.

Filenames with other encodings will not work and what is even worse
they do create a big problem when moving OpenOffice.org to the new
kenai based infrastructure.
While CVS does not care much about invalid chars in filenames
subversion which will be used on the new infrastructure does treat
those filenames as errornous and as a result will not import the whole
project at all.

Could native language projects webcontent developers please review
their webcontent and change anything that is currently not UTF-8
compliant. And that means not only to copy the broken files to new
valid ones but also deleting the broken filenames from the CVS
repository!

For example there are 2 broken directory names in
fr/www/Documentation/Gallery starting with the letters "fl" and than
some non UTF-8 encoded char.

Kind regards,
Bernd Eilers
-- 

<http://www.oracle.com/>
Bernd Eilers | Software Engineer
Phone: +49 40 23 646 967

ORACLE Deutschland B.V. & Co. KG | Nagelsweg 55 | 20097 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Rijnzathe 6, 3454PV De Meern, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der
Ven

<http://www.oracle.com/commitment>

	

Oracle is committed to developing practices and products that help
protect the environment



    

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@native-lang.openoffice.org
For additional commands, e-mail: dev-h...@native-lang.openoffice.org



  


--


Bernd Eilers | Software Engineer
Phone: +49 40 23 646 967

ORACLE Deutschland B.V. & Co. KG | Nagelsweg 55 | 20097 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Rijnzathe 6, 3454PV De Meern, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the environment



Reply via email to