ISO-8859-1 to UTF-8 conversion

Karsten Weiss Sun, 20 Oct 2002 07:34:59 -0400 (EDT)

Hi!

I would like to know how you are currently handling the
conversion of you systems to UTF-8. Please share your
experience!


* Is there a program similar which can determine the
  character set of a given text file? I know there is iconv
  to convert character sets of text files. But I still
  don't know a program which tells me if a given file is
  encoded in ISO-8859-1, ISO-8869-15, a Windows code page,
  etc.

* Which program are you using to convert you're ISO-8859-1
  file systems (directory- and filenames - not the file
  contents!) to UTF-8? Even better would be a program which
  can convert from any encoding to UTF-8 (using a heuristic)
  because my source filenames unfortunately have mixed
  encodings: Most are ISO-8859-1 but some use various Windows
  code pages).

  I've written a small perl hack to do this but I'm not
  very happy with it as it is because it doesn't have the
  mentioned heuristic.
  
  The gtk file selector has *lots* of trouble with some of my
  ISO-8859-1 filenames and becomes unusable. My current
  workaround is to start the respective program in a LANG=C
  environment).

* I'm not sure how I am supposed to handle all my text files.
  All of them are using ISO-8859-1 right now. But now that
  I'm using Red Hat 8 I'm never sure if the text editor
  saves them in ISO-8859-1 or UTF-8. Did you convert all
  your files to UTF-8 or are you using both character
  encodings?

* What about other non-UTF-8-aware machines accessing my
  files? File "formats" without a text encoding tag are
  becoming really problematic now, aren't they?

* Some days ago Chris Kloiber mentioned the
        unicode_stop ; setfont lat0-sun16
  as a way to turn off unicode support in the console. I
  still have problems with all umlaut keys because they
  still generate two-byte codes (even if LANG is set to
  de_DE.ISO-8859-1). Does it work for you?

* How to convert ID3 tags?

bye,
Karsten

PS: The non-working umlauts in pine with a WONTFIX bug status
  is a major problem for me.

-- 
Dipl.-Inf. Karsten Weiss - http://www.machineroom.de/knweiss

ISO-8859-1 to UTF-8 conversion

Reply via email to