Re: Reading and writing Unicode files

Jarrett Billingsley Sat, 28 Feb 2009 08:40:16 -0800

On Sat, Feb 28, 2009 at 1:40 AM, jicman <cabre...@_wrc.xerox.com> wrote:
>
> Greetings.
>
> Sorry guys, please be patient with me.  I am having a hard time understanding 
> this Unicode, ANSI, UTF* ideas.  I know how to get an UTF8 File and turn it 
> into ANSI. and I know how to take a ANSI file and turn it into an UTF file.  
> But, now I have a Unicode file and I need to change the content and create a 
> new Unicode file with the changes in the content.  I have read all kind of 
> places, and I found mtext, from Chris Miller's site, by reading,
>
> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
>
> Anyway, what I need is to read an Unicode file, search the strings inside, 
> make changes to the file and write the changes back to an Unicode file.


You seem to be distinguishing between UTF and Unicode; it's kind of
apples to oranges.  Unicode is a standard for character encoding (a
mapping from numbers to characters, like ASCII).  UTF is a way - or
rather, _several_ ways - of encoding Unicode text.  There are three
major encodings, UTF-8, UTF-16, and UTF-32 (and the 16- and 32-bit
encodings have both little- and big-endian versions), which correspond
to D's char[], wchar[], and dchar[].

When you say a "Unicode" file do you mean it's encoded in UTF-16?  If
so, you can just read the file's contents as a wchar[].  If you're
using Phobos, keep in mind that it provides no functionality for
searching or manipulating wchar[]s, which means you'll have to convert
it to UTF-8 (char[]).  If you're using Tango, you can give
tango.io.UnicodeFile a shot - it will automatically transcode a file
from any Unicode encoding to any other, and if your file has a BOM, it
can even automatically detect which encoding it's in.

Re: Reading and writing Unicode files

Reply via email to