James G. Sack (jim) wrote:
> Carl Lowenstein wrote:
>> ..
>> The particular bunch of text I was working with has a 3-byte
>> representation for an apostrophe, E2 80 99. determined by using "od -t
>> x1".  VIM displays this only as an apostrophe-like symbol, in one
>> character cell.  Moving the cursor to that place and entering "ga"
>> gives the result "hex 2019".  Which is not right at all.
> 
> Well, it is unicode code point 2019 ("RIGHT SINGLE QUOTATION MARK").
> Check the "general punctuation" block in gucharmap (the "character map"
> application in the gui menu).
> 
>   UTF-8: 0xE2 0x80 0x99
>   UTF-16: 0x2019
> 
>   C octal escaped UTF-8: \342\200\231
>   XML decimal entity: ’

Other thoughts:

- What is your locale?
mine includes
  LC_CTYPE="en_US.UTF-8"
as a consequence of the settings in /etc/sysconfig/i18n

- Does your vim support file encodings
  vim --version | grep multi_byte
I get
  .. +mouse_xterm +multi_byte +multi_lang ..

Viewing luigi.txt in my vim shows a unicode-apostrophe.

- How about developing a gawk script (sed doesn't have a way to specify
arbitrary binary bytes, but gawk does) to replace those unicode
sequences with ascii apostrophe and ascii quoting-chars? My python
script may come in handy in the process.

Regards,
..jim

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-newbie

Reply via email to