Re: non-ASCII characters in VIM

James G. Sack (jim) Fri, 02 Jan 2009 12:15:44 -0800

Carl Lowenstein wrote:
>..
> 1) vim -b and "set display=uhex" do not seem to be implemented in the
> version of VIM I am using.  It came with CentOS 5.2.
> [...@delta ~]$ vim -h
> VIM - Vi IMproved 7.0 (2006 May 7, compiled Nov 25 2008 11:43:45)


I have vim 7.1 on my f7, with functional uhex

> 
> 2) piping the VIM buffer through xxd(1) as found in a Google search
> does indeed give a hex display of the whole text, and it is supposed
> to be reversible.  But it isn't very easy to read, let alone edit.

I use xxd a lot, and always give it the -g1 option to make a dos-dump
like display .. 16 hex-pairs plus 16 chars (or dots). That's the format
bvi uses, also (which I was calling "conventional hex editor" format).

> 
> 3) apropos hex | grep edit comes up with "ghex" which is a GUI hex
> editor.  It seems to be usable, at least for small changes.
> Characters are displayed as 2-digit hex in one pane of the window and
> ASCII on the same line in another pane.  Non-ASCII characters are
> displayed as dots.  Moving the cursor in one half of the display moves
> a ghost cursor to the corresponding place in the other half. You can
> delete the dots and replace them with other characters, and the empty
> space is closed up.  By the way, to invoke ghex from the command line,
> its name is "/usr/bin/ghex2".
> 
> The particular bunch of text I was working with has a 3-byte
> representation for an apostrophe, E2 80 99. determined by using "od -t
> x1".  VIM displays this only as an apostrophe-like symbol, in one
> character cell.  Moving the cursor to that place and entering "ga"
> gives the result "hex 2019".  Which is not right at all.

Well, it is unicode code point 2019 ("RIGHT SINGLE QUOTATION MARK").
Check the "general punctuation" block in gucharmap (the "character map"
application in the gui menu).

  UTF-8: 0xE2 0x80 0x99
  UTF-16: 0x2019

  C octal escaped UTF-8: \342\200\231
  XML decimal entity: &#8217;

> 
> I have not found a hex dump routine that produces as user-friendly a
> display as "od -cb" does for octal.  That is, parallel lines of
> character and numeric representations of each byte, with the same
> horizontal spacing so it is obvious what belongs together.

is there a way to get hex-pairs instead of octal?

> 
> Here is a uuencoded representation of a small text file, which is
> supposed to say "Luigi's Pizza".
> - - - - - - -
> begin 664 luigi.txt
> 03'5I9VGB@)ES(%!I>GIA"@``
> `
> end
> - - - - - - -

Yeah, that's the unicode char in there, all right.
I attach a python program to list all the (non-ascii) unicode, which you
may find useful. The b=# value is the (zero-based)byte offset.

It probably isn't very useful unless the input is actually UTF-8
unicode. ;-)

> 
> Side note.  uuencode/uudecode seem to have disappeared from modern
> Linux systems.  Their replacement is called uuenview/uudeview, and is
> almost but not exactly compatible.  Notice the lack of the second
> space after "begin".  I don't know what this might do with a real
> uudecode.

I have uuencode/uudecode from a sharutils rpm package. It decoded
luigi.txt fine.

Regards,
..jim

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-newbie

Re: non-ASCII characters in VIM

Reply via email to