Re: gvim and Unicode

2007-02-08 Thread A.J.Mechelynck

Guido Milanese wrote:
I have an additional question concerning this topic, that has been discussed 
several times.


I am happily using (g)vim with files containing several languages, basically 
as editor for LaTeX, and it's all right with Unicode. I am working in Linux 
Mandriva 2007, with (g)vim 7.0.30.
I still have a minor problem. My default encoding is Unicode-utf8: the .vimrc 
file says "set encoding=utf8". The automatic conversion from latin1 to utf8 
is perfect. However, sometimes, for reasons of compatibility with other 
programs that still cannot read Unicode files  (e.g. LyX) I must leave 
the "latin1" encoding of some files unchanged. What I do is:


1. open gvim
2: set encoding=latin1
3: e: filename.txt
4. work and save the file, that this way remains in its original encoding.

OR

:e ++enc=latin1 filename.txt

Question:
Is there any difference among the two systems?
Is the original encoding affected in some way by one of the two approaches?

Thanks!
gm


Guido Milanese
http://www.arsantiqua.org



The difference is as follows:

With your method #1, gvim represents all the data of all files internally 
using the Latin1 encoding. There is no conversion, but if you have files in 
other windows, or in hidden buffers, which are not in Latin1, it's anyone's 
guess what might happen to them.


With your method #2, gvim keeps UTF-8 for the internal representation of data. 
Conversion happens on reading and writing; that conversion is lossless and 
doesn't require an external utility such as iconv, because Vim "knows" that 
the codepoints U+ to U+00FF of Unicode correspond one-to-one and 
respectively to the characters 0x00 to 0xFF of Latin1. There may be a slight 
"swelling" of the data in memory, depending on the proportion of accented and 
other upper-ascii characters in your file, since codepoints U+0080 to U+00FF 
occupy two bytes each in memory (while U+ to U+007F are one byte each). 
Normally you can afford that swelling. All in all, I would regard this method 
as "safer" because any other files (in other windows or in hidden buffers) 
won't suffer: this method sets the 'fileencoding' of this particular file to 
Latin1 (as with ":setlocal fenc=latin1"), but other buffers (if any) are not 
affected.



Best regards,
Tony.
--
Hummingbirds never remember the words to songs.


Re: gvim and Unicode

2007-02-08 Thread Guido Milanese
I have an additional question concerning this topic, that has been discussed 
several times.

I am happily using (g)vim with files containing several languages, basically 
as editor for LaTeX, and it's all right with Unicode. I am working in Linux 
Mandriva 2007, with (g)vim 7.0.30.
I still have a minor problem. My default encoding is Unicode-utf8: the .vimrc 
file says "set encoding=utf8". The automatic conversion from latin1 to utf8 
is perfect. However, sometimes, for reasons of compatibility with other 
programs that still cannot read Unicode files  (e.g. LyX) I must leave 
the "latin1" encoding of some files unchanged. What I do is:

1. open gvim
2: set encoding=latin1
3: e: filename.txt
4. work and save the file, that this way remains in its original encoding.

OR

:e ++enc=latin1 filename.txt

Question:
Is there any difference among the two systems?
Is the original encoding affected in some way by one of the two approaches?

Thanks!
gm


Guido Milanese
http://www.arsantiqua.org


Re: gvim and Unicode

2007-01-26 Thread Bill McCarthy
On Fri 26-Jan-07 4:17pm -0600, Jon Noring wrote:

> In addition, in the documentation and menus, I see nothing
> mentioned about Unicode, UTF-8 encoding, etc.

Hmm, if I simply type (in 7.0.188):

:helpg \

Re: gvim and Unicode

2007-01-26 Thread A.J.Mechelynck

Jon Noring wrote:

I've been a long-time user of vi editors on Windows (lemmy and an older
version of vim) and now am looking for a vi editor for Windows that supports
the Unicode encodings (such as UTF-8, UTF-16, etc.)

So I installed the latest gvim, version 7, but am disappointed that on my
system at least (Windows XP), it doesn't recognize UTF-8 documents, so
characters outside of the ASCII range are not being rendered properly (it
appears gvim assumes the documents are ISO-8859 encoded.) In addition, in
the documentation and menus, I see nothing mentioned about Unicode, UTF-8
encoding, etc.

So what's going on? I was under the impression that in gvim I'd have a UTF-8
capable editor.

Thanks!

Jon Noring 


gvim does support Unicode, but it may be easier or harder depending on your OS 
and its settings. The easiest is of course if you start gvim in a Unicode 
locale, or, on Unix, if you run a version compiled for the GTK2 toolkit (which 
uses Unicode by default). Here is a code snippet which you can paste into your 
vimrc to enable support for Unicode in all versions which have Unicode support 
compiled-in.


if has("multi_byte")" if not, we need to recompile
  if &enc !~? '^u'  " if the locale 'encoding' starts with u or U
" then Unicode is already set
if &tenc == ''
  let &tenc = &enc  " save the keyboard charset
endif
set enc=utf-8   " to support Unicode fully, we need to be able
" to represent all Unicode codepoints in memory
  endif
  set fencs=ucs-bom,utf-8,latin1
  setg bomb " default for new Unicode files
  setg fenc=latin1  " default for files created from scratch
else
  echomsg 'Warning: Multibyte support is not compiled-in.'
endif

You must also set a 'guifont' which includes the glyphs you will need, but 
most fonts don't cover the whole range of "assigned" Unicode codepoints from 
U+ (well, U+0020 since 0-1F are not "printable") to U+10 (well, 
U+10FFFD since anything ending in FFFE or  is invalid). If you are like 
me, you will have to set different fonts at different times depending on what 
languages you're editing at any particular moment. Courier New has (in my 
experience) a wide coverage for "alphabetic" languages (Latin, Greek, 
Cyrillic, Hebrew, Arabic); for Far Eastern scripts you will need some other 
font such as FZ FangSong or MingLiU.


With the above settings, Unicode files will be recognised when possible:
- Any file starting with a BOM will be properly recognised as the appropriate 
Unicode encoding (out of, IIUC, UTF-8, UTF-16be, UTF-16le, UTF-32be and UTF-32le).
- Files with no BOM will still be recognised as UTF-8 if they include nothing 
that is invalid in UTF-8.

- Fallback is to Latin1.
- The above means that 7-bit US-ASCII will be diagnosed as UTF-8; this is not 
a problem as long as you don't add to them any characters with the high bit 
set, since the codepoints U+ to U+007F have both the same meaning and the 
same representation in ASCII and UTF-8. The first time you add a character 
above 0x7F to such a file, you will have to save it with, for instance,


:setlocal fenc=latin1
:w

if you want it to be encoded in Latin1. From then on, the file (containing one 
or more bytes with high bit set in combinations invalid in UTF-8) will be 
recognised as Latin1 by the 'fileencodings' heuristics set above.
- It also means that for non-UTF-8 Unicode files with no BOM, or in general 
for anything not autodetected (such as 8-bit files other than Latin1), you 
will have to specify the encoding yourself (e.g. ":e ++enc=utf-16le 
filename.txt").


Also with the above settings, new files will be created in Latin1. To create a 
new file in UTF-8, use for instance


:enew
:setlocal fenc=utf-8


See
:help Unicode
:help 'encoding'
:help 'termencoding'
:help 'fileencodings'
:help 'fileencoding'
:help 'bomb'
:help ++opt


HTH,
Tony.


Re: gvim and Unicode

2007-01-26 Thread Mikolaj Machowski
On sobota 27 styczeń 2007, vim@vim.org wrote:
> (it appears gvim assumes the documents are ISO-8859 encoded.) In
> addition, in the documentation and menus, I see nothing mentioned about
> Unicode, UTF-8 encoding, etc.

:help utf-8
:help unicode

m.



gvim and Unicode

2007-01-26 Thread Jon Noring

I've been a long-time user of vi editors on Windows (lemmy and an older
version of vim) and now am looking for a vi editor for Windows that supports
the Unicode encodings (such as UTF-8, UTF-16, etc.)

So I installed the latest gvim, version 7, but am disappointed that on my
system at least (Windows XP), it doesn't recognize UTF-8 documents, so
characters outside of the ASCII range are not being rendered properly (it
appears gvim assumes the documents are ISO-8859 encoded.) In addition, in
the documentation and menus, I see nothing mentioned about Unicode, UTF-8
encoding, etc.

So what's going on? I was under the impression that in gvim I'd have a UTF-8
capable editor.

Thanks!

Jon Noring 
-- 
View this message in context: 
http://www.nabble.com/gvim-and-Unicode-tf3125527.html#a8659825
Sent from the Vim - General mailing list archive at Nabble.com.