Re: 'fileencodings': Why use ucs-2le for cp936 file?

2007-06-06 Thread panshizhu
A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 10:30:54:
  1. will vim write BOM when writing to unicode files? or is there any
  options for that?

:setlocal bomb

 When opening a Unicode file, Vim will set or clear the buffer-local
'bomb'
 option according to the presence or absence of a BOM. That option is
 irrelevant for non-Unicode files. You can also set or clear it manually.
When
 creating a new Unicode file from scratch, a BOM will be set, or
not,depending
 on the corresponding global setting, so if you want your new Unicode
files to
 be created with a BOM, you may add

:setglobal bomb

 to your vimrc.

Thanks, it seems that BOM is not written by default and can be set only
globally for all unicode encoding.

But the problem is: my gcc 4.0 will complain about BOM for utf-8 source
file, while ucs-2le must have a BOM so that vim can recognize it without
adding the ucs-2le in fencs (ucs-2le should never be added into fencs
though, since most characters in cpxxx are valid and it is likely to mess
things up).

Is there anyway to do BOM setting only for particular unicode encoding like
the following?
when write utf-8 files, do not write the BOM.
when write ucs-2 files, write the BOM. (it should happen even if the file
opend as utf-8 then :set fenc=ucs-2le and then :write)

--
Sincerely, Pan, Shi Zhu. ext: 2606

Re: 'fileencodings': Why use ucs-2le for cp936 file?

2007-06-06 Thread A.J.Mechelynck

[EMAIL PROTECTED] wrote:

A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 10:30:54:

1. will vim write BOM when writing to unicode files? or is there any
options for that?

   :setlocal bomb

When opening a Unicode file, Vim will set or clear the buffer-local

'bomb'

option according to the presence or absence of a BOM. That option is
irrelevant for non-Unicode files. You can also set or clear it manually.

When

creating a new Unicode file from scratch, a BOM will be set, or

not,depending

on the corresponding global setting, so if you want your new Unicode

files to

be created with a BOM, you may add

   :setglobal bomb

to your vimrc.


Thanks, it seems that BOM is not written by default and can be set only
globally for all unicode encoding.

But the problem is: my gcc 4.0 will complain about BOM for utf-8 source
file, while ucs-2le must have a BOM so that vim can recognize it without
adding the ucs-2le in fencs (ucs-2le should never be added into fencs
though, since most characters in cpxxx are valid and it is likely to mess
things up).

Is there anyway to do BOM setting only for particular unicode encoding like
the following?
when write utf-8 files, do not write the BOM.
when write ucs-2 files, write the BOM. (it should happen even if the file
opend as utf-8 then :set fenc=ucs-2le and then :write)

--
Sincerely, Pan, Shi Zhu. ext: 2606




autocmd BufWritePre * if fenc ==? 'utf-8' || fenc ==? 'utf8' |
\ setlocal nobomb |
\ elseif fenc =~? '^u' | setlocal bomb | endif

... will clear 'bomb' just before writing UTF-8 files and set it just before 
writing other Unicode files (UCS-2, UCS-4 or UTF-16, UTF-32, of any 
endianness; I'm not speaking here of the latest PRC encoding, GB18030 I think 
it is called, which is, strictly speaking, also a Unicode encoding).



Best regards,
Tony.
--
A man wrapped up in himself makes a very small package.


'fileencodings': Why use ucs-2le for cp936 file?

2007-06-05 Thread panshizhu

Hello,

Recently I want to do some research about 'fileencodings', what I want is
to recognize utf-8, ucs-2le, euc-cn and cp936 encodings.

So I set the 'fencs' in my .vimrc:
set fencs=ucs-bom,utf-8,ucs-2le,euc-cn,cp936

However, cp936 files are always recognized as ucs-2le and I got everything
in a mess...
If I remove the ucs-2le:
set fencs=ucs-bom,utf-8,euc-cn,cp936

That would work, but ucs-2le files cannot get recognized at all.

It is said that unicode files all have BOM, and obviously cp936 files do
not have BOM, so I wonder why cp936 files get recognized as ucs-2le file
without any BOM.

I tried to change my 'encoding' setting, but it doesn't affect anything.

Any hints?
--
Sincerely, Pan, Shi Zhu. ext: 2606



Re: 'fileencodings': Why use ucs-2le for cp936 file?

2007-06-05 Thread A.J.Mechelynck

[EMAIL PROTECTED] wrote:

Hello,

Recently I want to do some research about 'fileencodings', what I want is
to recognize utf-8, ucs-2le, euc-cn and cp936 encodings.

So I set the 'fencs' in my .vimrc:
set fencs=ucs-bom,utf-8,ucs-2le,euc-cn,cp936

However, cp936 files are always recognized as ucs-2le and I got everything
in a mess...
If I remove the ucs-2le:
set fencs=ucs-bom,utf-8,euc-cn,cp936

That would work, but ucs-2le files cannot get recognized at all.

It is said that unicode files all have BOM, and obviously cp936 files do
not have BOM, so I wonder why cp936 files get recognized as ucs-2le file
without any BOM.


probably because the cp936 files you tested do not contain any sequence of 
bytes that would be illegal under UCS-2le.




I tried to change my 'encoding' setting, but it doesn't affect anything.

Any hints?
--
Sincerely, Pan, Shi Zhu. ext: 2606



Unicode files may or may not have a BOM, depending on who (or which program) 
created them and where they come from. If you remove ucs-2le from your 
'fileencodings', but leave ucs-bom at the start, any Unicode files having a 
BOM will still be recognised and the proper encoding set.



Best regards,
Tony.
--
Cahn's Axiom:
When all else fails, read the instructions.


Re: 'fileencodings': Why use ucs-2le for cp936 file?

2007-06-05 Thread panshizhu
A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 09:51:51:
 Unicode files may or may not have a BOM, depending on who (or which
program)
 created them and where they come from. If you remove ucs-2le from your
 'fileencodings', but leave ucs-bom at the start, any Unicode
fileshaving a
 BOM will still be recognised and the proper encoding set.


 Best regards,
 Tony.

It seems that ucs-2le files with BOM will get recongized now.
But I've got some other question:

1. will vim write BOM when writing to unicode files? or is there any
options for that?

2. what is the correct way of converting a file encoding inside vim?

I opened a file with cp936 encoding, then :set fenc=ucs-2le, then :w
newfile.txt, close the vim and open the newfile.txt with a new vim, then I
found everything in a mess. (gvim 7.1 winxp)

--
Sincerely, Pan, Shi Zhu. ext: 2606

Re: 'fileencodings': Why use ucs-2le for cp936 file?

2007-06-05 Thread A.J.Mechelynck
[EMAIL PROTECTED] wrote:
 A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 09:51:51:
 Unicode files may or may not have a BOM, depending on who (or which
 program)
 created them and where they come from. If you remove ucs-2le from your
 'fileencodings', but leave ucs-bom at the start, any Unicode
 fileshaving a
 BOM will still be recognised and the proper encoding set.


 Best regards,
 Tony.
 
 It seems that ucs-2le files with BOM will get recongized now.
 But I've got some other question:
 
 1. will vim write BOM when writing to unicode files? or is there any
 options for that?

:setlocal bomb

When opening a Unicode file, Vim will set or clear the buffer-local 'bomb'
option according to the presence or absence of a BOM. That option is
irrelevant for non-Unicode files. You can also set or clear it manually. When
creating a new Unicode file from scratch, a BOM will be set, or not, depending
on the corresponding global setting, so if you want your new Unicode files to
be created with a BOM, you may add

:setglobal bomb

to your vimrc.

 
 2. what is the correct way of converting a file encoding inside vim?
 
 I opened a file with cp936 encoding, then :set fenc=ucs-2le, then :w
 newfile.txt, close the vim and open the newfile.txt with a new vim, then I
 found everything in a mess. (gvim 7.1 winxp)
 
 --
 Sincerely, Pan, Shi Zhu. ext: 2606
 
 

It should have worked; but if the file had no BOM, maybe its encoding was
detected wrongly: so if it was in UCS-2le but Vim thought it was in GB2312 or
in cp936... a mess would be the result.

Try opening a file in cp936 then doing

:setlocal fenc=ucs-2le bomb
:w

Your other Vim ought to display it correctly then.

See also :help ++opt for another way to set the 'fileencoding' for one file
only.


Best regards,
Tony.
-- 
Life would be so much easier if we could just look at the source code.


Re: 'fileencodings': Why use ucs-2le for cp936 file?

2007-06-05 Thread Edward L. Fox

On 6/6/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


Hello,

Recently I want to do some research about 'fileencodings', what I want is
to recognize utf-8, ucs-2le, euc-cn and cp936 encodings.

So I set the 'fencs' in my .vimrc:
set fencs=ucs-bom,utf-8,ucs-2le,euc-cn,cp936

However, cp936 files are always recognized as ucs-2le and I got everything
in a mess...
If I remove the ucs-2le:
set fencs=ucs-bom,utf-8,euc-cn,cp936

That would work, but ucs-2le files cannot get recognized at all.

It is said that unicode files all have BOM, and obviously cp936 files do
not have BOM, so I wonder why cp936 files get recognized as ucs-2le file
without any BOM.


It's not recommended using UCS-2 without BOM. It's not an easy thing
to detect its file encoding automatically. Maybe you need a fenc
detecting plugin, such as FencView. Although the current version of
FencView cannot handle your problem, I think it will be able to do
this after some modifications. Please contact Ming Bai
[EMAIL PROTECTED] and tell him your problem.


I tried to change my 'encoding' setting, but it doesn't affect anything.

Any hints?
--
Sincerely, Pan, Shi Zhu. ext: 2606




Regards,

Edward L. Fox