Re: 'fileencodings': Why use ucs-2le for cp936 file?
A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 10:30:54: 1. will vim write BOM when writing to unicode files? or is there any options for that? :setlocal bomb When opening a Unicode file, Vim will set or clear the buffer-local 'bomb' option according to the presence or absence of a BOM. That option is irrelevant for non-Unicode files. You can also set or clear it manually. When creating a new Unicode file from scratch, a BOM will be set, or not,depending on the corresponding global setting, so if you want your new Unicode files to be created with a BOM, you may add :setglobal bomb to your vimrc. Thanks, it seems that BOM is not written by default and can be set only globally for all unicode encoding. But the problem is: my gcc 4.0 will complain about BOM for utf-8 source file, while ucs-2le must have a BOM so that vim can recognize it without adding the ucs-2le in fencs (ucs-2le should never be added into fencs though, since most characters in cpxxx are valid and it is likely to mess things up). Is there anyway to do BOM setting only for particular unicode encoding like the following? when write utf-8 files, do not write the BOM. when write ucs-2 files, write the BOM. (it should happen even if the file opend as utf-8 then :set fenc=ucs-2le and then :write) -- Sincerely, Pan, Shi Zhu. ext: 2606
Re: 'fileencodings': Why use ucs-2le for cp936 file?
[EMAIL PROTECTED] wrote: A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 10:30:54: 1. will vim write BOM when writing to unicode files? or is there any options for that? :setlocal bomb When opening a Unicode file, Vim will set or clear the buffer-local 'bomb' option according to the presence or absence of a BOM. That option is irrelevant for non-Unicode files. You can also set or clear it manually. When creating a new Unicode file from scratch, a BOM will be set, or not,depending on the corresponding global setting, so if you want your new Unicode files to be created with a BOM, you may add :setglobal bomb to your vimrc. Thanks, it seems that BOM is not written by default and can be set only globally for all unicode encoding. But the problem is: my gcc 4.0 will complain about BOM for utf-8 source file, while ucs-2le must have a BOM so that vim can recognize it without adding the ucs-2le in fencs (ucs-2le should never be added into fencs though, since most characters in cpxxx are valid and it is likely to mess things up). Is there anyway to do BOM setting only for particular unicode encoding like the following? when write utf-8 files, do not write the BOM. when write ucs-2 files, write the BOM. (it should happen even if the file opend as utf-8 then :set fenc=ucs-2le and then :write) -- Sincerely, Pan, Shi Zhu. ext: 2606 autocmd BufWritePre * if fenc ==? 'utf-8' || fenc ==? 'utf8' | \ setlocal nobomb | \ elseif fenc =~? '^u' | setlocal bomb | endif ... will clear 'bomb' just before writing UTF-8 files and set it just before writing other Unicode files (UCS-2, UCS-4 or UTF-16, UTF-32, of any endianness; I'm not speaking here of the latest PRC encoding, GB18030 I think it is called, which is, strictly speaking, also a Unicode encoding). Best regards, Tony. -- A man wrapped up in himself makes a very small package.
'fileencodings': Why use ucs-2le for cp936 file?
Hello, Recently I want to do some research about 'fileencodings', what I want is to recognize utf-8, ucs-2le, euc-cn and cp936 encodings. So I set the 'fencs' in my .vimrc: set fencs=ucs-bom,utf-8,ucs-2le,euc-cn,cp936 However, cp936 files are always recognized as ucs-2le and I got everything in a mess... If I remove the ucs-2le: set fencs=ucs-bom,utf-8,euc-cn,cp936 That would work, but ucs-2le files cannot get recognized at all. It is said that unicode files all have BOM, and obviously cp936 files do not have BOM, so I wonder why cp936 files get recognized as ucs-2le file without any BOM. I tried to change my 'encoding' setting, but it doesn't affect anything. Any hints? -- Sincerely, Pan, Shi Zhu. ext: 2606
Re: 'fileencodings': Why use ucs-2le for cp936 file?
[EMAIL PROTECTED] wrote: Hello, Recently I want to do some research about 'fileencodings', what I want is to recognize utf-8, ucs-2le, euc-cn and cp936 encodings. So I set the 'fencs' in my .vimrc: set fencs=ucs-bom,utf-8,ucs-2le,euc-cn,cp936 However, cp936 files are always recognized as ucs-2le and I got everything in a mess... If I remove the ucs-2le: set fencs=ucs-bom,utf-8,euc-cn,cp936 That would work, but ucs-2le files cannot get recognized at all. It is said that unicode files all have BOM, and obviously cp936 files do not have BOM, so I wonder why cp936 files get recognized as ucs-2le file without any BOM. probably because the cp936 files you tested do not contain any sequence of bytes that would be illegal under UCS-2le. I tried to change my 'encoding' setting, but it doesn't affect anything. Any hints? -- Sincerely, Pan, Shi Zhu. ext: 2606 Unicode files may or may not have a BOM, depending on who (or which program) created them and where they come from. If you remove ucs-2le from your 'fileencodings', but leave ucs-bom at the start, any Unicode files having a BOM will still be recognised and the proper encoding set. Best regards, Tony. -- Cahn's Axiom: When all else fails, read the instructions.
Re: 'fileencodings': Why use ucs-2le for cp936 file?
A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 09:51:51: Unicode files may or may not have a BOM, depending on who (or which program) created them and where they come from. If you remove ucs-2le from your 'fileencodings', but leave ucs-bom at the start, any Unicode fileshaving a BOM will still be recognised and the proper encoding set. Best regards, Tony. It seems that ucs-2le files with BOM will get recongized now. But I've got some other question: 1. will vim write BOM when writing to unicode files? or is there any options for that? 2. what is the correct way of converting a file encoding inside vim? I opened a file with cp936 encoding, then :set fenc=ucs-2le, then :w newfile.txt, close the vim and open the newfile.txt with a new vim, then I found everything in a mess. (gvim 7.1 winxp) -- Sincerely, Pan, Shi Zhu. ext: 2606
Re: 'fileencodings': Why use ucs-2le for cp936 file?
[EMAIL PROTECTED] wrote: A.J.Mechelynck [EMAIL PROTECTED] 写于 2007-06-06 09:51:51: Unicode files may or may not have a BOM, depending on who (or which program) created them and where they come from. If you remove ucs-2le from your 'fileencodings', but leave ucs-bom at the start, any Unicode fileshaving a BOM will still be recognised and the proper encoding set. Best regards, Tony. It seems that ucs-2le files with BOM will get recongized now. But I've got some other question: 1. will vim write BOM when writing to unicode files? or is there any options for that? :setlocal bomb When opening a Unicode file, Vim will set or clear the buffer-local 'bomb' option according to the presence or absence of a BOM. That option is irrelevant for non-Unicode files. You can also set or clear it manually. When creating a new Unicode file from scratch, a BOM will be set, or not, depending on the corresponding global setting, so if you want your new Unicode files to be created with a BOM, you may add :setglobal bomb to your vimrc. 2. what is the correct way of converting a file encoding inside vim? I opened a file with cp936 encoding, then :set fenc=ucs-2le, then :w newfile.txt, close the vim and open the newfile.txt with a new vim, then I found everything in a mess. (gvim 7.1 winxp) -- Sincerely, Pan, Shi Zhu. ext: 2606 It should have worked; but if the file had no BOM, maybe its encoding was detected wrongly: so if it was in UCS-2le but Vim thought it was in GB2312 or in cp936... a mess would be the result. Try opening a file in cp936 then doing :setlocal fenc=ucs-2le bomb :w Your other Vim ought to display it correctly then. See also :help ++opt for another way to set the 'fileencoding' for one file only. Best regards, Tony. -- Life would be so much easier if we could just look at the source code.
Re: 'fileencodings': Why use ucs-2le for cp936 file?
On 6/6/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello, Recently I want to do some research about 'fileencodings', what I want is to recognize utf-8, ucs-2le, euc-cn and cp936 encodings. So I set the 'fencs' in my .vimrc: set fencs=ucs-bom,utf-8,ucs-2le,euc-cn,cp936 However, cp936 files are always recognized as ucs-2le and I got everything in a mess... If I remove the ucs-2le: set fencs=ucs-bom,utf-8,euc-cn,cp936 That would work, but ucs-2le files cannot get recognized at all. It is said that unicode files all have BOM, and obviously cp936 files do not have BOM, so I wonder why cp936 files get recognized as ucs-2le file without any BOM. It's not recommended using UCS-2 without BOM. It's not an easy thing to detect its file encoding automatically. Maybe you need a fenc detecting plugin, such as FencView. Although the current version of FencView cannot handle your problem, I think it will be able to do this after some modifications. Please contact Ming Bai [EMAIL PROTECTED] and tell him your problem. I tried to change my 'encoding' setting, but it doesn't affect anything. Any hints? -- Sincerely, Pan, Shi Zhu. ext: 2606 Regards, Edward L. Fox