Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Gabriele F

On 10/12/2020 14.04, A. Wik wrote:

I just like to keep things "8-bit clean". As long as all tools used
to process the files are also 8-bit clean, nothing gets corrupted.
Alas, it does mean files are sometimes displayed incorrectly.  But in
my experience, it gets messy when I introduce UTF-8.


Ok, my experience instead is that a lot of tools do mess up the 
encodings and its hard to promptly recognize those mess-ups when not 
using a UTF encoding. I guess it comes up to one's usual tools, needs 
and habits.




There is something to it.  People who use only ASCII seem to like
UTF-8 better than those who frequently use non-English characters.
I've seen claims that UTF-8 is "compact" but compared to strictly
8-bit character sets like Latin-1 it is not.


Maybe that was in the first years of UTF-8, now several tests showed 
that UTF-8 is fairly efficient even for asian languages, so I think it's 
generally well accepted and the controversy is just on the BOM.
Anyway I don't think anyone who needs non-english characters has ever 
favoured any old non-unicode encoding, Unicode is a bliss precisely for 
them.


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/dee925c6-31e9-4d3d-4a9d-83a1f4f20070%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Gabriele F

On 10/12/2020 5.58, Tony Mechelynck wrote:

The problem with ":setg fenc=utf8 bomb" is that *every* new text file
will start with 0xEF 0xBB 0xBF unless you explicitly turn it off for
that file by means of ":setl nobomb" or ":setl fenc=latin1" or similar
before writing it.


That's the point, indeed




  For C sources this wil confuse the compiler
(generating an error and preventing successful compilation) and for
anything starting with a shebang (shell scripts, perl sources, etc.)
it will prevent the #! shebang leader from being recognized. OTOH for


It's true, it depends on what you most do in the editor, if you need to 
frequently create files that cannot have a BOM in them, it's most likely 
inconvenient. Maybe use more than one editor, or aliases with different 
configurations...?


I indeed personally use text editors mostly for normal textual or web 
files, use mostly IDEs for programming, rarely edit shell scripts, and 
it actually may well be that I usually left bomb disabled when using 
unices...


Anyway, for textual files or filetypes that do support the BOM, I 
believe it's more beneficial to include it, and that it should not be 
discouraged.


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/dcc46e3c-f5d5-3c80-e6d7-10a8e13be7aa%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Gabriele F

On 09/12/2020 21.19, Gabriele F wrote:
Completely off-topic, if you don't have particular needs I'd advise 
you to use UTF- 8 with BOMs for all your new files ('set bomb', 'set 
encoding=utf-8' and 'fenc' left to the default in your vimrc), it will 
prevent any future encoding problem for at least them.


I've been doing so for more than a decade and pretty much never had 
problems, and sigh a relief every time I see I'm working with one of 
them.


I should have specified that in that time I used mostly other text 
editors, and on Windows, I've been using Vim only for a few years and I 
still use more frequently other editors.
Although I do have a "set bomb" in my vimrc, I have less experience with 
it in Vim, and still am on Windows most of the time.


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/4ffef886-03cf-6eb1-4ace-66f86afe6774%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Boyko Bantchev
On Thu, 10 Dec 2020 at 15:04, A. Wik  wrote:
>
> On Wed, 9 Dec 2020 at 20:20, Gabriele F  wrote:
> ..
> > I imagine most of the critics are from countries that never needed more
> > than ASCII
>
> There is something to it.  People who use only ASCII seem to like
> UTF-8 better than those who frequently use non-English characters.
> I've seen claims that UTF-8 is "compact" but compared to strictly
> 8-bit character sets like Latin-1 it is not.

To people who use only ASCII the distinction between ASCII and
UTF-8 is totally irrelevant, because in their case UTF-8 is precisely ASCII
by definition.

But people like me, who regularly use scripts other than Latin, and who
also like to indulge themselves with mathematical and other ‘special’
characters in plain text – they are those who really appreciate and
praise the advent of Unicode and UTF-8.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CALdOZq%3DTObU-DcO1Jvt9P6yxGKraq8c8mVO6d565rt8ZGd0Wfw%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Gabriele F

On 09/12/2020 20.35, Gabriele F wrote:

That :%!cat is indeed a neat (if hacky) idea!


It should be noted that it works only as long as the 'shelltemp' option 
is on though, which is the default.


'shelltemp' makes Vim use a temporary file for the filtering instead of 
a pipe, which is evidently the (probably accidental) cause of the 
effects on the encoding.


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/3cd5d0bd-4c71-09cf-3368-81e4167f79de%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Gabriele F
I should add that those tests were all made with 'encoding' set in my 
vimrc to utf-8, I haven't tried with the default latin1 or other values. 
I don't know if this influenced something.


That's the setting that A. Wik said to have as well, anyway.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/7074f3f9-22e0-273c-41fc-34f9dc428704%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread Tony Mechelynck
On Thu, Dec 10, 2020 at 2:04 PM A. Wik  wrote:
>
> On Wed, 9 Dec 2020 at 20:20, Gabriele F  wrote:
> >
> > On 09/12/2020 18.47, A. Wik wrote:
> > > I don't include utf8 in my default fencs setting because that has the
> > > side effect of using utf8 for any newly created files.
> >
> > Completely off-topic, if you don't have particular needs ...
>
> I just like to keep things "8-bit clean".  As long as all tools used
> to process the files are also 8-bit clean, nothing gets corrupted.
> Alas, it does mean files are sometimes displayed incorrectly.  But in
> my experience, it gets messy when I introduce UTF-8.
>
> > I imagine most of the critics are from countries that never needed more
> > than ASCII
>
> There is something to it.  People who use only ASCII seem to like
> UTF-8 better than those who frequently use non-English characters.
> I've seen claims that UTF-8 is "compact" but compared to strictly
> 8-bit character sets like Latin-1 it is not.
>
> -aw

- For pure 7-bit ASCII, all three of us-ascii, Latin1 and UTF-8 are
equivalent, they represent the data identically.
- For "Western Latin" (French, Spanish, etc.) Latin1 is slightly more
economical than UTF-8. How much more depends on the percent abundance
of accented letters not found in ASCII.
- When mixing several scripts (at least two of Latin, Greek, Cyrillic,
Hebrew, Arabic, CJK ideographic, etc.) within a single document, I
know no better encoding than UTF-8. In an 8-bit charset like Latin1
you have only (at most) 256 different valid character values, and that
is much too few as soon as you start mixing scripts: be it for a
juxtalinear edition of the Bible (with the original Hebrew, Aramaic or
Greek text next to a translation and/or commentary) or for a
Greek-Russian or Russian-Finnish dictionary. And of course even for a
single CJK script, no 8-bit script can do the job.

Best regards,
Tony.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CAJkCKXt2bHw0RfJ6yfOBX%3D7%3DErBV0nPtUK--V0tP%2B6Og%3DONTHg%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-10 Thread A. Wik
On Wed, 9 Dec 2020 at 20:20, Gabriele F  wrote:
>
> On 09/12/2020 18.47, A. Wik wrote:
> > I don't include utf8 in my default fencs setting because that has the
> > side effect of using utf8 for any newly created files.
>
> Completely off-topic, if you don't have particular needs ...

I just like to keep things "8-bit clean".  As long as all tools used
to process the files are also 8-bit clean, nothing gets corrupted.
Alas, it does mean files are sometimes displayed incorrectly.  But in
my experience, it gets messy when I introduce UTF-8.

> I imagine most of the critics are from countries that never needed more
> than ASCII

There is something to it.  People who use only ASCII seem to like
UTF-8 better than those who frequently use non-English characters.
I've seen claims that UTF-8 is "compact" but compared to strictly
8-bit character sets like Latin-1 it is not.

-aw

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CALPW7mTLvRvds6gHuL1%3Du2BYcqaL1HgL_aPFsLY05vryPZNotg%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-09 Thread Tony Mechelynck
On Wed, Dec 9, 2020 at 9:20 PM Gabriele F  wrote:
>
> On 09/12/2020 18.47, A. Wik wrote:
> > I don't include utf8 in my default fencs setting because that has the
> > side effect of using utf8 for any newly created files.
>
> Completely off-topic, if you don't have particular needs I'd advise you
> to use UTF- 8 with BOMs for all your new files ('set bomb', 'set
> encoding=utf-8' and 'fenc' left to the default in your vimrc), it will
> prevent any future encoding problem for at least them.
>
> I've been doing so for more than a decade and pretty much never had
> problems, and sigh a relief every time I see I'm working with one of them.
>
> I heard many protest the BOMs in UTF-8, but they are the first thing
> ever to allow a reliable encoding detection and they solve a lot more
> problems than they can cause (if they cause problems they usually do so
> immediately and noticeably, much better than discovering years later
> that you irremediably botched the encoding of some file). So I find it
> absurd to disparage them, and delusive to think that we'll ever get to a
> point when non-utf8 files will be rare enough that we won't need to
> handle them.
> I imagine most of the critics are from countries that never needed more
> than ASCII

IIUC the critics are from people who do a lot of programming, either
in C (where sources are supposed to be in Latin1; they may be in UTF-8
if characters above U+007F are used only in alphanumeric literals, but
they cannot start with a BOM) or in Perl, Python, Unix shell script
language, etc. (where the first two bytes of a source file must be #!
in that order):

The problem with ":setg fenc=utf8 bomb" is that *every* new text file
will start with 0xEF 0xBB 0xBF unless you explicitly turn it off for
that file by means of ":setl nobomb" or ":setl fenc=latin1" or similar
before writing it. For C sources this wil confuse the compiler
(generating an error and preventing successful compilation) and for
anything starting with a shebang (shell scripts, perl sources, etc.)
it will prevent the #! shebang leader from being recognized. OTOH for
"well-behaved" filetypes like Vim scripts (if not run by means of a
shebang), HTML pages, CSS style sheets, etc., there is no problem. So
whether or not to set it should depend on what types of files you
write most often. I use it because most of the files I write are HTML
or CSS, followed by Vim scripts; but then when I write a shell script
I have to remember to turn the 'bomb' setting off for that file.

Best regards,
Tony.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CAJkCKXtbAtoj%2BU0EfF-oudbmoMng5nt2AbZZUi%2B7N6HayrwqmA%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-09 Thread Gabriele F

On 09/12/2020 18.47, A. Wik wrote:

I don't include utf8 in my default fencs setting because that has the
side effect of using utf8 for any newly created files.


Completely off-topic, if you don't have particular needs I'd advise you 
to use UTF- 8 with BOMs for all your new files ('set bomb', 'set 
encoding=utf-8' and 'fenc' left to the default in your vimrc), it will 
prevent any future encoding problem for at least them.


I've been doing so for more than a decade and pretty much never had 
problems, and sigh a relief every time I see I'm working with one of them.


I heard many protest the BOMs in UTF-8, but they are the first thing 
ever to allow a reliable encoding detection and they solve a lot more 
problems than they can cause (if they cause problems they usually do so 
immediately and noticeably, much better than discovering years later 
that you irremediably botched the encoding of some file). So I find it 
absurd to disparage them, and delusive to think that we'll ever get to a 
point when non-utf8 files will be rare enough that we won't need to 
handle them.
I imagine most of the critics are from countries that never needed more 
than ASCII


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/a7b20b97-cfc7-a2d6-d2a3-744a438199a5%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-09 Thread Gabriele F

On 08/12/2020 17.47, Bram Moolenaar wrote:

This works:
:set fencs=utf8
:%!cat
although "fenc" remains "latin1".

Yeah, for an existing buffer and filtering the first entry in 'fencs' is
used to read the filter output, but 'fenc' isn't set.  That's a bit
strange, but I'm not sure what would break if we change this.  It might
actually be good to fix this, since if you write that file it might get
messed up.


I performed a couple of tests trying to write the result to a file after 
doing the above (using a correct UTF-8 file as source):
- if you leave fenc to latin1 the new file will be in latin1 (with all 
the characters correctly encoded)
- if you set fenc to utf8 *after* the %!cat (but of course before 
writing the file) the new file will be in UTF-8 with all the characters 
correctly encoded
- if you set fenc to utf8 *before* the %!cat (and of course before 
writing the file) the new file will be... a mess: by all appearances Vim 
thinks that the individual bytes of the UTF-8 file are individual latin1 
characters, and it then converts them to UTF-8; so you'll get a UTF-8 
encoded file with the wrong characters, e.g. a "C3 B2" sequence in the 
original file, which stands for a UTF-8 encoded "ò", (Unicode code point 
F2) will become a "C3 83 C2 B2" sequence in the written file: "C3" is a 
"Â" in latin1 (and yes, in Unicode too), and "Â" is encoded as "C3 83" 
in UTF-8, "B2" is a "²" in latin1 (and Unicode) and "²" is encoded as 
"C2 B2" in UTF-8 (in case someone noticed it, don't let yourself get 
confused by the fact that C3 and B2 occur both in the source and the 
translated sequence, that's largely just an unfortunate coincidence of 
my example).


Given that Unicode is identical to latin1 in the first 256 characters, 
to better confirm what happened I also tried using another charset 
(cp850) instead of latin1 in the above tests (fencs=cp850 in my vimrc 
and setting fenc=cp850 in the second and third tests), still using a 
correct UTF-8 file as a source; the results are analogous, with a 
correct cp850 file in the first test, a correct UTF-8 one in the second 
and a UTF-8 one with the original file's bytes interpreted as cp850 and 
then converted to UTF-8 in the third (the original "ò", "C3 83", becomes 
a "E2 94 9C E2 96 93" sequence, given that "C3" is a "├" symbol in 
cp850, Unicode code point 251C ->  "E2 94 9C" UTF-8, and 83 is a "▓", 
Unicode code point 2593 -> "E2 96 93" UTF-8).


Yes, I... ahem, had a lot of fun this afternoon :D


Cheers

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/d90f2dd2-ef6a-fb16-0118-4f30dc238aba%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-09 Thread Gabriele F

On 08/12/2020 14.58, A. Wik wrote:

Thanks a lot for the "%!"-idea!  That's what I needed.

This works:
:set fencs=utf8
:%!cat


That :%!cat is indeed a neat (if hacky) idea!

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/7b28bffa-70f8-3009-45ff-ce1a85be472c%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-09 Thread Gabriele F

On 08/12/2020 10.47, A. Wik wrote:

Hi all,

I tried a few things:

(1) gvim -f ++enc=utf8 -
result: "E492: Not an editor command: +enc=utf8
(2) gvim -f +enc=utf8 -
result: see (1)
(3) gvim -f +"set fenc=utf8" -
result: no error message; sets fenc to "utf-8", but file is loaded as
if with latin1.
(4) gvim -f -c "set fenc=utf8" -
result: see (3)
(5) gvim -f --cmd "set fenc=utf8" -
no error message; fenc remains is "latin1"


Yes, I tried stuff like that while perusing the manual a hundred times, 
it can't work and that's also kind of declared in some points of the 
documentation; :h fenc is a jungle, and I seem to remember that it's 
also not completely correct. Basically 'fenc' is only looked at when 
writing a file, and who knows what the output of that write will be.


So essentially, besides 'fencs', the ++enc "opt" (which **has nothing to 
do with the 'enc' option!!!**) is the only thing that can have an effect 
when reading a file, and after it's read you better forget about fixing 
its encoding.


The only way forward in my opinion would be to deprecate 'enc', 'fenc', 
++enc and probably 'fencs', giving warnings when they do get used, and 
introduce completely different options and commands.


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/10740b06-b5c1-cc44-9c3e-d5607662214a%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-09 Thread A. Wik
On Tue, 8 Dec 2020 at 16:47, Bram Moolenaar  wrote:
>
>
> Albert Wik wrote:
> >
> > Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
> > not going to change the "fenc" accordingly?
>
> When reading a file (or filter output) the values in 'fencs' are tried
> one by one.  Normally when something fails then the next one is tried,
> but since reading filter output from a pipe doesn't allow for a retry,
> it will always use the first one.

Thanks, that is useful to know.

> The real problem is that 'fencs' was set to "latin1" at first, thus Vim
> didn't even try to use another encoding.  Perhaps it also works if you
> do that on the command line:
> somecommand | vim - -c 'set fencs=utf8,latin1'

No, because (according to --help) the command is run after loading the
first file.  Meanwhile, "--cmd " does not work because it
runs the command before sourcing any vimrc file, and so, the new fencs
setting gets overwritten by the vimrc.  It would be useful to have an
option to run a command just *before* loading the first file but after
any rc-files.

I don't include utf8 in my default fencs setting because that has the
side effect of using utf8 for any newly created files.

-aw

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CALPW7mSgAFud82k-rEv4_MjWkPZQy84VRGFm1Yy79ZROEATppw%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-08 Thread Bram Moolenaar


Albert Wik wrote:

> > > Right.  The only way I've found is to use a temporary file.
> > > Incidentally, the zsh shell makes that easy:
> > > % gvim -f =(man llseek)
> >
> > Assuming that loading the text as latin1 didn't mess it up (since it's
> > an 8 bit encoding it should be OK), then you can convert it to utf-8
> > with:
> > :set fencs=utf-8,latin1
> > :%!iconv -f latin1 -t utf-8
> >
> > Vim might recognize the utf-8 encoding, if not set set 'fenc':
> > :set fenc=utf8
> >
> > Hopefully that works.
> 
> Thanks a lot for the "%!"-idea!  That's what I needed.
> 
> This works:
> :set fencs=utf8
> :%!cat
> although "fenc" remains "latin1".

Yeah, for an existing buffer and filtering the first entry in 'fencs' is
used to read the filter output, but 'fenc' isn't set.  That's a bit
strange, but I'm not sure what would break if we change this.  It might
actually be good to fix this, since if you write that file it might get
messed up.
 
> It is not appropriate to use "iconv -f latin1 -t utf8" (that does in
> fact corrupt the data!) because the data is already in UTF-8, and that
> is why it is not displayed properly in Vim (because Vim thinks it is
> in Latin-1); in particular, the short dash character is shown as
> "â<80><90>".  When it is displayed properly, a "‐" is shown; putting
> the cursor at it and doing "ga" reports that this is character number
> 0x2010.
> 
> Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
> not going to change the "fenc" accordingly?

When reading a file (or filter output) the values in 'fencs' are tried
one by one.  Normally when something fails then the next one is tried,
but since reading filter output from a pipe doesn't allow for a retry,
it will always use the first one.

The real problem is that 'fencs' was set to "latin1" at first, thus Vim
didn't even try to use another encoding.  Perhaps it also works if you
do that on the command line:
somecommand | vim - -c 'set fencs=utf8,latin1'

Didn't try it.  Should at least work if you set 'fencs' in your .vimrc.


-- 
If an elephant is left tied to a parking meter, the parking fee has to be paid
just as it would for a vehicle.
[real standing law in Florida, United States of America]

 /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/202012081647.0B8GlVCw1678686%40masaka.moolenaar.net.


Re: Changing encoding of an already loaded buffer

2020-12-08 Thread A. Wik
On Tue, 8 Dec 2020 at 12:55, Bram Moolenaar  wrote:
>
>
> Albert Wik wrote:
> >
> > Right.  The only way I've found is to use a temporary file.
> > Incidentally, the zsh shell makes that easy:
> > % gvim -f =(man llseek)
>
> Assuming that loading the text as latin1 didn't mess it up (since it's
> an 8 bit encoding it should be OK), then you can convert it to utf-8
> with:
> :set fencs=utf-8,latin1
> :%!iconv -f latin1 -t utf-8
>
> Vim might recognize the utf-8 encoding, if not set set 'fenc':
> :set fenc=utf8
>
> Hopefully that works.

Thanks a lot for the "%!"-idea!  That's what I needed.

This works:
:set fencs=utf8
:%!cat
although "fenc" remains "latin1".

It is not appropriate to use "iconv -f latin1 -t utf8" (that does in
fact corrupt the data!) because the data is already in UTF-8, and that
is why it is not displayed properly in Vim (because Vim thinks it is
in Latin-1); in particular, the short dash character is shown as
"â<80><90>".  When it is displayed properly, a "‐" is shown; putting
the cursor at it and doing "ga" reports that this is character number
0x2010.

Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
not going to change the "fenc" accordingly?

Cheers,
Albert.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CALPW7mREoMoWYG%2BW26d_vWPiD5bKhU-r5MvY8RSOE3YTj-KZvQ%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-08 Thread Bram Moolenaar


Albert Wik wrote:

> On Mon, 7 Dec 2020 at 20:49, Gabriele F  wrote:
> >
> > The actual "correct" way to "change" the encoding of a buffer is, I
> > believe, with the "++enc" option, added either to :e (e.g. `:e
> > ++enc=utf8`) or several similar commands such as indeed :vi (`:vi
> > ++enc=utf8`).
> 
> Thanks, I didn't know about that.  It's more convenient than changing
> the "fileencodings".
> 
> > However I couldn't find a way to make it work with a file-less buffer,
> > such as your pipe example:
> 
> Right.  The only way I've found is to use a temporary file.
> Incidentally, the zsh shell makes that easy:
> % gvim -f =(man llseek)

Assuming that loading the text as latin1 didn't mess it up (since it's
an 8 bit encoding it should be OK), then you can convert it to utf-8
with:
:set fencs=utf-8,latin1
:%!iconv -f latin1 -t utf-8

Vim might recognize the utf-8 encoding, if not set set 'fenc':
:set fenc=utf8

Hopefully that works.

-- 
You can be stopped by the police for biking over 65 miles per hour.
You are not allowed to walk across a street on your hands.
[real standing laws in Connecticut, United States of America]

 /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/202012081255.0B8CtN671630556%40masaka.moolenaar.net.


Re: Changing encoding of an already loaded buffer

2020-12-08 Thread A. Wik
On Mon, 7 Dec 2020 at 20:49, Gabriele F  wrote:
>
> The actual "correct" way to "change" the encoding of a buffer is, I
> believe, with the "++enc" option, added either to :e (e.g. `:e
> ++enc=utf8`) or several similar commands such as indeed :vi (`:vi
> ++enc=utf8`).

Thanks, I didn't know about that.  It's more convenient than changing
the "fileencodings".

> However I couldn't find a way to make it work with a file-less buffer,
> such as your pipe example:

Right.  The only way I've found is to use a temporary file.
Incidentally, the zsh shell makes that easy:
% gvim -f =(man llseek)

Regards,
Albert.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CALPW7mQKZ1DPRYc%2B_bz%3D8mTFUWfnz2KhDthX7-oDBZE7eY_2BA%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-08 Thread A. Wik
Hi all,

I tried a few things:

(1) gvim -f ++enc=utf8 -
result: "E492: Not an editor command: +enc=utf8
(2) gvim -f +enc=utf8 -
result: see (1)
(3) gvim -f +"set fenc=utf8" -
result: no error message; sets fenc to "utf-8", but file is loaded as
if with latin1.
(4) gvim -f -c "set fenc=utf8" -
result: see (3)
(5) gvim -f --cmd "set fenc=utf8" -
no error message; fenc remains is "latin1"

A different approach:
(6) (man llseek ; echo 'vim:fenc=utf8:') | gvim -f -
result: no error message; fenc gets set to "utf-8"; file is loaded as
if with latin1

See also below:

On Tue, 8 Dec 2020 at 01:45, Tony Mechelynck
 wrote:
>
> If you find out after loading the stdin that it was opened in the
> wrong encoding, then it's too late; but if you know the file's
> encoding in advance, the should be a way, especially if your
> 'encoding' (the charset used internally by Vim) is UTF-8 and if your
> Vim is compiled with +iconv.

Both conditions hold true.

> To be able to detect Latin1 and UTF-8 (and UTF-16 with BOM) automagically, add
> set fileencodings=ucs-bom,utf-8,latin1

I tried that months ago.  The result was that new files were assumed
to have fenc=utf-8, for reasons you mention below.  This is not
acceptable, so I use "fileencodings=ucs-bom,latin1,cp437" (yes, I know
the trailing ",cp437" is pointless).

> somewhere in your vimrc (the s at the end of fileencodings is
> important); but this isn't enough for files in cp437, especially if
> Vim gets them on stdin. For those, load them with (untested)
>   someprogram | view ++enc=cp437 -

I tested it; see top of message.

> The above will detect files in 7-bit us-ascii encoding as utf-8 rather
> than Latin1. This is not a bug, because the 128 characters which are
> valid in us-ascii are represented identically in all three in
> us-ascii, Latin1 and UTF-8.

Right!

Cheers,
Albert.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CALPW7mQiUGf4-PEUU%2Bi3efpj0VWG7nmueO-OedxKUcij6_MTVA%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-07 Thread Tony Mechelynck
On Mon, Dec 7, 2020 at 5:40 PM A. Wik  wrote:
>
> Hi all,
>
> I sometimes need to change the encoding used for a file.  I have the
> default set to latin1 except for files with an ucs-bom.  However, when
> I load a file encoded in UTF-8 or CP-437 the default is wrong.  What I
> do then is normally to ":set fencs=utf8" and ":vi" to reload the file.
>
> However, what can I do about a file that cannot be reloaded?  Eg:
>
> $ man llseek | gvim -f -
>
> To work around it, I have to do this:
>
> $ man llseek > llseek.man
> $ gvim llseek.man
>
> Is there another way?
>
> Regards,
> Albert.

If you find out after loading the stdin that it was opened in the
wrong encoding, then it's too late; but if you know the file's
encoding in advance, the should be a way, especially if your
'encoding' (the charset used internally by Vim) is UTF-8 and if your
Vim is compiled with +iconv.

To be able to detect Latin1 and UTF-8 (and UTF-16 with BOM) automagically, add
set fileencodings=ucs-bom,utf-8,latin1
somewhere in your vimrc (the s at the end of fileencodings is
important); but this isn't enough for files in cp437, especially if
Vim gets them on stdin. For those, load them with (untested)
  someprogram | view ++enc=cp437 -
(the minus sign at the end is important) which means that you have to
know the file's encoding before starting Vim if it is other than UTF-8
or Latin1. Using "view" instead of "vim" on the command-line avoids
problems with the 'modified' flag; for ++enc see ":help ++enc".

The above will detect files in 7-bit us-ascii encoding as utf-8 rather
than Latin1. This is not a bug, because the 128 characters which are
valid in us-ascii are represented identically in all three in
us-ascii, Latin1 and UTF-8.

Best regards,
Tony.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CAJkCKXtd5YiRQv3wa7GAOwy%3Dq9P1zcGKv0rgQRpr1sw2qO2A0Q%40mail.gmail.com.


Re: Changing encoding of an already loaded buffer

2020-12-07 Thread Gabriele F
Ah yes, I had also tried passing "-" as a filename for the reload 
attempts, nope, it was interpreted as an actual "-" file name...


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/c3639889-8787-da73-ed90-e7bdbea86fd4%40tiscali.it.


Re: Changing encoding of an already loaded buffer

2020-12-07 Thread Gabriele F
The actual "correct" way to "change" the encoding of a buffer is, I 
believe, with the "++enc" option, added either to :e (e.g. `:e 
++enc=utf8`) or several similar commands such as indeed :vi (`:vi 
++enc=utf8`).


However I couldn't find a way to make it work with a file-less buffer, 
such as your pipe example:


If I use `:e! ++enc=utf8` I'm given an «E32: No file name» error.

I thought of passing "%" of "#n" as the filename for :e (`:e ++enc=utf8 
%`), but it doesn't work, I'm given a «E499: Empty file name for '%' or 
'#', only works with ":p:h"» error (and indeed the `:h _%` stuff is 
described as standing for "file names", not for the actual buffers).


Then I tried adding a filename, with `:file whatever`, but once that's 
done :e! loads a new empty buffer named "whatever"...


So there doesn't seem to be a way to really reload (possibly with 
different encoding options) the current buffer, only to reload the file 
from which the current buffer was loaded, and so for file-less buffers 
no way at all.


However under Linux and other systems there may well be a way to access 
the buffer's file's descriptor (/dev/fd/0 ?), so it might work by 
passing that as the filename.


And there's probably some other way by copying the text around.

By the way, apparently this also means that you can't even set the 
encoding of a pipe that you haven't yet created, from the shell, since 
to the best of my knowledge the only way to set the encoding of a file 
from the shell, before opening it, is `vim +":e ++enc= 
"` (which actually means to open it from inside vim). But 
maybe you can with some more intricate command.



I'm far from being Vim expert however, I might well be missing something 
(or a lot).



And encoding stuff is in general quite a mess in Vim, I'll grumble about 
it one time or another... :/



Cheers

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/07941846-edc4-431c-3889-0c7020254157%40tiscali.it.