Re: utf8 file corruption after transmission over email

2009-05-12 Thread Rocco Rutte
Hi,

* Jussi Peltola wrote:
> On Fri, May 08, 2009 at 06:23:15PM -0700, zion wrote:

> > if LC_CTYPE is unset, file doesn't get corrupted.

In that case, what does ':set ?charset' in mutt report?

> I think mutt is reading your file, assuming it's KOI8-R as stated in
> your locale, and converting it to UTF-8 for sending.
> 
> It has to do that; plain text won't tell it what charset it's in and it
> has to guess.

Yes. If mutt is recent enough, $attach_charset can help (it specifies
the charsets to use for guessing for text media type attachments).

Rocco


Re: utf8 file corruption after transmission over email

2009-05-09 Thread Jussi Peltola
On Fri, May 08, 2009 at 06:23:15PM -0700, zion wrote:
> On Fri, May 08, 2009 at 04:34:14PM -0700, zion wrote:
> > Well, I just captured smtp session of loopback interface (same box where
> > mutt is running). Here is the relevant part:
> >   03d0: 746f 3e38 353c 2f74 6f3e 0d0a 0909 093c  to>85.<
> >   03e0: 7265 6164 3e21 d091 e288 9ae2 9591 3c2f  read>!п.Б..Б.. >   
> >   03f0: 7265 6164 3e0d 0a09 0909 3c77 7269 7465  read>. > 
> > As you can see, this character is already messed up before reaching
> > server. So, @gmail is not guilty here ;-).
> Turns out it's my locale. having this causes the problem:
> LC_CTYPE=ru_RU.KOI8-R
> if LC_CTYPE is unset, file doesn't get corrupted.

I think mutt is reading your file, assuming it's KOI8-R as stated in
your locale, and converting it to UTF-8 for sending.

It has to do that; plain text won't tell it what charset it's in and it
has to guess. If you want to send files over email byte-per-byte, renaming
them to .bin or something else that has the mime type of
application/octet-stream should work better.

-- 
Jussi Peltola


Re: utf8 file corruption after transmission over email

2009-05-08 Thread zion
On Fri, May 08, 2009 at 04:34:14PM -0700, zion wrote:
> Well, I just captured smtp session of loopback interface (same box where
> mutt is running). Here is the relevant part:
>   03d0: 746f 3e38 353c 2f74 6f3e 0d0a 0909 093c  to>85.<
>   03e0: 7265 6164 3e21 d091 e288 9ae2 9591 3c2f  read>!п.Б..Б..   
>   03f0: 7265 6164 3e0d 0a09 0909 3c77 7269 7465  read>. 
> As you can see, this character is already messed up before reaching
> server. So, @gmail is not guilty here ;-).
Turns out it's my locale. having this causes the problem:
LC_CTYPE=ru_RU.KOI8-R
if LC_CTYPE is unset, file doesn't get corrupted.


Re: utf8 file corruption after transmission over email

2009-05-08 Thread zion
Well, I just captured smtp session of loopback interface (same box where
mutt is running). Here is the relevant part:
  03d0: 746f 3e38 353c 2f74 6f3e 0d0a 0909 093c  to>85.<
  03e0: 7265 6164 3e21 d091 e288 9ae2 9591 3c2f  read>!п.Б..Б...

Re: utf8 file corruption after transmission over email

2009-05-08 Thread zion
On Fri, May 08, 2009 at 06:04:42PM -0500, Kyle Wheeler wrote:
> On Friday, May  8 at 03:00 PM, quoth Aaron S.:
> > I have a mystery that I'm trying to solve to no avail.
> 
> Hopefully we can help!
> 
> > I got a little sample XML (utf-8) encoded file that I'm trying to 
> > send as attachment. When I attach it, mutt correctly identifies it: 
> > [text/plain, 8bit, utf-8, 0.3K], since there are non-ASCII 
> > characters, in this case there is only 1 such character.
> 
> Well, actually, that's an incorrect identification. It's NOT a 
> text/plain file, it's an xml file. According to RFC 3023, it should 
> either be sent as application/xml or as text/xml.
> 
> Now, that misidentification shouldn't cause the problem you're having, 
> but correcting it *probably* will fix the problem. I bet that if you 
> add the following to your ~/.mime.types file, the problem goes away:
> 
>  application/xml jff
> 
> > After I send it, this attached file becomes currupt.
> 
> I tried sending your file to myself, both with and without that line 
> in my mime.types file, and the file didn't get corrupted either way.
> 
> My guess is that this is ACTUALLY your mail server's fault (did you 
> send it through an MSFT Exchange server maybe? They're really bad 
> about this). Here's what I think happened: you have configured mutt to 
> send things in 8-bit mode (i.e. $allow_8bit). Thus, when sending a 
> utf-8 file attachment with an unusual character in it, mutt sent it 
> completely unmodified, because that's supposed to be safe to do when 
> sending in 8-bit mode. But some servers (and I've had this happen more 
> often than not with Exchange servers) attempt to convert all messages 
> into 7-bit form. Unfortunately, they're often very bad at it. I've had 
> several messages corrupted by Exchange servers simply because they 
> couldn't handle curly-quotes correctly. It's happened often enough 
> that I finally just unset allow_8bit so that mutt would always take 
> care of encoding my messages in a 7-bit safe manner, because mutt is 
> so much better at it than they are.
> 
> Anyway, does that help?
Hello,
Well, it was sent from @gmail to another @gmail
account. I have no idea what they run there at google. I thought about
adding that to mime.types and it does work.
What bothers me is that now I have to pay much closer attention as to
when I'm attaching strange files.
I'm gonna have to think of a way to intercept whatever mutt is sending
out to make sure it's not mutt that messes up this 3byte UTF-8
character.

In any case, thanks for your pointers.


Re: utf8 file corruption after transmission over email

2009-05-08 Thread Kyle Wheeler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On Friday, May  8 at 03:00 PM, quoth Aaron S.:
> I have a mystery that I'm trying to solve to no avail.

Hopefully we can help!

> I got a little sample XML (utf-8) encoded file that I'm trying to 
> send as attachment. When I attach it, mutt correctly identifies it: 
> [text/plain, 8bit, utf-8, 0.3K], since there are non-ASCII 
> characters, in this case there is only 1 such character.

Well, actually, that's an incorrect identification. It's NOT a 
text/plain file, it's an xml file. According to RFC 3023, it should 
either be sent as application/xml or as text/xml.

Now, that misidentification shouldn't cause the problem you're having, 
but correcting it *probably* will fix the problem. I bet that if you 
add the following to your ~/.mime.types file, the problem goes away:

 application/xml jff

> After I send it, this attached file becomes currupt.

I tried sending your file to myself, both with and without that line 
in my mime.types file, and the file didn't get corrupted either way.

My guess is that this is ACTUALLY your mail server's fault (did you 
send it through an MSFT Exchange server maybe? They're really bad 
about this). Here's what I think happened: you have configured mutt to 
send things in 8-bit mode (i.e. $allow_8bit). Thus, when sending a 
utf-8 file attachment with an unusual character in it, mutt sent it 
completely unmodified, because that's supposed to be safe to do when 
sending in 8-bit mode. But some servers (and I've had this happen more 
often than not with Exchange servers) attempt to convert all messages 
into 7-bit form. Unfortunately, they're often very bad at it. I've had 
several messages corrupted by Exchange servers simply because they 
couldn't handle curly-quotes correctly. It's happened often enough 
that I finally just unset allow_8bit so that mutt would always take 
care of encoding my messages in a 7-bit safe manner, because mutt is 
so much better at it than they are.

Anyway, does that help?

~Kyle
- -- 
Anyway, have fun.
And don't bother reporting any bugs for the next few days. I won't 
care anyway.
-- Linus Torvalds, when kernel 2.4 came out
-BEGIN PGP SIGNATURE-
Comment: Thank you for using encryption!

iQIcBAEBCAAGBQJKBLqJAAoJECuveozR/AWeBBQP/j1tOe19BTFKo/AN4gzcDhTS
72ug8j7pY3M75W7DJ33Bx3p6gafGEwaiHh6mePt/0L4YuHzGpZxhog9FmmzptJmG
1ROfdmkJ4DYH7zrXTHvLBufrp1I/hlAGqsogncrs+N/gLV6QNzJno3FnY9xkVbxe
DNm3MgkksZ1U9uMXyhrHsoJ+NQ01zuzP6BtEp1uVQKOHnluEd8jzyR9Dow5j5/8A
zp84PLMCvHn5UIQl1cf8qoUGFSfznBmK6xMgBDXCy/bghjwliGdPsy8n3Y0VhD6V
16vqx3qcTwoSbbTaQcyqY++v82TvQAz8izat23C1OWxWnYoAiy5lEjMv5FBh5UkO
dyDZKAWM4XmM05c1GexEcmhfvAHOQ2Il93hluKA7ZIFGeDfyz4Hl+dkeY7aSX2ek
6caHXz+jQnRz6hgBnwk1GcoTDwdmAdtU8XZYCjYHo+1BotMyKzMcE7c/483pj4Kb
dOpluhcPXsjnHKJK3wvRzT9lQqfGy4XFU6SuxXIH8TIxucAvlw+ZWrwUdpx1btIm
tH1ESVyIr4RoAm8viVl/OUdZ2apF8an9faxMwa80YBVGog9TxfoCy0jYOxs0g9YW
KLhPmaeeHaBaFpeha//MN8zcMxNknx7WsFvg2eL7XyMoUj3s2LhlWE6eQu0Luijc
3NWylNKI3jMswtfFtUyQ
=Hbvl
-END PGP SIGNATURE-


Re: utf8 console font

2008-05-07 Thread Chris Bannister
On Wed, May 07, 2008 at 07:55:43AM -0500, Kyle Wheeler wrote:
> On Wednesday, May  7 at 09:29 PM, quoth Chris Bannister:
> > Had one, an SE30, based on the 68030, greata machine. Support - 
> > bloody horrible, unless you had a fat wallet. Backwards 
> > compatibility - bloody horrible, they make their money from the 
> > hardware. Frankly, they are just too expensive.
> 
> A basic Mac mini starts at $600, which seems pretty reasonable to me 
> (similar machines from Dell are $650, but come with a monitor, similar 
> machines from HP are $550 and don't come with a monitor). Then again, 
> you *can* get bargain-bin stuff for $350 from Dell.
> 
> To each their own, I suppose.

Oh, ok, I'll keep that in mind, thanks.

-- 
Chris.
==
"One, with God, is always a majority, but many a martyr has been burned
   at the stake while the votes were being counted."  -- Thomas B. Reed


Re: utf8 console font

2008-05-07 Thread Kyle Wheeler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wednesday, May  7 at 09:29 PM, quoth Chris Bannister:
> Had one, an SE30, based on the 68030, greata machine. Support - 
> bloody horrible, unless you had a fat wallet. Backwards 
> compatibility - bloody horrible, they make their money from the 
> hardware. Frankly, they are just too expensive.

A basic Mac mini starts at $600, which seems pretty reasonable to me 
(similar machines from Dell are $650, but come with a monitor, similar 
machines from HP are $550 and don't come with a monitor). Then again, 
you *can* get bargain-bin stuff for $350 from Dell.

To each their own, I suppose.

~Kyle
- -- 
Habits of thought persist through the centuries; and while a healthy 
brain may reject the doctrine it no longer believes, it will continue 
to feel the same sentiments formerly associated with that doctrine.
-- Charlotte Perkins Gilman
-BEGIN PGP SIGNATURE-
Comment: Thank you for using encryption!

iEYEARECAAYFAkghps8ACgkQBkIOoMqOI15dMQCdEMQA2ZprALm4f/xQ0IGrPReR
nQMAoLF2sVBZJLsRfHqO8huh7kKT+dO/
=zQTe
-END PGP SIGNATURE-


Re: utf8 console font

2008-05-07 Thread Chris Bannister
On Tue, May 06, 2008 at 12:03:50PM -0500, Kyle Wheeler wrote:
> On Tuesday, May  6 at 11:04 PM, quoth Chris Bannister:
> >Hi,
> >
> >Just wondering which console font people are using in an utf8 locale.
> 
> Monaco.

The font families were named after cities, yeah now I remember.

> Tips? Get a mac. ;)

Had one, an SE30, based on the 68030, greata machine. Support - bloody
horrible, unless you had a fat wallet. Backwards compatibility - bloody
horrible, they make their money from the hardware. Frankly, they are
just too expensive.

-- 
Chris.
==
"One, with God, is always a majority, but many a martyr has been burned
   at the stake while the votes were being counted."  -- Thomas B. Reed


Re: utf8 console font

2008-05-06 Thread Vladimir Marek
> > > Just wondering which console font people are using in an utf8 locale.
> > 
> > Terminus
> > 
> > http://www.is-vn.bg/hamster/
> 
> Yeah, I have heard of it and am installing it now.
> 
> Installed "Uni3-TerminusBold16" and it seems to be displaying more
> foreign characters than "chavo", although it does remind of the early
> personal computers in the '70s :-), ah nostalgia ...

I use fullscreen terminal (no borders around). I think it looks nice :)
I installed the same font to my solaris, linux and windows machines.

-- 
Vlad


pgpRasSLylYqV.pgp
Description: PGP signature


Re: utf8 console font

2008-05-06 Thread Chris Bannister
On Tue, May 06, 2008 at 01:05:05PM +0200, Vladimir Marek wrote:
> > Just wondering which console font people are using in an utf8 locale.
> 
> Terminus
> 
> http://www.is-vn.bg/hamster/

Yeah, I have heard of it and am installing it now.

Installed "Uni3-TerminusBold16" and it seems to be displaying more
foreign characters than "chavo", although it does remind of the early
personal computers in the '70s :-), ah nostalgia ...

-- 
Chris.
==
"One, with God, is always a majority, but many a martyr has been burned
   at the stake while the votes were being counted."  -- Thomas B. Reed


Re: utf8 console font

2008-05-06 Thread Kyle Wheeler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday, May  6 at 11:04 PM, quoth Chris Bannister:
>Hi,
>
>Just wondering which console font people are using in an utf8 locale.

Monaco.

Tips? Get a mac. ;)

~Kyle
- -- 
Come to me, son of Jor-El. Kneel before Zod. Snootchie-bootchies.
 -- Jay
-BEGIN PGP SIGNATURE-
Comment: Thank you for using encryption!

iEYEARECAAYFAkggj3YACgkQBkIOoMqOI15HfgCcCSBwYmz6vG43UOkOWp5LH2+X
KBsAoPN+1YaViv67T6HSg3NEFRvRdaB1
=ArGJ
-END PGP SIGNATURE-


Re: utf8 console font

2008-05-06 Thread Ken Moffat
On Tue, May 06, 2008 at 11:04:38PM +1200, Chris Bannister wrote:
> Hi,
> 
> Just wondering which console font people are using in an utf8 locale.
 sigma-general-8x16 ;) [ It's from sigma-consolefonts which is my own
assemblage, derived from etl16, and includes a number of different
maps (the maps decide which characters are available in a particular
psfu font). ]
 http://homepage.ntlworld.com/zarniwhoop/consolefonts/sigma.html .

 End of blowing my own trumpet.  I must remember to update that font
with a few minor fixes.
> 
> Any tips?
> 
 Screen fonts fall into two types - the 'vga' fonts derived from
what you get from the video card (usually very bold), and the others
(often less bold, which might be a problem for some people).  You
then have the "256 glyphs and bold colours, or 512 glyphs without the
bold colours" decision.  There is also a size consideration (fewer
lines of text, or more lines of text)  FWIW, mine is up-to 512 glyphs
and «less bold» with only an 8x16 "cell".

 The big question is, which languages do you want to support in the
console ?  Most fonts are limited to only a few languages.  If you
already have UTF-8 text in all the languages of interest to you, or
you can create some (remember punctuation, quotation marks, currency
symbols as well as the accented letters), then try using the
different fonts ('setfont') to see how well they cope.  For
non-ascii text, creating it is usually a lot easier in a graphical
environment.

 I assume you already know that only alphabetic languages can be
displayed in the regular linux console.

 Of course, there are also visual choices for any font, such as
'what letter form for "g"?', and "how many rows of the character cell
are used for the text?" and even "what do you display when the glyph
isn't available?".  Many console fonts use different glyphs for
cyrillic and latin letters (e.g. Aa|Аа B|В and perhaps C|С) which to
me looks strange.  There is definitely no "one size fits all".

ĸen
-- 
das eine Mal als Tragödie, das andere Mal als Farce


Re: utf8 console font

2008-05-06 Thread Vladimir Marek
> Just wondering which console font people are using in an utf8 locale.

Terminus

http://www.is-vn.bg/hamster/

-- 
Vlad


pgpDa6DQRfNyM.pgp
Description: PGP signature


Re: utf8 encoding problem, but which part?

2007-10-29 Thread Kyle Wheeler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday, October 16 at 09:11 AM, quoth Raphael Brunner:
> my problem isn't clear for me,  what it is exactly.

Nor is it for anyone you're describing it to.

> I became a mail from russia with koi8-u encoding (in mailheader).

You mean you *received* such an email?

> My system is debian stable. In muttrc, I defined 'set 
> charset=utf-8',

It's typically (unless you know what you're doing) better to let mutt 
detect the charset, so take that out of your muttrc.

> the terminal is mlterm (I tried also xterm with the same result)

I don't know anything about mlterm, but xterm can't display the full 
utf-8 character set unless you do to things: first, you must have it 
using a font that has all the right characters in it, and second, you 
must run xterm with the correct options (e.g. use the wrapper script 
`uxterm`).

> and locale output (see below) shows good. Now, the mailcontent in 
> mutt is right,

Good.

> but if I reply to this (mails open with vim), then there are strange 
> signs there (but only the not-ascii). Regardless if I set ':set 
> encoding=utf-8' in vim or not, vim don't show this characters 
> correct.

Firstly, setting the encoding in vim to be utf-8 only changes vim's 
internal representation of characters, it does NOT change vim's 
understanding of the file it's editing. Here's a clip from vim's 
documentation:

 NOTE: Changing this option will not change the encoding of the
existing text in Vim.  It may cause non-ASCII text to become invalid.
It should normally be kept at its default value, or set when Vim
 starts up.  See |multibyte|.

More helpfully:

 The character encoding of files can be different from 'encoding'.
This is specified with 'fileencoding'.  The conversion is done with
iconv() or as specified with 'charconvert'.

Normally 'encoding' will be equal to your current locale.  This will
be the default if Vim recognizes your environment settings.  If
'encoding' is not set to the current locale, 'termencoding' must be
set to convert typed and displayed text.  See |encoding-table|.

So you want to leave 'encoding' alone, and change 'fileencoding'.

What you need to do is figure out what character set mutt is saving 
its temporary files as. I believe (but I haven't checked) that mutt 
doesn't do any conversion, and saves the temporary files in the same 
character set as the email.

> I tried also emacs, it display exact the same like vim.

Probably because it assumed (just like vim) that the text was encoded 
in either us-ascii or utf-8.

Hope that helps,
~Kyle
- -- 
When ideas fail, words come in very handy.
  -- Johann Wolfgang von Goethe
-BEGIN PGP SIGNATURE-
Comment: Thank you for using encryption!

iD8DBQFHJd6ABkIOoMqOI14RAp0wAKDdBnu6oDPuR4t2KoJOo7c1Jq8Q7gCgzwNc
JbXhVAaXhnOI+mZmbRiQxss=
=dfo6
-END PGP SIGNATURE-


Re: utf8

2007-09-19 Thread Kyle Wheeler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wednesday, September 19 at 03:57 PM, quoth Matthias Apitz:
>What would be the best way to make visible mails arriving in my
>mail-folder as:
>
>  --=_NextPart_001_2090AD_01C7FA92.54AF4C00
>  Content-Type: text/plain;
>charset="utf-8"
>  Content-Disposition: inline
>  Content-Transfer-Encoding: base64

Hmm, the best way would be to get mutt to decode those. :) I don't 
know how, though - probably something you need to pester the devel 
list for.

>$ /usr/local/bin/decode-base64 attachment > attachment.ascii
>
>but I can get 'decode-base64' compiled on FreeBSD 6.2R

OpenSSL can also decode base64 stuff:

openssl base64 -d < attachment > attachment.ascii

~Kyle
- -- 
If you are going through hell, keep going.
   -- Winston Churchill
-BEGIN PGP SIGNATURE-
Comment: Thank you for using encryption!

iD8DBQFG8TChBkIOoMqOI14RAgI+AJ4gDUao/Aga3cXDJa9ZXFUUf46WTwCgvFb7
7GKVmF55l2xPJEub360Gg7E=
=JWgg
-END PGP SIGNATURE-