Re: Handling mislabeled emails encoded with Windows-1252

2018-07-24 Thread Sebastian Poeplau
Hi Jeff,

> GMime actually comes with a stream filter (GMimeFilterWindows) which can 
> auto-detect this situation.
>
> In this particular case, you'd instantiate the GMimeFilterWindows like this:
>
> filter = g_mime_filter_windows_new ("iso-8859-1");
>
> "iso-8859-1" being the charset that the content claims to be in.
>
> Then you'd pipe the raw (decoded but not converted to utf-8) content though 
> the filter and afterward call g_mime_filter_windows_real_charset (filter) 
> which would return, in this user's case,  "windows-1252".

Nice, this is exactly what I was looking for! Somehow I missed it when
checking GMime. I'll adapt my local fix and post the results here.

Thanks,
Sebastian
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Handling mislabeled emails encoded with Windows-1252

2018-07-24 Thread Jeffrey Stedfast
Hi all (sent his to David already using Reply instead of Reply-All, d'oh!),

GMime actually comes with a stream filter (GMimeFilterWindows) which can 
auto-detect this situation.

In this particular case, you'd instantiate the GMimeFilterWindows like this:

filter = g_mime_filter_windows_new ("iso-8859-1");

"iso-8859-1" being the charset that the content claims to be in.

Then you'd pipe the raw (decoded but not converted to utf-8) content though the 
filter and afterward call g_mime_filter_windows_real_charset (filter) which 
would return, in this user's case,  "windows-1252".

Hope that helps,

Jeff

On 7/23/18, 9:49 PM, "notmuch on behalf of David Bremner" 
 wrote:

Sebastian Poeplau  writes:

> Hi,
>
> This email is to suggest a minor change in how notmuch handles text
> encoding when displaying emails. The motivation is the following: I keep
> receiving emails that are encoded with Windows-1252 but claim to be
> ISO 8859-1. The two character sets only differ in the range between 0x80
> and 0x9F where Windows-1252 contains special characters (e.g. “quotation
> marks”) while ISO 8859-1 only has non-printable ones. The mislabeling
> thus causes some special characters in such emails to be displayed with
> a replacement symbol for non-printable characters.

Hi Sebastian;

Everyone's mail situation is unique, but I haven't noticed this
problem. Do you have a mechanical (e.g. scripted) way of detecting such
mails? I suppose it could just look for characters in the range 0x80 to
0x95 in allegedly ISO_8859-1 messages. A census of the situation in my
own mail would help me think about this problem, I think.

David


___
notmuch mailing list
notmuch@notmuchmail.org

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnotmuchmail.org%2Fmailman%2Flistinfo%2Fnotmuchdata=02%7C01%7Cjestedfa%40microsoft.com%7C196f62f02155461e6e2408d5f107b75f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636679937804456911sdata=bI6deYOaU81RwBFmITjg3G1DPvjgP8xiO5cB%2FKIkz58%3Dreserved=0


___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Handling mislabeled emails encoded with Windows-1252

2018-07-24 Thread Sebastian Poeplau
Hi again,

>> Everyone's mail situation is unique, but I haven't noticed this
>> problem. Do you have a mechanical (e.g. scripted) way of detecting such
>> mails? I suppose it could just look for characters in the range 0x80 to
>> 0x95 in allegedly ISO_8859-1 messages. A census of the situation in my
>> own mail would help me think about this problem, I think.
>
> Yes, I guess that should be a good enough heuristic for detecting
> affected mail. I'll try to come up with a simple script and post it
> here.

Attached is a Python script that checks individual message files and
prints their name if it finds them to contain mislabeled Windows-1252
text. The heuristic seems to work well on my mail - let me know if you
encounter any issues!

Cheers,
Sebastian




find_mislabeled_cp1252.py
Description: Binary data
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Removing notmuch-wash-excerpt-citations from hook breaks fontlocking

2018-07-24 Thread Sean Whitton
Hello,

Steps to reproduce:

1. emacs -q

2. M-x load-library notmuch RET

3. M-x customize-variable notmuch-show-insert-text/plain-hook RET

4. Untick notmuch-wash-excerpt-citations

5. Open notmuch-show on a message

Expected result:

Quoted text is fontlocked, i.e. a different colour to body text.

Actual result:

Quoted text is not fontlocked.

Thanks.

-- 
Sean Whitton


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: how to search for Morse code?

2018-07-24 Thread Ben Oliver

On 18-07-23 15:16:07, Ben Oliver wrote:

On 18-07-23 14:20:41, Gregor Zattler wrote:

Hello,

today I searched for emails containing

-... --- .-. . -.. ..--.. ...-.-



Heh

I suppose the problem is that xapian won't take two periods ".." even 
in quotes.


I asked on their IRC about how to escape it but it's quiet


So it seems like morse code would not be indexed, which makes sense.

Sorry!


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch