Re: Handling mislabeled emails encoded with Windows-1252
Hi Jeff, > GMime actually comes with a stream filter (GMimeFilterWindows) which can > auto-detect this situation. > > In this particular case, you'd instantiate the GMimeFilterWindows like this: > > filter = g_mime_filter_windows_new ("iso-8859-1"); > > "iso-8859-1" being the charset that the content claims to be in. > > Then you'd pipe the raw (decoded but not converted to utf-8) content though > the filter and afterward call g_mime_filter_windows_real_charset (filter) > which would return, in this user's case, "windows-1252". Nice, this is exactly what I was looking for! Somehow I missed it when checking GMime. I'll adapt my local fix and post the results here. Thanks, Sebastian ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Handling mislabeled emails encoded with Windows-1252
Hi all (sent his to David already using Reply instead of Reply-All, d'oh!), GMime actually comes with a stream filter (GMimeFilterWindows) which can auto-detect this situation. In this particular case, you'd instantiate the GMimeFilterWindows like this: filter = g_mime_filter_windows_new ("iso-8859-1"); "iso-8859-1" being the charset that the content claims to be in. Then you'd pipe the raw (decoded but not converted to utf-8) content though the filter and afterward call g_mime_filter_windows_real_charset (filter) which would return, in this user's case, "windows-1252". Hope that helps, Jeff On 7/23/18, 9:49 PM, "notmuch on behalf of David Bremner" wrote: Sebastian Poeplau writes: > Hi, > > This email is to suggest a minor change in how notmuch handles text > encoding when displaying emails. The motivation is the following: I keep > receiving emails that are encoded with Windows-1252 but claim to be > ISO 8859-1. The two character sets only differ in the range between 0x80 > and 0x9F where Windows-1252 contains special characters (e.g. “quotation > marks”) while ISO 8859-1 only has non-printable ones. The mislabeling > thus causes some special characters in such emails to be displayed with > a replacement symbol for non-printable characters. Hi Sebastian; Everyone's mail situation is unique, but I haven't noticed this problem. Do you have a mechanical (e.g. scripted) way of detecting such mails? I suppose it could just look for characters in the range 0x80 to 0x95 in allegedly ISO_8859-1 messages. A census of the situation in my own mail would help me think about this problem, I think. David ___ notmuch mailing list notmuch@notmuchmail.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnotmuchmail.org%2Fmailman%2Flistinfo%2Fnotmuchdata=02%7C01%7Cjestedfa%40microsoft.com%7C196f62f02155461e6e2408d5f107b75f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636679937804456911sdata=bI6deYOaU81RwBFmITjg3G1DPvjgP8xiO5cB%2FKIkz58%3Dreserved=0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Handling mislabeled emails encoded with Windows-1252
Hi again, >> Everyone's mail situation is unique, but I haven't noticed this >> problem. Do you have a mechanical (e.g. scripted) way of detecting such >> mails? I suppose it could just look for characters in the range 0x80 to >> 0x95 in allegedly ISO_8859-1 messages. A census of the situation in my >> own mail would help me think about this problem, I think. > > Yes, I guess that should be a good enough heuristic for detecting > affected mail. I'll try to come up with a simple script and post it > here. Attached is a Python script that checks individual message files and prints their name if it finds them to contain mislabeled Windows-1252 text. The heuristic seems to work well on my mail - let me know if you encounter any issues! Cheers, Sebastian find_mislabeled_cp1252.py Description: Binary data ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Removing notmuch-wash-excerpt-citations from hook breaks fontlocking
Hello, Steps to reproduce: 1. emacs -q 2. M-x load-library notmuch RET 3. M-x customize-variable notmuch-show-insert-text/plain-hook RET 4. Untick notmuch-wash-excerpt-citations 5. Open notmuch-show on a message Expected result: Quoted text is fontlocked, i.e. a different colour to body text. Actual result: Quoted text is not fontlocked. Thanks. -- Sean Whitton signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for Morse code?
On 18-07-23 15:16:07, Ben Oliver wrote: On 18-07-23 14:20:41, Gregor Zattler wrote: Hello, today I searched for emails containing -... --- .-. . -.. ..--.. ...-.- Heh I suppose the problem is that xapian won't take two periods ".." even in quotes. I asked on their IRC about how to escape it but it's quiet So it seems like morse code would not be indexed, which makes sense. Sorry! signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch