from:"Steven Winikoff"

Re: Unsupported nroff macros on MacOS X

2023-04-03 Thread Steven Winikoff

>I am ... concerned about depending on pandoc, because of this:
>
>  Pandoc is available in lxplus, aiadm and most RPM repositories. It's
>  written in Haskell, which means it relies on hundreds of megabytes of
>  library dependencies.

That's certainly fair, but wouldn't it need to be used only once, after
which the documentation could be maintained in markdown format?  I suppose
that would require a tool to go from markdown to man, but at least it's a
thought.


>I have no objection to Markdown but I'm not sure what it would gain us
>exactly, other than maybe someone younger than 35 could edit the
>documentation.

That may be the point -- or not, I suppose, depending on one's point of
view.  (I'm far past the point of being under 35 myself, for what that's
worth.)

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "The cure for boredom is curiousity.
s...@smwonline.ca |  There is no cure for curiousity."
http://smwonline.ca  |
 |  - Dorothy Parker

Re: Unsupported nroff macros on MacOS X

2023-04-03 Thread Steven Winikoff

>In a more practical sense, I am not sure there is anyone with the free
>cycles to convert the current man pages into some other markup language.

This seems like the sort of thing that should be possible to automate, and
that question has been raised before.  A quick search turned up the
following, among others:

   
https://stackoverflow.com/questions/13433903/convert-all-linux-man-pages-to-text-html-or-markdown
   https://jeromebelleman.gitlab.io/posts/publishing/manpages/

 - Steven
-- 
___
Steven Winikoff  | "Science is built upon facts, as a house is
Montreal, QC, Canada |  built of stones; but an accumulation of
s...@smwonline.ca |  facts is no more a science than a heap of
http://smwonline.ca  |  stones is a house."
 |   - Henri Poincaré

Re: new release

2022-04-20 Thread Steven Winikoff

on Manjaro (21.2.6, "Qonos"):

   ===
   All 118 tests passed
   ===

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "It's amazing how much 'mature wisdom'
s...@smwonline.ca |  resembles being too tired."
http://smwonline.ca  |
 |  - Robert Heinlein

Re: mhfixmsg character set conversion

2022-02-26 Thread Steven Winikoff

>This should fix it for you, Steven:

It does.

Thanks again!  I really appreciate your help with this.

 - Steven
-- 
___
Steven Winikoff  | Sometimes you will never know the value
Montreal, QC, Canada | of a moment until it becomes a memory.
s...@smwonline.ca |
http://smwonline.ca  | - Dr. Seuss

Re: mhfixmsg character set conversion

2022-02-22 Thread Steven Winikoff

>> So having searched and found it, don't send it on.  :-)
>
>Very good advice.  Another good reason to retain the unmodified message.

...which I always do.

Sending it on isn't the point anyway (for me, I mean), and is something
I rarely do -- and when I do it, I usually quote selectively rather than
forward an entire message.  For the few exceptions, an unmodified copy of
the original message is available as a backup.

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "What I want is all of the power and none
s...@smwonline.ca |  of the responsibility."
http://smwonline.ca  |
 |  - fortune(6)

Re: mhfixmsg character set conversion

2022-02-21 Thread Steven Winikoff

>To be fair ... that's completely permitted according to the spec!

That doesn't make it a good idea. :-)


>The decoder used by the format engine can deal with that (but as
>David mentioned, it's only designed to convert stuff to the native
>character set).

The problem here isn't the character set conversion, but decoding from
quoted-printable (in this case, or whatever encoding is used in general).

The goal is, and has always been, to save the message in a format that can
be searched easily at a later date.

Anything that interferes with that unnecessarily is perverse, at least in
my opinion. :-/

 - Steven
-- 
___
Steven Winikoff  | "The man who has ceased to learn ought
Montreal, QC, Canada |  not to be allowed to wander around
s...@smwonline.ca |  loose in these dangerous days."
http://smwonline.ca  |
 |  - M. M. Coady

Re: mhfixmsg character set conversion

2022-02-21 Thread Steven Winikoff

>>Subject: Re: [KCBExec]
>> =?utf-8?q?Fwd=3A_r=C3=A9pertoire_des_ensembles_musicau?=
>> =?utf-8?q?x?=
>
>Well, mhfixmsg doesn't expect a mix of unencoded and encoded text.

And it really shouldn't have to.

But some senders are really, really perverse. :-(


>I'll look into it.

Thank you!

 - Steven
-- 
_______
Steven Winikoff  | Sometimes you will never know the value
Montreal, QC, Canada | of a moment until it becomes a memory.
s...@smwonline.ca |
http://smwonline.ca  | - Dr. Seuss

Re: mhfixmsg character set conversion

2022-02-21 Thread Steven Winikoff

>Thank you for reporting the issue you observed and working to improve
>mhfixmsg!

I'm very happy to do what I can.

...but I'm less happy to have to report that I just ran into a new
problem.  In particular, I just received a message with this header:

   Subject: Re: [KCBExec] 
=?utf-8?q?Fwd=3A_r=C3=A9pertoire_des_ensembles_musicau?=
=?utf-8?q?x?=

I ran that through

   mhfixmsg -decodetext 8bit -decodetypes text -textcharset utf-8 \
-reformat -fixcte -fixboundary -noreplacetextplain\
-fixtype application/octet-stream -noverbose  \
-decodeheaderfieldbodies utf-8\
-file "${source}" -outfile "${tf}.fixed"

...but the Subject header came through unchanged.  What am I missing?

 - Steven
-- 
_______
Steven Winikoff  | "Science is built upon facts, as a house is
Montreal, QC, Canada |  built of stones; but an accumulation of
s...@smwonline.ca |  facts is no more a science than a heap of
http://smwonline.ca  |  stones is a house."
 |   - Henri Poincaré

Re: mhfixmsg character set conversion

2022-02-20 Thread Steven Winikoff

>Commit a73f7f08a07e09200f320a734233ab0293e8f428.  Steven, this
>should decode your ASCII-encoded header field bodies.

I just tested it, and I can confirm that it does.


>Of course it didn't end up being that simple.

Nothing ever does. :-/


>A possible future enhancement would be to convert to any specified
>charset.  And maybe repurpose the argument of the mhfixmsg
>-decodeheaderfieldbodies switch to specify the destination charset.

Those sound like good ideas in general, but not ones I'd personally expect
to use.

...but I'm very happy with how mhfixmsg behaves right now. :-)

Thank you!

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "If at first you don't succeed, transform
s...@smwonline.ca |  your data set."
http://smwonline.ca  |
 |  - fortune(6)

Re: mhfixmsg character set conversion

2022-02-14 Thread Steven Winikoff

>>[ regarding decoding of encoded ASCII in headers]
>
>Ok, I'll add support for it to mhfixmsg -decodeheaderfieldbodies utf8.

Thank you!


>When I look at the message in the lists.nongnu.org archive [1], the
>line isn't too long.  But it's not folded, either.  The continuation
>is on separate line with no leading whitespace.

Something got lost in translation.

In the original message (as saved by procmail before being munged in any
way), it was one long line, with exactly one space between the end of the
first encoded portion and the beginning of the second one.

The relevant excerpt (with parts elided to keep the whole short enough here
for purposes of illustration) is

   Subject: =?US-ASCII?Q?Using_[...]_to_mak?= =?US-ASCII?Q?e_text_[...]?=

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "/Earth is 98%% full.  Please delete anyone
s...@smwonline.ca |  you can."
http://smwonline.ca  |
 |   - fortune(6)

Re: mhfixmsg character set conversion

2022-02-13 Thread Steven Winikoff

>That's because -decodeheaderfieldbodys utf8 only decodes UTF-8 text.

That makes sense.  I'd forgotten that utf-8 is a mandatory argument for
"-decodeheaderfieldbodies.


>There was a reason for only allowing decoding of UTF-8 header field
>bodies.  If any character set could be decoded, it would be possible
>to produce header field bodies with embedded nulls,

I didn't know that.


>we could decode ASCII because 1) we've seen it in the wild, 2) it seems as
>harmless as it is pointless to encode ASCII as ASCII, assuming no NULs,
>and 3) it's a proper subset of UTF-8 so it doesn't interfere with the
>semantics of the "-decodeheaderfieldbodies utf8" switch.

That also makes sense.


>Any other suggestions?

No, but then I've never noticed any encoded headers that weren't utf-8 or
ASCII.  And I agree that encoding ASCII seems pointless, but that doesn't
stop my Android mail app from doing it anyway. :-/


>So I'm curious, why is the ASCII encoded as ASCII?  Why not just fold
>the header as usual?

I have absolutely no idea.  This isn't a configurable choice as far as I'm
aware, it't just something that the app does.  If you're curious, it's
called "K-9 Mail":

   https://play.google.com/store/apps/details?id=com.fsck.k9=en_CA=US
   https://k9mail.app/
   https://github.com/k9mail/k-9


>This line is too long, I'm not sure if that is related or if it's a
>separate issue:

It's probably related.  I can't prove that, but in general, shorter subject
lines appear to be passed through without encoding.

Regardless, this kind of thing is exactly what I'm trying to eliminate in
my saved messages.  I just realized that my decode_headers program doesn't
detect the second encoded string in the same header, but I'm about to go
fix that. :-)

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "Any teacher who _can_ be replaced by a
s...@smwonline.ca |  machine, _should_ be."
http://smwonline.ca  |
 |- Arthur C. Clarke

Re: mhfixmsg character set conversion

2022-02-13 Thread Steven Winikoff

>> [ re -decodeheaderfieldbodies ]
>
>Ok, couple of issues, both due to very limited support of
>encoded formats by -decodeheaderfieldbodies.  I'll work on
>them.

Thank you.


>Note that the only encoded headers in your message are
>us-ascii, that seems pointless.

In the case of that particular message, the encoded headers are ones that
I'd almost never want to search for anyway.

But today I sent myself a message using an IMAP-based app on my phone,
resulting in the appended, and I'd definitely want to decode the Subject:
header.

Unfortunately, running it through mhfixmsg results in the message coming
back unchanged.  Is that specifically about -decodeheaderfieldbodies, or
is mhfixmsg doing nothing because the message body is already unencoded
text/plain?

 - Steven


8<-   cut here   >8
>From s...@smwonline.ca Sun Feb 13 15:03:01 2022
Return-Path: 
Received: from server03.4goodhosting.com (198.178.116.238:993) by
  mort.smwonline.ca with IMAP4-SSL; 13 Feb 2022 20:03:01 -
Delivered-To: s...@smwonline.ca
Received: from server03.4goodhosting.com
by server03.4goodhosting.com with LMTP
id qHlDCdljCWKQfgAA2eRUeQ
(envelope-from )
for ; Sun, 13 Feb 2022 15:02:33 -0500
Envelope-to: s...@smwonline.ca
Delivery-date: Sun, 13 Feb 2022 15:02:33 -0500
Received: from mort.smwonline.ca ([206.248.137.116]:59412 helo=[127.0.0.1])
by server03.4goodhosting.com with esmtpsa  (TLS1.2) tls 
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
(Exim 4.94.2)
(envelope-from )
id 1nJL4r-Oz-0y
for s...@smwonline.ca; Sun, 13 Feb 2022 15:02:33 -0500
Date: Sun, 13 Feb 2022 15:02:32 -0500
From: Steven Winikoff 
To: s...@smwonline.ca
Subject: =?US-ASCII?Q?Using_the_Linux_fold_command_to_mak?= 
=?US-ASCII?Q?e_text_more_readable_=7C_Network_World?=
User-Agent: K-9 Mail for Android
Message-ID: <43c6f911-0ded-4953-897b-3a5cffaf9...@smwonline.ca>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=utf-8
Content-Transfer-Encoding: 7bit
X-getmail-retrieved-from-mailbox: INBOX

https://www.networkworld.com/article/3646748/using-the-linux-fold-command-to-make-text-more-readable.amp.html
8<-   cut here   >8
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "Believe that life is worth living, and
s...@smwonline.ca |  your belief will help create the fact."
http://smwonline.ca  |
 |- William James

Re: mhfixmsg character set conversion

2022-02-12 Thread Steven Winikoff

>>1) What's the best replacement for elinks?
>
>mhn.defaults.sh looks for text/html helpers in this order:
>1. w3m
>2. lynx
>3. elinks
>
>I don't know if one is necessarily "better" than another.

I've tried all of w3m, elinks, links and lynx at different times, and I'd
settled on elinks at the one that reproduced the HTML most accurately
(subject to the obvious limitations of a text terminal, but you know what I
mean).

I no longer remember what the differences were, but they probably had to do
with HTML  constructions.

Anyhow, accuracy/faithfulness to the original is what I meant by "best".

I've settled on w3m for now, subject to further testing.


>If you have suggestions on how to improve the arguments that mhn.defaults.sh
>uses for elinks, please let us know.

If I can make elinks do what I need, I certainly will; however, for the
moment at least it looks like I'm unable to accomplish that, which is why
I've switched to w3m for the moment.  For the record, my w3m invocation
looks like this:

   w3m -I ${cset} -T text/html -dump -s -o display_link_number=1 \
   -o color=1 -graph ${html} | sed 's/^   //;s/[   ]*$//'

...where ${cset} is the character set assigned in .mh_profile, and ${html}
contains the HTML code to be rendered.


>>2) Should I replace my 1.7.1 installation by the version I just built?
>>   Basically I'm asking what benefits the current snapshot has over
>>   1.7.1,
>
>See docs/pending-release-notes.

Thanks, I will.


>>   and how far away the next numbered release might be.
>
>Unknown.  Ken appears to be busy.  One of us here could push it out.  It's
>been almost 4 years so I think that would be a good idea.  Perhaps after
>things here settle down a bit.

Please let me clarify that I wasn't trying to rush anything or put pressure
on anyone; I was just asking for an estimate, because that would help me
decide whether or not to wait for it.


>>3) How can I guarantee that messages will be saved with quoted-printable
>>   or base64 parts decoded, without patching mhfixmsg to deal with
>>   messages in which the decoded text would be more than 998 characters
>>   long?
>
>I don't know your reason for patching mhfixmsg.

At the time I didn't understand how or why to use -decodetext binary, so
the patch was the only way I could find to guarantee that text/html parts
would be decoded, no matter how badly formatted the HTML is (and by then
I'd already discovered just how bad that can be :-/).


>IIRC, you were using -decodetext 8bit; binary instead of 8bit might help.

Yes, I understand that now, though I still have the question you answered
below about the practical difference between binary and 8bit.


>>  - Why wasn't the text/html part converted to utf-8?
>
>mhfixmsg only converts the character set of text/plain.  That was a
>design decision.  Other subtypes can be extracted with mhstore and run
>through iconv.  If there's a use for converting them in place in
>mhfixmsg, it wouldn't be difficult but I'm not sure how useful it
>would be.

It would be useful for me, because some messages don't have a text/plain
part, and my main motivation for storing the decoded text is the ability
to search it with grep and mairix.

...but I can modify my shell script to run mhstore and iconv as you
suggest, so for me having a modified mhfixmsg would be nice but not
actually necessary.


>>  - Regardless of the answer to the previous question, after a
>>message has been refiled (and assuming I'm not planning to
>>resend it to anyone), is there a practical difference between
>>binary and 8bit encoding?
>
>"Note that -decodetext binary can produce messages that are not compliant
>with RFC 5322, §2.1.1."

Understood (you made it clear when I first asked about the 998-character
limit that my patch has the same effect), but I don't care; I'm storing
messages in case I need to reread them later, and if I ever need to resend
something that wouldn't be compliant (and so far I can't remember that ever
happening), I'd be sending the converted plain text anyway.


>Is it a proper MIME message (does mhfixmsg return with a non-zero exit
>status)?  If so, can you send it to me off-line?

It's the same message I already sent to you, that I've been using as a test
case all through this discussion.  I just checked, and mhfixmsg returns a
zero exit status for it.

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "It is never too late to be what you might
s...@smwonline.ca |  have been."
http://smwonline.ca  | - George Eliot

Re: mhfixmsg character set conversion

2022-02-12 Thread Steven Winikoff

>Thanks again, that has
>
>$ g 'seems not' *patch
>+L"Warning: Locale seems not configured\n");
>$
>
>Note the ‘L’; it's wide.
>
>$ tr -d \\000 > LC_ALL=C egrep -oa '[ -~]*seems not configured'
>Warning: Locale seems not configured
>$
>$ sed -n l paraur | egrep -A7 'W[\\0]+a[\\0]+r[\\0]+n'
>\000\000\000\000\000\000\000W\000\000\000a\000\000\000r\000\000\000n\
>\000\000\000i\000\000\000n\000\000\000g\000\000\000:\000\000\000 \000\
>\000\000L\000\000\000o\000\000\000c\000\000\000a\000\000\000l\000\000\
>\000e\000\000\000 \000\000\000s\000\000\000e\000\000\000e\000\000\000\
>m\000\000\000s\000\000\000 \000\000\000n\000\000\000o\000\000\000t\
>\000\000\000 \000\000\000c\000\000\000o\000\000\000n\000\000\000f\000\
>\000\000i\000\000\000g\000\000\000u\000\000\000r\000\000\000e\000\000\
>\000d\000\000\000$
>$

I think that clears up the last mystery remaining from the questions I
raised.  Thank you for your help throughout!

 - Steven
-- 
___
Steven Winikoff  | "To do each day two things one dislikes is
Montreal, QC, Canada |  a precept I have followed scrupulously;
s...@smwonline.ca |  every day I have got up and I have gone
http://smwonline.ca  |  to bed."
 |  - W. Somerset Maugham

Re: mhfixmsg character set conversion

2022-02-12 Thread Steven Winikoff

>The file has UTF-8 and later ISO 8859-1.

Another point that should have been obvious to me, and is in hindsight, is
that I can't expect vim to detect the character set properly for something
like this. :-/


>There's no BOM so ucs-bom fails.  The ISO 8859-1 bytes don't happen to
>be valid UTF-8.  ‘default’ means use your environment, which is probably
>UTF-8 again; fails.  Which means we arrive at ‘latin1’, AKA ISO 8859-1,
>which is happy.

Happy, and just as half-correct as utf-8 would have been.

Meanwhile, I did a web search based on what you wrote here, and discovered

   https://vim.fandom.com/wiki/Working_with_Unicode

...which confirms everything you wrote, but also

   https://stackoverflow.com/questions/25115752/vim-encodings-latin1-and-utf-8

...which suggests using this command in vim to force it to reload the file
in utf-8 encoding:

   :e ++enc=utf-8 path_to_file

Of course this can also be done directly from the command line as

   vim -c "e ++enc=utf-8" path_to_file

More interestingly, when vim reopens the file (or just opens it, in the
latter case) in utf-8, it emits this status line message:

   "/tmp/nmh_testing/bad" [ILLEGAL BYTE in line 289] 336 lines, 49366 bytes

...and of course the line in question contains accented characters encoded
in ISO 8859-1, so everything is consistent.


>> ...but in bash, although the line gets pasted, the newline at the end
>> of it somehow doesn't.
>
>Another difference is the pasted text is normally highlighted in some
>way, e.g. inverse video, until it's committed with Enter.

In my experience with tcsh, the inverse video highlighting stays in place
even after the paste is committed, and remains so until something else is
highlighted.

This appears to be the case for me in bash (invoked as sh just now), at
least with enable-bracketed-paste turned off.

 - Steven
-- 
___
Steven Winikoff  | "Science is built upon facts, as a house is
Montreal, QC, Canada |  built of stones; but an accumulation of
s...@smwonline.ca |  facts is no more a science than a heap of
http://smwonline.ca  |  stones is a house."
 |   - Henri Poincaré

Re: mhfixmsg character set conversion

2022-02-12 Thread Steven Winikoff

 = -1 ENOENT 
(No such file or directory)
openat(AT_FDCWD, "/usr/lib/gconv/gconv-modules", O_RDONLY|O_CLOEXEC) = 5
openat(AT_FDCWD, "/usr/lib/gconv/ISO8859-1.so", O_RDONLY|O_CLOEXEC) = 5
openat(AT_FDCWD, "/home/smw/Mail/mhfixmsguEmroo", O_RDWR|O_CREAT|O_EXCL, 0600) 
= 5
openat(AT_FDCWD, "/home/smw/Mail/mhfixmsgkbIBCl", O_RDONLY) = 6
mhfixmsg: /home/smw/Mail/mhfixmsgsLWrjg part 2, convert UTF-8 to UTF-8
openat(AT_FDCWD, "/home/smw/Mail/mhfixmsguEmroo", O_RDONLY) = 5
openat(AT_FDCWD, "/home/smw/Mail/mhfixmsgnhCjdt", O_RDONLY) = 5
+++ exited with 0 +++
8<-   cut here   >8
-- 
___
Steven Winikoff  | "When things are not as they appear to be,
Montreal, QC, Canada |  it's because they're actually simpler
s...@smwonline.ca |  than you think them to be."
http://smwonline.ca  | - Robert Rankin, in The Hollow
 |   Chocolate Bunnies of the Apocalypse

Re: mhfixmsg character set conversion

2022-02-12 Thread Steven Winikoff

>> > Can you try searching par again, this time with
>> > 
>> > file /usr/bin/par
>> > env LC_ALL=3DC egrep -boa 'seems not configured' /usr/bin/par
>>
>> Done, but this still produces no output.
>
>I'd have least expected some output from file(1).  ;-)

Details. :-)

   $ file usr/bin/paraur
   /usr/bin/paraur: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), 
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, 
BuildID[sha1]=e91dc39316ca8e55a7e0037200805128b4e8b5a6, for GNU/Linux 3.2.0, 
stripped

That's par from AUR, renamed after installing 1.53.0 from source.


>If you still have the par from AUR which you've been using all this
>time, could you make it available to me and I'll have a look.  An email
>off list, or a URL, or... I don't mind.

I'll send it to you privately, but we can do better than that.  I just
thought to check archive.org for the patch, and found it at

   
https://web.archive.org/web/20211124173449if_/http://sysmic.org/dl/par/par-1.52-i18n.4.patch

...so you can examine the source code directly.

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | Life is uncertain.
s...@smwonline.ca | Eat dessert first!
http://smwonline.ca  |

Re: mhfixmsg character set conversion

2022-02-11 Thread Steven Winikoff

>I would do this if you haven't already:
>1. download nmh HEAD, build, and install somewhere
>2. move your $(mhpath +)/mhn.defaults
>3. move your profile and create one with just a Path: entry
>4. run the "mhfixmsg -file original_copy -out -" from 1. and see if the
>   output looks good or bad

I just tried this, and a couple of other things, but only after installing
par 1.53.0 from source and using that to replace the AUR binary.  Here's
what I learned:

   1) Replacing par does indeed fix one of the three failed tests.  I can
  send you the details, but I seem to recall that you already have them
  from Valdis Klētnieks; please let me know if I should forward them
  anyway.

   2) After running make install, the newly built mhfixmsg produces correct
  output.  But so does nmh-1.7.1 mhfixmsg when compiled without my patch.

   3) Step (3) above was the key, and it turned out that I was being misled
  by this .mh_profile entry:

 mhshow-show-text/html:  html_to_text %F | cat -

  ...where html_to_text is a shell script that basically just runs this
  command:

 elinks -force-html -dump -dump-charset utf-8 ${html}

  Removing this profile entry causes the message to be displayed
  correctly -- both the original, unmodified version, and the one that
  was saved after being converted by my patched version of nmh-1.7.1
  mhfixmsg.  That's pretty conclusive evidence that I'd been looking
  in the wrong place all along. :-(

  The man page for elinks describes -dump-charset as follows:

 -dump-charset (alias for document.dump.codepage)
 Codepage used when formatting dump output.

  Interestingly, when I restored the mhshow-show-text/html .mh_profile
  entry and modified my shell script to run elinks without this option,
  I still saw the same doubly encoded output.

  So next I tried passing the character set to my script as follows:

 mhshow-show-text/html:  html_to_text %{charset} %F

  ...and changed the script to use the provided character set rather
  than forcing utf-8:

 elinks -force-html -dump -dump-charset $1 ${html}

  This failed differently.  Instead of rendering the message with '�'
  marking undisplayable characters, it used '*' instead.  Somehow, I
  don't consider that to be much of an improvement. :-/

...so clearly I need to replace elinks in my html_to_text script, and doing
that will solve the problem that prompted this discussion, leaving the
following questions:

   1) What's the best replacement for elinks?

   2) Should I replace my 1.7.1 installation by the version I just built?
  Basically I'm asking what benefits the current snapshot has over
  1.7.1, and how far away the next numbered release might be.

   3) How can I guarantee that messages will be saved with quoted-printable
  or base64 parts decoded, without patching mhfixmsg to deal with
  messages in which the decoded text would be more than 998 characters
  long?

  I used the current mhfixmsg with the test message I've been using
  throughout this discussion, with this command line:

 /tmp/nmh/root/bin/mhfixmsg \
 -decodeheaderfieldbodies utf-8 -decodetext binary \
 -decodetypes text -textcharset UTF-8 -reformat \
 -fixcte -fixboundary -noreplacetextplain  \
 -fixtype application/octet-stream \
 -verbose -file $source -outfile $destination

  ...and that resulted in these headers after decoding:

 - for the text/plain part:

  Content-Transfer-Encoding: 8bit
  Content-Type: text/plain; charset="UTF-8"

 - for the text/html part:

  Content-Transfer-Encoding: binary
  Content-Type: text/html; charset=iso-8859-1

  That raises some further questions:

 - Why wasn't the text/html part converted to utf-8?

 - Regardless of the answer to the previous question, after a
   message has been refiled (and assuming I'm not planning to
   resend it to anyone), is there a practical difference between
   binary and 8bit encoding?

 - Why are the headers of the decoded message identical to those
   of the input, despite the use of -decodeheaderfieldbodies?

   (...and yes, the unmodified version of the message does contain
some encoded headers that my decode_headers program found and
decoded; mhfixmsg appears not to have done so).

   Thanks,

 - Steven
-- 
_______
Steven Winikoff  | "'Somebody, SOMEBODY
Montreal, QC, Canada | Has to, you see.'
s...@smwonline.ca | Then she picked out two Somebodies.
http://smwonline.ca  | Sally and me."
 |- Dr. Seuss

Re: mhfixmsg character set conversion

2022-02-11 Thread Steven Winikoff

>  | in bash, although the line gets pasted, the newline at the end of it
>  | somehow doesn't.  When 
>
>This is a recent bash (IMO mis-)feature -

I absolutely agree with your opinion on this topic.


>I believe there's an option (run time) to return to sanity, as I think I
>set it ...  but just now I cannot find it!

A quick web search for "bash xterm paste newline" turns up

   
https://unix.stackexchange.com/questions/633370/bash-needs-another-newline-to-execute-pasted-lines

The summary is to add the following entry to ~/.inputrc

   set enable-bracketed-paste off

I just tried this, and it works.  Thank you for pointing that out!

 - Steven
-- 
___
Steven Winikoff  | "Science is built upon facts, as a house is
Montreal, QC, Canada |  built of stones; but an accumulation of
s...@smwonline.ca |  facts is no more a science than a heap of
http://smwonline.ca  |  stones is a house."
 |   - Henri Poincaré

Re: mhfixmsg character set conversion

2022-02-11 Thread Steven Winikoff

>>- run ~smw/bin/decode_headers using $source as stdin (this explicitly
>>  decodes headers which are RFC 2047-encoded, and passes the body
>>  through unchanged)
>
>This sounds like the kind of thing which might insert bytes which alter
>vim's idea of the ‘fileencoding’.  Given
>
>To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= 
>
>as taken from RFC 2047, is it going to put in a byte 0xf8 for ISO 8859-1
>encoding, or 0xc3 0xb8 for UTF-8?

I didn't know, so I just tried it.  Here's what happens:

   # decode_headers < rfc2407_test_header > converted_rfc2407_header
   # cat converted_rfc2407_header
   To: Keld Jørn Simonsen 

   # hexdump -C converted_rfc2407_header
     54 6f 3a 20 4b 65 6c 64  20 4a c3 b8 72 6e 20 53  |To: Keld J..rn 
S|
   0010  69 6d 6f 6e 73 65 6e 20  3c 6b 65 6c 64 40 64 6b  |imonsen 
.|
   0028

...so it writes 0xc3 0xb8, which I believe is what it should be doing.

 - Steven
-- 
___
Steven Winikoff  | "The most exciting phrase to hear in
Montreal, QC, Canada |  science, the one that heralds new
s...@smwonline.ca |  discoveries, is not 'Eureka!' (I found
http://smwonline.ca  |  it!), but 'That's funny...'"
 | - Isaac Asimov

Re: mhfixmsg character set conversion

2022-02-11 Thread Steven Winikoff

>I assume vim(1) will read up to a certain amount until it either makes up
>its mind or assumes the default.

That makes sense.


>Try this to remove the boring ASCII bytes and see what's left.
>
>tr -d ' -~' https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
>describes ‘�’ and it's being seen above because cut(1) is cutting bytes
>and the ‘108:’ at the start of the line has shifted the 68/69 cut-off
>point to part-way through the UTF-8 for a single code point AKA rune.

For me, this falls into the category of "things that are perfectly obvious,
but only after they've been explained".  Thank you for explaining it.


>Try
>
>sh
>LC_ALL=C; export LC_ALL
>locale
>perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

Done, and I just learned something interesting.  First, the output looks
like this:

   sh-5.1$ LC_ALL=C; export LC_ALL
   sh-5.1$ locale
   LANG=en_CA.UTF-8
   LC_CTYPE="C"
   LC_NUMERIC="C"
   LC_TIME="C"
   LC_COLLATE="C"
   LC_MONETARY="C"
   LC_MESSAGES="C"
   LC_PAPER="C"
   LC_NAME="C"
   LC_ADDRESS="C"
   LC_TELEPHONE="C"
   LC_MEASUREMENT="C"
   LC_IDENTIFICATION="C"
   LC_ALL=C
   sh-5.1$ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
   Veuillez ne pas rpondre au prsent courriel. Il a 
t gnr

Second, the problem with the original command appearing to hang turns out
to be an interaction between bash and xterm's pasting mechanism(!).

I'm accustomed to pasting a command line by triple-clicking to select the
whole line, then middle-clicking to paste it.  That's how xterm has worked
since I first started using it  years ago.

...and it still works exactly this way, and the line gets pasted just as I
expect, in tcsh.

...but in bash, although the line gets pasted, the newline at the end of it
somehow doesn't.  When 

   LC_ALL=C perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

originally seemed to hang, in fact it was just waiting for me to press the
Enter key!  I still don't know why this is happening, but at least I'm
comforted by the fact that my bash binary isn't totally broken. :-/


>Beware that invoking bash(1) as ‘sh’ is not the same as running ‘bash’.

I did know that, but thank you for mentioning it just in case.


>Might not make a difference in this case, but in general it's better to
>run whichever is desired.

Right, but in this case sh was what was desired.  As I understand it,
when invoked that way bash behaves closer to a real Bourne shell than
when involved as bash.


>> I propose to forget this particular clupea harengus of the crimson
>> variety unless you find it interesting in and of itself.
>
>It is odd.  And odd might affect other things, including to do with nmh.
>:-)

Odd indeed, but apparently only when used interactively with xterm, so nmh
is unlikely to be affected.

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "The reward of a thing well
s...@smwonline.ca |  done is to have done it."
http://smwonline.ca  |
 |   - Emerson


tr_output.pdf
Description: tr_output.pdf

Re: mhfixmsg character set conversion

2022-02-11 Thread Steven Winikoff

>Three tests failed.  You can run "make install" in the nmh directory
>to install.

Thanks.  I'll try that later tonight.


>If things are still broken with that nmh, I would remove your par[1]
>executable and rebuild mhn.defaults.  That might at least allow
>make check to pass.

This seems like a worthwhile thing to test at the same time even if
everything else works, so I'll try that also.


>And while we seem to have eliminated par and lynx, the symptoms are
>consistent with one or both of them being used by mhfixmsg.

...except that my strace of mhfixmsg shows no external programs being run.
What am I missing?

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "He who has imagination without learning
s...@smwonline.ca |  has wings but no feet."
http://smwonline.ca  |
 |   - fortune(6)

Re: mhfixmsg character set conversion

2022-02-11 Thread Steven Winikoff

>That ‘i18n’ smells given the nature of the other patch I found earlier.

I now understand what you're referring to, and unsurprisingly you're right.

I didn't realize that par isn't a Manjaro package at all, but in fact
something I installed directly from the Arch User Repository.  It's
clear that you already found the AUR page for par, but for the record
I'll quote this very interesting comment from the package maintainer:

   @ifreund notified me back in march via the out-of-date mechanism that a
   new version (1.53) was available upstream¹ (yup, after 19 years since
   the 1.52 release) \o/.

   Unfortunately, the i18n patch that we are applying to the 1.52, and
   which confers par the ability to deal with UTF-8, does not apply cleanly
   to 1.53. The par author has introduced some fixes to the locale handling
   for (single byte) charsets other than US ASCII, but no support for
   multibyte encodings² yet.

   Until the i18n patch gets updated to apply to 1.53 (any uptakers?), I
   would say that we are better off as we are.

I would have guessed that that patch would be a good thing, but apparently
the author of par agrees with you that isn't, given that the patch was
offered and not accepted.

I'll build 1.53 from source myself before continuing with my testing.


>Assuming Manjaro is just picking this up from Arch Linux,

That's how things work for packages which are included in the Manjaro
repositories, which are separate from those of Arch; however, the AUR is a
community effort which maintains packages not included in the repositories
for Arch (and thus, not in Manjaro or other Arch-derived distributions).

AUR packages are typically downloaded and built from source, although a few
also offer binary downloads -- but par isn't one of those, and since the
patch is no longer available online, the AUR package for it won't even
build anymore.  Clearly the patch was available at the time I installed par
as part of the OS installation on this machine, back in December 2019.


>I think this is the shell script which builds the package.
>https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=par

It is.


>Can you try searching par again, this time with
>
>file /usr/bin/par
>env LC_ALL=C egrep -boa 'seems not configured' /usr/bin/par

Done, but this still produces no output.

 - Steven
-- 
___
Steven Winikoff  | "I don't want to run the world; I merely
Montreal, QC, Canada |  want to own a substantial portion of the
s...@smwonline.ca |  the preferred stock."
http://smwonline.ca  |
 |- Alan Dean Foster,  Cat-A-Lyst

Re: mhfixmsg character set conversion

2022-02-09 Thread Steven Winikoff

>> >I would look at output from mime_helper and see if it's UTF-8.
>>
>> Please forgive me for having to ask this, but how is mime_helper even
>> involved?  Isn't that used only when I read the message?  It isn't in
>> the procmail chain that saves the original copy, and it's the original
>> copy that we've been looking at.
>
>I don't know how mime_helper might fit in.  The lynx invocation is still
>my pick for the root cause but you said you're not clear on how it is
>involved.

I understand how it's involved for reading a message; the part I don't
understand is how it's involved in the sequence of steps that occurs when
a new message is received.

Specifically, to the best of my knowlege:

   1) sendmail hands the message off to procmail

   2) this procmail recipe is activated:

 :0 HBfw
 * ^Content-Type:.*text/
 | /home/smw/bin/email_decoder

I'll append a copy of email_decoder, but the gist of it is:

   - explicitly unset LC_ALL and set LANG to en_CA.UTF-8

   - save the incoming standard input in $source (a file in /tmp)

   - run ~smw/bin/decode_headers using $source as stdin (this explicitly
 decodes headers which are RFC 2047-encoded, and passes the body
 through unchanged)

   - feed stdout from decode_headers into the same mhfixmsg command
 I've already quoted a few times; I'll quote it again here to
 keep everything in one place:

mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
 -reformat -fixcte -fixboundary -noreplacetextplain\   
 -fixtype application/octet-stream -noverbose -file -  \
 -outfile "${tf}.fixed"

 ...where ${tf}.fixed is another, newly created file in /tmp

   - use cmp to compare $source and ${tf}.fixed; if they differ, save
 $source as a new message in +reformatted

The file which started this discussion is the one from +reformatted, and
I still can't see how lynx would have been involved in its creation.


>I would do this if you haven't already:
>1. download nmh HEAD, build, and install somewhere

I got this far, but I've been unable to proceed since the build failed as
described previously.  (To be fair, I also haven't had time to try to get
farther as yet.)


>2. move your $(mhpath +)/mhn.defaults
>3. move your profile and create one with just a Path: entry
>4. run the "mhfixmsg -file original_copy -out -" from 1. and see if the
>   output looks good or bad
>
>If it's good, then start adding things back in one at a time in reverse
>order (starting with mhfixmsg switches) until it's bad.

This sounds like an excellent plan, and I intend to follow through with it
on Friday; unfortunately I'll be busy with other things until then.

...although I may need help getting past the build problem.

 - Steven


8<-   cut here   ---->8
#!/bin/sh
#
#  email_decoder -- rewrite quoted-printable and base64 text in a message
#
#  Steven Winikoff
#  2008/09/11
#  2010/01/22 -- use mhshow to decode
#  2014/05/19 -- always exit with status 0 (see note below)
#  2018/01/22 -- rewrite using mhfixmsg to do the heavy lifting
#  2019/10/17 -- ...and use ~smw/bin/decode_headers to decode RFC 2047
#headers (for use with procmail, grep and mairix)
#
#  Given an email message on standard input with at least one portion
#  containing text encoded in base64 or quoted-printable format, the
#  object of the game is to send the same message back to stdout with
#  the text part(s) decoded.
#
#  A copy of the original message will also be saved in +reformatted
#  (AKA ~smw/Mail/reformatted/) unless the -t (test mode) option is
#  specified.
#
#  This is intended to be invoked in a procmail filter recipe.
#
#  Note that this is the reason why we always exit with status 0, even
#  when something goes wrong; this prevents procmail from cluttering its
#  log with messages similar to these:
#
#   procmail: Program failure (3) of "/home/smw/bin/email_decoder"
#   procmail: Rescue of unfiltered data succeeded
#
#  usage:  email_decoder [-t]
#
#--
#  setup:

PATH="/local/paths:/bin:/usr/bin:$PATH"
export PATH

unset LC_ALL; LANG="en_CA.UTF-8"; export LC_ALL LANG

tf="/tmp/decoder.`date +%Y%m%d.%H%M%S.$$`"
trap 'rm -rf ${tf}* >/dev/null 2>&1' 1 2 3 15

save_folder="+reformatted"

test_mode=0


#--
#  are we operating in test mode?

if [ ! -z "${1}" ]
then
   # officially test mode is indicated by the -t option, but in
   # practice we'll accept any argument at all to mean test mod;

   test_mode=1
fi


#--
#  save a cop

Re: mhfixmsg character set conversion

2022-02-09 Thread Steven Winikoff

>Typically for me (at least) bad encoded files have been processed to find
>'thing' and converted to the Microsoft belief you meant to use the real
>pair of quote marks they prefer.

Thank you.  That helps.


>processed by super-smart software. the worst kind. "I was only trying to
>help" software.

Of course. :-/

You might enjoy this utility, if you haven't already seen it:

   https://www.fourmilab.ch/webtools/demoroniser/

 - Steven
-- 
_______
Steven Winikoff  | "There are millions of chords. There are
Montreal, QC, Canada |  millions of numbers. And everyone forgets
s...@smwonline.ca |  the one that is a zero. But without the
http://smwonline.ca  |  zero, numbers are just arithmetic. Without
 |  the empty chord, music is just noise."
 |  - Terry Pratchett (Soul Music)

Re: mhfixmsg character set conversion

2022-02-09 Thread Steven Winikoff

>> I think Steven says he's running Manjaro which is an Arch Linux spin off, and
>> Archers prefer to pass on upstream code unaltered where possible.
>
>Except that par has been altered?

Not by me, at any rate.


>I use this version, unaltered:
>$ par version
>par 1.53.0

   $ par version
   1.52-i18n.4

   $ pacman -Qi par
   Name: par
   Version : 1.52-8
   Description : Paragraph reformatter
   Architecture: x86_64
   URL : http://www.nicemice.net/par/
   Licenses: custom
   Groups  : None
   Provides: None
   Depends On  : None
   Optional Deps   : None
   Required By : None
   Optional For: None
   Conflicts With  : None
   Replaces: None
   Installed Size  : 98.90 KiB
   Packager: Unknown Packager
   Build Date  : Mon 06 Jan 2020 12:53:58 AM
   Install Date: Mon 06 Jan 2020 12:54:19 AM
   Install Reason  : Explicitly installed
   Install Script  : No
   Validated By: None


>> > Do you have any idea where the following warning comes from?
>>
>> My money's on par(1) given
>>
>> 
>> https://inbox.vuxu.org/voidlinux-github/20191027084150.NZqC6wHlZkyQJ7AkACI7juvuCp0AD_u_IIwftMlDmKs@z/T/
>
>That sure looks like it.

Perhaps, but it isn't.


>> Steven, to confirm, try
>>
>> egrep -l 'seems not configured' /usr/bin/par

   $ egrep -l 'seems not configured' /usr/bin/par
   $ echo $?
   1


>Steven, I would try removing par from the end of your mhbuild-convert-text/html
>entry.

The problem with that is that it's not there in the first place:

   $ grep par ~/.mh_profile
   $ echo $?
   1

In fact,

   $ grep mhbuild ~/.mh_profile
   mhbuild:-maxunencoded 500

   $ grep html ~/.mh_profile
   #: mhshow-show-text/html:   %pmime_helper %F %s %{name}
   mhshow-show-text/html:   html_to_text %F | cat -s
   mhshow_in_browser-show-text/html:  %pmime_helper %F %s "%{name}"
   mhfixmsg-format-text/html:  html_to_text < '%F'

   $ grep -w par ~/bin/html_to_text
   $ echo $?
   1

I'll append the full text of the script in case you'd like to see it, but
I'm pretty sure it's not implicated here.

In fact there are no invocations of par anywhere in my ~/bin directory; the
only occurrences of the word are in some old data files:

   $ grep -lrisw par ~/bin
   /home/smw/bin/mars/reports/data/FMARS/jrn/text/20070729
   /home/smw/bin/mars/reports/data/FMARS/jrn/text/20070718
   /home/smw/bin/mars/reports/data/FMARS/jrn/text/20070719
   /home/smw/bin/mars/reports/data/FMARS/jrn/raw/20070719

...and these files have nothing to do with nmh in any way.

I'm reminded of an old Jackie Mason routine, in which he describes a visit
to a psychiatrist.  After a fair bit of dialog which I won't repeat here,
this snippet occurs:

   psychiatrist:  I see your problem.  You hate your sister.

   Jackie Mason:  I haven't got a sister.

   psychiatrist:  I can't help you if you won't cooperate.

...so I feel a need to apologize for being uncooperative :-/, but I'm at a
loss here.

 - Steven


8<-   cut here   >8
#!/bin/sh
#
#  html_to_text -- convert HTML to plain text
#
#  Steven Winikoff
#  2010/04/28
#
#  note:  this script uses links
# [ http://atrey.karlin.mff.cuni.cz/~clock/twibright/links ]
# because it seems to be the only program available which
# renders tables reasonably
# 
# alternatives (lynx and vilistextum) both show tables one
# column at a time instead of row by row!
#
#
# UPDATE, 2018/08/22:
#
# switched from links to elinks, because links fails when invoked
# via procmail if the source HTML code contains invalid characters
# (as in a file in Windows character encoding which isn't labelled
# as such) -- the symptom is that a properly structured message
# will be converted into one which has an empty HTML part, which
# is a problem if (and only if :-) the HTML part needs to be viewed
# in a graphical browser (see ~smw/bin/view_html_message, as called
# from ~smw/bin/mhread)
#
#--

if [ ! -z "${1}" ]
then
   html="${1}"
else
   # links (as of April 2010, at least) refuses to read standard 
   # input with -dump

   html="/tmp/html_to_text.`date +%Y%m%d.%H%M%S`.$$"
   trap "rm -f ${html} >/dev/null 2>&1; exit 1" 1 2 3 15
   cat > ${html}
fi

elinks -force-html -dump -dump-charset utf-8  ${html} | sed 's/^   //;s/[   
]*$//'
## | cat -s

#
#  w3m -I utf8 -T text/html -dump -s -o display_link_number=1 \
#  -o color=1 -graph ${html} | sed 's/^   //;s/[

Re: mhfixmsg character set conversion

2022-02-09 Thread Steven Winikoff

>What platform are you on (uname -a and relevant excerpt from /etc/*-release)?

   $ uname -a
   Linux mort 5.15.6-2-MANJARO #1 SMP PREEMPT Sat Dec 4 11:11:58 UTC 2021 
x86_64 GNU/Linux

   $ ls -l /etc/*-release
   lrwxrwxrwx 1 root root  15 Dec 18 10:21 /etc/arch-release -> manjaro-release
   -rw-r--r-- 1 root root 106 Feb  5 02:23 /etc/lsb-release
   -rw-r--r-- 1 root root  14 Dec 18 10:21 /etc/manjaro-release
   lrwxrwxrwx 1 root root  21 Sep 13  2019 /etc/os-release -> 
../usr/lib/os-release

   $ cat /etc/lsb-release
   DISTRIB_ID=ManjaroLinux
   DISTRIB_RELEASE=21.2.3
   DISTRIB_CODENAME=Qonos
   DISTRIB_DESCRIPTION="Manjaro Linux"

   $ cat /etc/manjaro-release
   Manjaro Linux

The last one isn't very interesting :-/, but I hope that's enough to give
you the idea. :-)


>What output do you see from these two mhparam commands?
>
>$ mhparam mimetypeproc
>file --brief --dereference --mime-type
>$ mhparam mimeencodingproc
>file --brief --dereference --mime-encoding

Exactly the same thing that you do.


>What are the jpg entries in your profile and mhn.defaults

None; I see no output from

   $ grep -i jpg ~/.mh_profile ~/Mail/mhn.defaults 

However, I do have this entry in .mh_profile:

   mhshow-show-image:   %pmime_helper %F %s "%{name}"

That's the only entry which has anything to do with any type of image
format.

The same thing is repeated, apparently redundantly, in ~/Mail/mhn.defaults:

   $ cat ~/Mail/mhn.defaults
   mhshow-show-application/pdf: %pmime_helper %F %s "%{name}"
   mhshow-show-application: %pmime_helper %F %s "%{name}"
   mhshow-show-audio:   %pmime_helper %F %s "%{name}"
   mhshow-show-video:   %pmime_helper %F %s "%{name}"
   mhshow-show-image:   %pmime_helper %F %s "%{name}"
   mhshow-show-text/richtext:   %pmime_helper %F %s "%{name}"


>Do you have any idea where the following warning comes from?  I don't
>find it using:
>find /bin/ /usr/ /etc/ $HOME -type f -print0 | xargs=0  egrep -l
>'seems not configured'

The same command also returns for me without finding anything.

I've read Ralph's followup suggesting it might be /usr/bin/par, but
apparently that's not the case.

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada |  "Yoda was wrong when it comes to
s...@smwonline.ca |   programming.  Do or undo.  There
http://smwonline.ca  |   is always try."
 |  - Ron Jeffries

Re: mhfixmsg character set conversion

2022-02-09 Thread Steven Winikoff

>I'm referring to your first email on this topic.
>
>Message-ID: <4155787-1643946141.609...@w322.mcfw.6EvO>
>Date: Thu, 03 Feb 2022 22:42:21 -0500
>Subject: mhfixmsg character set conversion
>
>It also has fields which tie it into an earlier email on another topic.
>
>In-reply-to: <202202011803.211i3f1f2458...@darkstar.fourwinds.com>
>X-In-reply-to: Your message of Tue, 01 Feb 2022 13:03:15 -0500

Yes, I see.  I have no idea why I did that, and I'd completely forgotten
having done it.  I'll try to be more careful next time.

 - Steven
-- 
_______
Steven Winikoff  | It's is not, it isn't ain't, and it's it's,
Montreal, QC, Canada | not its, if you mean it is. If you don't,
s...@smwonline.ca | it's its. Then too, it's hers. It isn't
http://smwonline.ca  | her's. It isn't our's either. It's ours,
 | and likewise yours and theirs.
 |   - Oxford University Press

Re: mhfixmsg character set conversion

2022-02-09 Thread Steven Winikoff

>> Really.  I'm not making this up. :-/
>
>No, I don't think you are.  I think that line in both files is correctly
>UTF-8 encoded.

And now that you've explained what's going on, it's clear that you're
right.


>vim isn't the vi(1) I grew up with, and probably you too.

Definitely.  The first time I used vi was in 1984, on a 68000-based Cadmus
system.


>Try ‘:se fileencoding?’ when vim-ing good and again with bad.

Good point:

   $ vim good
   :set fileencoding
   fileencoding=utf-8

   $ vim bad
   :set fileencoding
   fileencoding=latin1


>I expect the bad file has something earlier on which fixes vim's idea of
>the encoding to ISO 8859-1

That does seem to be the case.  Do you have any idea what kind of thing
that might be?  (I know you can't diagnose a file you haven't seen, but in
general, what sorts of things should I look for?)


>> But wait.  It gets worse:
>>
>>$ grep -n ^Veuillez good | cut -c1-68
>>108:Veuillez ne pas répondre au présent courriel. Il a été gén�
>>
>>$ grep -n ^Veuillez bad | cut -c1-68
>>108:Veuillez ne pas répondre au présent courriel. Il a été gén�
>
>The worse being it is the very same line 108 you're seeing in vim which
>grep is also showing?

Exactly, because...


>(The ‘�’ at the end is to be expected.)

...this is still more evidence that you know more about character sets and
conversions than I do.  As if further evidence was needed at this point. :-/

Until now, I've only ever seen that glyph when a character doesn't exist in
the font being used -- but that can't be the case here because that same
character is shown correctly five times in the same line of output.

Why is it to be expected?


>>$ LC_ALL=C perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
>> [...]
>
>I don't understand that.  The -p sets up a loop to read a line from
>good_snippet, do the substitution on it, and print the result, until
>EOF.  The -l strips off the linefeed on input and puts it back on the
>output.  The substitution in between changes all bytes, thanks to
>LC_ALL=C, which aren't space to tilde into a ‘<42>’ string representing
>their hex value.

Thank you for explaining that.

Just for fun, I tried the following in tcsh:

   $ setenv LC_ALL C
   $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
   Veuillez ne pas rpondre au prsent courriel. Il a 
t gnr

As expected, this returned pretty much instantly.  Then I tried this:

   $ sh
   $ LC_ALL=C
   $ echo $LC_ALL
   C
   $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

...and that also hung.  Which in a way is good, because at least it means
bash is behaving consistently.  But also not good, because it's behaving
badly. :-/

On my system, /bin/sh is a symlink to /bin/bash, which is version 5.1.016-2
as packaged by Manjaro.

...but troubleshooting bash is far outside the scope of this discussion, so
I propose to forget this particular clupea harengus of the crimson variety
unless you find it interesting in and of itself.


>Nothing wrong with od(1).  If you have hexdump(1) installed then it with
>-C gives quite nice output.

Yes, I see (or -C? :-).  Thanks for that tip; I hadn't known that hexdump
existed.


>> ...and both snippets are identical!
>
>Well, those lines were identical to start with before snipping.
>You could confirm this with
>
>cmp <(sed -n 108p good) <(sed -n 108p bad)

As written, this also hangs in bash (and is invalid syntax in tcsh).

But it's effectively equivalent to

   $ sed -n 108p good > good.sed
   $ sed -n 108p bad  >  bad.sed 
   $ cmp good.sed bad.sed
   $ echo $?
   0

...which behaves as expected.


>> Strangely, both snippet files look fine in vim.
>
>Because you have chopped off the non-UTF-8 which occurs earlier in bad
>which fixes vim's idea of the file's encoding.

In retrospect this should have been obvious. :-/


>> ...but for the bad file, that becomes
>>
>>"bad" [converted] 336 lines, 49471 bytes 1,1   Top
>
>Ta-da!

Indeed. :-)

Thank you.

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | Eschew obfuscation.
s...@smwonline.ca |
http://smwonline.ca  |

Re: mhfixmsg character set conversion

2022-02-08 Thread Steven Winikoff

>No apology is necessary.  This uncovered an issue with mhfixmsg that
>we fixed.

Thank you.


>> The key is the message about the line length being too long.  Seeing that
>> reminded me that I'd modified the stock 1.7.1 mhfixmsg with this patch:
>
>-decodetext binary instead of 8bit would be safer, I expect.  It
>sounds like you might have tried that in the past without success.
>It might help to dig in to that.

That's definitely my plan now that we've gotten this far.


>> ...but when I look at the files with command-line tools such as more or
>> head, *both* versions look correct.
>
>But are they correct?  It sounds like not, based on viewing in text
>editors.

I agree, but I'm running out of things to try to in order understand what's
happening (please see my reply to Ralph if you haven't already).


>> In summary, I now know what's happening and (mostly) what to do about it,
>> but I still don't know why.
>
>I would look at output from mime_helper and see if it's UTF-8.

Please forgive me for having to ask this, but how is mime_helper even
involved?  Isn't that used only when I read the message?  It isn't in
the procmail chain that saves the original copy, and it's the original
copy that we've been looking at.

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "The cure for boredom is curiousity.
s...@smwonline.ca |  There is no cure for curiousity."
http://smwonline.ca  |
 |  - Dorothy Parker

Re: mhfixmsg character set conversion

2022-02-08 Thread Steven Winikoff

p/nmh/nmh/test/testdir/test-convert530072.expected.
first named test failure: -convertarg with multiple parts and additional text 
in draft
FAIL: test/repl/test-convert
PASS: test/repl/test-if-str
PASS: test/repl/test-multicomp
PASS: test/repl/test-repl
PASS: test/repl/test-trailing-newline
PASS: test/scan/test-header-parsing
PASS: test/scan/test-scan
PASS: test/scan/test-scan-file
PASS: test/scan/test-scan-multibyte
PASS: test/send/test-sendfrom
PASS: test/sequences/test-flist
PASS: test/sequences/test-mark
PASS: test/sequences/test-out-of-range
PASS: test/show/test-show
PASS: test/slocal/test-slocal
PASS: test/whatnow/test-attach-detach
PASS: test/whatnow/test-cd
PASS: test/whatnow/test-ls
PASS: test/whom/test-whom
PASS: test/cleanup
===
3 of 118 tests failed
Please report to nmh-workers@nongnu.org
===
make[1]: *** [Makefile:4996: check-TESTS] Error 1
make[1]: Leaving directory '/tmp/nmh/nmh'
make: *** [Makefile:5261: check-am] Error 2
8<---------   cut here   >8
-- 
___
Steven Winikoff  | "I have learned
Montreal, QC, Canada | To spell hors d'oeuvres
s...@smwonline.ca | Which still grates on
http://smwonline.ca  | Some people's n'oeuvres."
 | - Warren Knox

Re: mhfixmsg character set conversion

2022-02-08 Thread Steven Winikoff

  060  49  6c  20  61  20  c3  a9  74  c3  a9  20  67  c3  a9  6e  c3
 I   l   a 303 251   t 303 251   g 303 251   n 303
   100  a9  72  c3  a9  0a
   251   r 303 251  \n
   105

   $ od -t x1c bad_snippet 
   000  56  65  75  69  6c  6c  65  7a  20  6e  65  20  70  61  73  20
 V   e   u   i   l   l   e   z   n   e   p   a   s
   020  72  c3  a9  70  6f  6e  64  72  65  20  61  75  20  70  72  c3
 r 303 251   p   o   n   d   r   e   a   u   p   r 303
   040  a9  73  65  6e  74  20  63  6f  75  72  72  69  65  6c  2e  20
   251   s   e   n   t   c   o   u   r   r   i   e   l   .
   060  49  6c  20  61  20  c3  a9  74  c3  a9  20  67  c3  a9  6e  c3
 I   l   a 303 251   t 303 251   g 303 251   n 303
   100  a9  72  c3  a9  0a
   251   r 303 251  \n
   105

...and both snippets are identical!  Suddenly I understand even less than
I did when I started writing this reply. :-(

Strangely, both snippet files look fine in vim.  But the original bad file
still looks bad in vim, and I'm at a loss for how to prove that except by
taking a screen shot, so I've done that and attached the result as a 34 Kb
PDF file.

One additional fact which must be relevant although I don't know enough
to say exactly how is that the status bar in vim looks like this when
the good file is newly opened:

   "good" 836 lines, 50844 bytes1,1   Top

...but for the bad file, that becomes

   "bad" [converted] 336 lines, 49471 bytes 1,1   Top

The smaller number of lines is expected (that's the effect of my
no-longer-wanted patch to mhfixmsg), but does that also explain the
different number of bytes?

More importantly, vim explicitly claims that the bad file is "[converted]",
so maybe that's the source of the double encoding?

The more I try to think about this, the more my head hurts. :-(

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "Do not meddle in the affairs of dragons,
s...@smwonline.ca |  for you are crunchy and good with ketchup."
http://smwonline.ca  |


bad.pdf
Description: bad.pdf

Re: mhfixmsg character set conversion

2022-02-08 Thread Steven Winikoff

>This is an easy way to download and build the latest, assuming that
>you have the prerequisites listed in the MACHINES file (and respond
>to the build_nmh questions):
>
>wget http://git.savannah.gnu.org/cgit/nmh.git/plain/build_nmh
>sh build_nmh -v

Thanks for that.  I just tried it, but unfortunately the build failed
at the test step (details appended).

I don't know whether it matters that I ran the build script using my
regular account rather than with root privileges, with the root directory
configured in /tmp (because this is a temporary installation for testing
rather than something I'm planning to keep).

In case it matters, the configuration answers below are the same as in my
real installation of version 1.7.1.


>Ah, OK, maybe it wasn't lynx.

It wasn't.  I just realized what it was, and it turns out I owe you an
apology for reasons I'll explain separately in a reply to your message
from last night.

 - Steven


8<-   cut here   >8
$ mkdir -p /tmp/nmh/root
$ cd /tmp/nmh
$ wget http://git.savannah.gnu.org/cgit/nmh.git/plain/build_nmh
$ sh build_nmh -v
Install prefix [/local]: /tmp/nmh/root
Locking type (dot|fcntl|flock|lockf) [determined by configure]: fcntl
MTS (smtp|sendmail/smtp|sendmail/pipe) [smtp]: 
SMTP server [localhost]: 
Cyrus SASL support (y|n) [determined by configure]: no
TLS support (y|n) [determined by configure]: yes
downloading . . .
autoconfiguring . . .
configuring . . .
building . . .
testing . . .
build failed, build log is in nmh/build_nmh.log

$ tail -30 nmh/build_nmh.log
/sbin/sed -f man/man.sed man/show.man > man/show.1
/sbin/sed -f man/man.sed man/slocal.man > man/slocal.1
/sbin/sed -f man/man.sed man/sortm.man > man/sortm.1
/sbin/sed -f man/man.sed man/unseen.man > man/unseen.1
/sbin/sed -f man/man.sed man/whatnow.man > man/whatnow.1
/sbin/sed -f man/man.sed man/whom.man > man/whom.1
./etc/bash_completion_nmh-gen > etc/bash_completion_nmh
./etc/mhn.defaults.sh 
"/home/smw/bin:/local/paths:/usr/local/sbin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/lib"
 ./etc/mhn.find.sh > etc/mhn.defaults
/sbin/sed -e 's,%mts%,smtp,' \
   -e 's,%mailspool%,/var/mail,' \
   -e 's,%smtpserver%,localhost,' \
   -e 's,%default_locking%,fcntl,' \
   -e 's,%supported_locks%,fcntl dot flock lockf,' \
< ./etc/mts.conf.in > etc/mts.conf
make[1]: Leaving directory '/tmp/nmh/nmh'
make[1]: *** [Makefile:4996: check-TESTS] Error 1
make: *** [Makefile:5261: check-am] Error 2
===
FAIL: test/mhbuild/test-attach
FAIL: test/mhbuild/test-ext-params
FAIL: test/repl/test-convert
3 of 118 tests failed
===
configure.ac:8: warning: The macro `AC_CONFIG_HEADER' is obsolete.
configure.ac:135: warning: The macro `AC_TRY_COMPILE' is obsolete.
configure.ac:142: warning: The macro `AC_TRY_COMPILE' is obsolete.
configure.ac:188: warning: The macro `AC_TRY_LINK' is obsolete.
configure.ac:213: warning: AC_PROG_LEX without either yywrap or noyywrap is 
obsolete
make[1]: *** [Makefile:4996: check-TESTS] Error 1
make: *** [Makefile:5261: check-am] Error 2
8<-   cut here   >8
-- 
_______
Steven Winikoff  | "Don't make your decisions because they are
Montreal, QC, Canada |  the easiest, the cheapest, or the most
s...@smwonline.ca |  popular; make them because you know they
http://smwonline.ca  |  are right."
 |  - Theodore Hesburgh

Re: mhfixmsg character set conversion

2022-02-08 Thread Steven Winikoff

>I'm unable to replicate your problem here with the original message,
>and using your mhfixmsg invocation, mhfixmsg-format-text/html, and
>locale.  The only piece I think I'm missing is your mime_helper.
>I would give that a try if you send it to me.

I've attached the script, but (without having looked at it in a while) I
suspect it depends too heavily on other parts of my personal setup to be
usable for anyone else.  It turns out not to be relevant, but perhaps it
might be interesting to someone anyway.


>With nmh-1.7 mhfixmsg:
>mhfixmsg: /home/levine/src/nmh/msg part 2, decode text/plain; 
>charset=iso-8859-1
>mhfixmsg: /home/levine/src/nmh/msg part 1, will not decode because it
>is binary (line length > 998)
>mhfixmsg: /home/levine/src/nmh/msg part 2, convert UTF-8 to UTF-8

...and therein lies the answer.

I owe you an apology about this, and I'm sincerely sorry for wasting your
time on this question.

The key is the message about the line length being too long.  Seeing that
reminded me that I'd modified the stock 1.7.1 mhfixmsg with this patch:

   --- uip/mhfixmsg.c.original 2018-03-06 14:05:56.0 -0500
   +++ uip/mhfixmsg.c  2019-08-17 19:51:25.723267048 -0400
   @@ -2144,13 +2144,13 @@
int last_char_was_cr = 0;

for (i = 0, cp = buffer; i < inbytes; ++i, ++cp) {
   -if (*cp == '\0'  ||  ++line_len > 998  ||
   +if (*cp == '\0'  ||  ++line_len > 8  ||
(*cp != '\n'  &&  last_char_was_cr)) {
encoding = CE_BINARY;
if (*cp == '\0') {
*reason = "null character";
   -} else if (line_len > 998) {
   -*reason = "line length > 998";
   +} else if (line_len > 8) {
   +*reason = "line length > 8";
} else if (*cp != '\n'  &&  last_char_was_cr) {
*reason = "CR not followed by LF";
} else {

I remember asking about the 998-character limit on this list, in a thread
from January 2018.  You explained why the limit exists, and suggested
another way to achieve what I was trying to do, which I tried but without
success -- I wasn't able to get what I wanted without this change, but I no
longer remember the details.

Obviously I need to revisit this question, because I just compiled a copy
of mhfixmsg from 1.7.1 without this patch, and it now behaves as you'd
expect:  it complains about the line length, and then generates correct
output with these headers:

   Content-Type: multipart/alternative;
boundary=0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   
   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/plain; charset="UTF-8"
   Mime-Version: 1.0
   
   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html; charset=iso-8859-1
   Mime-Version: 1.0

With my patch, I get these headers:

   Content-Type: multipart/alternative;
  boundary=0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit

   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/plain; charset="UTF-8"
   Mime-Version: 1.0

   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/html; charset=iso-8859-1
   Mime-Version: 1.0

There's still something going on that I don't understand, however.  The
way I've evaluated the output from mhfixmsg was by viewing it in vim, and
there's no question that the unpatched output looks fine while the patched
output is as I've been describing since the beginning of this thread.

...but when I look at the files with command-line tools such as more or
head, *both* versions look correct.  When I open both files in xed, the
unpatched file is fine, but the patched file generates this message:

   There was a problem opening the file /tmp/nmh_testing/xxx.

   The file you opened has some invalid characters. If you continue editing
   this file you could corrupt this document.

   You can also choose another character encoding and try again.

...with a menu offering "Automatically Detected", "Current Locale (UTF-8)"
and "Western (ISO-8859-15)" as possible character encodings.

In summary, I now know what's happening and (mostly) what to do about it,
but I still don't know why.

 - Steven
-- 
___
Steven Win

Re: mhfixmsg character set conversion

2022-02-06 Thread Steven Winikoff

>Ah, OK, maybe it wasn't lynx.  I don't know enough about your
>environment to say exactly what heppened.

Quick question:  would it help if I were to run mhfixmsg under control of
strace and send you the output?  That's the most definitive way I can think
of to show you exactly what happens in my environment.

(Yes, the problem occurred when procmail invoked my shell script which
then invoked mhfixmsg, but I see the same behaviour when I run mhfixmsg
directly on the command line with the same input file, so I believe that
strace under those circumstances should capture everything relevant.)

 - Steven
-- 
___
Steven Winikoff  | "The man who has ceased to learn ought
Montreal, QC, Canada |  not to be allowed to wander around
s...@smwonline.ca |  loose in these dangerous days."
http://smwonline.ca  |
 |  - M. M. Coady

Re: mhfixmsg character set conversion

2022-02-06 Thread Steven Winikoff

>Steven wrote:
>
>> That's probably a helpful thing to do, but the question I was wondering
>> about wasn't why the UTF-to-UTF conversion was reported, but rather why
>> the iso-8859-1-to-UTF conversion wasn't reported.
>
>Ok, commit 41ce4490ac5d might fix the problem for you.

Thank you!  I'll check it out tomorrow and see what happens.  Do you
think the patch will apply to 1.7.1, or will I need to install the
latest version from the repository?


>The cause was mismatch between the character set of content generated by
>the external program (lynx, in this case) and mhfixmsg -textcharset UTF-8.
>mhfixmsg wasn't capturing the charset of the generated output, assuming it
>was unchanged.  It then converted it again.

I'm afraid I have to admit I'm not entirely clear on how lynx is even
involved.  I know I have these entry in my system-wide mhn.defaults file:

   $ grep lynx /local/pkg/nmh/root-nmh-1.7.1/etc/mhn.defaults
   mhbuild-convert-text/html: charset="%{charset}"; /sbin/lynx -child -dump 
-force_html ${charset:+--assume_charset} ${charset:+"$charset"} %F | sed 
's/^\(.\)/> \1/; s/^$/>/;' | par 64
   mhfixmsg-format-text/html: charset="%{charset}"; /sbin/lynx -child -dump 
-force_html ${charset:+--assume_charset} ${charset:+"$charset"} %F | expand | 
sed -e 's/^   //' -e 's/  *$//'
   mhshow-show-text/html: charset="%{charset}"; %l/sbin/lynx -child -dump 
-force-html ${charset:+--assume_charset} ${charset:+"$charset"} %F

...and the second of these certainly looks relevant, but:

   - While testing on Friday, I emptied that file out completely and still
 observed the same behaviour.

   - In your message from 12:35 today in the "crufty mhn.default.sh stuff"
 thread, you wrote:

> There is a way.  etcpath() looks for mhn.defaults in this order:
>  * 3) Next, check in nmh Mail directory.
>  * 4) Next, check in nmh `etc' directory.
>
> So if the user puts an mhn.defaults in their Mail directory, then
> only it will be read.  They'd have to copy any entries that they
> do want from /etc/nmh/mhn.defaults to their own.

 I do have an mhn.defaults file in my Mail directory, with (only) these
 entries in it:

mhshow-show-application/pdf: %pmime_helper %F %s "%{name}"
mhshow-show-application: %pmime_helper %F %s "%{name}"
mhshow-show-audio:   %pmime_helper %F %s "%{name}"
mhshow-show-video:   %pmime_helper %F %s "%{name}"
mhshow-show-image:   %pmime_helper %F %s "%{name}"
mhshow-show-text/richtext:   %pmime_helper %F %s "%{name}"

...so while I believe that lynx is involved, I don't know where that
involvement is coming from.

While I'm replying to you anyway, I realize I forgot to reply to your
question from yesterday morning.  You asked:

   Have you tried the -decodeheaderfieldbodies switch to mhfixmsg?

I haven't, mainly because I didn't know that switch existed.  I don't know
what it does (other than what I can infer from the name, of course), and I
can't find any mention of it in the man page for mhfixmsg, or anywhere in
the source code for version 1.7.1.  Was this switch added after 1.7.1 was
released?
   

>The fix is for mhfixmsg to detect the charset of the content,
>using file --brief --mime-encoding if it can.  If it can't, it
>falls back to the -textcharset value.  If that wasn't used, it
>gets it from the locale and advises the user.

That sounds reasonable to me.


>I'm not completely sure that this will fix your problem because
>it's aimed at added text/plain parts.  But with -noreplacetextplain
>I think that's the path to your issue.

Please advise on the easiest way to try it (between applying 41ce4490ac5d to
1.7.1, or just downloading and building the current version of the master
branch), and I'll do so tomorrow (I'm unable to do it before then due to
a prior commitment).

 - Steven
-- 
___
Steven Winikoff  | "The thing is, I mean, there's times when
Montreal, QC, Canada |  you look at the universe and you think,
s...@smwonline.ca |  'What about me?' and you can just hear
http://smwonline.ca  |  the universe replying, 'Well, what about
 |  you?'"
 | - Terry Pratchett (Thief of Time)

Re: mhfixmsg character set conversion

2022-02-04 Thread Steven Winikoff

>I am wondering ... do you maybe have some old configuration in mhn.defaults
>or your .mh_profile that does some iso8859-1 to UTF-8 conversion?

Good question!

The easiest way to answer it for .mh_profile was to empty it temporarily of
everything except

   Path: /home/smw/Mail

This made no difference.  As for mhn_defaults, I have both a personal
file in ~smw/Mail/mhn.defaults as well as the system-wide version in
/local/pkg/nmh/root-nmh-1.7.1/etc/mhn.defaults

The personal file contains only these entries:

   mhshow-show-application/pdf: %pmime_helper %F %s "%{name}"
   mhshow-show-application: %pmime_helper %F %s "%{name}"
   mhshow-show-audio:   %pmime_helper %F %s "%{name}"
   mhshow-show-video:   %pmime_helper %F %s "%{name}"
   mhshow-show-image:   %pmime_helper %F %s "%{name}"
   mhshow-show-text/richtext:   %pmime_helper %F %s "%{name}"

...where mime_helper is a shell script which I'll be happy to share if
anyone's interested.  In any case these entries seem irrelevant to the
matter at hand, but please let me know if you disagree.

Meanwhile, I have these entries in the system-wide file:

   $ grep mhfixmsg /local/pkg/nmh/root-nmh-1.7.1/etc/mhn.defaults
   mhfixmsg-format-application/ics: mhical -infile %F
   mhfixmsg-format-text/calendar: mhical -infile %F
   mhfixmsg-format-text/html: charset="%{charset}"; /sbin/lynx -child -dump 
-force_html ${charset:+--assume_charset} ${charset:+"$charset"} %F | expand | 
sed -e 's/^   //' -e 's/  *$//'

I apologize for the length of that last line, but it's probably easier to
read as is than it would be if I tried to break it up.

In any case, this looks like it might be relevant, so I tried commenting it
out; when that also made no difference, I used a bigger hammer and emptied
out /local/pkg/nmh/root-nmh-1.7.1/etc/mhn.defaults completely, but even
that made no difference that I could detect.

 - Steven
-- 
___
Steven Winikoff  | "I knew 'Enterprise Computing Systems' were
Montreal, QC, Canada |  evil before I touched an actual computer
s...@smwonline.ca |  for the first time, because I used to
http://smwonline.ca  |  watch Kirk and Spock fighting for control
 |  of it."  - Anthony de Boer

Re: mhfixmsg character set conversion

2022-02-04 Thread Steven Winikoff

>As Robert and Ken pointed out, one explanation could be that the
>content is converted twice, the second time incorrectly.

I saw those replies, but I wasn't sure how to interpret them (as in, the
evidence is compelling, but I have no idea why that would be happening or
what to do about it).


>I don't see at this point how mhfixmsg could do that but this needs more
>investigation.  We can continue this way, or if you want to send me a
>sanitized excerpt of the message, I'd be glad to work with it.

I can't think of a reasonable way to sanitize it, but I'm willing to send
it to you privately.  Should I use your  address for this
purpose?


>> $ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
>>-fixcte -fixboundary -noreplacetextplain \
>>-fixtype application/octet-stream -verbose -file - \
>>-outfile $destination < $source
>> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain; 
>> charset=iso-8859-1
>> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html; 
>> charset=iso-8859-1
>> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8
>>
>> ...which is interesting for more than one reason, including that there's
>> apparently no conversion of iso-8859-1 to UTF-8,
>
>That's strange, unless $source had already been run through mhfixmsg.

It hadn't.  In normal use my procmail-invoked shell script does run the
message through a program I wrote myself, which decodes 2047-encoded
headers -- but that only affects the headers, and passes the body through
unmodified; the relevant excerpt for that is:

   [ loop that processes header lines elided]

   172   /**  an empty input line means the end of the message headers:  **/
   173  
   174   if (strlen(input_line) < 1) break;
   175}
   176  
   177  
   178/**  read and write message body:  **/
   179  
   180while (getline(_line, , infile) >= 0)
   181{
   182   fputs(input_line, outfile);
   183}
   184
   185
   186/**  ...and we're done:  **/
   187  
   188return(0);
   189  
   190 }


The only change this produces in the problematic message is as follows:

   47,57c47,57
   < X-SG-EID:  
=?us-ascii?Q?CePduXinO1TKWf=2FmbcRcIcb5o7KEfW6Q=2FLxIZrPrRA0dtxQ5evb2UIV0M0r6v6?=
   <  =?us-ascii?Q?DfqG=2FoldGlAr6l6p1riD1OEyVdX0=2F57dKo740dz?=
   <  =?us-ascii?Q?NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS?=
   <  =?us-ascii?Q?FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4?=
   <  =?us-ascii?Q?ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M?=
   <  =?us-ascii?Q?G6=2FuEHfZ5+X57rF1w=3D?=
   < X-SG-ID:  
=?us-ascii?Q?N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi=2FKHgAsE=2FCUk5eZaRe5Ltr?=
   <  =?us-ascii?Q?cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv?=
   <  =?us-ascii?Q?fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx?=
   <  =?us-ascii?Q?T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx?=
   <  =?us-ascii?Q?5EAyl462xuJc+?=
   ---
   > X-SG-EID:  CePduXinO1TKWf/mbcRcIcb5o7KEfW6Q/LxIZrPrRA0dtxQ5evb2UIV0M0r6v6
   >  DfqG/oldGlAr6l6p1riD1OEyVdX0/57dKo740dz
   >  NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS
   >  FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4
   >  ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M
   >  G6/uEHfZ5+X57rF1w=
   > X-SG-ID:  N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi/KHgAsE/CUk5eZaRe5Ltr
   >  cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv
   >  fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx
   >  T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx
   >  5EAyl462xuJc+

...but in my testing last night and just now, I see the same behavior
when I run mhfixmsg directly on the unmodified original file (my script
always saves an unmodified copy when it makes changes, in case something
goes wrong).


>Conversion to the same charset is a no-op, I'll look into removing the
>verbose output in that case.

That's probably a helpful thing to do, but the question I was wondering
about wasn't why the UTF-to-UTF conversion was reported, but rather why
the iso-8859-1-to-UTF conversion wasn't reported.


>> and that in fact it's part 1 rather than part 2 that gets converted
>> improperly
>
>The part numbers are reversed because that's the order used for display.
>Part 2 is the text/plain part, that's the one that got converted.

Thank you.  That clears up part of my confusion.

 - Steven
-- 
___
Steven Winikoff  | "The thing is, I mean, there's times when
Montreal, QC, Canada |  you look at the universe and you think,
s...@smwonline.ca |  'What about me?' and you can just hear
http://smwonline.ca  |  the universe replying, 'Well, what about
 |  you?'"
 | - Terry Pratchett (Thief of Time)

Re: mhfixmsg character set conversion

2022-02-04 Thread Steven Winikoff

>I expect that your environment is close enough to:
>
>[details snipped]

Pretty much.  Here's what I have:

$ iconv --version
iconv (GNU libc) 2.33

$ locale
LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC=en_CA.UTF-8
LC_TIME=en_CA.UTF-8
LC_COLLATE=C
LC_MONETARY=en_CA.UTF-8
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER=en_CA.UTF-8
LC_NAME=en_CA.UTF-8
LC_ADDRESS=en_CA.UTF-8
LC_TELEPHONE=en_CA.UTF-8
LC_MEASUREMENT=en_CA.UTF-8
LC_IDENTIFICATION=en_CA.UTF-8
LC_ALL=

...so the only differences are LC_COLLATE=C, which I set because I prefer
the way it sorts, and LC_ALL, which must be being set by a side effect of
something, because I'm not doing so explicitly.


>With this small example:
>
>[snip]


>I see correct conversion of the quoted-printable E9 to UTF-8 C3A9:

So do I, which suggests that there's something in the content of the
specific message I'm working with.


>Does adding -verbose to your mhfixmsg invocation provide any clues?
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 2, decode text/plain; charset=iso-8859-1
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 1, decode text/html; charset=iso-8859-1
>mhfixmsg: /tmp/mhfixmsgUgtVK1 part 2, convert iso-8859-1 to UTF-8

This is the output I receive:

$ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
   -fixcte -fixboundary -noreplacetextplain \
   -fixtype application/octet-stream -verbose -file - \
   -outfile $destination < $source
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain; 
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html; 
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8

...which is interesting for more than one reason, including that there's
apparently no conversion of iso-8859-1 to UTF-8, and that in fact it's
part 1 rather than part 2 that gets converted improperly; part 2 still
has

   Content-Type: text/html; charset=iso-8859-1

 - Steven
-- 
___
Steven Winikoff  | "Algebra? [...] But that's far too
Montreal, QC, Canada |  difficult for seven-year-olds!"
s...@smwonline.ca | "Yes, but I didn't tell them that
http://smwonline.ca  |  and so far they haven't found out"
 |
 |  - Terry Pratchett (Thief of Time)

mhfixmsg character set conversion

2022-02-03 Thread Steven Winikoff

I routinely use mhfixmsg to clean up incoming messages, using this command
in a shell script invoked through procmail:

   mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
-reformat -fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream -noverbose -file - \
-outfile $destination < $source

This usually does what I expect, but the other day I received a message
with these characteristics:

   - mhlist reports the following structure:

   msg part  type/subtype  size description
72   multipart/alternative  45K
   1 text/html  42K
   2 text/plain1501

   - the top level of the incoming message has this header (before
 mhfixmsg):

Content-Type: multipart/alternative; boundary=01266[...]

   - the alternative parts have these headers:

Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=iso-8859-1

 and

Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=iso-8859-1

   - after mhfixmsg, the top-level header is unchanged, as expected; the
 alternative part headers are changed to

Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="UTF-8"

 and

Content-Transfer-Encoding: 8bit
Content-Type: text/html; charset=iso-8859-1

...but after conversion from iso-8859-1 to UTF-8, the output file is
mangled.

For reference, here's a section of the quoted-printable encoding from the
original message:

   Veuillez ne pas r=E9pondre au pr=E9sent courriel. Il a =E9t=E9 g=E9n=E9r=E9=
automatiquement, nous ne pourrons pas y donner suite.

This should decode to the following (represented in UTF-8):

   Veuillez ne pas répondre au présent courriel. Il a été généré
   automatiquement, nous ne pourrons pas y donner suite.

   (all in one line, but split here for readability).

...but mhfixmsg turns that into

   Veuillez ne pas rÃ©pondre au prÃ©sent courriel. Il a Ã©tÃ© gÃ©nÃ©rÃ©
   automatiquement, nous ne pourrons pas y donner suite.

   (also all in one line, but split here for readability).

Not that I care very much about this particular boilerplate sentence :-/,
but the message contained a lot of other text that I do care about, all of
which was mangled in the same way.

My questions are then:

1) Is this a bug in mhfixmsg, or am I just using it incorrectly?

2) If the former, is there further information I can supply to help track
   this down, or further tests I can conduct on the message in question?

3) ...or if the latter, what am I doing wrong, and what should I be doing
   instead?

  Thanks,

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | Aleph-null bottles of beer on the wall,
s...@smwonline.ca | Aleph-null bottles of beer...
http://smwonline.ca  |

Re: mhbuild: extraneous information in message

2021-05-12 Thread Steven Winikoff

>‘pacman -Qi mailcap’ will query for information on that package and show
>the upstream URL is https://pagure.io/mailcap.  Pagure is like a
>SourceForge or GitLab and that installation is Fedora's, despite the
>misleading domain name: https://pagure.io/about/.  Fedora took Red Hat's
>source.

Thanks for that!


>I've access to a Manjaro system.  After a ‘sudo -i pacman -Syu’ to
>ensure its packages are up to date, I see
>
>$ pacman -Q file
>file 5.40-2
>$ file -i /usr/share/mathjax2/extensions/a11y/invalid_keypress.mp3
>/usr/share/mathjax2/extensions/a11y/invalid_keypress.mp3: 
> audio/mpegapplication/octet-stream; charset=binary
>$ b2sum -l32 /usr/share/mathjax2/extensions/a11y/invalid_keypress.mp3
>c7d7c71d  /usr/share/mathjax2/extensions/a11y/invalid_keypress.mp3

Right.  Last night I reported that Manjaro had version 5.38-3, but that
was based on what I read at https://discover.manjaro.org/packages/file
rather than what's actually on my machine.  It turns out that I have the
same version you do.


>So the bug is there.  Does it report
>‘audio/mpegapplication/octet-stream’ for lots of your MP3 files?

Yes.  As an experiment, I ran file -i on 2243 MP3 files; two were reported
as application/octet-stream, with all of the remaining 2241 reported as
audio/mpegapplication/octet-stream.


>On both machines, ‘pacman -Qi file’ reports that package's upstream is
>https://www.darwinsys.com/file/.

...which links to https://github.com/file/file

I just downloaded and built the master branch, and it works correctly:

   $ /tmp/file/root/bin/file -i /tmp/session2.mp3 
   /tmp/session2.mp3: audio/mpeg; charset=binary

So that's definitely the root cause.  Thanks again for all your help on
this!

 - Steven
-- 
_______
Steven Winikoff  |
Montreal, QC, Canada | "Do not meddle in the affairs of dragons,
s...@smwonline.ca |  for you are crunchy and good with ketchup."
http://smwonline.ca  |

Re: mhbuild: extraneous information in message

2021-05-12 Thread Steven Winikoff

>Also .deb files just install on Arch, no? (been a long time since I had
>one.)

No, Arch uses its own package manager:

   https://wiki.archlinux.org/title/pacman

Details for the file package in Arch are here:

   https://archlinux.org/packages/core/x86_64/file/

The current version is 5.40-3, and Ralph is right:  the current version
for the same package in Manjaro (which sources packages from Arch, but
releases them from its own repositories after testing) is 5.38-3.

It's 6:02 am in this timezone and I definitely need sleep, but the next
thing to try is to grab the 5.40 package from Arch and see what happens.

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | "If you're not part of the solution,
s...@smwonline.ca |  you're part of the precipitate."
http://smwonline.ca  |
 |- Steven Wright

Re: mhbuild: extraneous information in message

2021-05-12 Thread Steven Winikoff

>I think it's a file(1) bug in the executable, probably known and fixed
>upstream.

That would make sense, and I'll follow it up in that direction (after
getting some sleep :-/).


>What system are you running, e.g. Ubuntu, and its version?

Like Arch, Manjaro is a rolling release, so the number doesn't actually
mean very much -- but for what it's worth,

   $ cat /etc/lsb-release 
   DISTRIB_ID=ManjaroLinux
   DISTRIB_RELEASE=21.0.4
   DISTRIB_CODENAME=Ornara
   DISTRIB_DESCRIPTION="Manjaro Linux"

I do keep up with updates, and as I write this there are none pending.

 - Steven
-- 
_______
Steven Winikoff  | "The difference between the right word and
Montreal, QC, Canada |  the nearly right word is the difference
s...@smwonline.ca |  between the lightning and the lightning
http://smwonline.ca  |  bug."
 |   - Mark Twain

Re: mhbuild: extraneous information in message

2021-05-12 Thread Steven Winikoff

>My debian distro looks for such things in /etc/mailcap.  I'd look there first.

Thanks!

...but I suspect that may be a Clupea harengus of the crimson variety :-),
partly because it was last modified more than a year ago:

   $ ls -l /etc/mailcap
   -rw-r--r-- 1 root root 272 May  5  2020 /etc/mailcap

...but mostly because it's almost empty:

   $ cat /etc/mailcap
   ### 
   ### Begin Red Hat Mailcap
   ###
   
   audio/*; /usr/bin/xdg-open %s
   
   image/*; /usr/bin/xdg-open %s
   
   application/msword; /usr/bin/xdg-open %s
   application/pdf; /usr/bin/xdg-open %s
   application/postscript ; /usr/bin/xdg-open %s
   
   text/html; /usr/bin/xdg-open %s ; copiousoutput

I'm not going to speculate on why an Arch-derived distribution has an
/etc/mailcap sourced from Red Hat. :-/

Just for fun I tried

   $ /usr/bin/xdg-open /tmp/session2.mp3

I'm not sure why xdg-open decided that I want to open .mp3 files in
clementine (that's a question for another time), but in fact it did
so with no output other than debug messages from clementine (and why
so many debug messages are emitted is also a question for another
time).

 - Steven
-- 
___
Steven Winikoff  |
Montreal, QC, Canada | For clarity in writing, be careful about
s...@smwonline.ca | word selection.  For example, never
http://smwonline.ca  | utilize 'utilize' when you can use 'use'.

Re: mhbuild: extraneous information in message

2021-05-12 Thread Steven Winikoff

>The complaint about ‘/octet-stream’ coupled with the trailing
>‘application’ after ‘audio/mpeg’ looks like two things are being
>combined, e.g. ‘audio/mpeg application/octet-stream’.

That makes sense.


>- How do you attach the MP3 file?

By typing "at /path/to/file.mp3" at the whatnow? prompt.

...and I just checked the man page for whatnow and discovered the -v option:

   What now? at -v /tmp/session2.mp3
   Attaching /tmp/session2.mp3 as a audio/mpegapplication/octet-stream

   What now? s

(I didn't actually send anything just now, but that's what would follow).

The relevant .mh_profile entries (at least, the ones I recognize as being
relevant) are:

   comp:   -form .compform
   send:   -msgid -messageid random -alias .aliases -port 25
   mhbuild:-maxunencoded 500


>- Can we see a draft before mhbuild gets run?

Sure, here's one:

   8<--   cut here   >8
   To: s...@smwonline.ca
   Subject: foo
   Fcc: inbox
   From: Steven Winikoff 
   Reply-to: Steven Winikoff 
   Content-Type: text/plain; charset="UTF-8"
   Nmh-Attach: /tmp/session2.mp3
   

   --
   _______
   Steven Winikoff  |
   Montreal, QC, Canada | "The worst misunderstandings are the
   s...@smwonline.ca |  unspoken ones."
   http://smwonline.ca  |
|  - Spider Robinson
   8<--   cut here   ->8

But I don't think you'll need it, because...


>- What does ‘file -i’ give on the MP3 file?

...Aha!  You nailed it:

   $ file -i /tmp/session2.mp3
   /tmp/session2.mp3: audio/mpegapplication/octet-stream; charset=binary

...and similarly,

   $ file --mime-type /tmp/session2.mp3
   /tmp/session2.mp3: audio/mpegapplication/octet-stream

I hadn't known about the -i option until you suggested it, and I found
--mime-type just now while looking up -i.  With no options, file reports

   $ file /tmp/session2.mp3
   /tmp/session2.mp3: Audio file with ID3 version 2.4.0, contains:MPEG ADTS, 
layer III, v1, 64 kbps, 48 kHz, Stereo

Running strace on file lists the following openat() calls:

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libmagic.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libseccomp.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/liblzma.so.5", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libbz2.so.1.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, 0x555892c2d4f0, O_RDONLY) = 3
openat(AT_FDCWD, 0x153f7cb54848, O_RDONLY) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, 0x7ffc2a816a10, O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, 0x7ffc2a819209, O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 3

...which includes none of the files I expected to see.  The magic database
on this system is /usr/share/file/misc/magic.mgc (the text version is
supported by libmagic, but doesn't exist), and suspiciously, it was
modified by a recent system upgrade:

   $ ls -l /usr/share/file/misc/magic.mgc
   -rw-r--r-- 1 root root 7012776 Apr 12 12:20 /usr/share/file/misc/magic.mgc

I have a backup copy from before the upgrade:

   $ ls -l /path/to/backup/of/misc/magic.mgc
   -rw-r--r-- 6 root root 6652192 Jun 16  2020 /path/to/backup/of/magic.mgc

But:

   $ file -i -k -m /path/to/backup/of/misc/magic.mgc /tmp/session2.mp3
   /tmp/session2.mp3: audio/mpegapplication/octet-stream; charset=binary

...and the atime reported by stat(1) confirms that the backup file was
accessed, so there's still something I'm obviously missing.


>- What's ‘folder -version’ yield?

   $ folder -version
   folder -- nmh-1.7.1 built 2019-12-16 03:09:06 + on mort

 - Steven
-- 
___
Steven Winikoff  | "The best executive is one who has sense
Montreal, QC, Canada |  enough to pick good people to do what he
s...@smwonline.ca |  wants done, and self-restraint enough to
http://smwonline.ca  |  keep from meddling with them while they
 |  do it."
 |  - Theodore Roosevelt

mhbuild: extraneous information in message

2021-05-12 Thread Steven Winikoff

Recently I've been seeing this message when sending email with an attached
.mp3 file:

   mhbuild: extraneous information in message /home/smw/Mail/drafts/1's 
Content-Type: field
 (/octet-stream)

I've appended the Fcc copy of a typical message, in which the Content-Type:
field for the attachment is

   Content-Type: audio/mpegapplication; name="session2.mp3"

I'm quite sure this isn't nmh's fault, but rather an error in the MIME
configuration on my Manjaro Linux machine.  I'm hoping for a hint about
where to look for the config file responsible for "audio/mpegapplication;".

   Thanks,

 - Steven


8<-   cut here   >8
To: s...@smwonline.ca
Subject: testing testing testing
From: Steven Winikoff 
Reply-to: Steven Winikoff 
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="- =_aa0"
Content-ID: <408771.1620805880.0@mort>
Date: Wed, 12 May 2021 03:51:21 -0400
Message-ID: <408773-1620805881.053...@ksvt.ghqa.ia-w>

--- =_aa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <408771.1620805880.1@mort>

 - Steven
-- 
___
Steven Winikoff  | "There are millions of chords. There are
Montreal, QC, Canada |  millions of numbers. And everyone forgets
s...@smwonline.ca |  the one that is a zero. But without the
http://smwonline.ca  |  zero, numbers are just arithmetic. Without
 |  the empty chord, music is just noise."
 | - Terry Pratchett

--- =_aa0
Content-Type: audio/mpegapplication; name="session2.mp3"
Content-Description: session2.mp3
Content-Disposition: attachment; filename="session2.mp3"
Content-Transfer-Encoding: base64

SUQzBAAAI1RTU0UPAAADTGF2ZjU4Ljc2LjEwMAAA//tU
[many more lines of base64 encoding deleted to save electrons :-)]
--- =_aa0--
8<-   cut here   >8

Re: displaying Date using local timezone

2021-05-03 Thread Steven Winikoff

>I had an inkling that it might be bad for NMH to try to handle
>DST calculations on its own;

Tom Scott would agree:

   https://www.youtube.com/watch?v=-5wpm-gesOY

This is probably the best explanation I've ever seen of why time
zones and DST calculations induce madness.

 - Steven
-- 
___
Steven Winikoff  | "Some men see things as they are and ask
Montreal, QC, Canada |  'Why?'.  I dream things that never
s...@smwonline.ca |  were and ask, 'Why not?'."
http://smwonline.ca  |
 |- Robert F. Kennedy

Re: coming back to (N)mh after a 15 year hiatus..

2021-04-09 Thread Steven Winikoff

>Have you tried orgrow yet?  If not, it's a swiss army knife of an 
>application and may help you out.

Speaking for myself, this is the first I've heard of it, and I haven't
been able to find any information via web search; the search lists are
full of references to marijuana and fertilizer, which I somehow suspect
isn't what you meant. :-)

Would you be willing to share a link?

  Thanks,

 - Steven
-- 
___
Steven Winikoff  | "The man who has ceased to learn ought
Montreal, QC, Canada |  not to be allowed to wander around
s...@smwonline.ca |  loose in these dangerous days."
http://smwonline.ca  |
 |  - M. M. Coady

Re: [nmh-workers] logging outgoing messages

2019-07-10 Thread Steven Winikoff

>But for the larger issue of whether or not you should submit email to
>your own SMTP server or your email provider's ... well, obviously my
>OPINION is that you should submit it to your email provider's server
>directly from nmh (see previous emails on why I think this).  But plenty
>of people disagree with me on this, and that's fine.  If you're the sort
>of person who doesn't have a problem configuring your own SMTP server,
>then fine, you should do that!

Thank you.  Between this and other comments, I've decided to revert to
having post communicate directly with my local SMTP server.


>But I think recommending that to people is a mistake; it creates the
>impression that you need to run your own SMTP server to use nmh, and that
>is absolutely not true.

Understood.  I'm comfortable enough with sendmail that running it on my
home system isn't a  problem, but I'm well aware that many people would
prefer not to have to do that.

 - Steven
-- 
_______
Steven Winikoff|
Montreal, QC, Canada   | "Stars are facts; constellations are
s...@smwonline.ca   |  theories."
http://smwonline.ca| - Michael F. Flynn

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] logging outgoing messages

2019-07-10 Thread Steven Winikoff

>I agree with that, and even when ifdef's are added, they should be
>positive, not double negative, so
>   #ifndef NOSYSLOG
>is just perferse,

Of course it is.  As I mentioned in my previous message...


>   #ifdef  USE_SYSLOG
>would work just as well (it does mean the name needs to be explicitly
>defined to get the new code,

...I was just too lazy to do that for a proof of concept.  There's no
question that you're right if such a patch were to be added in production
while using #ifdef


>  | - It is not clear to me that you can state with certainly that the
>  |   250 response code will contain the queue identifier
>
>No, you can't, but these days it almost always does.

That matches my experience.


>Personally, I'd just suggest keeping the local MTA, having post deliver
>to that, and let it do the logging

That's exactly what I've always done, from time immemorial until just
about two weeks ago.

Ironically enough I actually prefer to do it this way, but I was under the
impression that this is deprecated in modern configurations.  I'd be happy
to be wrong about that.

 - Steven
-- 
_______
Steven Winikoff|
Montreal, QC, Canada   |  Don't use no double negatives.
s...@smwonline.ca   |
http://smwonline.ca|

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] logging outgoing messages

2019-07-10 Thread Steven Winikoff

>>Is there any interest in adding an improved version of this to the code
>>base?
>
>So ... maybe?  But, some thoughts.

Thank you (and everyone else!) for taking the time to reply to this.

Before I say anything else, I never meant to ask for my patch to be
incorporated as-is -- I know there are many ways in which it would
need to be improved for production use.

I sent it mostly as a proof of concept (it's currently just barely
good enough to do what I personally need :-/), and partly in hopes
it might help anyone else if something like it isn't added to nmh
itself.


>- We don't, in general, want to have any more #ifdefs in the code unless
>  they are completely unavoidable (e.g., operating system differences or
>  optional third-party libraries like OpenSSL).  So this would require
>  some run-time configuration.

Understood, of course.  I used those mostly as an easy way to mark the
code I added -- and for those wondering why I chose to write them in
the negative, that was purely out of laziness (so that I didn't have to
add -DSYSLOG to the configure process).

Again, this was never intended for production use, and I apologize if I
didn't make that clear originally.


>- It is not clear to me that you can state with certainly that the
>  250 response code will contain the queue identifier (that is, in
>  fact, not a concept that appears anywhere that I can find in the SMTP
>  RFCs).

That's unfortunate.  I've mostly worked with sendmail, and I've never
seen a case where the QID wasn't sent back to the originating MTA, so
I wasn't aware that the RFCs don't require that behaviour.


>  As a practical matter I've never had to give anyone the queue
>  identifier of a message (because it's not normally logged on the
>  client; really, most people are happy with recipients and a time, and
>  they are really happy if you have a message-id).

That doesn't match my experience.


>I think this should be a lot more generic.  So ... an alternate proposal.
>
> [ details snipped for brevity, but the summary is be to create a
>   "post hook" and use that instead ]

I'd have no problem with that as long as the post hook provides the same
information gathered in my patch (i.e., sender and recipient addresses,
message ID, relay server and port, and resulting status and QID).

 - Steven
-- 
___
Steven Winikoff| "...and every single one of them wanted
Montreal, QC, Canada   |  to be involved in the decision-making
s...@smwonline.ca   |  process without necessarily going
http://smwonline.ca|  through the intelligence-using process
   |  first."  - Terry Pratchett

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

[nmh-workers] logging outgoing messages

2019-07-09 Thread Steven Winikoff

 : "RCPT TO:<%s%s>",
 FENDNULL(path), mbox, host)) {
@@ -717,6 +776,19 @@
 }
 
 for (bp = buffer; bp && len > 0; bp++, len--) {
+#ifndef NOSYSLOG
+if (strncmp(bp, "Message-ID: ", 12) == 0)
+{
+   int i;
+
+   (void)strncpy(syslog_msgid, bp + 12, SYSLOG_FIELD_SIZE);
+   for (i=0; i
 
+#ifndef NOSYSLOG
+   #include 
+#endif
+
 #include 
 
 #ifndef CYRUS_SASL
@@ -1760,6 +1764,15 @@
 }
 
 fflush (stdout);
+
+#ifndef NOSYSLOG
+openlog("nmh_smtp", LOG_PID, LOG_MAIL);
+syslog(LOG_NOTICE,
+   "from=%s, to=%s, msgid=%s, relay=%s, port=%s, stat=%s",
+   syslog_from, syslog_to, syslog_msgid, syslog_server,
+   syslog_port, syslog_qid);
+closelog();
+#endif
 }
 }
 
8<-   cut here   >8
-- 
___
Steven Winikoff|
Montreal, QC, Canada   | "I'd love to go out with you, but I want
s...@smwonline.ca   |  to spend more time with my blender."
http://smwonline.ca|
   |- fortune(6)

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Can't forward MIME-encoded message

2019-05-09 Thread Steven Winikoff

>>I didn't know forw had a -mime switch.  Since this is something I'd find
>>very helpful, I just tried it, but it completely failed to work for me.
>
>forw -mime doesn't have a wonderful interface; what it does is generate
>a mhbuild directive and it puts it in the draft message.  You then have
>to run "mime" on the resulting draft for the right thing to happen.

Thank you!  I just tried that, and it worked perfectly.


>This is actually all covered in the man page for forw(1); let me know if
>it is unclear.

No, it's clear enough; I just didn't think to read the man page until you
pointed it out just now.


>I'm not defending this practice; it's the way it's always worked and I am
>unable to come up with a better solution at this time.  Maybe someday ...

That's okay; the two-step process is still much easier than what I'd been
doing until now.

 - Steven
-- 
_______
Steven Winikoff| "Sometimes I think we're alone in the
Montreal, QC, Canada   |  universe, and sometimes I think we're
s...@smwonline.ca   |  not.  In either case, the idea is quite
http://smwonline.ca|  staggering."
   | - Arthur C. Clarke

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Can't forward MIME-encoded message

2019-05-09 Thread Steven Winikoff

>When you say, "Forward", what _EXACTLY_ do you mean?
>
>I just 3 different things with the latest nmh, and they all behaved
>"as designed".
>
>- When I forwarded a message with forw -mime, it generates a new message
>  with an message/rfc822 MIME type; the original message contained
>  in the message/rfc822 has the correct Content-Type header

I didn't know forw had a -mime switch.  Since this is something I'd find
very helpful, I just tried it, but it completely failed to work for me.

The message I tried to forward is described as follows:

   # mhlist +wing 3943
msg part  type/subtype  size description
   3943   multipart/mixed16M
1 text/plain 326
2 application/pdf  3836K a_night_at_the_ballet_2nd.pdf
3 application/pdf  1098K light_vibrations_2nd.pdf
4 application/pdf  2092K sinatra_in_concert_2nd.pdf
5 application/pdf  3629K a_night_at_the_ballet_3rd.pdf
6 application/pdf  1100K light_vibrations_3rd.pdf
7 application/pdf   389K sinatra_in_concert_3rd.pdf

My test was invoked as follows:

   # forw -mime +wing 3943

The resulting message is

   8<-   cut here   >8
   From s...@smwonline.ca  Thu May  9 13:06:39 2019
   Return-Path: 
   Received: from mort (localhost.localdomain [127.0.0.1])
   by 206-248-137-116.dsl.teksavvy.com (8.15.2/8.15.2/Debian-10) with 
ESMTP id x49H6dIA003943
   for ; Thu, 9 May 2019 13:06:39 -0400
   To: smw
   Subject: testing /local/paths/forw -mime
   From: Steven Winikoff 
   Reply-to: Steven Winikoff 
   MIME-Version: 1.0
   Content-Type: text/plain; charset="us-ascii"
   Content-ID: <3941.1557421599.1@mort>
   Date: Thu, 09 May 2019 13:06:39 -0400
   Message-ID: <3942.1557421599@mort>
   
   #forw [forwarded message] +/home/smw/Mail/wing 3943
   8<-   cut here   >8

I wondered if this might be caused by a profile entry.

   # grep forw ~/.mh_profile
   forw:   -filter .forwardfilter -form .forwardform
   
...but when I tried deleting this entry, the same thing happened:

   8<-   cut here   >8
   From s...@206-248-137-116.dsl.teksavvy.com  Thu May  9 13:08:15 2019
   Return-Path: 
   Received: from mort (localhost.localdomain [127.0.0.1])
   by 206-248-137-116.dsl.teksavvy.com (8.15.2/8.15.2/Debian-10) with 
ESMTP id x49H8F46004123
   for ; Thu, 9 May 2019 13:08:15 -0400
   From: Steven Winikoff 
   To: smw
   Subject: Re: FW: A Night at the Ballet - 2 (fwd)
   MIME-Version: 1.0
   Content-Type: text/plain; charset="us-ascii"
   Content-ID: <4121.1557421695.1@mort>
   Date: Thu, 09 May 2019 13:08:15 -0400
   Message-ID: <4122.1557421695@mort>
   
   #forw [forwarded message] +/home/smw/Mail/wing 3943
   8<-   cut here   >8

Here the only difference is the Subject: header; in my first test the
default was empty, and in the second test I left the default value
unchanged.

Please let me know if there's any additional information I can supply
about this.

 - Steven
-- 
___
Steven Winikoff| "I knew 'Enterprise Computing Systems' were
Montreal, QC, Canada   |  evil before I touched an actual computer
s...@smwonline.ca   |  for the first time, because I used to
http://smwonline.ca|  watch Kirk and Spock fighting for control
   |  of it."  - Anthony de Boer

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-24 Thread Steven Winikoff

>Does the full path to mhn.defaults shown by "man mhfixmsg" match
>/local/pkg/nmh/root-nmh-1.7/etc/mhn.defaults ?

Yes.


>If it does, maybe run mhfixmsg under ltrace or something similar to see
>exactly what file it's trying to open.

I used the strace command Ralph suggested (strace -fe open,openat), and
that solved it.

The problem was that I had a personal mhn.defaults file, and mhfixmsg was
reading that (which I expected) but then not reading the system version
(which I didn't expect -- I would have expected the system one to be
read first unconditionally, to be supplemented and/or overridden by the
personal file).

Ironically, the personal mhn.defaults in question isn't needed and
shouldn't have been there anyway; it's an artifact of the transition
I'm going through right now, from an older, about-to-be-decommissioned
server with nmh-1.6 to my desktop machine running 1.7.

With the personal mhn.defaults file deleted mhfixmsg works as expected
using the system version.


>> >> I thought Ken said the RFC 5322 limit was 998.  But...
>> >
>> >Right.  He also noted that he's had problems with insertion of '!' in long
>> >lines of HTML.
>>
>> What about the idea of reformatting the text/html part to reduce the line
>> width?
>
>Then -maxunencoded wouldn't be necessary.  Though I'm not sure if you're
>talking about outgoing or incoming messages here.

I'm talking about incoming messages.


>> Is there a way to get mhfixmsg to decode the base64 and then run it through
>> tidy with a given set of command-line options?
>
>Yes, via mhfixmsg-format-text/html.  See the mhfixmsg and mhshow man pages.

I did read those man pages, but perhaps I'm still failing to understand
parts of them.  I do know how mhfixmsg-format-text/html specifies the
command which generates the text/plain part from the text/html part, but
I don't see how to do that and also reformat the text/html part.

     - Steven
-- 
___
Steven Winikoff| "If you have built castles in the air,
Concordia University   |  your work need not be lost; that is
Montreal, QC, Canada   |  where they should be.  Now put
steven.winik...@concordia.ca   |  foundations under them."
   |   - Henry David Thoreau

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-23 Thread Steven Winikoff

>>mhfixmsg: Don't know how to convert /home/smw/Mail/reformatted/17352,
>>  there is no mhfixmsg-format-text/html profile entry
>>
>> ...which makes sense because I don't know what to put in that profile entry.
>
>Is there a mhfixmsg-format-text/html line in your mhn.defaults?

Yes:

   # grep mhfixmsg-format-text/html /local/pkg/nmh/root-nmh-1.7/etc/mhn.defaults
   mhfixmsg-format-text/html: charset=%{charset}; /usr/bin/lynx -child -dump 
-force_html ${charset:+--assume_charset} ${charset:+"$charset"} %F | expand | 
sed -e 's/^   //' -e 's/  *$//'

Of course I can just copy this entry into my .mh_profile, and I'll try that
tomorrow when I have some time -- but it sounds like you're suggesting that
the entry in /local/pkg/nmh/root-nmh-1.7/etc/mhn.defaults should be picked
up directory from there, and that isn't happening.


>> I thought Ken said the RFC 5322 limit was 998.  But...
>
>Right.  He also noted that he's had problems with insertion of '!' in long
>lines of HTML.

What about the idea of reformatting the text/html part to reduce the line
width?  I've been playing with tidy (AKA html-tidy), and it's capable of
transforming the HTML message I received last week from a single line of
42187 characters into a version with 1896 lines with a maximum line width
of 138.

Is there a way to get mhfixmsg to decode the base64 and then run it through
tidy with a given set of command-line options?

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   | "The end of the world will occur at
Montreal, QC, Canada   |  3:00 p.m., this Friday, with symposium
steven.winik...@concordia.ca   |  to follow."
   |- fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-22 Thread Steven Winikoff

>Well, "binary" has a specific meaning in the MIME world.  Specifically,
>it refers to a MIME Content Transfer Encoding of binary, which has no
>restrictions in terms of line length.  So when that message says that
>it can't decode it because the part would have to be binary, THAT is what
>it is referring to.

This helps, but I'm still a bit confused.  (That's an exaggeration; I'm
really still very much confused. :-()

I just looked up Content-Transfer-Encoding header, and found what you
already know (but which I'll repeat here, for the record and for my
own future reference):

   The Content-Transfer-Encoding field is designed to specify an invertible
   mapping between the "native" representation of a type of data and
   a representation that can be readily exchanged using 7 bit mail
   transport protocols, such as those defined by RFC 821 (SMTP). This field
   has not been defined by any previous standard. The field's value is
   a single token specifying the type of encoding, as enumerated below.
   Formally:

   Content-Transfer-Encoding := "BASE64" / "QUOTED-PRINTABLE" /
"8BIT"   / "7BIT" /
"BINARY" / x-token

...so when a message clearly contains

   Content-Transfer-Encoding: base64

shouldn't that mean we don't need to test the decoded content to see if
it's binary or not?  You just said in your previous message that there's
no line length restriction in the content after decoding.


>But David points out that if you tell it to, mhfixmsg will happily
>generate such messages (but the documentation does caution you that the
>resulting messages may not be readable with nmh).

That's good to know, but I really have no plans to create out-of-spec
messages; I just want to be able to read the messages I'm receiving, and
you clearly explained that I should be able to do that, because the encoded
form follows the RFC specification and the decoded form doesn't have to.

Or at least that's what I thought you said.


>Our only general-purpose nmh list is nmh-workers; plenty of people on it
>are not coders, so please don't be concerned on that score.

Thanks.  I've just subscribed.

 - Steven
-- 
___
Steven Winikoff| "Nature is by and large to be found out
Concordia University   |  out of doors, a location where, it
Montreal, QC, Canada   |  cannot be argued, there are never
steven.winik...@concordia.ca   |  enough comfortable chairs."
   |- Fran Leibowitz

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-22 Thread Steven Winikoff

>Are you saying you received via SMTP a RFC5322 message where there
>was 42027 characters between CR-LF pairs?

I think I might have said that :-/, but whether I did or not you're right
that it isn't what I meant.


>That suggests to me that you in fact received a message that had lines no
>greater than 78 characters between CR-LF pairs, and _after you decoded it_
>it might have had a very long line.

Exactly.

But that's also the situation with the message I received today which
sparked my original question.  That one had only one part, described
with:

   Content-Transfer-Encoding: base64
   Content-Disposition: inline
   Content-Type: text/html;
   charset="UTF-8"
   MIME-Version: 1.0

Before decoding, the body width was 76 characters (some of the headers were
wider, even those were all under 200 characters wide) -- but when I tried to
decode it, this happened:

mhfixmsg: /tmp/msg, will not decode text/html;  charset="UTF-8" because it 
is binary (line length > 998)

...but (line length > 998) refers to the decoded text, which really is more
than 998 characters wide.  This is what I was originally asking about (or
trying to :-/, and I apologize for not being clear on that point).


>THAT is completely legal according to the RFCs.  For the most part, it
>doesn't matter what it decodes to; what nmh cares about is that the
>message it is reading is valid according to RFC 5322.  THAT is where the
>998 byte line length limit comes into play.  You could send the entirety
>of "War and Peace" in text/plain part all as one line, and as long as it
>was encoded properly that would be fine.

This suggests to me that removing the 998-character limit in mhfixmsg
(only, and nowhere else) is a reasonable thing to do.

The comment in mhfixmsg which I quoted at the beginning of this thread
seems to be saying that sometimes message components described as text/*
are really binary files, and that the 998-character limit is used in
mhfixmsg (only) as a heuristic to identify this situation.


>>But you're quite right that this code isn't easy to understand.  If I were
>>to modify uip/mhfixmsg.c without touching sbr/m_getfld.c, am I risking
>>anything other than generating messages that nmh won't be able to read?
>
>Good question!  Your use cases seem to be ... well, I don't understand
>them.

That's because I keep being unclear, which in turn is because I don't
know enough to be clearer -- though I'm learning a lot just from this
discussion. :-)

My use case is simply that people keep sending me messages which decode
to HTML with horribly long lines, and I'd prefer to save the decoded text
rather than the encoded version[*].

(Digression:  I'd also prefer to reformat the long lines at the same time.
I'm seriously considering piping the decoded HTML through something like
tidy [ http://www.html-tidy.org/ ] before saving it. :-/)

As it happens, I have 

   mhbuild:  -maxunencoded 900

in my .mh_profile, and have had for a while.

This is a coincidence, in that I was unaware of the 998-character limit,
until today, but happily I'm under it anyway. :-)

...so if I were to quote text with wider lines than that the right thing
would happen -- although in practice if I were to quote text with lines that
long, I'd almost certainly run them through fmt first.


>And might I suggest that if you're going to keep asking us questions
>about nmh, you should join the mailing list? :-)

I'd be happy to, as long as it wouldn't be considered as a commitment to
work on the code -- not that I'm opposed to that in principle, but I think
I've already demonstrated I'm not competent to step in and do anything
useful. :-(

The only reason I've been writing to nmh-workers is that I'm unaware
of anywhere else to turn.  Is there a corresponding nmh-users list or
something similar?

 - Steven


[*]
That's because one of the biggest reasons for using nmh, at least for
me, is that it's so useful to be able to manipulate saved email with
standard command-line tools.

For example, I particularly depend on being able to find specific saved
messages using grep or mairix[**] -- and if the message body is saved in
base64 encoding, both of those programs fail completely.


[**]
http://www.rpcurnow.force9.co.uk/mairix/

-- 
___
Steven Winikoff|"Garfield is, for my money at least, the
Concordia University   | shining exemplar of that productive
Montreal, QC, Canada   | laziness that gave us flush plumbing,
steven.winik...@concordia.ca   | clothes washers, dish washers, electric
   | lights, and automated guitar string
   | factories." - Mike Andrews

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-22 Thread Steven Winikoff

>To answer your larger question (on the subject line):
>
>- MH/nmh doesn't handle lines greater than 998 characters because such
>  messages are not valid according to RFC 5322, and mhfixmsg isn't going
>  to generate a message that nmh cannot handle.  Whether or not nmh SHOULD
>  handle such messages is a different question.

Thank you, that helps.

And I won't presume to suggest what nmh should do, but I will point out
that I recently received a message with a text/html part which was one
single line of 42027 characters.  Clearly there are at least some senders
who have as much respect for RFC 5322 as Microsoft has for standards in
general. :-/

But I'm confused, because I didn't have any problems reading that message.
The structure on it is as follows:

 msg part  type/subtype  size description
   4   multipart/alternative2213K
 1 multipart/related2211K
 1.1   text/html  41K
 1.2   image/jpeg 28K
 1.3   image/jpeg 42K
 [...]
 1.33  image/jpeg 350
 2 text/plain 808

...and part 1.1 has these headers:

   --Apple-Mail=_7C2BA5CB-FA71-4036-9FAD-C693FF38AF09
   Content-Type: multipart/related;
   type="text/html";
   boundary="Apple-Mail=_B4252506-2E52-4348-A3AD-C92C9A9FBD3D"

   --Apple-Mail=_B4252506-2E52-4348-A3AD-C92C9A9FBD3D
   Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html;
   charset=us-ascii

This part is 670 lines before decoding, and exactly one line afterward.
This arrived before I started using mhfixmsg, but given what I've just
learned I'd certainly expect mhfixmsg to refuse to decode it.


>- The line length limit is imposed by m_getfld(), and that function is ...
>  hairy.  I think changing that might have unexpected consequences; it
>  might be fine, but I don't make any guarantees.  But the fact you said
>  you could "easily modify" it suggests to me that you have not actually
>  LOOKED at the code in question :-)

What I'd looked at was the content_encoding() function in uip/mhfixmsg.c,
where there are a few instances of literal 998 which really would be easy
to change.

You're right that I hadn't looked at the larger context, mostly because
I didn't know there was one.  This is the main reason why I asked before
doing anything.

I just took a quick look at sbr/m_getfld.c.  The first thing that struck me
was this comment at lines 158-163 (of the 1.7 version):

   [...] I considered
   using a Vax "scanc" to locate the end of the field followed by a
   "memmove" but the routine call overhead on a Vax is too large for this
   to work on short names.  If Berkeley ever makes "inline" part of the
   C optimiser (so things like "scanc" turn into inline instructions) a
   change here would be worthwhile.

I'm beginning to get a sense of (and becoming impressed by) just how old
this code base is.

But you're quite right that this code isn't easy to understand.  If I were
to modify uip/mhfixmsg.c without touching sbr/m_getfld.c, am I risking
anything other than generating messages that nmh won't be able to read?

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | Celibacy is hereditary.  If your parents
Montreal, QC, Canada   | didn't have children, chances are you
steven.winik...@concordia.ca   | won't either.

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

[Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-22 Thread Steven Winikoff

I'm in the middle of integrating mhfixmsg (1.7) into my proxmail setup, and
just discovered this behaviour:

   # mhfixmsg -verbose -textcharset utf8 -fixcte -noreformat -fixboundary -file 
/tmp/msg -outfile /tmp/501.fm
   mhfixmsg: /tmp/msg, will not decode text/html;  charset="UTF-8" because it 
is binary (line length > 998)

The comments in the code where this happens are as follows (lines
2118-2123):

   /*
* See if the decoded content is 7bit, 8bit, or binary.  It's binary
* if it has any NUL characters, a CR not followed by a LF, or lines
* greater than 998 characters in length.  If binary, reason is set
*  to a string explaining why.
*/

I can certainly understand this in the general case, but:

   - The case which tripped this for me specifically involved a text part;
 the headers in the message were:

Content-Transfer-Encoding: base64
Content-Disposition: inline
Content-Type: text/html;
charset="UTF-8"
MIME-Version: 1.0

 The message structure was as simple as it gets:

# mhlist -file /tmp/msg
 msg part  type/subtype  size description
   0text/html  20K

 ...so it was clearly marked as text.  If a sender packages a binary
 file but describes it as text/html, it's already broken, and I really
 don't care if mhfixmsg "damages" it even further. :-/

   - More and more senders these days are using auto-generated HTML in
 which the entire body is a single line of text.  This message wasn't
 even one of those, but the point is that HTML with very long lines
 isn't unusual anymore.

 Instead of leaving it base64-encoded, arguably the right thing to do
 with something like that is to decode it and then run it through an
 HTML pretty-printer, although I acknowledge that that's beyond the
 scope of what mhfixmsg is designed to do. :-)

So I guess what I'm asking is:  I can easily modify my copy of nmh to raise
the 998-character limit, but it's not clear to me what I might break by
doing so.  Would someone please explain what I'm missing here?

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   | "Peter's Principle of Success:
Montreal, QC, Canada   | Get up one more time than
steven.winik...@concordia.ca   | you're knocked down."
   |  - fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-21 Thread Steven Winikoff

>>#  fixdate -- fix the time zone on a Date: header in an email message
>
>Forgive me if this is a dumb question, but ... why do you care what the
>timezone is in your Date: headers?
>
>If you want them to appear in your local timezone when they are displayed,
>that is trivial to do with mh-format(5)

To begin with, I didn't know that mh-format had a date2local funtion, so
that would be the main reason why I'm not using it. :-)

But even now that I do know, I still value having the local timezone stored
in the file.  That's because it's not uncommon for me to read an entire
message into a file (for example, when the body of an email message is an
explanation of how to do something and I want to save that explanation for
posterity, including the email headers to show its provenance), and it's
nice to have the local timezone in the file without having to convert it
manually.

I might feel differently if receiving messages from other time zones was
something that happened only once in a while, as in fact it was for most
of my career.

...but a few years ago Concordia moved to Exchange for its central email
system, and that stamps every message which passes through it in UTC.

In general I very much favour the principle of storing times in UTC and
converting to local time for display, but this (for me, at least) is an
exception.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   |  Today is a good day for making decisions.
Montreal, QC, Canada   |  
steven.winik...@concordia.ca   |  ...or is it?

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-20 Thread Steven Winikoff

>So it seems as if your method of storing David's patch, and of quoting
>his email to reply to it, have both turned «'\''» into «'''».

That's right.


>Hopefully, this is some home-brew script rather than core nmh, but if
>it's the latter then we'd like to know.  :-)

It's nothing to do with nmh, and coincidentally I discovered the problem
myself about three days ago during an email discussion of mounting CIFS
shares on Linux.

The culprit was the appended shell script, shown here in its fixed version.
Specifically, the read statements weren't using the -r option.

My signature quotes are chosen at random from a collection, including the
one on this message.  But sometimes the random choice throws up something
appropriate. :-)

 - Steven


8<-   cut here   >8
#!/bin/sh
#
#  fixdate -- fix the time zone on a Date: header in an email message
#
#  Steven Winikoff 2012/12/12
#
#  it's annoying to view Date: headers marked in a different time zone;
#  that annoyance wasn't important in a world where invalid time zones were
#  infrequent, but EVERY SINGLE MESSAGE from Concordia's new Exchange
#  servers is stamped in UTC :-(
#
#  usage:  fixdate < message
#
#where standard input is the mail message to be fixed; the (possibly
#modified) message will be echoed to standard output
#
#  exit status:  0 if the date was modified, or 1 otherwise
#
#--
#  helper function:  when are two date strings equal? :-)
#
# normally this would be just a simple text comparison, but some
# systems present date stamps such as this one:
#
#Mon, 4 Jun 2012 14:24:06 -0400
#
# this gets canonicalized by /bin/date as follows:
#
#Mon, 04 Jun 2012 14:24:06 -0400
#
# ...and we don't want to bother rewriting the date in this case, so
# detect and eliminate it:

function samedate()
{
   #-- dispose of the simplest case first :-)

   [ "${1}" = "${2}" ] && return 0


   #-- the next simplest case occurs when the original date has a suffix;
   #   for example, "Tue, 26 Jun 2012 01:13:26 -0400 (EDT)", which should
   #   be treated as equal to "Tue, 26 Jun 2012 01:13:26 -0400"

   truncated="`echo \"${1}\" | cut -c1-31`"

   [ "${truncated}" = "${2}" ] && return 0


   #-- if the day number has no leading zero, these dates are definitely
   #   different:

   possible_zero="`echo \"${2}\" | cut -c6`"
   [ "${possible_zero}" = "0" ] || return 1


   #-- if we're still here, these dates may be identical except for the
   #   leading zero:

   rest="`echo \"${2}\" | cut -c1-5,7-`"
   test "${1}" = "${rest}"
}


#--
#  process message headers, one line at a time:

IFS='
'
while read -r line
do
   #-- have we reached the end of the headers yet?

   if [ -z "${line}" ]
   then
  echo
  break
   fi


   #-- if we're here, this line is a header:

   start="`echo \"${line}\" | cut -c1-6`"
   if [ "${start}" != "Date: " ]
   then
  #-- not a Date: header, so just blat to standard output:

  echo "${line}"
   else
  #-- convert to our time zone:

  old="`echo \"${line}\" | unqp | sed 's/^Date: //'`"
  new="`date -d \"${old}\" -R`"

  if samedate "${old}" "${new}"
  then
 #-- already correct:

 echo "${line}"
  else
 #-- use the new date, but keep the old one also:

 echo "Date: ${new}"
 echo "X-Original-Date: ${old}"
  fi
   fi
done


#--
#  now read and emit the body:

while read -r line
do
   echo "${line}"
done

exit 0
8<-   cut here   >8
-- 
___
Steven Winikoff| "I really hate this dumb machine; I wish
Concordia University   |  that they would sell it.  It never does
Montreal, QC, Canada   |  quite what I mean, but only what I tell
steven.winik...@concordia.ca   |  it!"
   |- fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-18 Thread Steven Winikoff

>I'm a little surprised, I thought that mhstore would store the HTML without
>any modification because it just copies the bytes.

It does.


>the original message to verify?  I just tried it on a message here and the
>HTML content was preserved, even an img tag.

Your timing is good:  I just discovered my mistake about a minute before
you sent this. :-)

It turns out that the extracted HTML does contain an  tag -- it's
just that I'd missed it because I was searching for ".png" in the source,
and in fact the tag looks like this:

   

Sure enough, the attachment containing the image has these headers:

   Content-Type: image/png
   Content-Transfer-Encoding: base64
   Content-ID: 

Needless to say, until now I'd never seen "src=cid:" in an  tag, and
hadn't known that was possible.

So I do have everything I need, I just need to put it together.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "The cure for boredom is curiousity.
Montreal, QC, Canada   |  There is no cure for curiousity."
steven.winik...@concordia.ca   |
   |- Dorothy Parker

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-18 Thread Steven Winikoff

>Did you know you can run mhstore(1) under another name, e.g. with a
>symlink, and it uses that to look up the .mh_profile entries.  You could
>have a second ...-show-text/html definition in the normal $MH.

No, I didn't know that.  Thank you for pointing it out!

I'm running into another roadblock with the HTML+images viewer that I'm
trying to put together.

My sample HTML message for this has a single image, and when I read it in
an IMAP client such as K-9 Mail (an Android app which I occasionally use
on my phone), the page comes up with the image in place.

When I extract the HTML portion and attachment from the message with
mhstore, I do get both components, but the HTML portion no longer contains
an  tag to load the image.  Do you have any advice on how to deal with
this?[*]

  Thanks,

 - Steven


[*]
My test message is an example of the main reason why I care about
viewing HTML messages with images intact.

Specifically, the message is a notice from Canada Post that a package
is available to be picked up at my local post office, and the image
is a barcode of the tracking number.

The message text also contains the tracking number, and it's possible
to collect the package without the barcode, but in that case the
counter clerk has to key in the tracking number manually.  I'm trying
to be kind to them by printing the message with the barcode intact,
and I'd prefer not to have to do that by opening the message in some
other mail client.

-- 
_______
Steven Winikoff|
Concordia University   | "In theory, there is no difference
Montreal, QC, Canada   |  between theory and practice.  In
steven.winik...@concordia.ca   |  practice, there is."
   |- Chuck Reid

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-15 Thread Steven Winikoff

>would a new .mh_profile entry that gave arguments for /usr/bin/tr do the 
>trick? that is, could you as a user do the #,! -> _ or even "-d #!" so 
>that the default would be no editing

Yes, that would work for me.  I'm not sure what profile entry would look
like, though; would the right hand side be an actual tr command?  If not,
how would nmh parse the entry?

 - Steven
-- 
_______
Steven Winikoff| "I really hate this dumb machine; I wish
Concordia University   |  that they would sell it.  It never does
Montreal, QC, Canada   |  quite what I mean, but only what I tell
steven.winik...@concordia.ca   |  it!"
   |- fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-15 Thread Steven Winikoff

>The SECOND thing is we now have the ability to place MIME parameters into
>some of those command strings, which are from email messages, which is
>where things are "interesting".  We don't normally do that in anything we
>distribute, I think, but here we have a user that did.

I think this is the key observation, and it stems from the fact that the
original MH predated MIME.

I don't know how most nmh users handle incoming attachments, and part of
my problem is that this isn't really documented anywhere.  MIME handling
improved significantly in 1.6 and even more so in 1.7, but almost all the
online documentation I can find is for 1.4 or older (and in most cases,
*much* older, as in MH 6.8!).

What I'm trying to accomplish is what IMAP provides by default, namely the
ability to see the same messages with the same attachments from more than
one place.  If nmh could adapt to using maildir format all my problems
would just disappear, since there are IMAP servers which also understand
that format -- but that's an entirely different can of particularly ugly
worms, and I'm no more inclined to try to open it than I imagine you are.

But that leaves me wanting to be able to open attachments in MH-formatted
messages from multiple systems, and as of this minute I have something
that already does about 98% of what I'm looking for (and the last 2% is
irrelevant to this discussion, so I won't go into that here).

It's just that what I'm doing works better if I can extract the original
filename for a given attachment, and as you point out that's exactly where
the fun starts.


>My proposal is to simply edit out shell metacharacters (add # and ! like
>David suggested) in those strings.  That seems simple and reasonable to
>me.  Well, maybe replace them with an _ or something.

For what it's worth I'd prefer the "replace them with _" option, but even
without it this would do what I'm looking for.

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   |  Don't use no double negatives.
Montreal, QC, Canada   |
steven.winik...@concordia.ca   |

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-15 Thread Steven Winikoff

>Steven, if you haven't been building from git and want to give it a try,
>please let us know.  Thanks again for reporting it.

I haven't been building from git, but an hour ago I backported your patch
into 1.7's rcvdist and tested that.  It seems to be working perfectly.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "If you're not part of the solution,
Montreal, QC, Canada   |  you're part of the precipitate."
steven.winik...@concordia.ca   |
   | - Steven Wright

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-15 Thread Steven Winikoff

>All of these things are DOABLE, it's just more complicated than it seems
>at first glance, and working it all out requires some careful thought.
>Welcome to programming nmh! :-/

I guess this is why they say that confidence is the feeling you have before
you understand the problem. :-/

Thank you for taking the time to explain all of that.

 - Steven
-- 
_______
Steven Winikoff| "43rd Law of Computing:
Concordia University   |  Anything that can go wr
Montreal, QC, Canada   |  Segmentation violation -- Core dumped"
steven.winik...@concordia.ca   |
   |   - fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-15 Thread Steven Winikoff

>post(8) not reading the profile was a long-standing deliberate design
>decision, and the way the code is implemented it's not possible to
>distinguish between "switches from the profile" and "switches from the
>command line".

This is where it shows that you know the code base and I don't. :-)

Although I don't fully understand this; post already accepts -port on
the command line, the problem is (just) that I can't take advantage of
that because I'm not running post directly.


>Putting things in mts.conf is also a pain; we never really figured out
>a reasonable syntax for port number specification

I don't understand why the syntax would be difficult, though that's
probably only because I'm not familiar with the issues involved.

Still, from the perspective of an outsider who may be unaware of an
obvious reason why this would be a bad idea, what I'd propose is:

   - the (new, optional) mts.conf entry would be specified as

port:  

 for some integer 

   - the entry would be ignored unless mts has a value for which a port
 number is appropriate

   - the specified port number would replace 587 as the default value for
 post, to be overridden if -port NNN is supplied on the command line


>Since rcvdist is deficient here, seems to me the right answer is to fix
>that.

I agree.  My point was just that there seems to be some disagreement about
how best to do that, and that it might be nice to be able to take the time
to discuss it thoroughly enough to reach a consensus.

But yes, fixing rcvdist properly (for an agreed-upon value of 'properly' :-)
in time for 1.7.1 would be preferable.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "Any teacher who _can_ be replaced by a
Montreal, QC, Canada   |  machine, _should_ be."
steven.winik...@concordia.ca   |
   | - Arthur C. Clarke

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-15 Thread Steven Winikoff

>> My immediate problem could be solved by having post check for a -port
>> switch (with value :-) in .mh_profile;
>
>post doesn't read the user's profile,

I know that (though only because you mentioned it in a previous message
yesterday), but...


>so that wouldn't work.

...the intent of my question was about how difficult it would be to change
that and have post read the profile, if only for that one entry.  Or might
it be easier to add a port entry to mts.conf, to complement the mts and
servers entries?

This is in the spirit of a workaround, even if the only reason for doing it
would be to delay having to fix rcvdist until after 1.7.1.

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   |  Boren's Laws:
Montreal, QC, Canada   | (1) When in charge, ponder.
steven.winik...@concordia.ca   | (2) When in trouble, delegate.
   | (3) When in doubt, mumble.

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-15 Thread Steven Winikoff

>But if the user wants to only pass a switch argument to post that does NOT
>take an argument, it's not possible from them to get it right.

Just a thought, but...

My immediate problem could be solved by having post check for a -port
switch (with value :-) in .mh_profile; if doing that wouldn't be too
difficult, would it reduce the urgency of fixing rcvdist and therefore
allow time to decide how to do that in the best possible way?

 - Steven
-- 
___
Steven Winikoff| "You can leave in a taxi.  If you can't
Concordia University   |  get a taxi, you can leave in a huff.
Montreal, QC, Canada   |  If that's too soon, you can leave in a
steven.winik...@concordia.ca   |  minute and a huff."
   |  - Groucho Marx

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-14 Thread Steven Winikoff

>I'm wondering if this is the correct approach.
>
>It seems kind of fragile to me to try quoting these characters, assuming
>we are passing the entire line for mhshow entries to /bin/sh -c, since
>we don't have any idea what that command line looks like

I'm not up to speed on the code in nmh (other than having looked at just
enough of mhshowsbr.c to have proposed the parentheses patch in the first
place).

...but my experience working with /bin/sh in other matters over the years
suggests that the safest thing to do is always to quote shell metacharacters
you aren't deliberately intending to interpret.


>(although ...  I don't think I really understand why Steven is using
>%{name},

I have a script which, given the message part corresponding to an
attachment, copies that attachment to a known directory on the machine
whose console I'm sitting at (which may or may not be the same machine
where nmh is running).

It's certainly possible to use the basename constructed by mhshow, but
I find it more useful to save the attachment under its real name if/when
that name can be determined.  That's why I want the value of %{name} here.

For years I was using single quotes around the values of all of the mhshow
escapes in my .mh_profile, but I recently learned that's not supposed to
be necessary.

...but whether I used single quotes or not, some filenames were causing
problems for me.  That's where this whole discussion began, since the
problem presented itself as an error interpreting ( and ) in the filename.
David convinced me that double-quoting %{name} accomplishes the same goal
as my proposed patch, which is therefore unnecessary.

However, there's certainly still some unexpected behaviour going on; when
I run "%{name}" through an RFC-2047 decoder (using David's suggested usage
of fmttest, or with a standalone python script I tried earlier today, the
entire string passed into my script is single-quoted even though the quote
marks aren't part of the decoded filename.

Whether this (or anything else I may not have run into :-) is actually
a problem which needs to be solved is something I'll leave to you and
others who know the code better than I do.


>I really think to be safe we should simply replace any shell
>metacharacters for those things, because I can imagine some nasty
>security holes that we might encounter.

That's a stronger version of what I was trying to say above. :-)

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "This sentence contradicts itself;
Montreal, QC, Canada   |  well, no, actually it doesn't."
steven.winik...@concordia.ca   |
   |- Douglas Hofstadter

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-14 Thread Steven Winikoff

>That's not right, it should be:
>
>while ((pp = strchr (pp, ''')) && buflen > 3) {

That's what I thought based on your patch.

But it was only after I sent my last message that I noticed the first line
of your patch file:

   diff --git a/uip/mhshowsbr.c b/uip/mhshowsbr.c

...which suggests you really are looking at a newer copy of the source
than I am.


>Seems to me we had another problem with botched patches
>recently.  At this point, I'd say let's not bother with
>it.

No problem.  Thanks for all your help!

 - Steven
-- 
_______
Steven Winikoff| "The reasonable man adapts himself to the
Concordia University   |  world; the unreasonable one persists in
Montreal, QC, Canada   |  trying to adapt the world to himself.
steven.winik...@concordia.ca   |  Therefore all progress depends on the
   |  unreasonable man."
   |  - George Bernard Shaw

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-14 Thread Steven Winikoff

>It's attached to this message.

I got it, but I'm not sure I know what to do with it.

What I did was this:

   % cd /local/pkg/nmh/nmh-1.7
   % patch -p1 < /tmp/qpatch

...but this is what happened:

   patching file uip/mhshowsbr.c
   Hunk #1 FAILED at 980.
   Hunk #2 FAILED at 987.
   Hunk #3 succeeded at 989 (offset -1 lines).
   2 out of 3 hunks FAILED -- saving rejects to file uip/mhshowsbr.c.rej

I see that the first hunk is trying to match on

   while ((pp = strchr (pp, ''')) && buflen > 3) {

...but the corresponding line (line 979, not line 980) in my copy of
uip/mhshowsbr.c is

   while ((pp = strchr (pp, '\'')) && buflen > 3) {

Is it possible that you're starting with a newer version of the source
than I am?

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "My interest is in the future because I
Montreal, QC, Canada   |  am going to spend the rest of my life
steven.winik...@concordia.ca   |  there."
   |  - Charles F. Kettering

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-14 Thread Steven Winikoff

>I guess no one has cared up until now.

I find that rcvdist isn't a program I use often, but there are times when
it's exactly what I need.


>I'm not sure if I should thank Steve or curse him for pointing
>this bug out :-)

You're welcome to do both.  I can take a curse or two if it means getting
this fixed. :-)

 - Steven
-- 
_______
Steven Winikoff|"If I traveled to the end of the rainbow
Concordia University   | As Dame Fortune did intend,
Montreal, QC, Canada   | Murphy would be there to tell me
steven.winik...@concordia.ca   | The pot's at the other end."
   |  - Bert Whitney

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-14 Thread Steven Winikoff

>If you want to try the attached patch to mhshowsbr.c,

I'd be happy to, but (ironically in this context :-) I'm not seeing the
attachment.

 - Steven
-- 
___
Steven Winikoff| "Good managers learn to share decisions
Concordia University   |  with others even though they alone must
Montreal, QC, Canada   |  accept responsibility for the results."
steven.winik...@concordia.ca   |
   |- fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] rcvdist with non-default port

2018-01-14 Thread Steven Winikoff

>Wait, 1.6 rcvdist works for you?  It behaves the same as the 1.7 rcvdist
>for me, not passing switch arguments.

Right, but 1.6 post defaults to port 25, so I don't need to pass the port.


>>- If called with -b, it extracts the HTML part to a file and opens that
>>  with a browser.  (Currently I'm doing this by creating a second profile
>>  file with a different value for mhshow-show-text/html, and selecting
>>  that by changing the value of $MH; I consider this to be ugly, but it
>>  works, and it's the only thing I could think of which does.)
>
>Are you using mhshow to store the HTML part?  mhstore should be more direct.
>
>[...]
>
>mhstore -type text/html -type image or something like that?

Thanks!  That's exactly what I'm working on right now. :-)

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   | "It is never too late to be what you
Montreal, QC, Canada   |  might have been."
steven.winik...@concordia.ca   |  - George Eliot

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-14 Thread Steven Winikoff

>That's just where the shell ran into trouble.
>
>The decoded text is:
>
>'SEAO - Résultats d''ouverture (002).pdf

Yes, I see how that explains all the symptoms.  Thank you for being patient
enough to explain that!


>I'm not sure there's a good way to fix this.  Maybe this?
>
>   'SEAO - Résultats d'"'"'ouverture (002).pdf'

I believe that's the right thing to do, and the only real alternative
to removing the ' characters altogether.

Meanwhile, I just ran into an entirely unrelated problem that I hope you
won't mind advising me on.

I don't have access to an SMTP server with an open submission port, so in
1.7 I had to add the '-port 25' option to this .mh_profile entry:

   send:  -alias .aliases -msgid -port 25

This works perfectly.  But last night I tried to use 1.7's rcvdist for the
first time, and ran into this:

   % rcvdist smw < ~/Mail/inbox/18
   post: problem initializing server; [RPLY] 530 5.7.0 Authentication required
   /local/pkg/nmh/root-nmh-1.7/bin/post: exit 1

So of course I added

   rcvdist:  -port 25

But that doesn't help, or at least it doesn't help enough:

   % rcvdist smw < ~/Mail/inbox/18
   post: missing argument to -port
   /local/pkg/nmh/root-nmh-1.7/bin/post: exit 1

I tried adding 

   post:  -port 25

in addition, but resulted in exactly the same error message.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "It is easier to love humanity than to
Montreal, QC, Canada   |  love one's neighbor."
steven.winik...@concordia.ca   |
   |   - Eric Hoffer

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-13 Thread Steven Winikoff

>> The problem was the embedded parentheses, specifically the (002) part of
>> the filename:
>>
>> =?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?=
>
>I still think it's due to the single quote, which confuses the quoting added
>by mhshow.

I understand why that seems likely, especially given the comments in that
part of the code.

But the error message I received specifically mentioned the parentheses, and
while I concede that my patch isn't necessary when putting double quotes
around %{name}, nevertheless that patch did work without the double quotes,
and it did so without touching the actual quote code.

Of course that doesn't necessarily mean I'm right; it only explains why I
think so.


>> This mostly works, but I'm running into quote-handling weirdness.
>
>Maybe the (ab)use of fmttest in the profile was just a bit too fancy.

But fmttest does the right thing outside .mh_profile...

It's not a problem (I just cleaned up the extraneous ' and \ characters
in my mime_handler script).


>> I don't know why I thought these entries are case-sensitive; are they not?
>
>They aren't, we should note that in the map page.  The comparison is done
>with strcasecmp().

Thank you for confirming that.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "It is easier to fight for principles
Montreal, QC, Canada   |  than to live up to them."
steven.winik...@concordia.ca   |
   |  - Alfred Adler

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-13 Thread Steven Winikoff

>Would you consider this?
>
>mhshow-show-application/pdf: %pmime_helper %F %s "%{name}"

Yes, I would.  I just tested it against the unpatched 1.7, and it works.


>That handles the embedded quote, which I think is why mhshow
>doesn't quote the argument correctly.

The problem was the embedded parentheses, specifically the (002) part of
the filename:

=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?=

But yes, "%{name}" does the right thing with that.


>And it would be nice if nmh could decode the filename, so your
>mime_helper doesn't have to (if it does).

It certainly would. :-)


>This works, though hopefully there's a better way:
>
>mhshow-show-application/pdf: %pmime_helper %F %s `fmttest -raw -format 
> '%(decode{text})' "%{name}"`

I had to revise it just slightly:

 mhshow-show-application/pdf: %pmime_helper %F %s "`fmttest -raw -format 
'%(decode{text})' \"%{name}\"`"

This mostly works, but I'm running into quote-handling weirdness.

Specifically, if I run the fmttest command directly, I get this:

   % fmttest -raw -format '%(decode{text})' 
"=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?="
   SEAO - Résultats d'ouverture (002).pdf

...but in .mh_profile, the same thing results in mime_helper receiving

   'SEAO - Résultats d'\'ouverture (002).pdf

as its third argument.


>(If you want to save a line in your profile, that
>mhshow-suffix-application/PDF line is in mhn.defaults.)

It's there, but as mhshow-suffix-application/pdf

Likewise, so is mhshow-suffix-application/postscript
but not mhshow-suffix-application/PostScript

I don't know why I thought these entries are case-sensitive; are they not?

     - Steven
-- 
___
Steven Winikoff|
Concordia University   | Cheop's Law:
Montreal, QC, Canada   |
steven.winik...@concordia.ca   |Nothing *ever* gets built on schedule
   |or within budget.

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

[Nmh-workers] proposed patch for shell metacharacter failure in nmh-1.7

2018-01-13 Thread Steven Winikoff

Yesterday I happened to receive an email message with an attachment
described by these headers:

   Content-Type: application/pdf;
   name="=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?="
   Content-Description: 
=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?=
   Content-Disposition: attachment;
   filename="=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?=";
   size=503419; creation-date="Fri, 12 Jan 2018 12:44:41 GMT"; 
   modification-date="Fri, 12 Jan 2018 12:50:33 GMT"
   Content-Transfer-Encoding: base64

My .mh_profile has these relevant entries:

   mhshow-suffix-application/PDF: .pdf
   mhshow-show-application/pdf: %pmime_helper %F %s %{name}

...where mime_helper is a shell script which opens attachments with the
relevant application when run locally, or copies attachments to a remote
desktop machine and opens them there via ssh.  I'm happy to share it if
anyone's interested, but it's not the point right now.

The point is that the attachment failed to open, with these messages:

   [ part 2 - application/pdf - 
=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'ouverture_(002).pdf?= 503.5KB  ]
   /bin/sh: -c: line 0: syntax error near unexpected token `('
   /bin/sh: -c: line 0: `mime_helper '/home/smw/Mail/mhshowdVgoi7.pdf' 'pdf'  
'=?iso-8859-1?Q?SEAO_-_R=E9sultats_d'\'ouverture_(002).pdf?= "$@"'

The right fix is probably to educate people not to use such abominable
filenames :-), but meanwhile I worked around it as follows:

8<-   cut here   >8
--- mhshowsbr.c.original2017-11-17 10:01:46.0 -0500
+++ mhshowsbr.c 2018-01-13 16:12:53.270723183 -0500
@@ -803,7 +803,7 @@
   char *file, char *buffer, size_t buflen,
   int multipart) {
 int len, quoted = 0;
-char *bp = buffer, *pp;
+char *bp = buffer, *pp, *sp;
 CI ci = >c_ctinfo;
 
 bp[0] = bp[buflen] = '\0';
@@ -975,6 +975,18 @@
bp++;
quoted = 1;
}
+   /* Escape existing parentheses */
+   sp = pp;
+   while (*sp) {
+   if (buflen && ((*sp == '(') || (*sp == ')'))) {
+   len = strlen (sp);
+   memmove (sp + 1, sp, len+1);
+   *sp++ = '\\';
+   buflen--;
+   bp++;
+   }
+   sp++;
+   }
/* Escape existing quotes */
while ((pp = strchr (pp, '\'')) && buflen > 3) {
len = strlen (pp++);
8<-   cut here   >8

I'm passing this on in case this might be considered worth adopting.

I'm not subscribed to this list, so I'd appreciate replies to my personal
address of steven.winik...@concordia.ca

   Thanks,

 - Steven
-- 
___
Steven Winikoff| "Writing is easy; all you do is sit
Concordia University   |  staring at a blank sheet of paper
Montreal, QC, Canada   |  until the drops of blood form on
steven.winik...@concordia.ca   |  your forehead."
   |   - Gene Fowler

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-27 Thread Steven Winikoff

>> As a non-contributor I don't (and shouldn't :-) get a vote here.
>
>Users get votes here.  :-)

I *like* this project. :-)

Seriously, I really do appreciate what you're doing here.  I can't imagine
having to use any other email client.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "A life spent making mistakes is not
Montreal, QC, Canada   |  only more honorable but more useful
steven.winik...@concordia.ca   |  than a life spent doing nothing."
   |   - George Bernard Shaw

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-26 Thread Steven Winikoff

>So in Steven's case something should be done, because test failures are
>alarming and would probably stop installation, and perhaps cause
>abandonment.  Him not having to work around /nmh suffixes is 1.8.

As a non-contributor I don't (and shouldn't :-) get a vote here.

But for what it's worth I agree with this completely.

 - Steven
-- 
_______
Steven Winikoff| "The reasonable man adapts himself to the
Concordia University   |  world; the unreasonable one persists in
Montreal, QC, Canada   |  trying to adapt the world to himself.
steven.winik...@concordia.ca   |  Therefore all progress depends on the
   |  unreasonable man."
   |  - George Bernard Shaw

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-26 Thread Steven Winikoff

>Steven wrote:
>
>> I've run into a few issues, most of which are trivial (and which I'll
>> describe below for the sake of completeness),
>
>Thank you for for detail description.  It looks like Ralph has addressed
>the substantive issues.
>
>> The other issues I mentioned are:
>>
>>   - It would be really nice if configure and Makefile.in didn't force a
>> trailing /nmh on the pathnames I supply for libexecdir and sysconfdir.
>
>Forcing them seems to be a common convention.  It does help avoid namespace
>collisions.

Understood, and when nmh is installed under /usr this makes perfect sense.

In my case I was eager to get my hands on 1.7 rather than wait for it to
be packaged by the OS maintainers, and I purposely keep all my locally
installed packages out of any directory at risk of being overwritten in
a future OS upgrade.

In this case that resulted in the

   /big/local/pkg/nmh

directory referred to in a previous message in this thread.  For
completeness, my installation unpacks the source archive into

   /big/local/pkg/nmh/nmh-1.7

...and all of the installed files go into subdirectories of

   /big/local/pkg/nmh/root-nmh-1.7

This also has the (intentional :-) side effect of making it easy for
multiple versions to coexist.  (On one server I still have 1.4 installed
because at least one user preferred to keep using it rather than learn
how things changed in 1.6 :-/).

Under these circumstances the extra nmh subdirectory isn't helpful, which
is why I wanted to avoid using it.

I know I'm in a minority here, which is why I requested a configure option
rather than an outright change.

  Thanks,

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   | "My interest is in the future because I
Montreal, QC, Canada   |  am going to spend the rest of my life
steven.winik...@concordia.ca   |  there."
   |  - Charles F. Kettering

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-26 Thread Steven Winikoff

>I think it was something obvious.  Can you add `-P' to the pwd below in
>your copy and see if it passes?

Done, and yes, it did.


>I suspect your /big is a symlink.  :-)

Close. :-)  It isn't, but the level below it is.

On some of my systems, rather than try to figure out how much space to
allocate to /home and to locally installed software (which I put in /local
to protect it from future OS upgrades), I create a single partition
spanning everything not used by the OS, with individual directories for
/home, /local and whatever else happens to fit there.

Then I have /local -> /big/local, /home -> /big/home, etc.

(Yes, I know I could just use LVM2, but even that would require some kind
of guess at the initial sizing.)

 - Steven
-- 
_______
Steven Winikoff|
Concordia University   | "Quidquid latine dictum sit, altum
Montreal, QC, Canada   |  viditur.  (Whatever is said in Latin
steven.winik...@concordia.ca   |  sounds profound.)"
   |- fortune(6)

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-26 Thread Steven Winikoff

>>   - I've been using mh for decades (literally!), so I no longer remember
>> why I originally chose to configure using --with-hash-backup.
>
>This is now fixed on the master branch, if Ken or David are happy then
>it can get cherry-picked across to branch 1.7-release.
>http://git.savannah.nongnu.org/cgit/nmh.git/commit/?id=47b86722957cca6057bf5fcd07c9d1f01b4516f8

That was fast. :-)

It turns out there are two more such failures in test-mhfixmsg, which I
didn't see yesterday because I wasn't getting that far.  (Yes, this is
intended to imply that your suggested fix of pwd -P worked perfectly.):

   diff: /big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/,11: No such file 
or directory
   
   ./test/mhfixmsg/test-mhfixmsg: test failed, outputs are in 
/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/,11 and 
/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/11.original.
   first named test failure: with no options:  checks backup
   FAIL: test/mhfixmsg/test-mhfixmsg

and

   diff: /big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/,21: No such file 
or directory

   ./test/mhfixmsg/test-mhfixmsg: test failed, outputs are in 
/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/22 and 
/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/,21.
   first named test failure: -normmproc
   FAIL: test/mhfixmsg/test-mhfixmsg

You may have fixed these already, but I figured I should mention them just
in case.


>>./test/mhfixmsg/test-mhfixmsg: test failed, outputs are in
>>/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/31 and
>>/big/local/pkg/nmh/nmh-1.7/test/testdir/test-mhfixmsg2494.actual.
>>first named test failure: pass through message with relative folder
>>path with parse error
>>FAIL: test/mhfixmsg/test-mhfixmsg
>>
>> Is this something I can safely ignore?
>
>Well, if you don't use mhfixmsg, then probably.  :-)

Excellent point. :-)

This is where I admit that until yesterday I hadn't know that mhfixmsg
existed.  Now I see it was also included in 1.6, which I've been using
for over three years, but I must not have read the release notes for it
carefully enough.

Now that I know, I'll probably start using it in future.

 - Steven
-- 
___
Steven Winikoff|
Concordia University   | "The Universe is not only stranger than
Montreal, QC, Canada   |  we imagine; it is stranger than we can
steven.winik...@concordia.ca   |  imagine."
   |- J.B.S. Haldane

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

[Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-26 Thread Steven Winikoff

I'm not an nmh contributor, but I'm currently working on installing nmh-1.7
on one of my servers.

I've run into a few issues, most of which are trivial (and which I'll
describe below for the sake of completeness), but the one which may not
be trivial is this:

   # make check
   [...passed tests and benign failures elided...]
   
   *** /big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/31   2017-11-25 
21:06:40.117262850 -0500
   --- /big/local/pkg/nmh/nmh-1.7/test/testdir/test-mhfixmsg2494.actual
2017-11-25 21:06:40.125262997 -0500
   ***
   *** 1,15 
   - To: recipi...@example.com
   - From: sen...@example.com
   - Subject: mhfixmsg pass through on parse error
   - MIME-Version: 1.0
   - Content-Type: multipart/mixed; boundary="- =_aa0"
   - 
   - --- =_aa0
   - Content-Type: text/plain; charset="iso-8859-1
   - Content-Disposition: attachment; filename="test1.txt"
   - Content-Transfer-Encoding: quoted-printable
   - 
   - This is the=
   -  text/plain part.
   - 
   - --- =_aa0--
   --- 0 
   
   ./test/mhfixmsg/test-mhfixmsg: test failed, outputs are in 
/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/inbox/31 and 
/big/local/pkg/nmh/nmh-1.7/test/testdir/test-mhfixmsg2494.actual.
   first named test failure: pass through message with relative folder path 
with parse error
   FAIL: test/mhfixmsg/test-mhfixmsg

Is this something I can safely ignore?

The other issues I mentioned are:

  - It would be really nice if configure and Makefile.in didn't force a
trailing /nmh on the pathnames I supply for libexecdir and sysconfdir.

This is trivial because it was easy enough to edit both files myself
to make the change I wanted, but it would be much nicer if there were
a way to do that by supplying an option to configure.

  - I've been using mh for decades (literally!), so I no longer remember
why I originally chose to configure using --with-hash-backup.

Nevertheless I did so, and ever since I've been continuing to do so for
sake of consistency.  This causes three other tests to fail, because
they hardcode backup filenames using a comma:

In this case I know these failures are benign, but in any case I proved
that to myself by reconfiguring without --with-hash-backup and running
make check again; in that situation, mhfixmsg/test-mhfixmsg is the only
test that failed, but it did still fail.

I'm not subscribed to this list, so I'd appreciate replies to my personal
address of steven.winik...@concordia.ca

   Thanks,

 - Steven
-- 
_______
Steven Winikoff| "The reasonable man adapts himself to the
Concordia University   |  world; the unreasonable one persists in
Montreal, QC, Canada   |  trying to adapt the world to himself.
steven.winik...@concordia.ca   |  Therefore all progress depends on the
   |  unreasonable man."
   |  - George Bernard Shaw

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] possible problem with mhfixmsg in nmh-1.7

2017-11-26 Thread Steven Winikoff

In my previous message, I wrote:

   This causes three other tests to fail, because they hardcode backup
   filenames using a comma:

I then continued the rest of the message, and forgot to go back and list
the three tests in question.  I apologize for that oversight, but at least
I can rectify it here:

   rm: cannot remove 
'/big/local/pkg/nmh/nmh-1.7/test/testdir/,23036.draft.orig': No such file or 
directory
   FAIL: test/mhbuild/test-forw
   
   [...passed tests and mhfixmsg failure already described elided...]

   mv: cannot stat '/big/local/pkg/nmh/nmh-1.7/test/testdir/Mail/,draft': No 
such file or directory
   first named test failure: smtp server doesn't support SMTPUTF8
   FAIL: test/post/test-rfc6531
   
   [...passed tests elided...]
   
   ./test/refile/test-refile: refile -nounlink failed
   FAIL: test/refile/test-refile

To recap, these benign failures are all triggered by calling configure with
the --with-hash-backup option.

 - Steven
-- 
___
Steven Winikoff| Zymurgy's Law of Evolving Systems
Concordia University   | Dynamics:
Montreal, QC, Canada   |Once you open a can of worms,
steven.winik...@concordia.ca   |the only way to recan them is
   |to use a larger can.

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

88 matches

Mail list logo