Steffen Nurpmeso <stef...@sdaoden.eu> wrote:

> Hello Mr. Werner Fink.
> 
> Dr. Werner Fink wrote in <20181217134450.ga9...@boole.suse.de>:
>  |On Thu, Dec 13, 2018 at 06:06:24PM +0100, Steffen Nurpmeso wrote:
>  |> Werner Fink wrote in <20181213092255.ga12...@boole.suse.de>:
>  ...
>  |>|First of all I had done/add some patches[0] for the old heilroom-mailx,
>  |>|like my mime autodetection of the encoding of a piped message even within
>  |>|a clean POSIX locale.  Also some extensions for options, mainly like
>  |>|the -R option[1] and the handling of mail addresses before and afther
>  |>|the options.
>  |> 
>  |> Ah!  This is the first i see.  A couple of years ago i was
>  |> searching around for packager patches, but could not find just
>  |> anything (but Debian, mostly).  If you do not know where to look,
>  |> you are lost.
>  |
>  |Ah ... never seen a mail about s-mailx/s-nail.  And as mailx seems
> 
> Maybe it is sometimes as easy as just kindly asking a question.
> That i have missed then.
> 
>  |to be not active maintained I had done a lot of patches to get my users
>  |and customers happy.  The patches are grown with the bug reports ;)
>  |
>  |> So this is where this ends up..   Your usage of -R, that is
>  |> a pity.  We do support Reply-To: when parsing and such, your -R is
>  |> in effect "-S reply_to=arg".
>  |> Note our builtin getopt does not support optional arguments.
>  |> What can be done about this?
>  |
>  |The problem was that this -R option is very old and Gunnar was faster
>  |with his next version with an own -R option. Therefore I had used the
>  |glibc feature of getopt(3) with its two colon which support an optional
>  |argument for an option. Simply not to break customers scripts using
>  |mailx out there.
> 
> I wonder whether eighteen years later that will brake anything.
> In the mail to Ralph i "envisioned" -T HEADNAME=ADDRESS, and this
> i think could be done, it would make sense (if ADDRESS is not
> parsed as a list, which it is for -b, -c etc.).  This would give
> you the option to patch in support for -Rarg on SuSE relatively
> easy, but optional arguments as such: better not.
> (I do not know whether it will be -T yet.  Maybe i will even port
> the small GetOpt class and we have long options in January.  Hmm.
> This will weaken your patch capabilities, then.)
> 
>  |>|Also I'd like to ask how to enable the RFC4155 as default as I can
>  |>|not break mbox out there.  This was the first what I've seen on my
>  |>|various tests with my test mbox folders around here and I'd like to
>  |>|avoid that users/customers have to edit their mbox based folders[2]
>  |> 
>  |> Well, i hope we do not break MBOXes!
>  |> The default parser uses the laxe POSIX standard rules for MBOX
>  |> parsing (Vol. 3, Shell and Utilities, mailx), and these specify
>  |> (in OUTPUT FILES) "line beginning with From space" ... "one or
>  |> more header lines" ... "empty line" ... "zero or more body lines,
>  |> followed by empty line".
>  |
>  |What I see is
>  |
>  | There are new messages in the error message ring (denoted by ERROR)
>  |   The `errors' command manages this message ring
>  | ERROR# ?
>  |
>  |for several mailboxes which contain mail body including a "From " at
>  |at the beginning of the line.  AFAICS nnon of the do use a mail address
>  |nor a date after this initial "From " but a normal sentence :)
> 
> Yes, bad things happen all the time and most software simply does
> neither deal with it nor give any feedback.  What you see is
> exactly what happened to me, and suddenly mailboxes were just
> plain broken!  I hope we get better.
> As written, the software now has an improved MBOX parser, even in
> non-*mbox-rfc4155* cases.  Thanks for the suggestion!
> 
>  |> Of course the standard just names the lowest common denominator of
>  |> the MBOX mess which started with UNIX v5 iirc.
>  |> The problem is old, there is MBOXO quoting etc., then there was
>  |> Content-Length: and such, which Zawinski had a humble opinion on
>  |> and off[1].
>  |
>  |The usage of the Content-Length: tag would be very perfect
>  |
>  | grep -E ^Content-Length ~/Mail/mailbox | wc -l
>  | 537
>  |
>  |if available.  AFAIK the sendmail as well as the postfix with procmail
>  |do use it (or you have to use -Y for `traditional Berkeley mailbox \
>  |format').
>  |Whereas mail.local(8) from sendmail and local(8) from postfix do use
>  |`traditional Berkeley mailbox format'. 
> 
> mutt(1) updates these when it writes messages, too.
> We actively strip them (except when *keep-content-length* is set,
> at least until v15, as documented), since we are not able to keep
> them up-to-date.  If others do too, they are not reliable.
> 
> I am opposed to those, because to me MBOX is a standardized
> database format (RFC 4155) with standardized content (RFC 5322);
> to make the latter comply to the former you may either need to
> modify the content of the latter (etc. MBOXO quoting), or apply
> a reversible MIME encoding.  (We only do the latter right now.)
> If proper messages are written you do not need those fields, they
> only waste space; and their info is trivially calculated (along
> the fly).  Whereas, to use them, you need to fully parse the
> headers as you fly by, which is pretty expensive.  Etc. Etc.
> 
>  |> The standard RFC 4155 defines a stricter format which is harder to
>  |> get wrong, but i cannot enable this by default.
>  |> Also, the way we do it is not good.  We should properly MIME
>  |> reencode the entire message, in order not to mangle actual
>  |> content.
>  |
>  |I'm aware of RFC4155 but it would be very fine not to get users
>  |in panic back on the bugzilla due this if procmail was in use.

Considering current status of procmail I am supposed that they are not
already in the panic mode.

https://marc.info/?l=openbsd-ports&m=141634350915839&w=2




> 
> I have never used procmail, i do not really know what you mean
> here.  You are saying the mailx has always been broken when used
> on mailboxes generated by procmail?
> Our MBOX parser is now furtherly improved, thanks to your
> suggestion.  The next release is planned for 2019-01-11.
> Current state is a bit wild, i need more time, also for new ASAN
> and Coverity.com scans...  (My development box uses musl which
> does not have support for ASAN, so it is not just a quick thing.)
> 
>   ...
>  |> MIME detection.. great..  Well, the character set stuff we do
>  |> differently, automatic character set detection is quite
>  |> complicated stuff if done right[2,3].
>  ...
>  |The autodetection of utf8 is simple and well defined in ISO/IEC 10646
>  |(also shon in utf8(7) manuala page section Encoding) and it helps a
> 
> The encoding is well defined, yes :)
> It has not changed since Thompson and Pike designed it on
> a napkin, as far as i know.
> 
>  |lot in pure POSIX locale:
>  |
>  | echo \303\266\303\244\303\274
>  | ??????
>  | echo \303\266\303\244\303\274 | mailx -s test wer...@suse.de
>  | echo $?
>  | echo \303\266\303\244\303\274 | .obj/mailx -s test wer...@suse.de
>  | mailx: Cannot find a usable character set to encode message: No such \
>  | entry, file or directory
>  | /suse/werner/dead.letter 3/45
>  | mailx: ... message not sent
> 
> Well, you can always say "-S ttycharset=utf8", as is most often
> used in the manual page, and it works just fine.
> 
> I mean, if the data would start with a UTF-8 encoded Unicode BOM,
> we could possibly force an input character set of UTF-8 (given
> that the Unicode FAQ says something about very high probability
> for this case, iirc, and maybe because it has too), but i would
> feel very uncomfortable to claim a character encoding just because
> some bytes seem to adhere to a coding scheme.
> 
> We could possibly extend our MIME classifier, and when we have
> seen multiple UTF-8 sequences after reading all the body, and if
> and only if the current locale is "C" a.k.a. and if *ttycharset*
> is n_iconv_name_is_ascii(), _then_ we could, instead of using the
> normal *charset-8bit*, go for UTF-8.
> But that would be the absolut maximum.  What do you think?
> 
>  |it is a major feature for my users which uses mailx for system mails.
> 
> Just say -Sttycharset=utf8 if you really have to run something in
> LC_ALL=C but want to generate the =charset data.  This always
> works, maybe even for ISO-2022-JP.  (Maybe)
> 
> I mean, our content classifier will need to change and work on
> multibytes.  In v15 i will hopefully have some tools of my (until
> now incomplete) Unicode library available, which will then allow
> something like mbtowc() for a a character set (*ttycharset*,
> *charset-8bit*, *charset-7bit*), concurrently.
> I.e., converters which work on character sets not locales.
> If you have to use the standard librariers you are lost.
> 
>  |As the patched mailx does also catch binary code with the correct
>  |
>  | echo ^A^B | mailx -s test wer...@suse.de
>  |
>  |where ^A and ^B are Ctrl-A (SOH) and Ctrl-B (STX) ... this is seen
>  |my mutt as this
>  |
>  | Date: Mon, 17 Dec 2018 14:22:09 +0100
>  | From wer...@suse.de  Mon Dec 17 13:22:12 2018
>  | From: "Dr. Werner Fink" <wer...@suse.de>
>  | To: wer...@suse.de
>  | Subject: test
>  | Return-Path: <wer...@suse.de>
>  | User-Agent: Heirloom mailx 12.5 7/5/10
>  | 
>  | [-- application/octet-stream is unsupported (use 'v' to view this \
>  | part) --]
> 
> I currently have no heirloom-mailx here, but
> 
>   printf '\x01\x02' | s-nail -:/ -Sexpandaddr -s test - > .X
> 
> works like so.
> 
> --steffen
> |
> |Der Kragenbaer,                The moon bear,
> |der holt sich munter           he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)

Reply via email to