Steffen Nurpmeso <stef...@sdaoden.eu> wrote: > Hello Mr. Werner Fink. > > Dr. Werner Fink wrote in <20181217134450.ga9...@boole.suse.de>: > |On Thu, Dec 13, 2018 at 06:06:24PM +0100, Steffen Nurpmeso wrote: > |> Werner Fink wrote in <20181213092255.ga12...@boole.suse.de>: > ... > |>|First of all I had done/add some patches[0] for the old heilroom-mailx, > |>|like my mime autodetection of the encoding of a piped message even within > |>|a clean POSIX locale. Also some extensions for options, mainly like > |>|the -R option[1] and the handling of mail addresses before and afther > |>|the options. > |> > |> Ah! This is the first i see. A couple of years ago i was > |> searching around for packager patches, but could not find just > |> anything (but Debian, mostly). If you do not know where to look, > |> you are lost. > | > |Ah ... never seen a mail about s-mailx/s-nail. And as mailx seems > > Maybe it is sometimes as easy as just kindly asking a question. > That i have missed then. > > |to be not active maintained I had done a lot of patches to get my users > |and customers happy. The patches are grown with the bug reports ;) > | > |> So this is where this ends up.. Your usage of -R, that is > |> a pity. We do support Reply-To: when parsing and such, your -R is > |> in effect "-S reply_to=arg". > |> Note our builtin getopt does not support optional arguments. > |> What can be done about this? > | > |The problem was that this -R option is very old and Gunnar was faster > |with his next version with an own -R option. Therefore I had used the > |glibc feature of getopt(3) with its two colon which support an optional > |argument for an option. Simply not to break customers scripts using > |mailx out there. > > I wonder whether eighteen years later that will brake anything. > In the mail to Ralph i "envisioned" -T HEADNAME=ADDRESS, and this > i think could be done, it would make sense (if ADDRESS is not > parsed as a list, which it is for -b, -c etc.). This would give > you the option to patch in support for -Rarg on SuSE relatively > easy, but optional arguments as such: better not. > (I do not know whether it will be -T yet. Maybe i will even port > the small GetOpt class and we have long options in January. Hmm. > This will weaken your patch capabilities, then.) > > |>|Also I'd like to ask how to enable the RFC4155 as default as I can > |>|not break mbox out there. This was the first what I've seen on my > |>|various tests with my test mbox folders around here and I'd like to > |>|avoid that users/customers have to edit their mbox based folders[2] > |> > |> Well, i hope we do not break MBOXes! > |> The default parser uses the laxe POSIX standard rules for MBOX > |> parsing (Vol. 3, Shell and Utilities, mailx), and these specify > |> (in OUTPUT FILES) "line beginning with From space" ... "one or > |> more header lines" ... "empty line" ... "zero or more body lines, > |> followed by empty line". > | > |What I see is > | > | There are new messages in the error message ring (denoted by ERROR) > | The `errors' command manages this message ring > | ERROR# ? > | > |for several mailboxes which contain mail body including a "From " at > |at the beginning of the line. AFAICS nnon of the do use a mail address > |nor a date after this initial "From " but a normal sentence :) > > Yes, bad things happen all the time and most software simply does > neither deal with it nor give any feedback. What you see is > exactly what happened to me, and suddenly mailboxes were just > plain broken! I hope we get better. > As written, the software now has an improved MBOX parser, even in > non-*mbox-rfc4155* cases. Thanks for the suggestion! > > |> Of course the standard just names the lowest common denominator of > |> the MBOX mess which started with UNIX v5 iirc. > |> The problem is old, there is MBOXO quoting etc., then there was > |> Content-Length: and such, which Zawinski had a humble opinion on > |> and off[1]. > | > |The usage of the Content-Length: tag would be very perfect > | > | grep -E ^Content-Length ~/Mail/mailbox | wc -l > | 537 > | > |if available. AFAIK the sendmail as well as the postfix with procmail > |do use it (or you have to use -Y for `traditional Berkeley mailbox \ > |format'). > |Whereas mail.local(8) from sendmail and local(8) from postfix do use > |`traditional Berkeley mailbox format'. > > mutt(1) updates these when it writes messages, too. > We actively strip them (except when *keep-content-length* is set, > at least until v15, as documented), since we are not able to keep > them up-to-date. If others do too, they are not reliable. > > I am opposed to those, because to me MBOX is a standardized > database format (RFC 4155) with standardized content (RFC 5322); > to make the latter comply to the former you may either need to > modify the content of the latter (etc. MBOXO quoting), or apply > a reversible MIME encoding. (We only do the latter right now.) > If proper messages are written you do not need those fields, they > only waste space; and their info is trivially calculated (along > the fly). Whereas, to use them, you need to fully parse the > headers as you fly by, which is pretty expensive. Etc. Etc. > > |> The standard RFC 4155 defines a stricter format which is harder to > |> get wrong, but i cannot enable this by default. > |> Also, the way we do it is not good. We should properly MIME > |> reencode the entire message, in order not to mangle actual > |> content. > | > |I'm aware of RFC4155 but it would be very fine not to get users > |in panic back on the bugzilla due this if procmail was in use.
Considering current status of procmail I am supposed that they are not already in the panic mode. https://marc.info/?l=openbsd-ports&m=141634350915839&w=2 > > I have never used procmail, i do not really know what you mean > here. You are saying the mailx has always been broken when used > on mailboxes generated by procmail? > Our MBOX parser is now furtherly improved, thanks to your > suggestion. The next release is planned for 2019-01-11. > Current state is a bit wild, i need more time, also for new ASAN > and Coverity.com scans... (My development box uses musl which > does not have support for ASAN, so it is not just a quick thing.) > > ... > |> MIME detection.. great.. Well, the character set stuff we do > |> differently, automatic character set detection is quite > |> complicated stuff if done right[2,3]. > ... > |The autodetection of utf8 is simple and well defined in ISO/IEC 10646 > |(also shon in utf8(7) manuala page section Encoding) and it helps a > > The encoding is well defined, yes :) > It has not changed since Thompson and Pike designed it on > a napkin, as far as i know. > > |lot in pure POSIX locale: > | > | echo \303\266\303\244\303\274 > | ?????? > | echo \303\266\303\244\303\274 | mailx -s test wer...@suse.de > | echo $? > | echo \303\266\303\244\303\274 | .obj/mailx -s test wer...@suse.de > | mailx: Cannot find a usable character set to encode message: No such \ > | entry, file or directory > | /suse/werner/dead.letter 3/45 > | mailx: ... message not sent > > Well, you can always say "-S ttycharset=utf8", as is most often > used in the manual page, and it works just fine. > > I mean, if the data would start with a UTF-8 encoded Unicode BOM, > we could possibly force an input character set of UTF-8 (given > that the Unicode FAQ says something about very high probability > for this case, iirc, and maybe because it has too), but i would > feel very uncomfortable to claim a character encoding just because > some bytes seem to adhere to a coding scheme. > > We could possibly extend our MIME classifier, and when we have > seen multiple UTF-8 sequences after reading all the body, and if > and only if the current locale is "C" a.k.a. and if *ttycharset* > is n_iconv_name_is_ascii(), _then_ we could, instead of using the > normal *charset-8bit*, go for UTF-8. > But that would be the absolut maximum. What do you think? > > |it is a major feature for my users which uses mailx for system mails. > > Just say -Sttycharset=utf8 if you really have to run something in > LC_ALL=C but want to generate the =charset data. This always > works, maybe even for ISO-2022-JP. (Maybe) > > I mean, our content classifier will need to change and work on > multibytes. In v15 i will hopefully have some tools of my (until > now incomplete) Unicode library available, which will then allow > something like mbtowc() for a a character set (*ttycharset*, > *charset-8bit*, *charset-7bit*), concurrently. > I.e., converters which work on character sets not locales. > If you have to use the standard librariers you are lost. > > |As the patched mailx does also catch binary code with the correct > | > | echo ^A^B | mailx -s test wer...@suse.de > | > |where ^A and ^B are Ctrl-A (SOH) and Ctrl-B (STX) ... this is seen > |my mutt as this > | > | Date: Mon, 17 Dec 2018 14:22:09 +0100 > | From wer...@suse.de Mon Dec 17 13:22:12 2018 > | From: "Dr. Werner Fink" <wer...@suse.de> > | To: wer...@suse.de > | Subject: test > | Return-Path: <wer...@suse.de> > | User-Agent: Heirloom mailx 12.5 7/5/10 > | > | [-- application/octet-stream is unsupported (use 'v' to view this \ > | part) --] > > I currently have no heirloom-mailx here, but > > printf '\x01\x02' | s-nail -:/ -Sexpandaddr -s test - > .X > > works like so. > > --steffen > | > |Der Kragenbaer, The moon bear, > |der holt sich munter he cheerfully and one by one > |einen nach dem anderen runter wa.ks himself off > |(By Robert Gernhardt)