Re: [nmh-workers] Stupid 'pick' question...
Hi Bakul, > The reason being the body was MIME encoded because of UTF-8 even > though it is plain text. I think that's why David and others runs all their emails on delivery through mhfixmsg(1)'s `-decodetext'. > Content-Transfer-Encoding: base64 Which of us hasn't worked out the possible encodings and searched for those? :-) $ for p in '' _ __; do base64 <<<${p}foobar; done Zm9vYmFyCg== X2Zvb2Jhcgo= X19mb29iYXIK $ $ for p in nul u to tri; do > base64 <<<${p}foobarxyzzy | egrep 'Zm9vYmFy|Zvb2Jhc|mb29iYX' > done bnVsZm9vYmFyeHl6enkK dWZvb2Jhcnh5enp5Cg== dG9mb29iYXJ4eXp6eQo= dHJpZm9vYmFyeHl6enkK $ -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On 12 Jun 2019 08:46:43 -0600 "Andy Bradford" wrote: > [ part - text/plain - 577B ] > Thus said "Valdis Kl?tnieks" on Sat, 08 Jun 2019 21:26:46 -0400: > > > In a world of Microsoft Office attachments, is having -search go > > through the body by default as well still a good idea? Maybe having a > > separate -searchbody would be better? > > Hard to say what it *should* be. In my environment, the majority of > messages that I use -search with do have searchable content in the body > of the message, so I've never really been concerned about it. > > Andy > -- > TAI64 timestamp: 40005d011078 It's funny you say that! I did pick -search Micro cur "cur" being your message, and the search failed. The reason being the body was MIME encoded because of UTF-8 even though it is plain text. Here is a sample of what you see with show -noshowproc Cc: nmh-workers@nongnu.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: nmh-workers-bounces+bakul=bitblocks@nongnu.org Sender: "nmh-workers" VGh1cyBzYWlkICJWYWxkaXMgS2w/dG5pZWtzIiBvbiBTYXQsIDA4IEp1biAyMDE5IDIxOjI2OjQ2 IC0wNDAwOgoKPiBJbiAgYSB3b3JsZCAgb2YgTWljcm9zb2Z0ICBPZmZpY2UgIGF0dGFjaG1lbnRz LCBpcyAgaGF2aW5nIC1zZWFyY2ggIGdvCj4gdGhyb3VnaCB0aGUgYm9keSBieSBkZWZhdWx0IGFz IHdlbGwgIHN0aWxsIGEgZ29vZCBpZGVhPyBNYXliZSBoYXZpbmcgYQo+IHNlcGFyYXRlIC1zZWFy Y2hib2R5IHdvdWxkIGJlIGJldHRlcj8KCkhhcmQgdG8gIHNheSB3aGF0ICBpdCAqc2hvdWxkKiAg -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Thus said "Valdis Kl?tnieks" on Sat, 08 Jun 2019 21:26:46 -0400: > In a world of Microsoft Office attachments, is having -search go > through the body by default as well still a good idea? Maybe having a > separate -searchbody would be better? Hard to say what it *should* be. In my environment, the majority of messages that I use -search with do have searchable content in the body of the message, so I've never really been concerned about it. Andy -- TAI64 timestamp: 40005d011078 -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
>The bigger issue is that -search seems to blindly match the >msg body. It doesn't even do mime decoding. So for example the >body of my prev msg to nmh is base64 encoded (even though it >is plain text) and -search fails. Unless I specify a search >pattern of encoded text such as 9uIEJpZyBlbmRpYW! pick(1) not doing MIME decoding is clearly a bug, and as far as I know there has been no disagreement on that. The plan is to fix that in the Great Mime Rewrite. Someday. --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On Jun 9, 2019, at 1:49 AM, Ralph Corderoy wrote: > > Hi Bakul, > >> So pick runs -search on header lines as well as the body a header >> specific option is only run against headers. > > pick(1): > > This means that the pattern specified for a -search will be found > everywhere in the message, including the header and the body, while the > other pattern matching requests are limited to the single specified > component. > >> And pick matches header line *after* line folding, as tested with >> Received: > > Pattern matching is performed on a per-line basis. Within the header > of the message, each component is treated as one long line, but in the > body, each line is separate. > >> Conclusion: the man page is not quite accurate! But that is probably >> ok. > > Based on the above extracts, it doesn't seem too bad. > Is there something in particular you think's missing? No, it is quite accurate. Thanks for doing a better job of reading than me! -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Hi Valdis, > In a world of Microsoft Office attachments, is having -search go > through the body by default as well still a good idea? Maybe having a > separate -searchbody would be better? I think -search should be left alone, but there's previous discussion on this list about a -header and -body that do -search's work but just on the appropriate `half'. If you're doing it a lot and many of the emails have large bodies then a copy of the folder with all bodies deleted would let you search just the headers and come up with the same message numbers for the original folder. Alternatively, use egrep(1) to do the initial filtering on just the headers, avoiding paging in all those bodies. shopt -s extglob cd `mhpath` egrep -im1 '^$|from.*ralph' [1-9]*([0-9]) | sed -n 's/:..*//p' | awk 'END {if (!NR) print 0}' | xargs -r pick -list -from ralph -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Hi Bakul, > So pick runs -search on header lines as well as the body a header > specific option is only run against headers. pick(1): This means that the pattern specified for a -search will be found everywhere in the message, including the header and the body, while the other pattern matching requests are limited to the single specified component. > And pick matches header line *after* line folding, as tested with > Received: Pattern matching is performed on a per-line basis. Within the header of the message, each component is treated as one long line, but in the body, each line is separate. > Conclusion: the man page is not quite accurate! But that is probably > ok. Based on the above extracts, it doesn't seem too bad. Is there something in particular you think's missing? -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Hi kre, > > Which it what happens at the moment, so it wouldn't be backwards > > compatible. > > No, it wouldn't - but does anyone really think that matters? Only to the extent it's worthy of a line in the release notes. I send myself short emails with leading punctuation in the Subject field to categorise them, e.g. `+'. I don't use `^', but it's not that impossible given its `up arrow', `top of the list', `important', appearance. -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Hi Valdis, > > -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]' > > [~] grep ^Subject Mail/linux-kernel/321805 > Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian > [~] scan `pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' > -and -from gre...@linuxfoundation.org -list` > 321805 * Thu 21Feb 7k Greg Kroah-Hartma Re: [PATCH 4.9 04/20] > net: Fix for_each_netdev_feature on Big endian < There's still something busticated here. Why did it match even with > the Re: in there? Your grep is looking for Search at the start of a line, your pick isn't. The email has the original, non-Re:, subject in the email's body. https://lkml.org/lkml/2019/2/21/975 >A modified grep(1) is used to perform the matching, so the full regular > expression >(see ed(1)) facility is available within pattern. With -search, > pattern is used >directly, and with the others, the grep pattern constructed is: That's ugly formatting. I find «export MANOPT='--nh --nj'» helps a lot with the man(1) here. > Also, saw this under 'BUG' in the pick manpage: > > The pattern syntax '[l-r]' is not supported; each letter to be matched > must be included within the square brackets. I think Paul Fox fixed that back in 2006. http://git.savannah.nongnu.org/cgit/nmh.git/commit/?id=dc0b0be755b41f3c195913631fedf023ad69192e -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On Jun 8, 2019, at 6:26 PM, Valdis Klētnieks wrote: > > On Sat, 08 Jun 2019 17:17:40 -0700, Bakul Shah said: >> >> So pick runs -search on header lines as well as the body a header specific >> option is only run against headers. > > In a world of Microsoft Office attachments, is having -search go through the > body by default as well still a good idea? Maybe having a separate -searchbody > would be better? The bigger issue is that -search seems to blindly match the msg body. It doesn't even do mime decoding. So for example the body of my prev msg to nmh is base64 encoded (even though it is plain text) and -search fails. Unless I specify a search pattern of encoded text such as 9uIEJpZyBlbmRpYW! -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On Sat, 08 Jun 2019 17:17:40 -0700, Bakul Shah said: > > So pick runs -search on header lines as well as the body a header specific > option is only run against headers. In a world of Microsoft Office attachments, is having -search go through the body by default as well still a good idea? Maybe having a separate -searchbody would be better? (Due to events not under my control, I'm currently stuck on a laptop with a Celeron CPU, and disk I/O is painful (it's managing only 3-5Mbytes/sec even with an SSD in it... Ouch) pgpZUdgvZqcDg.pgp Description: PGP signature -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On Jun 8, 2019, at 11:03 AM, Valdis Klētnieks wrote: > > On Fri, 07 Jun 2019 16:19:15 -0700, Bakul Shah said: >> You can directly use search as follows: >> >> -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]' > > [~] grep ^Subject Mail/linux-kernel/321805 > Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian > [~] scan `pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' > -and -from gre...@linuxfoundation.org -list` > 321805 * Thu 21Feb 7k Greg Kroah-Hartma Re: [PATCH 4.9 04/20] > net: Fix for_each_netdev_feature on Big endian < [~] scan `pick +linux-kernel 321805 -search 'Subject: \[WOMBAT [45]\.[0-9]' > -and -from gre...@linuxfoundation.org -list` > pick: no messages match specification > scan: no messages match specification Let us look at what you see (after some cleanup): $ grep ^Subject Mail/linux-kernel/321805 Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian $ pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' -and -from gre...@linuxfoundation.org 321805 $ pick +linux-kernel 321805 -search 'Subject: \[WOMBAT [45]\.[0-9]' -and -from gre...@linuxfoundation.org pick: no messages match specification This makes sense. But note that pick does treat header lines specially. Using your message as an example: $ grep ^Subject: `mhpath cur` Subject: Re: [nmh-workers] Stupid 'pick' question... Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian The second Subject: line is from the message body. $ pick -subj PATCH cur pick: no messages match specification $ pick -search 'Subject:.*PATH' cur 14 $ pick -search 'Subject:.*nmh' cur 14 So pick runs -search on header lines as well as the body a header specific option is only run against headers. And pick matches header line *after* line folding, as tested with Received: pick --received 'bakul' cur -- matches but my name is not on the same line as Received: pick --received '\]\)' cur -- match )] on the first line pick --received '\]\)$' cur -- no match even though the first Received: line ends wih )] pick --received '0400$' cur -- match as 0400 ends the last line of a Received: field. Conclusion: the man page is not quite accurate! But that is probably ok. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Date:Sat, 08 Jun 2019 14:03:12 -0400 From:"Valdis Kl=?utf-8?Q?=c4=93?=tnieks" Message-ID: <13857.1560016992@turing-police> | I understand why that .* was causing me indigestion. But I'm having a hard | time matching "pattern is used directly" with what I'm seeing, | unless -search is *also* doing a split into component and pattern I haven't looked, but it is not impossible that (perhaps any number of) "Re:" get deleted from Subject headers before matching (or perhaps it attempts a match both with and without) - that shouldn't be needed with a -subject search, but with -search it would allow users to easily find all messages in a thread (with and without Re: stuck in there). I also wouldn't be surpriused to see mailing-list noise elided before matching is attempted (the '[nmh-workers]' that gets installed in all of these messages). I have seen systems in the past which acted like that. It tends to more often DTRT than otherwise (but when it is otherwise, it is certainly perplexing). However, when I try it, I don't see that (either of them) ... jinx$ scan $( pick -subject "Stupid 'pick" +info/nmh ) 11777 Fri "Valdis Kl?tnieks [nmh-workers] Stupid 'pick' question...<<--= 11778 Fri Bakul Shah Re: [nmh-workers] Stupid 'pick' question...<https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Date:Sat, 08 Jun 2019 11:04:27 +0100 From:Ralph Corderoy Message-ID: <20190608100427.4d1c921...@orac.inputplus.co.uk> | Which it what happens at the moment, so it wouldn't be backwards | compatible. No, it wouldn't - but does anyone really think that matters? That is, is anyone, anywhere, going to attempt to match a ^ at the start of a field matching pattern pattern without escaping it (either as \^ or using [^]) ? (Even if they do know that nmh prepends more pattern to the start.) kre -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On Fri, 07 Jun 2019 16:19:15 -0700, Bakul Shah said: > You can directly use search as follows: > > -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]' [~] grep ^Subject Mail/linux-kernel/321805 Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian [~] scan `pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' -and -from gre...@linuxfoundation.org -list` 321805 * Thu 21Feb 7k Greg Kroah-Hartma Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian < pgprtuNFrLPxk.pgp Description: PGP signature -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Hi kre, > Maybe it would be possible to look for a leading ^ in the user's > pattern (for other than -search), and if found, remove it, and replace > the ".*" that's inserted into the RE with "[ \t]*" ? Sounds like a good idea. > Certainly no-one who uses a leading ^ is expecting it to attempt to > match a literal '^'. Which it what happens at the moment, so it wouldn't be backwards compatible. -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Date:Sat, 08 Jun 2019 08:45:17 +0100 From:Ralph Corderoy Message-ID: <20190608074517.a586020...@orac.inputplus.co.uk> | Bakul answered about the anchors. Maybe it would be possible to look for a leading ^ in the user's pattern (for other than -search), and if found, remove it, and replace the ".*" that's inserted into the RE with "[ \t]*" ? That would, I suspect, be more useful, and more in accordance with what users might expect to happen. Certainly no-one who uses a leading ^ is expecting it to attempt to match a literal '^'. kre -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
Hi Valdis, > pick -from -subject '\[PATCH [45]\.[0-9]' ... > However, it *also* catches messages of the form 'Subject: Re: [PATCH > ' which is unacceptable for the use case in question. Bakul answered about the anchors. Another approach is to rule out replies. -sub foo -and -not -sub re: -sub foo -and -not --references '_*' -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
But more often I just use grep or agrep by cd-ing to the right folder: cd ~/Mail/ pick + -from -and -subj ... -seq foo pick foo | xargs agrep -li 'this;that' | xargs scan -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] Stupid 'pick' question...
On Fri, 07 Jun 2019 19:00:52 -0400 "Valdis =?utf-8?Q?Kl=c4=93tnieks?=" wrote: > > So trying to work with the linux-kernel mailing list firehose (800-1500 > messages a day), and hitting a problem with 'pick'. > > Am trying to match all messages from a given person with a given part of > a subject line. > > pick -from -subject '\[PATCH [45]\.[0-9]' > > *almost* does what I want - catch all messages that have '[PATCH 4.9]' > or '[PATCH 4.14]' or '[PATCH 5.0]'. However, it *also* catches messages > of the form 'Subject: Re: [PATCH ' which is unacceptable for the use case > in question. > > So I tried an anchored search using -subject '^\[PATCH [45]\.[0-9]' but that > results in nothing matching. So much for this from the manpage: man pick to see why. -subject '^foo' is equivalent to -search "subject[ \t]*:.*^foo' which won't match anything. > Oddly enough, $ for tail-anchor seems to work: This makes sense since -subj 'foo$' is is -search "subject[ \t]*:.*foo$' You can directly use search as follows: -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]' -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
[nmh-workers] Stupid 'pick' question...
So trying to work with the linux-kernel mailing list firehose (800-1500 messages a day), and hitting a problem with 'pick'. Am trying to match all messages from a given person with a given part of a subject line. pick -from -subject '\[PATCH [45]\.[0-9]' *almost* does what I want - catch all messages that have '[PATCH 4.9]' or '[PATCH 4.14]' or '[PATCH 5.0]'. However, it *also* catches messages of the form 'Subject: Re: [PATCH ' which is unacceptable for the use case in question. So I tried an anchored search using -subject '^\[PATCH [45]\.[0-9]' but that results in nothing matching. So much for this from the manpage: A modified grep(1) is used to perform the matching, so the full regular expression (see ed(1)) facility is available within pattern. With -search, pattern is used directly, and with the others, the grep pattern constructed is: So is ^ to anchor the search in fact unsupported? Broken? I'm using it wrong? Oddly enough, $ for tail-anchor seems to work: 18:56:33 0 [~/Mail/kernel-patches] scan `pick +linux-kernel last:3000 -from gre...@linuxfoundation.org -and -subject 'path' -list` | more 416491 * 17:38 +02 6k Greg Kroah-Hartma [PATCH 4.14 04/69] net: fec: fix the clk mismatch in failed_reset path < [ Upstre 416492 * 17:38 +02 6k Greg Kroah-Hartma [PATCH 4.14 14/69] net: mvneta: Fix err code path of probe < [ Upstre 416502 * 17:39 +02 6k Greg Kroah-Hartma [PATCH 4.14 30/69] USB: sisusbvga: fix oops in error path of sisusb_probe < comm 416537 * 17:38 +02 6k Greg Kroah-Hartma [PATCH 4.19 10/73] USB: sisusbvga: fix oops in error path of sisusb_probe < comm 416689 * 17:38 +02 6k Greg Kroah-Hartma [PATCH 5.1 10/85] USB: sisusbvga: fix oops in error path of sisusb_probe < commi 18:56:52 0 [~/Mail/kernel-patches] scan `pick +linux-kernel last:3000 -from gre...@linuxfoundation.org -and -subject 'path$' -list` | more 416491 * 17:38 +02 6k Greg Kroah-Hartma [PATCH 4.14 04/69] net: fec: fix the clk mismatch in failed_reset path < [ Upstre 18:57:08 0 [~/Mail/kernel-patches] pgpjYVXoVMbUk.pgp Description: PGP signature -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers