Re: [nmh-workers] Stupid 'pick' question...

2019-06-13 Thread Ralph Corderoy
Hi Bakul,

> The reason being the body was MIME encoded because of UTF-8 even
> though it is plain text.

I think that's why David and others runs all their emails on delivery
through mhfixmsg(1)'s `-decodetext'.

> Content-Transfer-Encoding: base64

Which of us hasn't worked out the possible encodings and searched for
those?  :-)

$ for p in '' _ __; do base64 <<<${p}foobar; done
Zm9vYmFyCg==
X2Zvb2Jhcgo=
X19mb29iYXIK
$
$ for p in nul u to tri; do
> base64 <<<${p}foobarxyzzy | egrep 'Zm9vYmFy|Zvb2Jhc|mb29iYX'
> done
bnVsZm9vYmFyeHl6enkK
dWZvb2Jhcnh5enp5Cg==
dG9mb29iYXJ4eXp6eQo=
dHJpZm9vYmFyeHl6enkK
$

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-12 Thread Bakul Shah
On 12 Jun 2019 08:46:43 -0600 "Andy Bradford"  wrote:
> [ part  - text/plain -   577B  ]
> Thus said "Valdis Kl?tnieks" on Sat, 08 Jun 2019 21:26:46 -0400:
> 
> > In  a world  of Microsoft  Office  attachments, is  having -search  go
> > through the body by default as well  still a good idea? Maybe having a
> > separate -searchbody would be better?
> 
> Hard to  say what  it *should*  be. In my  environment, the  majority of
> messages that I use -search with  do have searchable content in the body
> of the message, so I've never really been concerned about it.
> 
> Andy
> --
> TAI64 timestamp: 40005d011078

It's funny you say that! I did

pick -search Micro cur

"cur" being your message, and the search failed. The reason
being the body was MIME encoded because of UTF-8 even though
it is plain text.  Here is a sample of what you see with

show -noshowproc

Cc: nmh-workers@nongnu.org
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: nmh-workers-bounces+bakul=bitblocks@nongnu.org
Sender: "nmh-workers" 

VGh1cyBzYWlkICJWYWxkaXMgS2w/dG5pZWtzIiBvbiBTYXQsIDA4IEp1biAyMDE5IDIxOjI2OjQ2
IC0wNDAwOgoKPiBJbiAgYSB3b3JsZCAgb2YgTWljcm9zb2Z0ICBPZmZpY2UgIGF0dGFjaG1lbnRz
LCBpcyAgaGF2aW5nIC1zZWFyY2ggIGdvCj4gdGhyb3VnaCB0aGUgYm9keSBieSBkZWZhdWx0IGFz
IHdlbGwgIHN0aWxsIGEgZ29vZCBpZGVhPyBNYXliZSBoYXZpbmcgYQo+IHNlcGFyYXRlIC1zZWFy
Y2hib2R5IHdvdWxkIGJlIGJldHRlcj8KCkhhcmQgdG8gIHNheSB3aGF0ICBpdCAqc2hvdWxkKiAg



-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-12 Thread Andy Bradford
Thus said "Valdis Kl?tnieks" on Sat, 08 Jun 2019 21:26:46 -0400:

> In  a world  of Microsoft  Office  attachments, is  having -search  go
> through the body by default as well  still a good idea? Maybe having a
> separate -searchbody would be better?

Hard to  say what  it *should*  be. In my  environment, the  majority of
messages that I use -search with  do have searchable content in the body
of the message, so I've never really been concerned about it.

Andy
-- 
TAI64 timestamp: 40005d011078



-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-09 Thread Ken Hornstein
>The bigger issue is that -search seems to blindly match the
>msg body. It doesn't even do mime decoding. So for example the
>body of my prev msg to nmh is base64 encoded (even though it
>is plain text) and -search fails. Unless I specify a search
>pattern of encoded text such as 9uIEJpZyBlbmRpYW!

pick(1) not doing MIME decoding is clearly a bug, and as far as I know
there has been no disagreement on that.  The plan is to fix that in the
Great Mime Rewrite.  Someday.

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-09 Thread Bakul Shah
On Jun 9, 2019, at 1:49 AM, Ralph Corderoy  wrote:
> 
> Hi Bakul,
> 
>> So pick runs -search on header lines as well as the body a header
>> specific option is only run against headers.
> 
> pick(1):
> 
>   This means that the pattern specified for a -search will be found
>   everywhere in the message, including the header and the body, while the
>   other pattern matching requests are limited to the single specified
>   component.
> 
>> And pick matches header line *after* line folding, as tested with
>> Received:
> 
>   Pattern matching is performed on a per-line basis.  Within the header
>   of the message, each component is treated as one long line, but in the
>   body, each line is separate.
> 
>> Conclusion: the man page is not quite accurate! But that is probably
>> ok. 
> 
> Based on the above extracts, it doesn't seem too bad.
> Is there something in particular you think's missing?

No, it is quite accurate. Thanks for doing a better job
of reading than me!

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-09 Thread Ralph Corderoy
Hi Valdis,

> In a world of Microsoft Office attachments, is having -search go
> through the body by default as well still a good idea? Maybe having a
> separate -searchbody would be better?

I think -search should be left alone, but there's previous discussion on
this list about a -header and -body that do -search's work but just on
the appropriate `half'.

If you're doing it a lot and many of the emails have large bodies then a
copy of the folder with all bodies deleted would let you search just the
headers and come up with the same message numbers for the original
folder.

Alternatively, use egrep(1) to do the initial filtering on just the
headers, avoiding paging in all those bodies.

shopt -s extglob
cd `mhpath`
egrep -im1 '^$|from.*ralph' [1-9]*([0-9]) |
sed -n 's/:..*//p' |
awk 'END {if (!NR) print 0}' |
xargs -r pick -list -from ralph

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-09 Thread Ralph Corderoy
Hi Bakul,

> So pick runs -search on header lines as well as the body a header
> specific option is only run against headers.

pick(1):

   This means that the pattern specified for a -search will be found
   everywhere in the message, including the header and the body, while the
   other pattern matching requests are limited to the single specified
   component.

> And pick matches header line *after* line folding, as tested with
> Received:

   Pattern matching is performed on a per-line basis.  Within the header
   of the message, each component is treated as one long line, but in the
   body, each line is separate.

> Conclusion: the man page is not quite accurate! But that is probably
> ok. 

Based on the above extracts, it doesn't seem too bad.
Is there something in particular you think's missing?

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-09 Thread Ralph Corderoy
Hi kre,

> > Which it what happens at the moment, so it wouldn't be backwards
> > compatible.
>
> No, it wouldn't - but does anyone really think that matters?

Only to the extent it's worthy of a line in the release notes.

I send myself short emails with leading punctuation in the Subject field
to categorise them, e.g. `+'.  I don't use `^', but it's not that
impossible given its `up arrow', `top of the list', `important',
appearance.

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-09 Thread Ralph Corderoy
Hi Valdis,

> > -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]'
>
> [~] grep ^Subject Mail/linux-kernel/321805
> Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian
> [~] scan `pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' 
> -and -from gre...@linuxfoundation.org -list`
> 321805  *  Thu 21Feb  7k Greg Kroah-Hartma  Re: [PATCH 4.9 04/20] 
> net: Fix for_each_netdev_feature on Big endian < There's still something busticated here.  Why did it match even with
> the Re: in there?

Your grep is looking for Search at the start of a line, your pick isn't.
The email has the original, non-Re:, subject in the email's body.
https://lkml.org/lkml/2019/2/21/975

>A  modified  grep(1)  is used to perform the matching, so the full regular 
> expression
>(see ed(1)) facility is available within pattern.   With  -search,  
> pattern  is  used
>directly, and with the others, the grep pattern constructed is:

That's ugly formatting.  I find «export MANOPT='--nh --nj'» helps a lot
with the man(1) here.

> Also, saw this under 'BUG' in the pick manpage:
>
> The pattern syntax '[l-r]' is not supported; each letter to be matched
> must be included within the square brackets.

I think Paul Fox fixed that back in 2006.
http://git.savannah.nongnu.org/cgit/nmh.git/commit/?id=dc0b0be755b41f3c195913631fedf023ad69192e

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Bakul Shah
On Jun 8, 2019, at 6:26 PM, Valdis Klētnieks  wrote:
> 
> On Sat, 08 Jun 2019 17:17:40 -0700, Bakul Shah said:
>> 
>> So pick runs -search on header lines as well as the body a header specific
>> option is only run against headers.
> 
> In a world of Microsoft Office attachments, is having -search go through the
> body by default as well still a good idea? Maybe having a separate -searchbody
> would be better?

The bigger issue is that -search seems to blindly match the
msg body. It doesn't even do mime decoding. So for example
the body of my prev msg to nmh is base64 encoded (even though
it is plain text) and -search fails. Unless I specify a search
pattern of encoded text such as 9uIEJpZyBlbmRpYW!
-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Valdis Klētnieks
On Sat, 08 Jun 2019 17:17:40 -0700, Bakul Shah said:
>
> So pick runs -search on header lines as well as the body a header specific
> option is only run against headers.

In a world of Microsoft Office attachments, is having -search go through the
body by default as well still a good idea? Maybe having a separate -searchbody
would be better?

(Due to events not under my control, I'm currently stuck on a laptop with a
Celeron CPU, and disk I/O is painful (it's managing only 3-5Mbytes/sec even
with an SSD in it... Ouch)


pgpZUdgvZqcDg.pgp
Description: PGP signature
-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Bakul Shah
On Jun 8, 2019, at 11:03 AM, Valdis Klētnieks  wrote:
> 
> On Fri, 07 Jun 2019 16:19:15 -0700, Bakul Shah said:
>> You can directly use search as follows:
>> 
>>  -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]'
> 
> [~] grep ^Subject Mail/linux-kernel/321805
> Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian
> [~] scan `pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' 
> -and -from gre...@linuxfoundation.org -list`
> 321805  *  Thu 21Feb  7k Greg Kroah-Hartma  Re: [PATCH 4.9 04/20] 
> net: Fix for_each_netdev_feature on Big endian < [~] scan `pick +linux-kernel 321805 -search 'Subject: \[WOMBAT [45]\.[0-9]' 
> -and -from gre...@linuxfoundation.org -list`
> pick: no messages match specification
> scan: no messages match specification


Let us look at what you see (after some cleanup):

  $ grep ^Subject Mail/linux-kernel/321805
  Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian
  $ pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' -and -from 
gre...@linuxfoundation.org
  321805
  $ pick +linux-kernel 321805 -search 'Subject: \[WOMBAT [45]\.[0-9]' -and 
-from gre...@linuxfoundation.org 
  pick: no messages match specification

This makes sense. But note that pick does treat header lines specially.
Using your message as an example:

  $ grep ^Subject: `mhpath cur`
  Subject: Re: [nmh-workers] Stupid 'pick' question...
  Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian

The second Subject: line is from the message body.

  $ pick -subj PATCH cur
  pick: no messages match specification
  $ pick -search 'Subject:.*PATH' cur
  14
  $ pick -search 'Subject:.*nmh' cur
  14

So pick runs -search on header lines as well as the body a header specific
option is only run against headers.

And pick matches header line *after* line folding, as tested with Received:

  pick --received 'bakul' cur -- matches but my name is not on the same line as 
Received:
  pick --received '\]\)' cur  -- match )] on the first line
  pick --received '\]\)$' cur -- no match even though the first Received: line 
ends wih )]
  pick --received '0400$' cur -- match as 0400 ends the last line of a 
Received: field.

Conclusion: the man page is not quite accurate! But that is probably ok. 


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Robert Elz
Date:Sat, 08 Jun 2019 14:03:12 -0400
From:"Valdis Kl=?utf-8?Q?=c4=93?=tnieks" 
Message-ID:  <13857.1560016992@turing-police>


  | I understand why that .* was causing me indigestion.  But I'm having a hard
  | time matching "pattern is used directly" with what I'm seeing,
  | unless -search is *also* doing a split into component and pattern

I haven't looked, but it is not impossible that (perhaps any number of) "Re:"
get deleted from Subject headers before matching (or perhaps it attempts a
match both with and without) - that shouldn't be needed with a -subject
search, but with -search it would allow users to easily find all messages
in a thread (with and without Re: stuck in there).

I also wouldn't be surpriused to see mailing-list noise elided before
matching is attempted (the '[nmh-workers]' that gets installed in all of
these messages).

I have seen systems in the past which acted like that.   It tends to more
often DTRT than otherwise (but when it is otherwise, it is certainly
perplexing).

However, when I try it, I don't see that (either of them) ...

jinx$ scan $( pick -subject "Stupid 'pick"  +info/nmh )
11777   Fri  "Valdis Kl?tnieks  [nmh-workers] Stupid 'pick' question...<<--=
11778   Fri  Bakul Shah Re: [nmh-workers] Stupid 'pick' question...<https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Robert Elz
Date:Sat, 08 Jun 2019 11:04:27 +0100
From:Ralph Corderoy 
Message-ID:  <20190608100427.4d1c921...@orac.inputplus.co.uk>

  | Which it what happens at the moment, so it wouldn't be backwards
  | compatible.

No, it wouldn't - but does anyone really think that matters?

That is, is anyone, anywhere, going to attempt to match a ^ at the
start of a field matching pattern pattern without escaping it (either
as \^ or using [^]) ?

(Even if they do know that nmh prepends more pattern to the start.)

kre


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Valdis Klētnieks
On Fri, 07 Jun 2019 16:19:15 -0700, Bakul Shah said:
> You can directly use search as follows:
>
>   -search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]'

 [~] grep ^Subject Mail/linux-kernel/321805
Subject: Re: [PATCH 4.9 04/20] net: Fix for_each_netdev_feature on Big endian
[~] scan `pick +linux-kernel 321805 -search 'Subject: \[PATCH [45]\.[0-9]' -and 
-from gre...@linuxfoundation.org -list`
321805  *  Thu 21Feb  7k Greg Kroah-Hartma  Re: [PATCH 4.9 04/20] net: 
Fix for_each_netdev_feature on Big endian <

pgprtuNFrLPxk.pgp
Description: PGP signature
-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Ralph Corderoy
Hi kre,

> Maybe it would be possible to look for a leading ^ in the user's
> pattern (for other than -search), and if found, remove it, and replace
> the ".*" that's inserted into the RE with "[ \t]*"  ?

Sounds like a good idea.

> Certainly no-one who uses a leading ^ is expecting it to attempt to
> match a literal '^'.

Which it what happens at the moment, so it wouldn't be backwards
compatible.

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Robert Elz
Date:Sat, 08 Jun 2019 08:45:17 +0100
From:Ralph Corderoy 
Message-ID:  <20190608074517.a586020...@orac.inputplus.co.uk>

  | Bakul answered about the anchors.

Maybe it would be possible to look for a leading ^ in the user's
pattern (for other than -search), and if found, remove it, and
replace the ".*" that's inserted into the RE with "[ \t]*"  ?

That would, I suspect, be more useful, and more in accordance with
what users might expect to happen.   Certainly no-one who uses a
leading ^ is expecting it to attempt to match a literal '^'.

kre


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-08 Thread Ralph Corderoy
Hi Valdis,

> pick -from  -subject '\[PATCH [45]\.[0-9]'
...
> However, it *also* catches messages of the form 'Subject: Re: [PATCH
> ' which is unacceptable for the use case in question.

Bakul answered about the anchors.  Another approach is to rule out
replies.

-sub foo -and -not -sub re:
-sub foo -and -not --references '_*'

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-07 Thread Bakul Shah
But more often I just use grep or agrep by cd-ing to the right folder:

cd ~/Mail/
pick + -from  -and -subj  ... -seq foo
pick foo | xargs agrep -li 'this;that' | xargs scan

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] Stupid 'pick' question...

2019-06-07 Thread Bakul Shah
On Fri, 07 Jun 2019 19:00:52 -0400 "Valdis =?utf-8?Q?Kl=c4=93tnieks?=" 
 wrote:
>
> So trying to work with the linux-kernel mailing list firehose (800-1500
> messages a day), and hitting a problem with 'pick'.
>
> Am trying to match all messages from a given person with a given part of
> a subject line.
>
> pick -from  -subject '\[PATCH [45]\.[0-9]'
>
> *almost* does what I want - catch all messages that have '[PATCH 4.9]'
> or '[PATCH 4.14]'  or '[PATCH 5.0]'.  However, it *also* catches messages
> of the form 'Subject: Re: [PATCH ' which is unacceptable for the use case
> in question.
>
> So I tried an anchored search using -subject '^\[PATCH [45]\.[0-9]' but that
> results in nothing matching. So much for this from the manpage:

man pick
to see why.

-subject '^foo'
is equivalent to
-search "subject[ \t]*:.*^foo'
which won't match anything.

> Oddly enough, $ for tail-anchor seems to work:

This makes sense since
 -subj 'foo$' is
is 
-search "subject[ \t]*:.*foo$'

You can directly use search as follows:

-search 'Subject[ \t]:[ \t]*\[PATCH [45]\.[0-9]'


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

[nmh-workers] Stupid 'pick' question...

2019-06-07 Thread Valdis Klētnieks
So trying to work with the linux-kernel mailing list firehose (800-1500
messages a day), and hitting a problem with 'pick'.

Am trying to match all messages from a given person with a given part of
a subject line.

pick -from  -subject '\[PATCH [45]\.[0-9]'

*almost* does what I want - catch all messages that have '[PATCH 4.9]'
or '[PATCH 4.14]'  or '[PATCH 5.0]'.  However, it *also* catches messages
of the form 'Subject: Re: [PATCH ' which is unacceptable for the use case
in question.

So I tried an anchored search using -subject '^\[PATCH [45]\.[0-9]' but that
results in nothing matching. So much for this from the manpage:

   A modified grep(1) is used to perform the matching, so the  full  regular
   expression  (see  ed(1))  facility  is  available  within  pattern.  With
   -search, pattern is used directly, and with the others, the grep  pattern
   constructed is:

So is ^ to anchor the search in fact unsupported?  Broken? I'm using it wrong?

Oddly enough, $ for tail-anchor seems to work:

18:56:33 0 [~/Mail/kernel-patches] scan `pick +linux-kernel last:3000 -from 
gre...@linuxfoundation.org -and -subject 'path' -list` | more
416491  *  17:38 +02  6k Greg Kroah-Hartma  [PATCH 4.14 04/69] net: 
fec: fix the clk mismatch in failed_reset path < [ Upstre
416492  *  17:38 +02  6k Greg Kroah-Hartma  [PATCH 4.14 14/69] net: 
mvneta: Fix err code path of probe < [ Upstre
416502  *  17:39 +02  6k Greg Kroah-Hartma  [PATCH 4.14 30/69] USB: 
sisusbvga: fix oops in error path of sisusb_probe < comm
416537  *  17:38 +02  6k Greg Kroah-Hartma  [PATCH 4.19 10/73] USB: 
sisusbvga: fix oops in error path of sisusb_probe < comm
416689  *  17:38 +02  6k Greg Kroah-Hartma  [PATCH 5.1 10/85] USB: 
sisusbvga: fix oops in error path of sisusb_probe < commi
18:56:52 0 [~/Mail/kernel-patches] scan `pick +linux-kernel last:3000 -from 
gre...@linuxfoundation.org -and -subject 'path$' -list` | more
416491  *  17:38 +02  6k Greg Kroah-Hartma  [PATCH 4.14 04/69] net: 
fec: fix the clk mismatch in failed_reset path < [ Upstre
18:57:08 0 [~/Mail/kernel-patches]



pgpjYVXoVMbUk.pgp
Description: PGP signature
-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers