Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Robert Elz
Date:Thu, 02 Mar 2023 09:49:23 +0300
From:Greg Minshall 
Message-ID:  <814300.1677739763@archlinux>

  | bash archlinux (master): {49603} ls -a 74600607886815/

That would do it.

  | and, i guess, flist decided that
  | (under the ~/Mail/MHE-INDEX folder) was a message number?
  |
  | does that make sense?

yes.

  | i guess mh-e could not create such subfolders
  | with names consisting only of decimal integers (i have some
  | hexadecimal-named folders which don't seem to give a problem).

That, or have the index put in some tree outside your mh mail tree.

  | or, i could not search for such.

That sounds a bit draconian

  | or, maybe flist (or, nmh in general?) could
  | not think that a directory was a message?

On a standard system that would require a stat of every potential
message number named like directory entry.  If that was to be done
flist would need to be renamed to be slist instead.

On filesystems that support d_type in the directory entries it would
be possible, but would need the stat() fallback whenever it sees
DT_UNKNOWN which on many systems is lukely to be always.

Better would be to fix mhe to always add a 1 char non-numeric
prefix to directories it creates (perhaps '_', even ' '), and then
there is no confusion any more.

kre



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Ken Hornstein
>it seems that at some point i had done a search for 74600607886815 (your
>basic "magic number" :).  mh-e, i guess, had created a directory with
>that number as its name (it uses the search term to name subfolders
>under the normal mhe-index folder).  and, i guess, flist decided that
>(under the ~/Mail/MHE-INDEX folder) was a message number?
>
>does that make sense?  i guess mh-e could not create such subfolders
>with names consisting only of decimal integers (i have some
>hexadecimal-named folders which don't seem to give a problem).  or, i
>could not search for such.  or, maybe flist (or, nmh in general?) could
>not think that a directory was a message?

The loop in folder_read() that is scanning for messages is this:

while ((dp = readdir (dd))) {
if ((msgnum = m_atoi (dp->d_name)) && msgnum > 0) {
[...]

So if the directory entry is a positive decimal integer, nmh (and MH
before it) considers it a message.  Robert already explained the issues
involved; stat()ing every file to determine if it was a file or not
would be prohibitively slow (and this would affect every nmh program;
almost everything calls folder_read()), and using d_type isn't portable.

I think we have to push this back on the MH-E people; Robert's
suggestion to add a non-numeric prefix to directories it creats sounds
like the best answer to me.

--Ken



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Ralph Corderoy
Hi,

Ken wrote:
> I think we have to push this back on the MH-E people; Robert's
> suggestion to add a non-numeric prefix to directories it creats sounds
> like the best answer to me.

$ refile +31415 

Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Ken Hornstein
>> I think we have to push this back on the MH-E people; Robert's
>> suggestion to add a non-numeric prefix to directories it creats sounds
>> like the best answer to me.
>
>$ refile +31415 $ folder +31415
>31415+ has 1 message   (1-1).

I'm aware of that, but what happens if you have a subfolder that is all
numeric?  I believe all of the nmh tools will treat that subfolder as
a message (that's the real issue).

--Ken



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Ralph Corderoy
Hi Ken,

> > > I think we have to push this back on the MH-E people
...
> >$ refile +31415  >$ folder +31415
> >31415+ has 1 message   (1-1).
>
> I'm aware of that, but what happens if you have a subfolder that is
> all numeric?  I believe all of the nmh tools will treat that subfolder
> as a message

$ ref +3/1/4/1/5
Create folder "/home/ralph/mail/3/1/4/1/5"? yes
$ folder +3/1/4/1/5
3/1/4/1/5+ has 1 message   (1-1).
$ scan -forma %{from} +3/1/4/1/5 1
Ken Hornstein 
$ scan -forma %{from} +3/1/4/1
scan: unable to read: Is a directory
scan: scan() botch (-3)
$

> (that's the real issue).

The real issue is nmh doesn't forbid folders named with just decimal
digits and even creates them when requested.  MH-E is set a bad example.

-- 
Cheers, Ralph.



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Simon Burge
Ken Hornstein wrote:

> Exactly HOW many messages are in mhe-index?
>
> Ah, I think I see what's happening.  That line is this:
>
>   mp->msgstats = mh_xmalloc (MSGSTATSIZE(mp));
>
> MSGSTATSIZE is defined as:
>
> #define MSGSTATSIZE(mp) ((mp)->num_msgstats * sizeof *(mp)->msgstats)
>
> num_msgstats is set by the previous line:
>
> mp->num_msgstats = MSGSTATNUM (mp->lowoff, mp->hghoff);
>
> Which is defined as:
>
> #define MSGSTATNUM(lo, hi) ((size_t) ((hi) - (lo) + 1))
>
> So ... the summary here is that nmh (and MH before it) allocates a
> "message status" element for every possible message.  The possible
> number of messages is the range between the LOWEST message number and
> the HIGHEST message number.  So if you just had 100 and 1002 in
> a folder, it would allocate 3 elements.  But if you had 1 and 100,
> it would allocate a million elements.  A msgstat structure is an array
> of "struct bvector" which might be ... 8 + 8 + 16 bytes per message on
> a 64 bit platform.  That suggests there are either 1320920404 messages
> in that folder (1.2 billion) or there's a huge message number gap (that
> has come up before when someone had a huge gap; the my memory is the
> consensus was you just had to deal).

Possibly somewhat related, Greg mentioned he uses mairix for search.
mairix produces very "sparse" results folders.  For example:

thoreau 52115> mairix caffeine
Matched 111 messages
thoreau 52116> f +vfolder
vfolder+ has 111 messages  (47-782143); cur=650783.

>From Ken's description above, these 111 messages would allocate almost
800,000 msgstat structures.  I don't know how huge the message numbers
get in the results folder, but six digits is common.  I don't recall if
I've seen seven digit or larger message numbers.

Cheers,
Simon.



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Conrad Hughes
Simon> Possibly somewhat related, Greg mentioned he uses mairix for
Simon> search.  mairix produces very "sparse" results folders.

I use mairix and have never witnessed this.  A quick experiment shows
that it's because I use

  sort=date+

in my .mairixrc.  At a guess, the default unsorted numbering system must
use the emails' positions in mairix's own index, which could obviously
get quite high, given a big archive.

An odd choice.  Try using "sort=date+" if that's acceptable.

Conrad



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Robert Elz
Date:Thu, 02 Mar 2023 13:33:01 +
From:Ralph Corderoy 
Message-ID:  <20230302133301.8900121...@orac.inputplus.co.uk>

  | The real issue is nmh doesn't forbid folders named with just decimal
  | digits and even creates them when requested.  MH-E is set a bad example.

That's true, but nmh doesn't just create folders on a whim, only when
the user requests it, and if the user requests a folder name that looks
like a message number, well, it is their problem to cause...   mh-e is
(apparently, I don't use it, or anything emacsish) doing in behind the
user's back, in a sense - and has no need to really.

kre




Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Simon Burge
Hi Conrad,

Conrad Hughes wrote:

> Simon> Possibly somewhat related, Greg mentioned he uses mairix for
> Simon> search.  mairix produces very "sparse" results folders.
>
> I use mairix and have never witnessed this.  A quick experiment shows
> that it's because I use
>
>   sort=date+
>
> in my .mairixrc.  At a guess, the default unsorted numbering system must
> use the emails' positions in mairix's own index, which could obviously
> get quite high, given a big archive.
>
> An odd choice.  Try using "sort=date+" if that's acceptable.

Ahh, I see that there have been two "recent" commits to mairix:

  17 Jan 2020 - MH search results are sequentially numbered from 1. 
  17 Jan 2020 - Add "sort=date+" option, and renumber MH results.

Unfortunately(?) I'm still using the latest release version which
is 0.24 from 14 Aug 2017 :/

Cheers,
Simon.



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Ken Hornstein
>From Ken's description above, these 111 messages would allocate almost
>800,000 msgstat structures.  I don't know how huge the message numbers
>get in the results folder, but six digits is common.  I don't recall if
>I've seen seven digit or larger message numbers.

I see Conrad pointed out that if you set "sort=date+" in your .mairixc
then this resolves this issue (but I do not know if that has negative
side effects or if that interacts badly with MH-E).

This does suggest to me we should probably change the internal API
so sparse message ranges are handled better; right now all of the
programs access the folder structure members directly and assume that
there will be a msgstat structure in every location in the array.
Sigh.  One more thing to add to the list.

--Ken



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Howard Bampton
On Thu, Mar 2, 2023 at 11:27 AM Ken Hornstein  wrote:

>
>
> This does suggest to me we should probably change the internal API
> so sparse message ranges are handled better; right now all of the
> programs access the folder structure members directly and assume that
> there will be a msgstat structure in every location in the array.
> Sigh.  One more thing to add to the list.
>
>
If I understand the problem correctly:
a folder with the highest message number of "N" will cause the array to be
configured to support N messages, even if there are many fewer (perhaps
even one) messages
stat()ing every file in a folder to make sure it is a file (message)
instead of a directory (folder) is very expensive (and harms the
performance of other programs where this isn't important and is thus a
no-go)

I assume we want "close enough" scaling, not perfect. Would not the
following work well enough?

Scale the array based upon the number of directory entries in the folder.
This will over commit due to subfolders being counted, scratch files, and
deleted messages. It seems this would only over commit in interesting cases
by 3x (baseline of 1 covers the messages, the 2nd set is scratch files and
deleted messages, and 3 is subfolders). Short of malicious actions, you'd
end up with, maybe 5x (message, extracted parts of the message, deleted
message, folders that look like message numbers). If you want more
compactness, you take pains to dump the stuff that isn't a message number
(the aforementioned extracted parts and deleted messages).

Or am I missing something about filesystem internals?


Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Ken Hornstein
>a folder with the highest message number of "N" will cause the array to be
>configured to support N messages, even if there are many fewer (perhaps
>even one) messages

No, that's not correct.  If you have a single message in a folder with a
count of 100, you only get one entry allocated.  The number of entries
allocated is based on the difference between the lowest and highest
message number.

>Scale the array based upon the number of directory entries in the folder.
>This will over commit due to subfolders being counted, scratch files, and
>deleted messages. It seems this would only over commit in interesting cases
>by 3x (baseline of 1 covers the messages, the 2nd set is scratch files and
>deleted messages, and 3 is subfolders). Short of malicious actions, you'd
>end up with, maybe 5x (message, extracted parts of the message, deleted
>message, folders that look like message numbers). If you want more
>compactness, you take pains to dump the stuff that isn't a message number
>(the aforementioned extracted parts and deleted messages).

It's not filesystem internals that is the issue, it's (n)mh internals.

Right now the msgstats array is indexed by taking the message number and
subtracting the value of the lowest message number.  Obviously there are
much better ways to deal with this, but all of the nmh code directly
accesses the msgstats array.  And of course time is not infinite so
someone who HAS time would have to roll up their sleeves to fix it.

(A general assumption is that there are few holes in nmh message
numbers and this is reflected in more locations than just this).

--Ken



Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread George Michaelson
I've always done sortm -verbose after big delete fests. verbose because I
love watching the towers of hanoi shuffle along.

lots of GUI mail systems have 'compact mailbox' command options. I assumed
that everyone did periodic tidyup anyway.

I'm not saying this isn't a problem. But, I seriously wonder how BIG a
problem this is. If you can renumber out of it, then isn't that a viable
work-around?

-G


Re: flist -- "Killed" -- oom (*not* 1.8 related)

2023-03-02 Thread Greg Minshall
Robert, et al., thanks very much.

possibly mh-e could add something like a comma before integers.  i'll
ask and look.

on the more general issue, you all know a lot more about all of this
than me.  but ... :)

while actual bytes of memory on my laptop are semi-precious, addresses
in the address space are much less so.  here's somebody who uses mmap(2)
to allocate a huge chunk of address space, and then madvise(2) (a call i
think i've never used) to have that chunk backed by (lots and lots of)
zeroes.

https://robert.ocallahan.org/2016/06/managing-vast-sparse-memory-on-linux.html


i get the sense that nmh will only (after maybe zeroing the array, which
would be eliminated in this scenario!) access locations in the array
corresponding to actual "messages" found in the directory.  so, this
sparse array should stay sparse, right?

this wouldn't solve all problems -- places where the difference between
the largest and smallest "message" numbers is greater than the size of
the address space (+/-).

but, i wonder if it might help.  maybe combined with those places where
d_type is supported (and not "unknown") in directory entries.  at least
as a temporary fix?

and, if the mmap(2) call fails, that's maybe a way to provide a more
graceful termination than lots of sluggishness, then OOM.

cheers, Greg