Re: Very large folderTo:

2021-06-05 Thread Ken Hornstein
>Starting in late 2014 I have stopped deleting messages, putting them in a
>directory, +gone, which now contains 465,147 messages and uses about 17
>gigabytes. The bulk of these messages were of transitory or of less interest
>to me. But they include 1,702 messages from my daughter. They were almost all
>of no interest or use to me within a day or two of when she sent them. But she
>recently died (the worst thing by far that's ever happened to me). Now every
>byte she ever wrote is precious to me. So I am glad that I stopped deleting
>messages that I no longer care about.

First off, please accept my sympathies for this unimaginable tragedy.

>So, what is the likelihood of such a bug? Does anybody have any experience
>dealing with such large folders?

I can't think of any _buffer overflows_ that might happen; this isn't
anything out of the ordinary, except that it's a very large number of
messages.  What I think you might bump up against are virtual memory
limits, but even then I suspect you're fine.

There's a number of things that are allocated when a folder is read
(in the function folder_read()).  From what I see, the ones that are
affected by the number of messages in the folder are:

- The "message number" array, which holds the message number for each
  message.  That's an int, so 4 bytes per message on most platforms.
  But it is free()d after folder_read() is done, which seems 
  sub-optimal?  Doing better here might be hard, though.  It would certainly
  be more complex.  We could do something smarter about message numbers
  that are contiguous that would cut down on this memory usage a lot.

- The msgstats array, which is ... an array of struct bvector.  A struct
  bvector looks like .. a pointer, size_t, two unsigned long.  Call it
  32 bytes on a 64 bit platform, maybe?  It looks like we only set 4
  bits possible for each message, so we don't use anything more than
  that size; with the exception of sequence membership flags.  If you
  have a lot of sequences in that folder, it's possible you could get
  something more than that (you'd need ... more than 60 sequences in
  a single folder before it affected anything).  It's possible my
  quick math is wrong, but I think that it's probably close.

So by my count, that's 1.9 MB of memory that gets free()d and 
14.9 MB of memory for that folder's structure.  Which, in 2021, does not
seem like a lot!  MH and nmh were always a bit casual with memory
management since all of the programs are short-lived, but I think you
should be fine.  All of the calls to malloc() are wrapped using
mh_xmalloc() and friends which call die() if a call to malloc() fails.

--Ken



Re: Message header formatting

2021-06-05 Thread Jon Steinhart
Ken Hornstein writes:
> >> Brilliant idea! I too would use an inverse match logic. Shorter rules,
> >> easier to apply, probably faster.
> >>
> >> G
> >
> >I guess that I should interpret that as no, there isn't such an incantation
> >but since I brought it up then it's my job to write the code so I will when
> >I get a chance.
>
> You don't need to write anything.  From mhl(1):
>
>The component "Extras" will output all of the components of the message
>which were not matched by  explicit  components,  or  included  in  the
>ignore list.  If this component is not specified, an ignore list is not
>needed since all non-specified components will be ignored.
>
> So just remove "extras".
>
> --Ken

Ah, thanks.



Re: Message header formatting

2021-06-05 Thread Philipp Takacs
Hi

[2021-06-05 15:16] Jon Steinhart 
> Is there any incantation for "show only the headers explicitly listed
> in mhl.format" so that new and uninteresting headers from everybody's
> latest spam filter, mailing list manager, and internal tracking don't
> fill the screen.

You can remove the ``extras'' component from your format file.

See mhl(1):

> The component "Extras" will output all of the components of the
> message which were not matched by explicit components, or included in
> the ignore list.

Philipp



Re: Message header formatting

2021-06-05 Thread Ken Hornstein
>> Brilliant idea! I too would use an inverse match logic. Shorter rules,
>> easier to apply, probably faster.
>>
>> G
>
>I guess that I should interpret that as no, there isn't such an incantation
>but since I brought it up then it's my job to write the code so I will when
>I get a chance.

You don't need to write anything.  From mhl(1):

   The component "Extras" will output all of the components of the message
   which were not matched by  explicit  components,  or  included  in  the
   ignore list.  If this component is not specified, an ignore list is not
   needed since all non-specified components will be ignored.

So just remove "extras".

--Ken



Re: Message header formatting

2021-06-05 Thread Jon Steinhart
George Michaelson writes:
>
> Brilliant idea! I too would use an inverse match logic. Shorter rules,
> easier to apply, probably faster.
>
> G

I guess that I should interpret that as no, there isn't such an incantation
but since I brought it up then it's my job to write the code so I will when
I get a chance.

Jon



Re: Message header formatting

2021-06-05 Thread George Michaelson
Brilliant idea! I too would use an inverse match logic. Shorter rules,
easier to apply, probably faster.

G

On Sun, 6 Jun 2021, 8:17 am Jon Steinhart,  wrote:

> I've been getting increasingly annoyed at the number of header lines
> that fill a screen or two before I can even see message contents.  I
> keep adding to an already huge ignores line in my mhl.format but new
> headers seem to be created daily.
>
> Is there any incantation for "show only the headers explicitly listed
> in mhl.format" so that new and uninteresting headers from everybody's
> latest spam filter, mailing list manager, and internal tracking don't
> fill the screen.
>
> BTW, when looking at this I noticed that while the mhl man page has
> examples for the ignores variable that it's missing from the list of
> variables on that page.
>
> Jon
>
>


Message header formatting

2021-06-05 Thread Jon Steinhart
I've been getting increasingly annoyed at the number of header lines
that fill a screen or two before I can even see message contents.  I
keep adding to an already huge ignores line in my mhl.format but new
headers seem to be created daily.

Is there any incantation for "show only the headers explicitly listed
in mhl.format" so that new and uninteresting headers from everybody's
latest spam filter, mailing list manager, and internal tracking don't
fill the screen.

BTW, when looking at this I noticed that while the mhl man page has
examples for the ignores variable that it's missing from the list of
variables on that page.

Jon



Re: Very large folderTo:

2021-06-05 Thread George Michaelson
Its always been my belief that large folders cause multi level directory
block chaining in traditional UNIX fs. This itself incurs costs and
consequences on how the cross-system file buffer cache works. Basically,
any operation which requires all the directory blocks to be walked in
sequence flood kernel file buffers. It has impacts on other uses of the OS.

It is likely more modern FS like ZFS handle this differently but I don't
know, I've never seen an analysis.

Your system has cronjobs doing things like find . -type f -mtime  which
may run slower, you may be causing general systems slowdowns.

I think it would make sense to filter out the things you want.

I Share your problem, mails from now dead relatives it is exquisitely
painful for me to read but I am unwilling to delete, and the thought of
having to write filters to find and file them doesn't fill me with joy. On
the other hand, I have replicated the data because you have other risks:
disk media is fragile.

Don't have only one copy of these mails. A cloud mail provider like Google
might be a good backup, and has filter, search and tag options.

Cheers

G

On Sun, 6 Jun 2021, 7:10 am ,  wrote:

> Starting in late 2014 I have stopped deleting messages, putting them in a
> directory, +gone, which now contains 465,147 messages and uses about 17
> gigabytes. The bulk of these messages were of transitory or of less
> interest
> to me. But they include 1,702 messages from my daughter. They were almost
> all
> of no interest or use to me within a day or two of when she sent them. But
> she
> recently died (the worst thing by far that's ever happened to me). Now
> every
> byte she ever wrote is precious to me. So I am glad that I stopped deleting
> messages that I no longer care about.
>
> In practice this large folder has little impact on performance. For
> example,
> whenever I do a pick which is, or in a script which might be, +gone, I
> give it
> an argument like last:10. I could, if necessary split +gone into
> several
> smaller folders, but I would rather not. But I'm concerned that a bug in
> nmh
> might cause a problem. For example, some kind of a buffer overflow.
>
> So, what is the likelihood of such a bug? Does anybody have any experience
> dealing with such large folders?
>
>
>
>
>
>
> Norman Shapiro
>
> --
> Starting in late 2014 I have stopped deleting messages, putting them in
> a directory, +gone, which now contains 465,147 messages and uses
> about 17 gigabytes. The bulk of these messages were of transitory or of
> less
> interest to me.  But they  include 1,702 messages from my daughter.  They
> were almost
> all of no interest or use to me within a day or two of when she sent them.
> But she recently died (the worst thing by far that's ever happened to me).
> Now every byte she ever wrote is precious to me. So I am glad that I
> stopped
> deleting messages that I no longer care about.
>
> In practice this large folder has little impact on performance. For
> example,
> whenever I do a pick which is, or in a script which might be, +gone I give
> it an argument like last:10. I could, if necessary split +gone into
> several smaller folders, but I would rather not. But I'm concerned that a
> bug
> in nmh might cause a problem. For example, some kind of a buffer overflow.
>
> So, what is the likelihood of such a bug? Does anybody have any experience
> dealing with such large folders?
> such a large folder might
>
>
>
>
>
>
> Norman Shapiro
>
>


Re: Very large folder

2021-06-05 Thread Ralph Corderoy
Hi Norm,

> Starting in late 2014 I have stopped deleting messages, putting them
> in a directory, +gone, which now contains 465,147 messages and uses
> about 17 gigabytes.

That's far larger than my +cor (for correspondence) which is only 42,229
emails consuming just over 1 GiB.

> But they include 1,702 messages from my daughter. They were almost all
> of no interest or use to me within a day or two of when she sent them.
> But she recently died

I'm so very sorry to hear that.

> But I'm concerned that a bug in nmh might cause a problem. For
> example, some kind of a buffer overflow.

>From the kind of severe bugs I see, like segmentation-violation causing
ones, they typically affect the display of data about a message, or
perhaps sometimes the inc-orporation of a message.  I can't think of any
which have affected an existing folder of many messages.  Others will
pipe up with their views.

Given those 1,702 messages are precious, you may want to pick them
and refile them to a dedicated folder.  You could use ‘refile -link
+newfolder ...’ to keep them in +gone and just create hard links to the
new folder's version, or without the -link to have them move from +gone.

-- 
Ralph.



Very large folderTo:

2021-06-05 Thread norm
Starting in late 2014 I have stopped deleting messages, putting them in a
directory, +gone, which now contains 465,147 messages and uses about 17
gigabytes. The bulk of these messages were of transitory or of less interest
to me. But they include 1,702 messages from my daughter. They were almost all
of no interest or use to me within a day or two of when she sent them. But she
recently died (the worst thing by far that's ever happened to me). Now every
byte she ever wrote is precious to me. So I am glad that I stopped deleting
messages that I no longer care about.

In practice this large folder has little impact on performance. For example,
whenever I do a pick which is, or in a script which might be, +gone, I give it
an argument like last:10. I could, if necessary split +gone into several
smaller folders, but I would rather not. But I'm concerned that a bug in nmh
might cause a problem. For example, some kind of a buffer overflow.

So, what is the likelihood of such a bug? Does anybody have any experience
dealing with such large folders?






Norman Shapiro

--
Starting in late 2014 I have stopped deleting messages, putting them in
a directory, +gone, which now contains 465,147 messages and uses
about 17 gigabytes. The bulk of these messages were of transitory or of less
interest to me.  But they  include 1,702 messages from my daughter.  They
were almost
all of no interest or use to me within a day or two of when she sent them.
But she recently died (the worst thing by far that's ever happened to me).
Now every byte she ever wrote is precious to me. So I am glad that I stopped
deleting messages that I no longer care about.

In practice this large folder has little impact on performance. For example,
whenever I do a pick which is, or in a script which might be, +gone I give
it an argument like last:10. I could, if necessary split +gone into
several smaller folders, but I would rather not. But I'm concerned that a bug
in nmh might cause a problem. For example, some kind of a buffer overflow.

So, what is the likelihood of such a bug? Does anybody have any experience
dealing with such large folders?
such a large folder might






Norman Shapiro