>Starting in late 2014 I have stopped deleting messages, putting them in a >directory, +gone, which now contains 465,147 messages and uses about 17 >gigabytes. The bulk of these messages were of transitory or of less interest >to me. But they include 1,702 messages from my daughter. They were almost all >of no interest or use to me within a day or two of when she sent them. But she >recently died (the worst thing by far that's ever happened to me). Now every >byte she ever wrote is precious to me. So I am glad that I stopped deleting >messages that I no longer care about.
First off, please accept my sympathies for this unimaginable tragedy. >So, what is the likelihood of such a bug? Does anybody have any experience >dealing with such large folders? I can't think of any _buffer overflows_ that might happen; this isn't anything out of the ordinary, except that it's a very large number of messages. What I think you might bump up against are virtual memory limits, but even then I suspect you're fine. There's a number of things that are allocated when a folder is read (in the function folder_read()). From what I see, the ones that are affected by the number of messages in the folder are: - The "message number" array, which holds the message number for each message. That's an int, so 4 bytes per message on most platforms. But it is free()d after folder_read() is done, which seems .... sub-optimal? Doing better here might be hard, though. It would certainly be more complex. We could do something smarter about message numbers that are contiguous that would cut down on this memory usage a lot. - The msgstats array, which is ... an array of struct bvector. A struct bvector looks like .. a pointer, size_t, two unsigned long. Call it 32 bytes on a 64 bit platform, maybe? It looks like we only set 4 bits possible for each message, so we don't use anything more than that size; with the exception of sequence membership flags. If you have a lot of sequences in that folder, it's possible you could get something more than that (you'd need ... more than 60 sequences in a single folder before it affected anything). It's possible my quick math is wrong, but I think that it's probably close. So by my count, that's 1.9 MB of memory that gets free()d and 14.9 MB of memory for that folder's structure. Which, in 2021, does not seem like a lot! MH and nmh were always a bit casual with memory management since all of the programs are short-lived, but I think you should be fine. All of the calls to malloc() are wrapped using mh_xmalloc() and friends which call die() if a call to malloc() fails. --Ken