Re: [PATCH] lei: support reading MH for convert+import+index

2023-12-17 Thread Eric Wong
Eric Wong  wrote:
> Konstantin Ryabitsev  wrote:
> > Nice, so eventually we should be able to specify the following instead of
> > faking out a maildir?
> > 
> > watch=mh:/var/spool/mlmmj/list.name/archive
> 
> Yes, that's the plan.

Well, reading /usr/lib/python*/mailbox.py on my system makes me cry:

def pack(self):
"""Re-name messages to eliminate numbering gaps. Invalidates keys."""

That's for the Python stdlib MH class where I was looking for a
non-racy write implementation.

And checking the nmh source[1] reveals packing happens there, too...

Packing makes sense for a memory-efficient representation of
.mh_sequences without resorting to a tree or hash table; but it
invalidates `lei index' and forces -watch to do a full rescan if
anybody uses pack.  Ugh...

Fortunately, this doesn't seem to be the default behavior of nmh
(`nopack' appears to be the default).

[1] https://git.savannah.gnu.org/git/nmh.git sbr/folder_pack.c

> > > inotify|EVFILT_VNODE watches aren't supported, yet, either.

At least lei should have a reasonably fast way to handle this
using mail_sync.sqlite3 to compare SHA-(1|256) without having
to decode MIME/QP/Base-64 to get comparisons... I suppose
-watch needs to start using that, too...

> > In the case of mlmmj it's sufficient to watch the
> > /var/spool/mlmmj/list.name/index file for updates, but I don't know how well
> > this lends itself to other implementations (I am not at all familiar with 
> > MH).
> 
> Just watching the directory itself is sufficient (like Maildir)
> and will report new files.  We just have to check /\A[0-9]+\z/

At least mlmmj won't pack because it's an archive (or at
least it shouldn't)



Re: [PATCH] lei: support reading MH for convert+import+index

2023-12-16 Thread Eric Wong
Konstantin Ryabitsev  wrote:
> Nice, so eventually we should be able to specify the following instead of
> faking out a maildir?
> 
> watch=mh:/var/spool/mlmmj/list.name/archive

Yes, that's the plan.

> > inotify|EVFILT_VNODE watches aren't supported, yet, either.
> 
> In the case of mlmmj it's sufficient to watch the
> /var/spool/mlmmj/list.name/index file for updates, but I don't know how well
> this lends itself to other implementations (I am not at all familiar with MH).

Just watching the directory itself is sufficient (like Maildir)
and will report new files.  We just have to check /\A[0-9]+\z/



Re: [PATCH] lei: support reading MH for convert+import+index

2023-12-16 Thread Konstantin Ryabitsev
On Sat, Dec 16, 2023 at 01:09:32PM +, Eric Wong wrote:
> The MH format is widely-supported and used by various MUAs such
> as mutt and sylpheed, and a MH-like format is used by mlmmj for
> archives, as well.  Locking implementations for writes are
> inconsistent, so this commit doesn't support writes, yet.

Nice, so eventually we should be able to specify the following instead of
faking out a maildir?

watch=mh:/var/spool/mlmmj/list.name/archive

> inotify|EVFILT_VNODE watches aren't supported, yet, either.

In the case of mlmmj it's sufficient to watch the
/var/spool/mlmmj/list.name/index file for updates, but I don't know how well
this lends itself to other implementations (I am not at all familiar with MH).

-K



[PATCH] lei: support reading MH for convert+import+index

2023-12-16 Thread Eric Wong
The MH format is widely-supported and used by various MUAs such
as mutt and sylpheed, and a MH-like format is used by mlmmj for
archives, as well.  Locking implementations for writes are
inconsistent, so this commit doesn't support writes, yet.

inotify|EVFILT_VNODE watches aren't supported, yet, either.
---
 MANIFEST   |   3 +
 lib/PublicInbox/LEI.pm |  13 ++--
 lib/PublicInbox/LeiConvert.pm  |   5 ++
 lib/PublicInbox/LeiImport.pm   |  23 +++
 lib/PublicInbox/LeiImportKw.pm |   2 +-
 lib/PublicInbox/LeiIndex.pm|   2 +-
 lib/PublicInbox/LeiInput.pm|  52 +---
 lib/PublicInbox/LeiMailSync.pm |  39 
 lib/PublicInbox/LeiToMail.pm   |   5 ++
 lib/PublicInbox/MHreader.pm| 103 +++
 lib/PublicInbox/MdirReader.pm  |   2 +-
 lib/PublicInbox/MdirSort.pm|  46 ++
 lib/PublicInbox/TestCommon.pm  |  22 ---
 t/mh_reader.t  | 108 +
 14 files changed, 392 insertions(+), 33 deletions(-)
 create mode 100644 lib/PublicInbox/MHreader.pm
 create mode 100644 lib/PublicInbox/MdirSort.pm
 create mode 100644 t/mh_reader.t

diff --git a/MANIFEST b/MANIFEST
index e22674b7..8bcc3179 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -293,6 +293,7 @@ lib/PublicInbox/Linkify.pm
 lib/PublicInbox/Listener.pm
 lib/PublicInbox/Lock.pm
 lib/PublicInbox/MDA.pm
+lib/PublicInbox/MHreader.pm
 lib/PublicInbox/MID.pm
 lib/PublicInbox/MIME.pm
 lib/PublicInbox/MailDiff.pm
@@ -302,6 +303,7 @@ lib/PublicInbox/MboxGz.pm
 lib/PublicInbox/MboxLock.pm
 lib/PublicInbox/MboxReader.pm
 lib/PublicInbox/MdirReader.pm
+lib/PublicInbox/MdirSort.pm
 lib/PublicInbox/MiscIdx.pm
 lib/PublicInbox/MiscSearch.pm
 lib/PublicInbox/MsgIter.pm
@@ -543,6 +545,7 @@ t/mda-mime.eml
 t/mda.t
 t/mda_filter_rubylang.t
 t/mdir_reader.t
+t/mh_reader.t
 t/mid.t
 t/mime.t
 t/miscsearch.t
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 17431518..e0cfd55a 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -267,7 +267,7 @@ import => [ 'LOCATION...|--stdin [LABELS...]',
'one-time import/update from URL or filesystem',
qw(stdin| offset=i recursive|r exclude=s include|I=s new-only
lock=s@ in-format|F=s kw! verbose|v+ incremental! mail-sync!
-   commit-delay=i),
+   commit-delay=i sort|s:s@),
@net_opt, @c_opt ],
 'forget-mail-sync' => [ 'LOCATION...',
'forget sync information for a mail folder', @c_opt ],
@@ -280,7 +280,7 @@ import => [ 'LOCATION...|--stdin [LABELS...]',
 'convert' => [ 'LOCATION...|--stdin',
'one-time conversion from URL or filesystem to another format',
qw(stdin| in-format|F=s out-format|f=s output|mfolder|o=s lock=s@ kw!
-   rsyncable),
+   rsyncable sort|s:s@),
@net_opt, @c_opt ],
 'p2q' => [ 'LOCATION_OR_COMMIT...|--stdin',
"use a patch to generate a query for `lei q --stdin'",
@@ -321,6 +321,9 @@ import => [ 'LOCATION...|--stdin [LABELS...]',
 my $stdin_formats = [ 'MAIL_FORMAT|eml|mboxrd|mboxcl2|mboxcl|mboxo',
'specify message input format' ];
 my $ls_format = [ 'OUT|plain|json|null', 'listing output format' ];
+my $sort_out = [ 'VAL|received|relevance|docid',
+   "order of results is `--output'-dependent"];
+my $sort_in = [ 'sequence|mtime|size', 'sort input (format-dependent)' ];
 
 # we use \x{a0} (non-breaking SP) to avoid wrapping in PublicInbox::LeiHelp
 my %OPTDESC = (
@@ -428,8 +431,10 @@ my %OPTDESC = (
 'limit|n=i@' => ['NUM', 'limit on number of matches (default: 1)' ],
 'offset=i' => ['OFF', 'search result offset (default: 0)'],
 
-'sort|s=s' => [ 'VAL|received|relevance|docid',
-   "order of results is `--output'-dependent"],
+'sort|s=s  q' => $sort_out,
+'sort|s=s  lcat' => $sort_out,
+'sort|s:s@ convert' => $sort_in,
+'sort|s:s@ import' => $sort_in,
 'reverse|r' => 'reverse search results', # like sort(1)
 
 'boost=i' => 'increase/decrease priority of results (default: 0)',
diff --git a/lib/PublicInbox/LeiConvert.pm b/lib/PublicInbox/LeiConvert.pm
index 8f628562..17a952f2 100644
--- a/lib/PublicInbox/LeiConvert.pm
+++ b/lib/PublicInbox/LeiConvert.pm
@@ -28,6 +28,11 @@ sub input_maildir_cb {
$self->{wcb}->(undef, { kw => $kw }, $eml);
 }
 
+sub input_mh_cb {
+   my ($dn, $bn, $kw, $eml, $self) = @_;
+   $self->{wcb}->(undef, { kw => $kw }, $eml);
+}
+
 sub process_inputs { # via wq_do
my ($self) = @_;
local $PublicInbox::DS::in_loop = 0; # force synchronous awaitpid
diff --git a/lib/PublicInbox/LeiImport.pm b/lib/PublicInbox/LeiImport.pm
index c2552bf0..5521188c 100644
--- a/lib/PublicInbox/LeiImport.pm
+++ b/lib/PublicInbox/LeiImport.pm
@@ -53,6 +53,29 @@ sub pmdir_cb { # called via wq_io_do from 
LeiPmdir->each_mdir_fn
}
 }
 
+sub input_mh_cb {
+   my ($mhdir, $n, $kw, $eml, $self) = @_;
+   substr($mhdir, 0, 0) = 'mh:'; # add prefix
+   my $lse