Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-15 Thread Eric Wong
Konstantin Ryabitsev  wrote:
> I'm quite happy to not require libgit2 -- I've always found it easier to just
> use git plumbing commands even if this requires exec'ing an external
> executable.

Yeah, I don't have libgit2 installed on most of my systems, either.

Hoping git itself eventually makes it easier to deal with
multiple repos...  -cindex has to run show-ref on every single
coderepo (so over 1k on my gko mirror) even for the incremental
no-op case and that gets expensive...



Re: [PATCH] cindex: fix test when missing time(1) executable

2023-11-15 Thread Konstantin Ryabitsev
On Wed, Nov 15, 2023 at 05:55:49AM +, Eric Wong wrote:
> Eric Wong  wrote:
> > +++ b/t/cindex.t
> > @@ -210,7 +210,7 @@ EOM
> > my $cmd = [ qw(-cindex -u --all --associate -d), "$tmp/ext",
> > '-I', $basic->{inboxdir} ];
> > $cidx_out = $cidx_err = '';
> > -   ok(run_script($cmd, $env, $opt), 'associate w/o search');
> > +   ok(run_script($cmd, $env, undef), 'associate w/o search');
> > like($cidx_err, qr/W: \Q$basic->{inboxdir}\E not indexed for search/,
> > 'non-Xapian-enabled inbox noted');
> >  }
> 
> Yeah, using this on your new VM showed the problem right away:

Yes, I can confirm that this hang is now gone. \o/
All the tests succeed now with the latest master.

Thanks, as always!

-K



Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-15 Thread Konstantin Ryabitsev
On Wed, Nov 15, 2023 at 03:09:28AM +, Eric Wong wrote:
> > t/imapd.t  2/? Bailout called.  Further testing 
> > stopped:  FETCH socket closed while reading data from server
> > FAILED--Further testing stopped: FETCH socket closed while reading data 
> > from server
> > make: *** [test_dynamic] Error 255
> > 
> > This one looks odd but I do see it happen every time I run the test.
> 
> Both are actually related to our libgit2 support.  Working on a
> fix now since I didn't have libgit2 installed.
> 
> But I'm hesitant to do much with libgit2 since -extindex has
> basically made it obsolete scalability-wise.  Fwiw, both major
> commercial git hosts are ditching libgit2:
> 
>   https://public-inbox.org/git/ZRrfN2lbg14IOLiK@nand.local/

I'm quite happy to not require libgit2 -- I've always found it easier to just
use git plumbing commands even if this requires exec'ing an external
executable.

-K



RE: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-15 Thread Salil Mehta
Hi Konstantin,

> From: Salil Mehta
> Sent: Tuesday, November 14, 2023 5:25 PM
> To: 'Konstantin Ryabitsev' ; Eric Wong 
> 
> 
> Hi Konstantin,
> 
> > From: Konstantin Ryabitsev 
> > Sent: Tuesday, November 14, 2023 4:59 PM
> > To: Eric Wong 
> >
> > On Tue, Nov 14, 2023 at 04:36:29PM +, Eric Wong wrote:
> > > In any case, kernel.org folks should be able to import missing
> > > messages from GNU.org idempotently into lore/qemu-devel without
> > > having to resend:
> > >
> > > https://lists.gnu.org/archive/mbox/qemu-devel/2023-10
> > > https://lists.gnu.org/archive/mbox/qemu-devel/2023-11
> >
> > Just so this conversation is over, I've fed 2023-10 to the archive and the
> > links should be working now.
> >
> > But to make it clear, these messages didn't "disappear" from the archives.
> > They were never there because, for whatever reason, we never received them.
> 
> Many thanks for this help. I appreciate this.
> 
> Some has changed within the links but they are still not opening. I'll wait
> till tomorrow before checking it again.

I can confirm that all links have been restored and are visible again.

Many thanks for taking pains and helping to resolve this. Really appreciate 
this.


Best regards
Salil.



[PATCH 4/4] lei q|up|convert: common finish_output to detect errors

2023-11-15 Thread Eric Wong
We need to consistently check the exit code of pigz|gzip|xz|bzip2
when writing to compressed mboxes (or bad storage).
---
 lib/PublicInbox/LeiConvert.pm |  4 ++--
 lib/PublicInbox/LeiToMail.pm  | 11 +++
 lib/PublicInbox/LeiXSearch.pm |  9 +
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/LeiConvert.pm b/lib/PublicInbox/LeiConvert.pm
index 9d2479b0..8f628562 100644
--- a/lib/PublicInbox/LeiConvert.pm
+++ b/lib/PublicInbox/LeiConvert.pm
@@ -33,9 +33,9 @@ sub process_inputs { # via wq_do
local $PublicInbox::DS::in_loop = 0; # force synchronous awaitpid
$self->SUPER::process_inputs;
my $lei = $self->{lei};
-   delete $lei->{1};
my $l2m = delete $lei->{l2m};
-   delete $self->{wcb}; # commit
+   delete $self->{wcb}; # may close connections
+   $l2m->finish_output($lei) if $l2m;
if (my $v2w = delete $lei->{v2w}) { $v2w->done } # may die
my $nr_w = delete($l2m->{-nr_write}) // 0;
my $d = (delete($l2m->{-nr_seen}) // 0) - $nr_w;
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 0d62888d..007191bb 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -609,6 +609,17 @@ sub _pre_augment_mbox {
undef;
 }
 
+sub finish_output {
+   my ($self, $lei) = @_;
+   my $out = delete $lei->{1} // die 'BUG: no lei->{1}';
+   my $old = delete $lei->{old_1};
+   $lei->{1} = $old if $old;
+   return if $out->close; # reaps gzip|pigz|xz|bzip2
+   my $msg = "E: Error closing $lei->{ovv}->{dst}";
+   $? ? $lei->child_error($?) : ($msg .= " ($!)");
+   die $msg;
+}
+
 sub _do_augment_mbox {
my ($self, $lei) = @_;
return unless $self->{seekable};
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 5e36c11a..cee3ad07 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -393,14 +393,7 @@ sub query_done { # EOF callback for main daemon
$lei->sto_done_request;
$lei->{ovv}->ovv_end($lei);
if ($l2m) { # close() calls LeiToMail reap_compress
-   if (my $out = delete $lei->{old_1}) {
-   if (my $mbout = $lei->{1}) { # compressor pipe process
-   $mbout->close or die <<"";
-Error closing $lei->{ovv}->{dst}: \$!=$! \$?=$?
-
-   }
-   $lei->{1} = $out;
-   }
+   $l2m->finish_output($lei);
if ($l2m->lock_free) {
$l2m->poke_dst;
$lei->poke_mua;



[PATCH 3/4] lei: avoid extra fork for v2 outputs

2023-11-15 Thread Eric Wong
We've always forced LeiToMail to only have one process for v2
outputs anyways since v2 has its own sharding and IPC.  Thus we
can use the single LeiToMail process directly to avoid extra IPC
overhead.
---
 lib/PublicInbox/LeiConvert.pm |  7 ++-
 lib/PublicInbox/LeiToMail.pm  | 19 +--
 lib/PublicInbox/LeiXSearch.pm |  6 +-
 lib/PublicInbox/V2Writable.pm |  2 --
 4 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/lib/PublicInbox/LeiConvert.pm b/lib/PublicInbox/LeiConvert.pm
index 4a1f8323..9d2479b0 100644
--- a/lib/PublicInbox/LeiConvert.pm
+++ b/lib/PublicInbox/LeiConvert.pm
@@ -35,12 +35,9 @@ sub process_inputs { # via wq_do
my $lei = $self->{lei};
delete $lei->{1};
my $l2m = delete $lei->{l2m};
-   my $nr_w = delete($l2m->{-nr_write}) // 0;
delete $self->{wcb}; # commit
-   if (my $v2w = delete $lei->{v2w}) {
-   $nr_w = $v2w->wq_do('done'); # may die
-   $v2w->wq_close;
-   }
+   if (my $v2w = delete $lei->{v2w}) { $v2w->done } # may die
+   my $nr_w = delete($l2m->{-nr_write}) // 0;
my $d = (delete($l2m->{-nr_seen}) // 0) - $nr_w;
$d = $d ? " ($d duplicates)" : '';
$lei->qerr("# converted $nr_w messages$d");
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 2d9b7061..0d62888d 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -369,12 +369,14 @@ sub _v2_write_cb ($$) {
my ($self, $lei) = @_;
my $dedupe = $lei->{dedupe};
$dedupe->prepare_dedupe if $dedupe;
+   # only call in worker
+   $PublicInbox::Import::DROP_UNIQUE_UNSUB = $lei->{-drop_unique_unsub};
sub { # for git_to_mail
my ($bref, $smsg, $eml) = @_;
$eml //= PublicInbox::Eml->new($bref);
++$self->{-nr_seen};
return if $dedupe && $dedupe->is_dup($eml, $smsg);
-   $lei->{v2w}->wq_do('add', $eml); # V2Writable->add
+   $lei->{v2w}->add($eml) and ++$self->{-nr_write};
}
 }
 
@@ -647,11 +649,6 @@ sub _do_augment_mbox {
$dedupe->pause_dedupe if $dedupe;
 }
 
-sub v2w_done_wait { # awaitpid cb
-   my ($pid, $v2w, $lei) = @_;
-   $lei->child_error($?, "error for $v2w->{ibx}->{inboxdir}") if $?;
-}
-
 sub _pre_augment_v2 {
my ($self, $lei) = @_;
my $dir = $self->{dst};
@@ -677,11 +674,9 @@ sub _pre_augment_v2 {
$lei->x_it(shift);
die "E: can't write v2 inbox with broken config\n";
});
+   $lei->{-drop_unique_unsub} = $PublicInbox::Import::DROP_UNIQUE_UNSUB;
$ibx->init_inbox if @creat;
-   my $v2w = $ibx->importer;
-   $v2w->wq_workers_start("lei/v2w $dir", 1, $lei->oldset, {lei => $lei},
-   \&v2w_done_wait, $lei);
-   $lei->{v2w} = $v2w;
+   $lei->{v2w} = $ibx->importer;
return if !$lei->{opt}->{shared};
my $d = "$lei->{ale}->{git}->{git_dir}/objects";
open my $fh, '+>>', my $f = "$dir/git/0.git/objects/info/alternates";
@@ -806,6 +801,10 @@ sub wq_atexit_child {
my $lei = $self->{lei};
$lei->{ale}->git->async_wait_all;
my ($nr_w, $nr_s) = delete(@$self{qw(-nr_write -nr_seen)});
+   if (my $v2w = delete $lei->{v2w}) {
+   eval { $v2w->done };
+   $lei->child_error($?, "E: $@ ($v2w->{ibx}->{inboxdir})") if $@;
+   }
delete $self->{wcb};
(($nr_w //= 0) + ($nr_s //= 0)) or return;
return if $lei->{early_mua} || !$lei->{-progress} || !$lei->{pkt_op_p};
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 7eda6f9e..5e36c11a 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -391,11 +391,6 @@ sub query_done { # EOF callback for main daemon
($lei->{opt}->{'mail-sync'} && !$lei->{sto}) and
warn "BUG: {sto} missing with --mail-sync";
$lei->sto_done_request;
-   my $nr_w = delete($lei->{-nr_write}) // 0;
-   if (my $v2w = delete $lei->{v2w}) {
-   $nr_w = $v2w->wq_do('done'); # may die
-   $v2w->wq_close;
-   }
$lei->{ovv}->ovv_end($lei);
if ($l2m) { # close() calls LeiToMail reap_compress
if (my $out = delete $lei->{old_1}) {
@@ -413,6 +408,7 @@ Error closing $lei->{ovv}->{dst}: \$!=$! \$?=$?
delete $l2m->{mbl}; # drop dotlock
}
}
+   my $nr_w = delete($lei->{-nr_write}) // 0;
my $nr_dup = (delete($lei->{-nr_seen}) // 0) - $nr_w;
if ($lei->{-progress}) {
my $tot = $lei->{-mset_total} // 0;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 231ed516..fb259396 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -135,7 +135,6 @@ sub add {
if (do_idx($self, $mime, $smsg)) {
$self->checkpoint;
 

[PATCH 0/4] lei convert: support idempotent v2 outputs

2023-11-15 Thread Eric Wong
This may make it easier for public-inbox admins to forcibly
inject missing messages from existing mbox*/maildir/IMAP/NNTP
archives.

1/4 was only needed to get 2/4 working, but 3/4 makes
it unnecessary with our current codebase (though we may
still need 1/4 in the future).

4/4 was noticed while working on 3/4.

Eric Wong (4):
  lei: fix idempotent STDERR redirect in workers
  lei convert: fix repeat and idempotent v2 output
  lei: avoid extra fork for v2 outputs
  lei q|up|convert: common finish_output to detect errors

 lib/PublicInbox/LEI.pm |  2 +-
 lib/PublicInbox/LeiConvert.pm  |  9 ++---
 lib/PublicInbox/LeiOverview.pm |  4 ++--
 lib/PublicInbox/LeiToMail.pm   | 33 +
 lib/PublicInbox/LeiXSearch.pm  | 13 +
 lib/PublicInbox/V2Writable.pm  |  1 -
 t/lei-convert.t| 31 ++-
 7 files changed, 61 insertions(+), 32 deletions(-)



[PATCH 1/4] lei: fix idempotent STDERR redirect in workers

2023-11-15 Thread Eric Wong
This is needed to support forking from already-forked lei workers
and $lei->{2} is already STDERR.

Fixes: e015c3742f91 (lei: use autodie where appropriate, 2023-10-17)
---
 lib/PublicInbox/LEI.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 460aed40..8d235b37 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -581,7 +581,7 @@ sub _lei_atfork_child {
close($_) for (grep(defined, delete @$self{qw(0 1 2 sock)}));
delete $cfg->{-lei_store};
} else { # worker, Net::NNTP (Net::Cmd) uses STDERR directly
-   open STDERR, '+>&', $self->{2};
+   open STDERR, '+>&='.fileno($self->{2}); # idempotent w/ fileno
STDERR->autoflush(1);
$self->{2} = \*STDERR;
POSIX::setpgid(0, $$) // die "setpgid(0, $$): $!";



[PATCH 2/4] lei convert: fix repeat and idempotent v2 output

2023-11-15 Thread Eric Wong
We should be able to treat v2 outputs just like any other mail
format, with the exception that content dedupe is always
enforced by the v2 format.

This allows users hosting v2 public-inboxes to catch up broken
synchronization from alternate archives such as the mbox
archives hosted by https://lists.gnu.org/

Link: 
https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
---
 lib/PublicInbox/LeiConvert.pm  |  8 ++--
 lib/PublicInbox/LeiOverview.pm |  4 ++--
 lib/PublicInbox/LeiToMail.pm   |  3 +--
 lib/PublicInbox/LeiXSearch.pm  |  4 ++--
 lib/PublicInbox/V2Writable.pm  |  3 ++-
 t/lei-convert.t| 31 ++-
 6 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/LeiConvert.pm b/lib/PublicInbox/LeiConvert.pm
index 22aba81a..4a1f8323 100644
--- a/lib/PublicInbox/LeiConvert.pm
+++ b/lib/PublicInbox/LeiConvert.pm
@@ -34,9 +34,13 @@ sub process_inputs { # via wq_do
$self->SUPER::process_inputs;
my $lei = $self->{lei};
delete $lei->{1};
-   my $l2m = delete $self->{l2m};
-   delete $self->{wcb}; # commit
+   my $l2m = delete $lei->{l2m};
my $nr_w = delete($l2m->{-nr_write}) // 0;
+   delete $self->{wcb}; # commit
+   if (my $v2w = delete $lei->{v2w}) {
+   $nr_w = $v2w->wq_do('done'); # may die
+   $v2w->wq_close;
+   }
my $d = (delete($l2m->{-nr_seen}) // 0) - $nr_w;
$d = $d ? " ($d duplicates)" : '';
$lei->qerr("# converted $nr_w messages$d");
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index 129dabf8..0529bbe4 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -41,8 +41,8 @@ sub detect_fmt ($) {
my ($dst) = @_;
if ($dst =~ m!\A([:/]+://)!) {
die "$1 support not implemented, yet\n";
-   } elsif (!-e $dst || -d _) {
-   'maildir'; # the default TODO: MH?
+   } elsif (!-e $dst || -d _) { # maildir is the default TODO: MH
+   -e "$dst/inbox.lock" ? 'v2' : 'maildir';
} elsif (-f _ || -p _) {
die "unable to determine mbox family of $dst\n";
} else {
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 2928be45..2d9b7061 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -375,7 +375,6 @@ sub _v2_write_cb ($$) {
++$self->{-nr_seen};
return if $dedupe && $dedupe->is_dup($eml, $smsg);
$lei->{v2w}->wq_do('add', $eml); # V2Writable->add
-   ++$self->{-nr_write};
}
 }
 
@@ -435,7 +434,7 @@ sub new {
($lei->{opt}->{dedupe}//'') eq 'oid';
$self->{base_type} = 'v2';
$self->{-wq_nr_workers} = 1; # v2 has shards
-   $lei->{opt}->{save} = \1;
+   $lei->{opt}->{save} //= \1 if $lei->{cmd} eq 'q';
$dst = $lei->{ovv}->{dst} = $lei->abs_path($dst);
@conflict = qw(mua sort);
} else {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e85fd3c4..7eda6f9e 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -391,8 +391,9 @@ sub query_done { # EOF callback for main daemon
($lei->{opt}->{'mail-sync'} && !$lei->{sto}) and
warn "BUG: {sto} missing with --mail-sync";
$lei->sto_done_request;
+   my $nr_w = delete($lei->{-nr_write}) // 0;
if (my $v2w = delete $lei->{v2w}) {
-   my $wait = $v2w->wq_do('done'); # may die
+   $nr_w = $v2w->wq_do('done'); # may die
$v2w->wq_close;
}
$lei->{ovv}->ovv_end($lei);
@@ -412,7 +413,6 @@ Error closing $lei->{ovv}->{dst}: \$!=$! \$?=$?
delete $l2m->{mbl}; # drop dotlock
}
}
-   my $nr_w = delete($lei->{-nr_write}) // 0;
my $nr_dup = (delete($lei->{-nr_seen}) // 0) - $nr_w;
if ($lei->{-progress}) {
my $tot = $lei->{-mset_total} // 0;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 4d606dfe..231ed516 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -135,7 +135,7 @@ sub add {
if (do_idx($self, $mime, $smsg)) {
$self->checkpoint;
}
-
+   ++$self->{-nr_add}; # for lei convert
$cmt;
 }
 
@@ -611,6 +611,7 @@ sub done {
$self->lock_release(!!$nbytes) if $shards;
$self->git->cleanup;
die $err if $err;
+   delete $self->{-nr_add}; # for lei-convert
 }
 
 sub importer {
diff --git a/t/lei-convert.t b/t/lei-convert.t
index 84b57f81..6aff80bb 100644
--- a/t/lei-convert.t
+++ b/t/lei-convert.t
@@ -8,7 +8,8 @@ use PublicInbox::NetReader;
 use PublicInbox::Eml;
 use IO::Uncompress::Gunzip;
 use File::Path qw(remove_tree);
-use PublicInbox::Spawn qw(which);

[PATCH 4/3] xap_helper_cxx: accept leading spaces from pkg-config

2023-11-15 Thread Eric Wong
Eric Wong  wrote:
> Avoid mixing autodie use in different scopes since it's likely
> to cause problems like it did in Gcf2.  While none of these
> fix known problems with test cases, it's likely worthwhile to
> avoid it anyways to avoid future surprises.

>  lib/PublicInbox/XapHelperCxx.pm | 18 --

That XapHelperCxx change was totally necessary for running the
C++ build on CentOS 7.x (but the test is auto-skipped on any
build failure), as is this one:

8<
Subject: [PATCH] xap_helper_cxx: accept leading spaces from pkg-config

pkg-config 0.27.1 and xapian14-core-devel (1.4.24-1.el7) on
CentOS 7.x will print a leading space when running
`pkg-config --libs --cflags xapian-core'.  This leading
space creates an empty string when `split' with /\s+/ as
a pattern.  Instead, use the documented ' ' (SP) character
to put split into "awk mode" which eats leading (and
redundant) spaces and tabs.
---
 lib/PublicInbox/XapHelperCxx.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/XapHelperCxx.pm b/lib/PublicInbox/XapHelperCxx.pm
index 1250c964..9e819546 100644
--- a/lib/PublicInbox/XapHelperCxx.pm
+++ b/lib/PublicInbox/XapHelperCxx.pm
@@ -86,7 +86,7 @@ sub build () {
# distributed packages.
$^O eq 'netbsd' and $fl =~ s/(\A|[ \t])\-L([^ \t]+)([ \t]|\z)/
"$1-L$2 -Wl,-rpath=$2$3"/egsx;
-   my @xflags = split(/\s+/, "$fl $xflags");
+   my @xflags = split(' ', "$fl $xflags"); # ' ' awk-mode eats leading WS
my @cflags = grep(!/\A-(?:Wl|l|L)/, @xflags);
run_die([$cxx, '-c', "$prog.cpp", @cflags]);
run_die([$cxx, '-o', "$prog.tmp", "$prog.o", @xflags]);