from:"Konstantin Ryabitsev"

Re: filtering stable patches in lore queries

2024-05-08 Thread Konstantin Ryabitsev

On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote:
> > I'm whole-heartedly for this! This ties nicely to my b4 work where I'd 
> > like to be able to identify code-review trailers sent for a specific 
> > patch, even if that patch itself is not on lore. For example, this could 
> > be a patch that is part of a pull-request on a git forge, but we'd still 
> > like to be able to collect and find code-review trailers for it when a 
> > maintainer applies it.
> 
> OK, a more configurable version is available on a per-inbox basis:
> 
> https://public-inbox.org/meta/20240508110957.3108196-...@80x24.org/
> 
> But that's a PITA to configure with hundreds of inboxes and
> doesn't have extindex support, yet.
> 
> I made it share logic with the old altid code; so I'll also be
> getting altid into extindex since ISTR users wanting to be able
> to lookup gmane stuff via extindex.

Great, thanks for doing this. I'll wait until this has extindex support,
because I really need to be able to look across all inboxes.

> Yeah, though there's 3 ways of indexing strings, currently :x
> I've decided to keep some options open and support boolean_term,
> text, and phrase for now.

What's the difference between "text" and "phrase"?

> boolean_term is the cheapest and probably best for exactly
> matching labels/enums and such.

So, this is for "X-Ignore-Me: Yes" type of headers?

-K

Re: filtering stable patches in lore queries

2024-04-29 Thread Konstantin Ryabitsev

On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
> Correct, public-inbox currently won't index every header due to
> cost, false positives, and otherwise lack of usefulness (general
> gibberish from DKIM sigs, various UUIDs, etc).
> 
> So it doesn't currently know about "X-stable:"
> 
> I started working on making headers indexing configurable last
> year, but didn't hear a response from the person that
> potentially was interested:
> 
> https://public-inbox.org/meta/20231120032132.M610564@dcvr/
> 
> Right now, indexing new headers + validations can be maintained
> as a Perl module in the public-inbox codebase.
> 
> For lore, it'd make sense to be able to configure a bunch (or
> all) inboxes at once instead of the per-inbox configuration in
> my proposed RFC.
> 
> At minimum, one would have to know:
> 
> 1) the mail header name (e.g. `X-stable')
> 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
> 3) the type of header value (phrase, string, sortable numeric, etc...)

I'm whole-heartedly for this! This ties nicely to my b4 work where I'd 
like to be able to identify code-review trailers sent for a specific 
patch, even if that patch itself is not on lore. For example, this could 
be a patch that is part of a pull-request on a git forge, but we'd still 
like to be able to collect and find code-review trailers for it when a 
maintainer applies it.

Currently, I am using the following approach:

| Reviewed-by: Some Developer 
| ---
| for-patch-id: abcd...1234

Then I can query 'nq:"for-patch-id: abcd...1234"', but this is probably 
much more heavy than if I could provide this in a custom header:

| X-For-Patch-ID: abcd...1234

and query for "xforpatchid:abcd...1234"

> I'm trying to avoid supporting sortable numeric values for this,
> since supporting them will problems if columns get repurposed
> with admins changing their minds.   A full reindex would fix it,
> but those are crazy expensive.

I'm perfectly fine with it only being a string, honestly.

> 
> So probably just supporting strings and/or phrases to start...
> 
> Validation to prevent poisoning by malicious/broken senders can
> be useful in some cases (and the reason the RFC was a per use
> case Perl module).  That said, I'm not sure if much validation
> is necessary for X-stable: headers or if just any text is fine.

I'd let the consumer clients worry about it.

-K

Re: downloading t.mbox.gz messages are not sorted in expected order

2024-04-11 Thread Konstantin Ryabitsev

On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote:
> I sometimes download patch series off of public inbox hosted servers to
> apply with git-am. Occasionally I have found that these do not apply
> cleanly because the thread is not sorted in patch order.

It's more than just the order -- if there are replies in the thread, the mbox
file won't apply either.

This is the reason why the b4 tool exists:
https://b4.docs.kernel.org/

> For an example, see
> https://lore.kernel.org/lkml/20240308230557.805580-1-alex.william...@redhat.com/

$ b4 am -o/tmp 
https://lore.kernel.org/lkml/20240308230557.805580-1-alex.william...@redhat.com/
Grabbing thread from 
lore.kernel.org/all/20240308230557.805580-1-alex.william...@redhat.com/t.mbox.gz
Analyzing 20 messages in the thread
Looking for additional code-review trailers on lore.kernel.org
Checking attestation on all messages, may take a moment...
---
  ✓ [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ
  ✓ [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
+ Reviewed-by: Eric Auger  (✓ DKIM/redhat.com)
  ✓ [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject 
workqueue
+ Reviewed-by: Eric Auger  (✓ DKIM/redhat.com)
  ✓ [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
+ Reviewed-by: Eric Auger  (✓ DKIM/redhat.com)
  ✓ [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
+ Reviewed-by: Kevin Tian  (✓ DKIM/intel.com)
+ Reviewed-by: Eric Auger  (✓ DKIM/redhat.com)
  ✓ [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
+ Reviewed-by: Kevin Tian  (✓ DKIM/intel.com)
+ Reviewed-by: Eric Auger  (✓ DKIM/redhat.com)
+ Tested-by: Eric Auger  (✓ DKIM/redhat.com)
  ✓ [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without 
trigger
+ Reviewed-by: Kevin Tian  (✓ DKIM/intel.com)
+ Reviewed-by: Eric Auger  (✓ DKIM/redhat.com)
  ---
  ✓ Signed: DKIM/redhat.com
---
Total patches: 7
---
Cover: 
/tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.cover
 Link: 
https://lore.kernel.org/r/20240308230557.805580-1-alex.william...@redhat.com
 Base: not specified
   git am 
/tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.mbx

-K

Re: sample robots.txt to reduce WWW load

2024-04-03 Thread Konstantin Ryabitsev

On Mon, Apr 01, 2024 at 01:21:45PM +, Eric Wong wrote:
> Performance is still slow, and crawler traffic patterns tend to
> do bad things with caches at all levels, so I've regretfully had
> to experiment with robots.txt to mitigate performance problems.

This has been the source of grief for us, because aggressive bots don't appear
to be paying any attention to robots.txt, and they are fudging their
user-agent string to pretend to be a regular browser. I am dealing with one
that is hammering us from China Mobile IP ranges and is currently trying to
download every possible snapshot of torvalds/linux, while pretending to be
various versions of Chrome.

So, while I welcome having a robots.txt recommendation, it kinda assumes that
robots will actually play nice and won't try to suck down as much as possible
as quickly as possible for training some LLM-du-jour.

/end rant

-K

Re: [PATCH] lei: support reading MH for convert+import+index

2023-12-16 Thread Konstantin Ryabitsev

On Sat, Dec 16, 2023 at 01:09:32PM +, Eric Wong wrote:
> The MH format is widely-supported and used by various MUAs such
> as mutt and sylpheed, and a MH-like format is used by mlmmj for
> archives, as well.  Locking implementations for writes are
> inconsistent, so this commit doesn't support writes, yet.

Nice, so eventually we should be able to specify the following instead of
faking out a maildir?

watch=mh:/var/spool/mlmmj/list.name/archive

> inotify|EVFILT_VNODE watches aren't supported, yet, either.

In the case of mlmmj it's sufficient to watch the
/var/spool/mlmmj/list.name/index file for updates, but I don't know how well
this lends itself to other implementations (I am not at all familiar with MH).

-K

Re: extra search flags and params? (ispatch, replycount, ...)

2023-11-28 Thread Konstantin Ryabitsev

On Tue, Nov 28, 2023 at 06:20:03PM +, Eric Wong wrote:
> > Ah. I think here is enough to just say "s:* AND NOT s:PATCH" without
> > introducing additional xapian indexing parameters. Though, perhaps the web
> > interface can also gain a "collapse threads" view?
> 
> topics_new.html / topics_active.html endpoints?
> Also, '' is a weird accident that happens to work:
> 
> https://yhbt.net/lore/git/?q=s:*+AND+NOT+s:PATCH
> 
> I suppose that's OK for the majority of cases.

Nice!

> Though being able to find unanswered threads could be helpful.

Note, I'm not saying it's not a cool feature. :) However, I imagine people
would be more interested in searching for something like "show me all threads
mentioning $foo to which *I* haven't replied yet". It's not quite the same
thing as "nobody has replied yet."

I have no idea how hard this would be.

-K

Re: extra search flags and params? (ispatch, replycount, ...)

2023-11-28 Thread Konstantin Ryabitsev

On Tue, Nov 28, 2023 at 05:35:09PM +, Eric Wong wrote:
> > I understand the reasoning, but I'm not sure we should be trying too hard to
> > make public-inbox a patch tracking platform. What makes lei great is ability
> > to automatically find and retrieve entire threads -- I feel like we should
> > leave series tracking to other platforms that already exist (patchwork,
> > patchew, etc).
> 
> I was thinking more along the lines of readers just trying to
> find trying to find non-patch discussions.

Ah. I think here is enough to just say "s:* AND NOT s:PATCH" without
introducing additional xapian indexing parameters. Though, perhaps the web
interface can also gain a "collapse threads" view?

> > This made me realize that there's actually a multitude of ways the same 
> > patch
> > can be represented (diff-algorithm, number of context lines, etc) that would
> > cause git-patch-id to return a different value for the exact same commit.
> 
> Yeah, post-image blob abbreviations are probably the way to go.
> 
> Fwiw, solver only uses post-image blob abbreviations and the
> filename as a hint.  I rolled it out a few hours ago on yhbt.net/lore
> and it seems to be solving kernel blobs just fine, but the
> debug log is choosing random git URLs.

Ah, neat! That said, what happens if a series was applied with "git am -3" and
the post-image blob abbreviations are necessarily different? (I may be
misunderstanding the approach, please correct me if I do.)

-K

Re: extra search flags and params? (ispatch, replycount, ...)

2023-11-28 Thread Konstantin Ryabitsev

On Tue, Nov 28, 2023 at 12:10:28AM +, Eric Wong wrote:
> Would they be useful?
> 
> It's not currently possible to quickly search for whether or not
> a term (e.g. patchid:) is present in a Xapian document.  Having
> the ability to do so would make it easier to find non-patch messages,
> or easily filter down to cover letters, bot replies, etc...

I understand the reasoning, but I'm not sure we should be trying too hard to
make public-inbox a patch tracking platform. What makes lei great is ability
to automatically find and retrieve entire threads -- I feel like we should
leave series tracking to other platforms that already exist (patchwork,
patchew, etc).

> I don't think any of these would be required to get "lei rediff"
> working on entire patchsets, though (it only does individual
> messages, currently).

Incidentally, I've recently discovered that relying on git-patch-id to match
commits to message archives has some important flaws. Linus was actually the
one who caused this when he recommended that maintainers switch to using the
"histogram" diff algorithm instead of the default ("myers").

This made me realize that there's actually a multitude of ways the same patch
can be represented (diff-algorithm, number of context lines, etc) that would
cause git-patch-id to return a different value for the exact same commit.

So, while I know that Linus doesn't want Link: entries in commits that just go
to the series, using the message-id remains the only mechanism to reliably
link commits to the series discussion.

-K

Re: [PATCH] cindex: fix test when missing time(1) executable

2023-11-15 Thread Konstantin Ryabitsev

On Wed, Nov 15, 2023 at 05:55:49AM +, Eric Wong wrote:
> Eric Wong  wrote:
> > +++ b/t/cindex.t
> > @@ -210,7 +210,7 @@ EOM
> > my $cmd = [ qw(-cindex -u --all --associate -d), "$tmp/ext",
> > '-I', $basic->{inboxdir} ];
> > $cidx_out = $cidx_err = '';
> > -   ok(run_script($cmd, $env, $opt), 'associate w/o search');
> > +   ok(run_script($cmd, $env, undef), 'associate w/o search');
> > like($cidx_err, qr/W: \Q$basic->{inboxdir}\E not indexed for search/,
> > 'non-Xapian-enabled inbox noted');
> >  }
> 
> Yeah, using this on your new VM showed the problem right away:

Yes, I can confirm that this hang is now gone. \o/
All the tests succeed now with the latest master.

Thanks, as always!

-K

Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-15 Thread Konstantin Ryabitsev

On Wed, Nov 15, 2023 at 03:09:28AM +, Eric Wong wrote:
> > t/imapd.t  2/? Bailout called.  Further testing 
> > stopped:  FETCH socket closed while reading data from server
> > FAILED--Further testing stopped: FETCH socket closed while reading data 
> > from server
> > make: *** [test_dynamic] Error 255
> > 
> > This one looks odd but I do see it happen every time I run the test.
> 
> Both are actually related to our libgit2 support.  Working on a
> fix now since I didn't have libgit2 installed.
> 
> But I'm hesitant to do much with libgit2 since -extindex has
> basically made it obsolete scalability-wise.  Fwiw, both major
> commercial git hosts are ditching libgit2:
> 
>   https://public-inbox.org/git/ZRrfN2lbg14IOLiK@nand.local/

I'm quite happy to not require libgit2 -- I've always found it easier to just
use git plumbing commands even if this requires exec'ing an external
executable.

-K

Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-14 Thread Konstantin Ryabitsev

On Wed, Nov 15, 2023 at 01:06:42AM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > Looks like the last time I am able to successfully run "make test" is before
> > this commit:
> > 
> > b231d91f42d791becf7b6861e723833d71e73237 is the first bad commit
> > 
> > The error I start getting after this commit is:
> > 
> > t/extsearch.t  160/?
> > #   Failed test 'lei_err=unindexed extindex 
> > /tmp/pi-extsearch-12765-FmsY/extindex not supported
> > #
> > # Argument "-TERM" isn't numeric in kill at 
> > /home/mricon/public-inbox-test/blib/lib/PublicInbox/IPC.pm line 442.
> 
> Thanks, that (and another bug) should be fixed with:
> https://public-inbox.org/meta/20231115010457.1047199-...@80x24.org/

Thanks, that is indeed much better! But I still get a few errors.

First a non-critical:

t/gcf2_client.t .. 1/? Can't locate object method "fail" via 
package "PublicInbox::Gcf2Client" at 
/home/mricon/public-inbox-test/blib/lib/PublicInbox/Git.pm line 269.
(in cleanup) Can't locate object method "fail" via package 
"PublicInbox::Gcf2Client" at 
/home/mricon/public-inbox-test/blib/lib/PublicInbox/Git.pm line 269.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 65280 just after 2.
t/gcf2_client.t .. Dubious, test returned 255 (wstat 65280, 0xff00)

Then just a bit later this one:

t/imapd.t  2/? Bailout called.  Further testing stopped:  
FETCH socket closed while reading data from server
FAILED--Further testing stopped: FETCH socket closed while reading data from 
server
make: *** [test_dynamic] Error 255

This one looks odd but I do see it happen every time I run the test.

-K

Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 06:51:00PM -0500, Konstantin Ryabitsev wrote:
> On Tue, Nov 14, 2023 at 11:46:20PM +, Eric Wong wrote:
> > My, that's a lot of pipes...
> > 
> > I should've told you to try this debug patch earlier, but this
> > might help...  (and our test suite should really be able to
> > watch messages like this while capturing)
> 
> Will try this shorty -- currently running a git bisect to find the last time
> when tests passed on this system.

Looks like the last time I am able to successfully run "make test" is before
this commit:

b231d91f42d791becf7b6861e723833d71e73237 is the first bad commit

The error I start getting after this commit is:

t/extsearch.t  160/?
#   Failed test 'lei_err=unindexed extindex 
/tmp/pi-extsearch-12765-FmsY/extindex not supported
#
# Argument "-TERM" isn't numeric in kill at 
/home/mricon/public-inbox-test/blib/lib/PublicInbox/IPC.pm line 442.
# Argument "-TERM" isn't numeric in kill at 
/home/mricon/public-inbox-test/blib/lib/PublicInbox/IPC.pm line 442.
# # converted 0 messages
# '
#   at /home/mricon/public-inbox-test/blib/lib/PublicInbox/TestCommon.pm line 
579.
# Looks like you failed 1 test of 172.
t/extsearch.t  Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/172 subtests

It is repeated multiple times after.

HTH.

-K

Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 11:46:20PM +, Eric Wong wrote:
> My, that's a lot of pipes...
> 
> I should've told you to try this debug patch earlier, but this
> might help...  (and our test suite should really be able to
> watch messages like this while capturing)

Will try this shorty -- currently running a git bisect to find the last time
when tests passed on this system.

> The like() test will fail with the above change, of course; but
> maybe something else is amiss on your system and showing stderr
> will help.  FWIW, I can't reproduce the problem on my CentOS7 VM.

There are two sources of potential discrepancy:

- differences in CPAN module versions I have installed
- different git or xapian14 versions

If you want to try with the newest xapian14 release, you can enable the LFIT
copr on your test system:

yum copr enable icon/lfit

That should let you install xapian14-core and xapian14-bindings-perl, plus
git241. This will, at least, get you to the same version of those two things
that we have.

Thanks for your help!

-K

Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 10:46:57PM +, Eric Wong wrote:
> > I can't do +E because that's not available to me under CentOS7 (I can't wait
> > until we move on, but just when we think the yak is fully shaved, we find 
> > more
> > clumps of thick fur we hadn't considered). Is the output of the regular 
> > "lsof
> > -p" helpful at all?
> 
> Sure.

Sent privately.

> > Strace for all three processes (-cindex, cidx shard[0], cidx shard[1]) just
> > sits at:
> > 
> > select(24, [13 16], NULL, NULL, NULL
> 
> OK, that's still useful.  One FD is signalfd, the others would
> be a SOCK_SEQPACKET socket, I think...
> 
> > As far as I can see, there are no other processes other than cidx.
> 
> OK.  Hmm.. Perhaps `kill -CHLD' on the top-level cindex process
> can move it along? 

Didn't seem to do anything. The cidx shards were still there, and even running
"kill" on those processes directly didn't make them go away. Killing the
-cindex process itself does move things along without a failure.

-K

Re: t/cindex.t "associate w/o search" test hangs for me

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 10:16:53PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> >  └─-cindex -u --al,4432
> >  ├─cidx shard[0],4646
> >  └─cidx shard[1],4647
> > 
> > Anything I can do to figure out why this is happening?
> 
> You can show me strace and lsof +E of the processes (any other
> processes (join|sort|awk|perl)?).  This code is highly in flux,
> so it's also fine to rm the test for now since nothing
> public-facing is using -cindex, yet...

Yeah, but I figured I'll poke a bit in case it's helpful.

I can't do +E because that's not available to me under CentOS7 (I can't wait
until we move on, but just when we think the yak is fully shaved, we find more
clumps of thick fur we hadn't considered). Is the output of the regular "lsof
-p" helpful at all?

Strace for all three processes (-cindex, cidx shard[0], cidx shard[1]) just
sits at:

select(24, [13 16], NULL, NULL, NULL

As far as I can see, there are no other processes other than cidx.

Hope that helps,
-K

t/cindex.t "associate w/o search" test hangs for me

2023-11-14 Thread Konstantin Ryabitsev

Eric:

I'm trying to have tests pass on CentOS7 with the current master and I'm
apparently not able to get past the "associate w/o search" test.

When I run `prove -bvw t/cindex.t` I get to:

ok 76 - xcpdb compact

and then it just sits there. If I look at the process table, I can see that
the next test is attempting to run, but it just hangs forever. This is what
pstree shows:

 └─prove,4431 -w /usr/local/bin/prove -bvw t/cindex.t
 └─-cindex -u --al,4432
 ├─cidx shard[0],4646
 └─cidx shard[1],4647

Anything I can do to figure out why this is happening?

-K

[PATCH] TestCommon: older strace does not have --version

2023-11-14 Thread Konstantin Ryabitsev

The tests will check for strace >= 4.16, but version 4.24 that I have
does not accept --version, only -V. This works for both older and newer
strace, so switch to using "strace -V" for the check.

Signed-off-by: Konstantin Ryabitsev 
---
 lib/PublicInbox/TestCommon.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index caf709c2..a5546905 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -956,9 +956,9 @@ sub strace_inject (;$) {
my $cmd = strace(@_);
state $ver = do {
require PublicInbox::Spawn;
-   my $v = PublicInbox::Spawn::run_qx([$cmd, '--version']);
+   my $v = PublicInbox::Spawn::run_qx([$cmd, '-V']);
$v =~ m!version\s+([1-9]+\.[0-9]+)! or
-   xbail "no strace --version: $v";
+   xbail "no strace -V: $v";
eval("v$1");
};
$ver ge v4.16 or skip "$cmd too old for syscall injection (".

---
base-commit: 1f3fdeee8919d06b9293d34a2446a61cba730a0c
change-id: 20231114-strace-no-version-7073fd02aa16

Best regards,
-- 
Konstantin Ryabitsev

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 04:36:29PM +, Eric Wong wrote:
> In any case, kernel.org folks should be able to import missing
> messages from GNU.org idempotently into lore/qemu-devel without
> having to resend:
> 
> https://lists.gnu.org/archive/mbox/qemu-devel/2023-10
> https://lists.gnu.org/archive/mbox/qemu-devel/2023-11

Just so this conversation is over, I've fed 2023-10 to the archive and the
links should be working now.

But to make it clear, these messages didn't "disappear" from the archives.
They were never there because, for whatever reason, we never received them.

-K

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 04:11:41PM +, Salil Mehta wrote:
> It is not just me there other people (from other organizations CC'ed in this 
> mail)
> I requested to check below set of links and they experienced the same behavior
> i.e. except first link no other link opens (ends up in Not Found)
> 
> https://lore.kernel.org/qemu-devel/20231027150536.3c481...@imammedo.users.ipa.redhat.com/
> https://lore.kernel.org/qemu-devel/20231027160814.3f47f...@imammedo.users.ipa.redhat.com/
> https://lore.kernel.org/qemu-devel/20231027154648.2ce47...@imammedo.users.ipa.redhat.com/
> https://lore.kernel.org/qemu-devel/20231027151828.5c9d4...@imammedo.users.ipa.redhat.com/

These messages do not exist in our archives because we've never received them.
If someone wants to bounce them to the address
"qemu-de...@archiver.kernel.org", then they will be fed to the archive.

-K

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-14 Thread Konstantin Ryabitsev

On Tue, Nov 14, 2023 at 11:03:38AM +, Salil Mehta wrote:
> I have cross confirmed the behavior with other people across companies and
> all of them are having issues in viewing above links. Surprising part is
> these were present at the first instance when the review comments were floated
> by Igor - I can assure you that.

Well, I don't know what to tell you. ¯\_(ツ)_/¯

I have no record of this happening. Every log entry for the message-id
20231027145652.44cc8...@imammedo.users.ipa.redhat.com in lore.kernel.org logs
is a "HTTP 200" since November 7, when that message was resent to the list.

The only guess I can offer you is that you have the error page cached in your
browser or your corporate environment has it cached somewhere.

-K

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-13 Thread Konstantin Ryabitsev

On Mon, Nov 13, 2023 at 07:39:21PM +, Salil Mehta wrote:
> > When you go to "git.kernel.org" on your company network, what city do you
> > have in the header ("dallas", "amsterdam", etc).
> 
> I am seeing below in the header:
> 
> Git repositories hosted at kernel.org (amsterdam)

This is really strange, I am able to retrieve that URL without any problems on
all lore.kernel.org nodes. E.g.:

$ curl -sL -H 'Host: lore.kernel.org' -I 
https://dfw.source.kernel.org/qemu-devel/20231027145652.44cc8...@imammedo.users.ipa.redhat.com/
 | head -n1
HTTP/1.1 200 OK

$ curl -sL -H 'Host: lore.kernel.org' -I 
https://ams.source.kernel.org/qemu-devel/20231027145652.44cc8...@imammedo.users.ipa.redhat.com/
 | head -n1
HTTP/1.1 200 OK

$ curl -sL -H 'Host: lore.kernel.org' -I 
https://sin.source.kernel.org/qemu-devel/20231027145652.44cc8...@imammedo.users.ipa.redhat.com/
 | head -n1
HTTP/1.1 200 OK

I am very curious why it's not working for you. When you run "ping
lore.kernel.org", which IP address do you get?

-K

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-13 Thread Konstantin Ryabitsev

On Mon, Nov 13, 2023 at 07:27:34PM +, Salil Mehta wrote:
> I am able to open [Patch V6 1/9] from outside the company network (as you
> can see the by [2] [3]) but strangely not from inside the company network.
> It purges below error from company network. I am totally stumped.

When you go to "git.kernel.org" on your company network, what city do you have
in the header ("dallas", "amsterdam", etc).

-K

Re: [RFC v2] www: add topics_(new|active).(html|atom) endpoints

2023-11-10 Thread Konstantin Ryabitsev

On Fri, Nov 10, 2023 at 03:09:59AM +, Eric Wong wrote:
> > Yes, actually thinking about this some more, perhaps it makes sense to 
> > expose
> > this as an RSS feed feature (maybe even exclusively as an RSS feed 
> > feature?).
> 
> I assume Atom is OK?  I don't know of any widely-used feed readers
> which only do RSS without Atom support.  IIRC Atom is less ambiguous
> and supports the in-reply-to extension.

Yes, sorry, I know they aren't the same thing, but in my head Atom is just a
form of RSS (perhaps for the same reason why everyone says "rss reader" but
nobody says "atom reader").

> That said, the Atom feeds generated by this RFC includes full
> messages because that's the easiest way to tie into our existing
> Atom generation code, so it's currently slower than the HTML
> version which never retrieves git blobs.

That's fine, actually, because this lets people read the full message to
figure out if they are interested in the rest of the thread or not.

> > Have two different feeds:
> > 
> > - new topics: just all the new threads
> > - hot topics: NN most active threads (kinda lkml.org's "hottest messages")
> 
> I'm not sure if `hot' means it's the most read (not just replied-to);
> but tracking read counts isn't something that scales on decentralized
> systems.  So I'm naming it "active" instead...

Sounds good to me.

> > Have this available per-list and for the extindex -- I think this would be
> > a great feature that we can point people at as a mechanism to keep an eye on
> > overall activity.
> 
> Yeah, lots of the WWW and lei code works transparently between extindex
> and regular inboxes:
> 
> extindex:
> https://yhbt.net/lore/all/topics_new.atom
> https://yhbt.net/lore/all/topics_active.atom
> https://yhbt.net/lore/all/topics_new.html
> https://yhbt.net/lore/all/topics_active.html
> 
> v2:
> https://yhbt.net/lore/lkml/topics_new.atom
> https://yhbt.net/lore/lkml/topics_active.atom
> https://yhbt.net/lore/lkml/topics_new.html
> https://yhbt.net/lore/lkml/topics_active.html
> 
> v1:
> https://public-inbox.org/git/topics_new.atom
> https://public-inbox.org/git/topics_active.atom
> https://public-inbox.org/git/topics_new.html
> https://public-inbox.org/git/topics_active.html

This is great, thank you!

-K

Re: [RFC] www: add topics.html endpoint [was: Query to see all new "topics"]

2023-11-09 Thread Konstantin Ryabitsev

On Thu, Nov 09, 2023 at 02:45:08AM +, Eric Wong wrote:
> This seems like a easy (but WWW-specific) way to get recent
> topics as suggested by Konstantin.  Perhaps an Atom endpoint
> will also be useful.

Yes, actually thinking about this some more, perhaps it makes sense to expose
this as an RSS feed feature (maybe even exclusively as an RSS feed feature?).
Have two different feeds:

- new topics: just all the new threads
- hot topics: NN most active threads (kinda lkml.org's "hottest messages")

Have this available per-list and for the extindex -- I think this would be
a great feature that we can point people at as a mechanism to keep an eye on
overall activity.

I haven't tried your patch yet -- doubt I will be able before coming back from
Plumbers next week.

-K

Query to see all new "topics"

2023-11-07 Thread Konstantin Ryabitsev

Hello:

Following the discussion on the ksummit list [1], I wanted to give someone a 
query
they could use to keep an eye on any new threads. Is there a xapian query that
can be used to effectively say "return just top-level messages and exclude any
follow-ups"? It's not quite as simple as "s:* AND NOT s:Re:" because we also
want to exclude threaded patches. Some kind of equivalent of "any messages
without an in-reply-to/references header"?

-K

[1] 
https://lore.kernel.org/ksummit/20231106-venomous-raccoon-of-wealth-acc57c@nitro/T/

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-07 Thread Konstantin Ryabitsev

On Tue, Nov 07, 2023 at 09:57:55AM +, Salil Mehta wrote:
> > On Mon, Nov 06, 2023 at 11:49:02AM -0500, Michael S. Tsirkin wrote:
> > > > 2023-10-13 10:51 ` [PATCH V6 3/9] hw/acpi: Add ACPI CPU hotplug init 
> > > > stub Salil Mehta via
> > > >  [not found]   
> > > > `<20231027150536.3c481...@imammedo.users.ipa.redhat.com>---> why is 
> > > > this?
> > 
> > Unhelpfully, because the archiver address never received that, at least not
> > according to the logs.
> > 
> > I see that message-id being delivered to other subscribers with kernel.org
> > addresses, just never to the archiver.
> 
> I can assure you that I saw these links present and working last Tuesday
> and as mentioned earlier I did had an internal discussion using these links
> as well. They have disappeared in between.

Sorry, but this is not possible short of someone running a very specific
command on lore (as administrator) to actually rewrite git history in the
underlying repo. We only do this for GDPR requests or to remove
illegal/abusive messages.

E.g. these are all the messages we have in that repo from Igor:
https://erol.kernel.org/qemu-devel/git/2/log/?qt=author=mammedov

-K

Re: [Question] review links are disappearing from the qemu-devel mailing-list

2023-11-06 Thread Konstantin Ryabitsev

On Mon, Nov 06, 2023 at 11:49:02AM -0500, Michael S. Tsirkin wrote:
> > 2023-10-13 10:51 ` [PATCH V6 3/9] hw/acpi: Add ACPI CPU hotplug init stub 
> > Salil Mehta via
> >  [not found]   ` 
> > <20231027150536.3c481...@imammedo.users.ipa.redhat.com>---> why is this?

Unhelpfully, because the archiver address never received that, at least not
according to the logs.

I see that message-id being delivered to other subscribers with kernel.org
addresses, just never to the archiver.

Sorry I can't be more helpful.

-K

pop3: does uuid need to be unique for all mailboxes?

2023-09-25 Thread Konstantin Ryabitsev

Hopefully an easy question:

If I'm subscribing to two different mailboxes via pop3, does the uuid part
need to be unique for each mailbox? For example, will the following cause any
problems because the uuid is the same?

- 2db99975-4977-4866-8819-d7fbbf0f9...@org.kernel.vger.netdev?initial_limit=10
- 2db99975-4977-4866-8819-d7fbbf0f9...@org.kernel.vger.bpf?initial_limit=10

-K

Re: [PATCH] pop3: support initial_limit parameter in mailbox name

2023-09-22 Thread Konstantin Ryabitsev

On Fri, Sep 22, 2023 at 02:18:17AM +, Eric Wong wrote:
> Subject: [PATCH] pop3: support initial_limit parameter in mailbox name

That looks good in my tests. Thanks!

Tested-by: Konstantin Ryabitsev 

-K

Re: [RFC] pop3: support `?limit=$NUM' parameter in mailbox name

2023-09-19 Thread Konstantin Ryabitsev

On Mon, Sep 18, 2023 at 09:14:22PM +, Eric Wong wrote:
> > Oh, I did notice what is probably unintentional behaviour -- passing
> > ?limit=XXX affects all mailbox access, not just the initial retrieval.
> > 
> > E.g. if I configured pop3 with ?limit=128, then leave for the weekend and
> > return on Monday, I will only be able to retrieve 128 new messages, 
> > regardless
> > of how many arrived over the weekend.
> > 
> > I'm not sure if this is what was intended -- I think it makes more sense to
> > have ?limit=XXX only affect the initial retrieval. In all other cases, when 
> > a
> > tracking uuid cookie is present, it should return all messages regardless of
> > ?limit=.
> > 
> > Does that make sense?
> 
> I think there should be an initial_limit parameter in addition to the
> current limit.   initial_limit would be more suited for cronjobs and
> such running on 24/7 systems.  The regular limit would be better
> for systems with intermittent access and could go weeks w/o being
> online (including situations where somebody restored a system from
> a months/years-old backup).

I'm game with that. Maybe even shorten that to l= and il=? I'm still worried
about the field size limit a bit.

> Not feeling well, will try to work on it once (or if) I feel better.

Please take care!

-K

Re: [RFC] pop3: support `?limit=$NUM' parameter in mailbox name

2023-09-18 Thread Konstantin Ryabitsev

On Fri, Sep 15, 2023 at 08:41:10PM +, Eric Wong wrote:
> Thanks, pushed the series as
> a37e3ab3740c24c3 (pop3: limit default mailbox to 1K messages, 2023-09-14)
> 392d251f97d46579 (pop3: support `?limit=$NUM' parameter in mailbox name, 
> 2023-09-12)

Oh, I did notice what is probably unintentional behaviour -- passing
?limit=XXX affects all mailbox access, not just the initial retrieval.

E.g. if I configured pop3 with ?limit=128, then leave for the weekend and
return on Monday, I will only be able to retrieve 128 new messages, regardless
of how many arrived over the weekend.

I'm not sure if this is what was intended -- I think it makes more sense to
have ?limit=XXX only affect the initial retrieval. In all other cases, when a
tracking uuid cookie is present, it should return all messages regardless of
?limit=.

Does that make sense?

-K

RFC: lei searches managed by users in git

2023-09-15 Thread Konstantin Ryabitsev

Hello:

I am curious what is the best approach to have a centrally managed set of lei
searches, for example via config files tracked in git. For example, the file
could look like this:

mricon.toml:

[search.torvalds]
# All mail sent by torvalds
q = 'f:torva...@linux-foundation.org'
[search.floppy]
# Any messages talking about floppies or touching floppy code
q = 'dfhh:floppy_* OR dfn:drivers/block/floppy.c OR s:floppy OR 
((nq:bug OR nq:regression) AND nq:floppy)'

I could then have a small wrapper maintaining saved searches and making the
mailboxes available via special newsgroups like:

org.kernel.lei.mricon.torvalds
org.kernel.lei.mricon.floppy

The goal is to make it possible for maintainers to define their own set of
saved searches and have access to them at kernel.org via imap/pop3/nntp.

It's easy to write a simple wrapper that would invoke lei-edit-search and
replace the search string when there are updates to the config files, but I'm
curious if you already have thoughts on how to best implement something like
this.

My biggest concern is someone committing an invalid query and not receiving
any more email as a result -- so having a sane way to validate the query
before sticking it into the saved search would be handy.

-K

Re: [RFC] pop3: support `?limit=$NUM' parameter in mailbox name

2023-09-15 Thread Konstantin Ryabitsev

On Thu, Sep 14, 2023 at 12:38:28AM +, Eric Wong wrote:
> > My initial target for deploying POP3 support is to allow Gmail users to
> > pull-subscribe to mailing lists, since Gmail is the #1 provider that we have
> > trouble with message delivery due to their draconian threshold limits.
> > However, I think if the default behaviour results in dumping 50,000 messages
> > into people's inboxes, they wouldn't use it, which is why I think we should
> > have a default that is lighter both on the server side and on the users.
> 
> OK, I think this could work (goes on top of my previous limit patch):

Yes, it looks good in my tests:

- specifying the username as `[uuid]@org.kernel.vger.linux-kernel` downloads
  1000 messages
- specifying the username as `[uuid]@org.kernel.vger.linux-kernel?limit=128`
  properly downloads only 128
- tested in both Claws-mail and Thunderbird

Tested-by: Konstantin Ryabitsev 

Thanks!

-K

Re: [RFC] pop3: support `?limit=$NUM' parameter in mailbox name

2023-09-15 Thread Konstantin Ryabitsev

On Wed, Sep 13, 2023 at 10:03:26PM +, Eric Wong wrote:
> > What if we move the uuid into the password field -- it seems it belongs 
> > there
> > anyway, as it's tied to the user cookie.
> 
> I've thought about that, too; but it can get tricky since passwords
> aren't visible in most UIs.  I've also seen some UIs (not POP3) which
> forbid copy+paste in password fields.
> 
> Furthermore, if a user wants to migrate to a different POP3 client;
> carrying their UUID with them is easier when it's readable in the
> username.  (I'm assuming users won't be bothered backup their UUID
> anywhere)

That makes sense.

> I'm open to supporting both ways; but I'm also not inclined to
> do so unless there's evidence of real-world POP3 clients being
> unable to handle the user names.
> 
> Documenting both ways can be overwhelming to users.

Yes, let's keep it as-is -- I'll test the patches shortly and follow up with
details.

-K

Re: [RFC] pop3: support `?limit=$NUM' parameter in mailbox name

2023-09-13 Thread Konstantin Ryabitsev

On Tue, Sep 12, 2023 at 10:40:34PM +, Eric Wong wrote:
> Perhaps 50K is too much?  I figured clients would have a way to
> limit that, but I don't really pay attention to POP3 clients...

The few clients I looked at didn't give any option to specify how many remote
messages I want to retrieve, so I think defaulting to 50,000 is not the right
approach. Maybe the default limit should be something "last 7 days or 1000
messages, whichever is larger"?

My initial target for deploying POP3 support is to allow Gmail users to
pull-subscribe to mailing lists, since Gmail is the #1 provider that we have
trouble with message delivery due to their draconian threshold limits.
However, I think if the default behaviour results in dumping 50,000 messages
into people's inboxes, they wouldn't use it, which is why I think we should
have a default that is lighter both on the server side and on the users.

-K

pop3 usability thoughts

2023-09-12 Thread Konstantin Ryabitsev

Hello:

I've been playing around with pop3, and I'm wondering if we can improve its
usability by adding a "last NNN messages" pseudo-folder. Currently, if someone
wants to access the git mailing list archive via pop3, they have to do the
following:

- know that the username should be $(uuidgen)@org.kernel.vger.git.1 (the
  default username would access slice 0, right? Or is it the last 50,000
  messages?)
- wait for their client to retrieve tens of thousands of unread messages on
  first access
- if the remote archive rolls over to the next slice, they have to edit their
  account info to get new messages (unless I'm wrong about #1)

Perhaps the default could be slightly different:

- $(uuidgen)@org.kernel.vger.git would start with an empty view (or something
  like the last 10 messages)
- it would only get any new messages added to the archive

I think this would be a friendlier experience, but not sure how difficult it
would be to implement. I'm also not 100% sure all my assumptions are correct,
so please feel free to correct me.

Best wishes,
-K

Re: [PATCH] www: use correct threadid for per-thread search

2023-06-21 Thread Konstantin Ryabitsev

On Fri, Jun 16, 2023 at 11:13:01PM +, Eric Wong wrote:
> > Reviving this old thread for some clarification. I noticed that this only
> > works for /all/, but not for individual inboxes. E.g.:
> > 
> > $ curl -d '' -sSf \
> >   https://lore.kernel.org/all/"$MSGID/?x=m=rt:2023-03-29..; \
> >   | zgrep -i ^Message-ID:
> > Message-ID: 
> > 
> > but with /lkml/ I get a 404:
> > 
> > $ curl -d '' -sSf \
> >   https://lore.kernel.org/lkml/"$MSGID/?x=m=rt:2023-03-29..; \
> >   | zgrep -i ^Message-ID:
> > curl: (22) The requested URL returned error: 404
> > 
> > Is that intentionally restricted to just extindex?
> 
> It's a bug, fix below and deployed to https://80x24.org/lore/

Indeed, looks good now. Thank you!

-K

Re: Cheap way to check for new messages in a thread

2023-06-16 Thread Konstantin Ryabitsev

On Thu, Mar 30, 2023 at 11:29:51AM +, Eric Wong wrote:
> This implements the mbox.gz retrieval.  I didn't want to deal
> with HTML nor figuring out how to expose more  elements,
> yet; but I figure mbox.gz is the most important.
> 
> Now deployed on 80x24.org/lore:
> 
> MSGID=20230327080502.GA570847@ziqianlu-desk2
> curl -d '' -sSf \
>https://80x24.org/lore/all/"$MSGID/?x=m=rt:2023-03-29..; | \
>zcat | grep -i ^Message-ID:

Eric:

Reviving this old thread for some clarification. I noticed that this only
works for /all/, but not for individual inboxes. E.g.:

$ curl -d '' -sSf \
  https://lore.kernel.org/all/"$MSGID/?x=m=rt:2023-03-29..; \
  | zgrep -i ^Message-ID:
Message-ID: 

but with /lkml/ I get a 404:

$ curl -d '' -sSf \
  https://lore.kernel.org/lkml/"$MSGID/?x=m=rt:2023-03-29..; \
  | zgrep -i ^Message-ID:
curl: (22) The requested URL returned error: 404

Is that intentionally restricted to just extindex?

-K

Re: Indicating the mirror's origin

2023-06-15 Thread Konstantin Ryabitsev

On Wed, Jun 14, 2023 at 11:50:15PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > Good day:
> > 
> > We've had a few requests to mirror public-inbox archives that originate on
> > other systems so they can also be searchable and viewable via 
> > lore.kernel.org.
> > I've been dragging my feet on these requests, because they are a potential
> > liability in terms of GDPR compliance.
> 
> I just tried using `git replace' for the first time:

I think I didn't quite convey my idea -- let me try to step back a bit.

What I have is lore.kernel.org, which is actually 3 different frontends all
pulling git repositories from some other source of origin. Currently, I have
two:

- lkml.kernel.org, which subscribes to external lists via regular SMTP
- subspace.kernel.org, which is our own mlmmj server and where public-inbox
  repositories are created via public-inbox-watch

Since we control both lkml and subspace, we are the origin of the data, so if
anyone requests archive removal, we can easily comply.

Now, I want to be able to add other external public-inbox repositories to be
mirrored on lore.kernel.org, but with some clear indication that we're not the
origin of that data, we're merely mirroring it. Any GDPR removal requests need
to be sent to $ORIGIN and we'll just propagate any changes.

>   git replace --edit $BLOB_OID

I don't want to go down that route, because while we can do such surgery on a
node, it would need to be rerun again if we bring up a new mirror node, and
it's almost guaranteed to be forgotten.

> I sometimes use the $INBOX_DIR/description file for that and it
> affects WWW and NNTP, but not IMAP/POP3.  I'm not sure if I want
> to reintroduce header injection in case there's some conflict
> with DKIM or other signature mechanisms[1]

I don't think we need to worry about it if we pick a header that's almost
certain to not be included in the default DKIM signature set.
X-Originally-Archived-At: or some other header is guaranteed to never be
signed.

-K

Re: Indicating the mirror's origin

2023-06-14 Thread Konstantin Ryabitsev

On Wed, Jun 14, 2023 at 10:18:57PM +0200, Uwe Kleine-König wrote:
> Hello Konstantin,
> 
> On Wed, Jun 14, 2023 at 02:42:15PM -0400, Konstantin Ryabitsev wrote:
> > We've had a few requests to mirror public-inbox archives that originate on
> > other systems so they can also be searchable and viewable via 
> > lore.kernel.org.
> > I've been dragging my feet on these requests, because they are a potential
> > liability in terms of GDPR compliance.
> 
> What is the relevant GDPR liability here? I assume someone who sent a
> mail to (say) bare...@lists.infradead.org can request that you remove
> your copy of that mail?!

It feels stupid to subscribe an archiver agent to the list when a public-inbox
repository is conveniently available for cloning. However, if we just clone
the repository over and integrate it into lore, we need to indicate that we're
just copying bits over from some other location -- I cannot delete things from
the archive on my own any more.

> (I wonder what should be the effect on lore.kernel.org if
> lore.barebox.org removes a mail. Should it disappear from the former,
> too?)

Yes, if we are mirroring the underlying archive git repositories, any deletes
you make on your end will propagate to us.

-K

Indicating the mirror's origin

2023-06-14 Thread Konstantin Ryabitsev

Good day:

We've had a few requests to mirror public-inbox archives that originate on
other systems so they can also be searchable and viewable via lore.kernel.org.
I've been dragging my feet on these requests, because they are a potential
liability in terms of GDPR compliance.

If we are merely mirroring the archive from some other location, then there
should be a clear indication of the origin of the data and contact information
of the maintainer of the remote archive where someone could send requests for
any data removal. It's best if this is visible both via the web view and in
raw messages retrieved via our service, e.g. via an "X-Archive-Origin:" header
or something similar.

Any thoughts on this issue?

CC'ing the folks who have been dutifully asking me to mirror their lists on
lore, and who I'm sure are sick and tired of me not getting any movement on
this issue.

-K

Re: https://lore.kernel.org/linux-mm/

2023-06-08 Thread Konstantin Ryabitsev

On Thu, Jun 08, 2023 at 12:40:52AM -0600, Jonathan Corbet wrote:
> > This is the reason why there's a gap from May 31 to June 6. If you would 
> > like
> > to contribute the missing messages, I'll be happy to feed them into the
> > archive.
> 
> The LWN archive - http://archive.lwn.net:8080/linux-mm/ - is intact as
> far as I know; feel free, as always, to pillage from it if that helps.

Oh, nice, I will dip into that, thank you!

-K

Re: https://lore.kernel.org/linux-mm/

2023-06-07 Thread Konstantin Ryabitsev

On Wed, Jun 07, 2023 at 10:41:18AM +0800, Yin, Fengwei wrote:
> > Messages on lore.kernel.org show up because an email address
> > they control receives messages from the linux-mm list.  Since
> > (I assume) kvack.org is controlled by someone else, kernel.org
> > needs to subscribe to the linux-mm list just like you or anyone
> > else.
> Oh. So the problem is kernel.org does not subscribe to linux-mm
> after May 31st? Thanks.

We had a mail server configuration problem that was a result of thousands of
zombie hosts all trying to relay mail via mail.kernel.org. This caused our
public RBL lookups to fail with "you're using us too much!" error, which
unfortunately fails in the worst possible ways -- by marking all RBL lookups
as spam.

Our monitoring quickly alerted us to this, but unfortunately we did bounce
pretty much all incoming mail for about 10-15 minutes. If there was a patch
series coming in during that time, that would have generated enough bounces to
cause the archiver address to be unsubscribed.

This is the reason why there's a gap from May 31 to June 6. If you would like
to contribute the missing messages, I'll be happy to feed them into the
archive.

-K

Re: Threaded responses to queries (x86)

2023-05-31 Thread Konstantin Ryabitsev

On Wed, May 31, 2023 at 07:50:53AM +, Eric Wong wrote:
> I wonder if lore can expose x...@kernel.org as its own inbox even
> if it's technically not a subscribable mailing list.  That would
> be much faster (but less space-efficient) than issuing a Xapian
> query to get x...@kernel.org mails.

I'm not even sure why it's not a real list. Let me follow up with them to see
if we should just convert it.

-K

Re: search by whole thread?

2023-04-12 Thread Konstantin Ryabitsev

On Wed, Apr 12, 2023 at 12:06:53AM +, Eric Wong wrote:
> I think the reason it's rare in MUAs is that it's potentially
> very expensive.  But I think the `thread:{subquery}' feature
> from notmuch I discussed with Konstantin the other week[1] can
> do what you want it to do.
> 
> Keep in mind, notmuch-search-terms(7) states:
> 
>   The performance of such queries can vary wildly.
> 
> And that's for a private client tool for a single user.

Yes, when I was wondering about that, it was really for the lei side of
things. I don't really want to run expensive queries on lore (though I'm okay
if we can turn it off for /all/ or other very large lists).

-K

Re: Cheap way to check for new messages in a thread

2023-03-30 Thread Konstantin Ryabitsev

On Thu, Mar 30, 2023 at 11:29:51AM +, Eric Wong wrote:
> > Per-thread search is something I've wanted for a while, anyways,
> > so I think I'll do /$MSGID/?q= in between ongoing work for
> 
> This implements the mbox.gz retrieval.  I didn't want to deal
> with HTML nor figuring out how to expose more  elements,
> yet; but I figure mbox.gz is the most important.

Nice, thanks!

I can't easily test this, because lore is currently mostly on 1.9 and the
patch doesn't cleanly apply to that tree. However, I will be happy to test it
out once 2.0 is out and we've updated to it on our systems.

Cheers,
-K

Re: Cheap way to check for new messages in a thread

2023-03-28 Thread Konstantin Ryabitsev

On Tue, Mar 28, 2023 at 10:08:30PM +, Eric Wong wrote:
> > I think this is a workable approach, but would require a reindex, right?
> 
> Yes, it requires a reindex to take effect, which takes ~2 days
> on my lore mirror.  The biggest problem is MUAs are likely to
> cull References: when threads get too long; so accuracy gets
> lost.
> 
> Supporting /$MSGID/?q=... doesn't seem like the worst idea,
> actually; since I've seen some web forums (phpBB maybe?) have a
> "search in thread" function.
> 
> thread:{sub-query} is ideal; and I wouldn't rule out doing any
> combination of the three (I don't like separating before/after).

I'm fine with either of these, and just to stress, it's not really blocking
anything I'm working on -- bugbot is in initial rollout stages, so while the
number of tracked bugs/threads remains low, even if we re-download a hundred
threads every 10 minutes, it's just internal churn between two adjacent VMs.
If it becomes heavy, I can always look into switching to lei and performing
local queries instead of doing external polling.

However, if you do want to add ability to cheaply do a "give me just the
newest messages in this thread since this datetime", that would be great for
my needs. :)

-K

Re: Cheap way to check for new messages in a thread

2023-03-28 Thread Konstantin Ryabitsev

On Tue, Mar 28, 2023 at 07:45:49PM +, Eric Wong wrote:
> C) index References:/In-Reply-To: so searching `ref:$MSGID'
>can work.  This doesn't work for some MUAs and deep
>threads, though.

I think this is a workable approach, but would require a reindex, right?

-K

Re: Cheap way to check for new messages in a thread

2023-03-28 Thread Konstantin Ryabitsev

On Mon, Mar 27, 2023 at 09:38:49PM +, Eric Wong wrote:
> I thought about that, too; but I'm worried about having one-off
> stuff that ends up needing to be supported indefinitely.
> 
> JMAP for this would take more time, but I'd be more comfortable
> carrying it long-term.
> 
> I don't expect trimming after the first paragraph to be a huge
> improvement.  Retrieving any part of the message from git and
> dealing with MIME is expensive, anyways.  I wouldn't expect it
> to be a big (if any) improvement compared to POST-ing for the
> mbox.gz (=m=1) endpoint with rt:$SINCE..

Hmm... This didn't seem to do the right thing for me. For example, this
thread:

https://lore.kernel.org/lkml/20230327080502.GA570847@ziqianlu-desk2

If I ask for any new messages in that thread since 2023032712, I get
nothing:

curl -Sf -d '' 
'https://lore.kernel.org/all/?x=m=1=mid%3A20230327080502.GA570847@ziqianlu-desk2+AND+dt%3A2023032812..'

> The mbox.gz endpoints should be a bit more efficient for the
> server than Atom feeds; decoding MIME and HTML escaping takes up
> considerable CPU time.

Good to know. I'm really looking for a way to ask the remote system "hey, is
there anything new in this thread?" so that I can quickly ignore threads
without any updates.

-K

Re: Cheap way to check for new messages in a thread

2023-03-27 Thread Konstantin Ryabitsev

On Mon, Mar 27, 2023 at 07:10:49PM +, Eric Wong wrote:
> > For the bugzilla integration work I'm doing, I need a way to check if there
> > were any updates to a thread since the last check. Right now, I'm just
> > grabbing the full thread, parsing it and seeing if there are any new
> > message-IDs that we don't know about, but it's very wasteful. Any way to 
> > just
> > issue something like "how many messages are in a thread with this 
> > message-id"
> > or "are there any updates to a thread with this message-id since
> > MMDDHHMMSS?
> 
>   lei q -t --only /path/to/(inbox|extindex) mid:$MSGID rt:APPROXIDATE..
> 
> Returns JSON and won't retrieve message bodies from git.

Ah, I was hoping to have a fully remote way of doing this.

> I wouldn't query down to the second due to propagation delays,
> clock skew, etc, though.
> 
> There might be a JMAP endpoint I can implement for WWW which
> only retrieves that info, but getting backreferences (required
> by the JMAP spec) to work properly seemed painful.

What about a "bodiless" atom feed? It's already available per thread, so
perhaps there could be a mode that skips the bodies or trims them after the
first paragraph?

-K

Cheap way to check for new messages in a thread

2023-03-27 Thread Konstantin Ryabitsev

Hello:

For the bugzilla integration work I'm doing, I need a way to check if there
were any updates to a thread since the last check. Right now, I'm just
grabbing the full thread, parsing it and seeing if there are any new
message-IDs that we don't know about, but it's very wasteful. Any way to just
issue something like "how many messages are in a thread with this message-id"
or "are there any updates to a thread with this message-id since
MMDDHHMMSS?

-K

Re: IMAP users: how useful is server-side search to you?

2023-03-15 Thread Konstantin Ryabitsev

On Tue, Mar 14, 2023 at 09:55:02PM +, Eric Wong wrote:
> I've always used local search (lei nowadays, mairix in the past).
> 
> I'm considering an option to disable it, or make it available to
> AUTH=ANONYMOUS users only, since there seems to be a lot of
> scrapers trying to look for private emails and such...

I think at this point there are few enough users of IMAP that you can just
make unilateral decisions and everyone will be fine with that. :)

-K

Re: [PATCH] lei_mirror: unlink FETCH_HEAD when fetching forkgroups

2023-03-08 Thread Konstantin Ryabitsev

On Wed, Mar 08, 2023 at 11:02:58AM +, Eric Wong wrote:
> Apparently, --no-write-fetch-head is broken in current git[1].
> It also wasn't in older git, at all.  So just unlink FETCH_HEAD
> as we see it, but keep using --no-write-fetch-head to avoid the
> syscall and I/O overhead when we can.

I'm pretty sure --no-write-fetch-head was added in response to me asking why
it's needed for bare repos in the first place. In grokmirror, we symlink it to
/dev/null, but you already know this probably.

-K

Re: ActivityPub <=> email bridge?

2023-03-07 Thread Konstantin Ryabitsev

On Tue, Mar 07, 2023 at 10:12:10PM +, Eric Wong wrote:
> > Something tells me that if ActivityPub reaches high-enough
> > adoption levels; it'll have to deal with a spam problem that
> > email folks have been dealing with for decades, too.
> > 
> > So ActivityPub seems like a duplicated effort as far as it's use
> > for messaging for software development goes...
> 
> Still true, but it seems to have caught on, lately...
> 
> If we manage to try this, it'll be using AP as a transport layer
> and still requiring plain-text and RFC5322 (or 822/2822) plain-text
> messages compatible with git-am.

I'm not sure about the bridge, but I would very much welcome ability to
archive activitypub messages in a public-inbox archive, with full threading.

> That would allow SpamAssassin or similar to perform spam
> filtering w/o modification.
> 
> Allowing markup or images from arbitrary posters is a nightmare
> in terms of spam, phishing and illegal content, though; so
> normal mailing list etiquette still applies.

Perhaps it's possible to allow attachments from specific instances, but the
default for any federated content is just plaintext content?

-K

Re: POP3 adoption (was: [PATCH 1/2] doc|www: flesh out POP3 documentation for servers) and users

2023-02-06 Thread Konstantin Ryabitsev

On Mon, Feb 06, 2023 at 06:03:38AM +, Eric Wong wrote:
> > +The password is: anonymous
> > +The username is: \$(uuidgen)\@$ctx->{ibx}->{newsgroup}
> > +where \$(uuidgen) in the output of the `uuidgen' command on your system.
> > +The UUID in the username functions as a private cookie (don't share it).
> > +Idle accounts will expire periodically.
> 
> I just checked my POP3 instance on public-inbox.org; and not a
> single user has used it.  I guess POP3 really is unpopular :<

I think it's because it's not a widely known feature. I intend to make it
available soon via lore.kernel.org, which should allow people to subscribe to
lists via gmail's POP3 integration.

I just need to implement the unified daemon features so we don't manage 5
different systemd services.

-K

Re: [PATCH] eml: header_raw converts octets to Perl UTF-8

2022-11-25 Thread Konstantin Ryabitsev

On Thu, Nov 24, 2022 at 09:31:55PM +, Eric Wong wrote:
> The below case generalizes it to all HTML displays and removes
> the special case.

It looks good to me in some cursory tests, thank you!

-K

handling unquoted utf8 in the headers

2022-11-24 Thread Konstantin Ryabitsev

Hello:

There's a bit of inconsistency handling messages with utf8 content in the
headers:

https://lore.kernel.org/b4-sent/20221122-gud-shadow-plane-v1-0-9de3afa33...@tronnes.org/

You can see that the name in the From: line is mangled, but in the thread
overview it is displayed correctly.

I know older SMTP standards still require 7bit escaping in the headers, but
with SMTPUTF8 being very widely available, it should be possible to store and
properly display messages with 8bit unicode in the headers.

-K

Re: [PATCH 0/4] CentOS 7 fixes + fix Gcf2 everywhere

2022-09-29 Thread Konstantin Ryabitsev

On Thu, Sep 29, 2022 at 05:48:27PM +, Eric Wong wrote:
> A bunch of CentOS 7.x fixes noted by Konstantin
> I also just noticed I broke Gcf2 everywhere :x

The tests pass now, thanks!

I still noticed this error (master + this series):

1 at /home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 445.
(repeated a bunch of times)

Cheers,
-K

Re: "make test" for 1.9.0 on centos-7

2022-09-28 Thread Konstantin Ryabitsev

On Wed, Sep 28, 2022 at 07:59:39PM +, Eric Wong wrote:
> > I'm starting to work on upgrading lore.kernel.org to 1.9.0. For a number of
> > yak-shavey reasons we are still on centos-7, though hopefully we'll be able 
> > to
> > move on to something newer soon. Right now, I'm having difficulty running
> > "make test":
> 
> Eep, ok there's a lot of breakages on CentOS 7, likely going
> back to our 1.7/1.8 releases and my C7 VM was broken/forgotten :x
> Working on fixes now...

Ok, no worries -- I know that Centos-7 is getting pretty old. :) Note, that in
our particular case most perl dependencies are installed with cpanm, not with
RPMs, and we're using newer git and xapian-1.4.

Best regards,
Konstantin

"make test" for 1.9.0 on centos-7

2022-09-28 Thread Konstantin Ryabitsev

Hello:

I'm starting to work on upgrading lore.kernel.org to 1.9.0. For a number of
yak-shavey reasons we are still on centos-7, though hopefully we'll be able to
move on to something newer soon. Right now, I'm having difficulty running
"make test":

... [skipping OK tests] ...
t/cmd_ipc.t .. 1/? sleeping on sendmsg: Too many 
references: cannot splice (#1)
sleeping on sendmsg: Too many references: cannot splice (#2)
... (repeats 49 times) ...

t/cmd_ipc.t .. 24/? # sent 2097152, retrying with more
sleeping on sendmsg: No buffer space available (#1)
... (repeats 49 times, but much slower) ...

t/cmd_ipc.t .. 29/? Use of uninitialized value $_[1] in 
vec at /home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 457.
Use of uninitialized value $_[1] in scalar assignment at 
/home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 457.
Use of uninitialized value $_[1] in vec at 
/home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 457.
Use of uninitialized value $_[1] in scalar assignment at 
/home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 457.
1 at /home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 445.
... (repeats a lot of times) ...

t/cmd_ipc.t .. 66/? # sent 2097152, retrying with more
1 at /home/mricon/public-inbox/blib/lib/PublicInbox/Syscall.pm line 445.
... (repeats a lot of times) ...

... [skipped some OK tests] ...

t/extsearch.t  5/? # inherited [::1]:45080 fd=3
Can't call method "xpath" on an undefined value at t/extsearch.t line 
132.
# Tests were run but no plan was declared and done_testing() was not seen.
t/extsearch.t  Dubious, test returned 254 (wstat 65024, 
0xfe00)
... [skipped some OK tests] ...

t/httpd-corner.t . 1/? # inherited [::1]:46301 fd=3
# inherited /tmp/pi-httpd-corner-12549-8Xh4/s fd=4
# inherited [::1]:45630 fd=5
# http://[::1]:45630 psgi=t/alt.psgi
# http://[::1]:45630 err=/tmp/pi-httpd-corner-12549-8Xh4/alt.err
# inherited [::1]:46301 fd=3
# inherited /tmp/pi-httpd-corner-12549-8Xh4/s fd=4
# inherited [::1]:45630 fd=5
# http://[::1]:45630 psgi=t/alt.psgi
# http://[::1]:45630 err=/tmp/pi-httpd-corner-12549-8Xh4/alt.err
# inherited [::1]:46301 fd=3
# inherited /tmp/pi-httpd-corner-12549-8Xh4/s fd=4
# inherited [::1]:45630 fd=5
# http://[::1]:45630 psgi=t/alt.psgi
# http://[::1]:45630 err=/tmp/pi-httpd-corner-12549-8Xh4/alt.err
... [seems to hang here forever] ...

Any pointers where I should start looking?

Best wishes,
-Konstantin

Re: extindex for git? [was: an even bigger git show than before...]

2022-08-26 Thread Konstantin Ryabitsev

On Thu, Aug 25, 2022 at 09:34:42PM +, Eric Wong wrote:
> > I wanted to add search to git repos ages ago, but it was silly
> > expensive in terms of space.  That was before extindex...
> > 
> > extindex ought to be able to offer space savings across forks
> > and similar documents (commits vs patch mails).
> > 
> > At least dfpre/dfpost/dfn/subject may be enough, even...
> 
> And I'm also thinking extindexing coderepos can make
> auto-assocation with inboxes possible.
> 
> Right now, configuring coderepos on a large scale is a huge PITA
> given the M:N associations between inboxes and coderepos.
> 
> Being able to do fuzzy JOIN-ish operations based on
> blobs/filenames/subjects would allow extindex to automatically
> associate coderepos with inboxes and vice-versa.

I wonder how well this would work in the presence of many forks? E.g. most of
the content on git.kernel.org are thin forks of linux.git, so matching by
blobs/filenames/subjects across all of them would return too many hits and
some kind of priority ordering would be required, I think.

Overall, though, I do agree that this would be really handy.

-K

Re: APOP-only POP3 clients?

2022-05-27 Thread Konstantin Ryabitsev

On Fri, May 27, 2022 at 10:53:02AM +, Eric Wong wrote:
> Which may be an extremely long username...  Now I'm thinking
> it's safe for UUID_1 and UUID_2 to be the same, to save storage
> space on the server and to save users from dealing with
> excessively long, compression-unfriendly field entries. So,
> this:
>   username: $UUID@$NEWSGROUP.$SLICE
>   password: $UUID

1. I'd even say that we don't need all the entropy of uuid here and it's
   probably sufficient to grab 8-12 bytes from /dev/random and z-base-32
   encode them (z-base-32 to produce case-insensitive strings)
2. You can make password be "p" :)

-K

Re: RFC: hooks in public-inbox-watch

2022-05-12 Thread Konstantin Ryabitsev

On Thu, May 12, 2022 at 07:38:59PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > Hi, all:
> > 
> > What do you think about a mechanism to run hooks at the stage right before
> > public-inbox-watch adds a new message to the archive? One feature that would
> > be neat is to search archives for all instances of the same patch using its
> > $(git-patch-id --stable) and adding a header, e.g.:
> > 
> > X-Git-Patch-ID: 19c05284cea20b72b44c2b7e6cfd782a6a860cf1 
> > 
> 
> There shouldn't be any need for a header.  I've been meaning to
> teach -index to use git-patch-id and index it's output, anyways.
> That would work for old messages, too.

Sure, I'll take that, too. :)

Thanks,
-K

RFC: hooks in public-inbox-watch

2022-05-12 Thread Konstantin Ryabitsev

Hi, all:

What do you think about a mechanism to run hooks at the stage right before
public-inbox-watch adds a new message to the archive? One feature that would
be neat is to search archives for all instances of the same patch using its
$(git-patch-id --stable) and adding a header, e.g.:

X-Git-Patch-ID: 19c05284cea20b72b44c2b7e6cfd782a6a860cf1 


I know this can be done at the postfix stage, but seems like it would be more
efficient at the ingestion stage.

Maybe even instead of a hook this could be a native public-inbox feature, with
this header being indexed by default?

-K

Re: Trouble running lei

2022-05-03 Thread Konstantin Ryabitsev

On Tue, May 03, 2022 at 01:50:52PM +0100, Filipe Manana wrote:
> > Perhaps it's already running lei-daemon as an older version?
> > "lei daemon-kill" should kill it and it'll restart on the next
> > command, unless something else got wedged.
> 
> Ah, running "lei daemon-kill" fixed it.
> I don't know if I did something wrong before, but after running that,
> lei is now working fine.

I think this is actually a common occurrence. Any way lei-daemon can recognize
when there's a version mismatch between itself and the binary talking to it?

Regards,
-K'

Re: Issue with lore.kernel.org links

2022-03-07 Thread Konstantin Ryabitsev

On Mon, Mar 07, 2022 at 09:44:06AM -0600, Tom Lendacky wrote:
> Hi Konstantin,
> 
> Boris Petkov suggested I email you about an issue I'm having with some of
> the links on lore.kernel.org.  For example, I performed a search and was
> presented with a result page:
> 
> https://lore.kernel.org/all/Yh3r1PSx/fjqo...@nazgul.tnic/?q=Singh
> 
> Paging down slightly, you'll see the patch series links. If I click on any
> of them, for example, the link associated with the line:
> 
>   2022-02-24 16:55 ` [PATCH v11 08/45] x86/sev: " Brijesh Singh
>   
> https://lore.kernel.org/all/Yh3r1PSx/20220224165625.2175020-9-brijesh.si...@amd.com/
> 
>   (Note, that line should really read:
>2022-02-24 16:55 ` [PATCH v11 08/45] x86/sev: Detect/setup SEV/SME 
> features earlier in boot Brijesh Singh)
> 
> I'm taken to a page that reads:
> Message-ID 
> not found
> 
> Perhaps try an external site:
> ...
> 
> I'm not sure what the issue is, but wanted to let you know.

The problem is that the message-id has a '/' in it, and it's causing some
problems. E.g. if you escape the slash, all links work properly:
https://lore.kernel.org/all/yh3r1psx%2ffjqo...@nazgul.tnic/?q=Singh

I'm cc'ing the public-inbox list with the hopes that we can figure out what's
the best way to handle this situation.

Thanks,
-K

RFC: should lei inject its own "Received:" header?

2021-11-19 Thread Konstantin Ryabitsev

Hello:

I wonder if lei should inject its own "received"-like header on writing to a
maildir/imap target -- to indicate where the copy of the email came from. I
don't think it should use the actual Received: header, as this may cause some
weird SPF/DMARC issues, but perhaps something like:

X-Lei-Received: from https://lore.kernel.org/all/; Fri, 19 Nov 2021 
14:32:44 -0500

Just a random thought.

-K

Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)

2021-11-08 Thread Konstantin Ryabitsev

On Mon, Nov 08, 2021 at 09:48:36PM +, Eric Wong wrote:
> > Hmm... I noticed that when I `lei edit-search` the initial query that was
> > causing quoting issues, I get the following:
> > 
> > [lei]
> > q = (dfn:drivers OR dfn:arch OR dfn:Documentation OR 
> > dfn:include OR dfn:scripts) AND f:r...@kernel.org
> > 
> > So, the extra quotes didn't get added to the config file. Running `lei up` 
> > on
> > that saved search seems to do the right thing, so the erroneous quotes are
> > only added during the initial `lei q` call.
> 
> Right, each entry in lei.q is actually an entry in argv[].
> So the correct query should look something like:

So, to be clear here... the following doesn't work because instead of multiple
query parameters to 'lei q' the single-quoted string becomes a single
parameter?

lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches \
--threads --dedupe=mid \
'(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR 
dfn:scripts) AND f:r...@kernel.org'

Any way to make this work? I find that it's more easily readable than the
"echo | lei q" version.

For bash users, the following should work as well:

lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches \
--threads --dedupe=mid <<< \
'(dfn:drivers OR dfn:arch OR dfn:Documentation OR dfn:include OR 
dfn:scripts) AND f:r...@kernel.org'

Suggestion, can -I accept the URL containing the query, so that the command
becomes:

lei q -o ~/mail/foo --threads --dedupe=mid -I \
https://lore.kernel.org/all/?q=f%3Atorvalds+AND+nq%3Agarbage

This way we pass both the location of the extindex to query AND the parameters
we should use, avoiding shell quoting problems?

-K

Re: lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)

2021-11-08 Thread Konstantin Ryabitsev

On Mon, Nov 08, 2021 at 08:49:23PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:
> > 
> > Moving this to meta.
> 
> I don't think workflows should've been dropped, though.
> 
> > > > lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> > > >   --threads --dedupe=mid \
> > > >   '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> > > >   OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> > > >   AND rt:1.month.ago..'
> > > 
> > > I tried a similar one which I had working as a bookmark:
> 
> That's actually treating the entire single-quoted section as
> a phrase search for Xapian.

Hmm... I noticed that when I `lei edit-search` the initial query that was
causing quoting issues, I get the following:

[lei]
q = (dfn:drivers OR dfn:arch OR dfn:Documentation OR 
dfn:include OR dfn:scripts) AND f:r...@kernel.org

So, the extra quotes didn't get added to the config file. Running `lei up` on
that saved search seems to do the right thing, so the erroneous quotes are
only added during the initial `lei q` call.

> The correct way to use '(', ')', and '*' on the command-line for
> Xapian is to shell escape them:

But putting them into single quotes should accomplish the same result, no? At
least, that's how I've always understood shell escaping.

-K

lei: incorrect quoting on saved searches (was Re: lore+lei: getting started)

2021-11-08 Thread Konstantin Ryabitsev

On Mon, Nov 08, 2021 at 01:49:07PM -0600, Rob Herring wrote:

Moving this to meta.

> > lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
> >   --threads --dedupe=mid \
> >   '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
> >   OR ((nq:bug OR nq:regression) AND nq:floppy)) \
> >   AND rt:1.month.ago..'
> 
> I tried a similar one which I had working as a bookmark:
> 
> $ lei q -I https://lore.kernel.org/all/ -o ~/Mail/my-patches
> --threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation
> OR dfn:include OR dfn:scripts) AND f:r...@kernel.org'
> # /home/rob/.local/share/lei/store 0/0
> # /usr/bin/curl -Sf -s -d ''
> https://lore.kernel.org/all/?x=m=1=(dfn%3A%22drivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org%22
> # 0 written to /home/rob/Mail/my-patches/ (0 matches)

It's true, I get the same thing if I omit "AND rt:" at the end.

$ lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches 
--threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation OR 
dfn:include OR dfn:scripts) AND f:r...@kernel.org'
# /home/user/.local/share/lei/store 0/0
# /usr/bin/curl -Sf -s -d '' 
https://lore.kernel.org/all/?x=m=1=(dfn%3A%22drivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org%22
# 0 written to /home/user/work/temp/lei/robh-patches/ (0 matches)
$ lei forget-search ~/work/temp/lei/robh-patches
$ lei q -I https://lore.kernel.org/all/ -o ~/work/temp/lei/robh-patches 
--threads --dedupe=mid '(dfn:drivers OR dfn:arch OR dfn:Documentation OR 
dfn:include OR dfn:scripts) AND f:r...@kernel.org AND rt:1.month.ago..'
# /usr/bin/curl -Sf -s -d '' 
https://lore.kernel.org/all/?x=m=1=(dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1633724105..
# /home/user/.local/share/lei/store 13/13
# https://lore.kernel.org/all/ 65/?
# https://lore.kernel.org/all/ 75/75
# 45 written to /home/user/work/temp/lei/robh-patches/ (88 matches)

> It seems there is some problem in quoting. Notice the '%22' that's
> inserted in the url.

Deferring to Eric here.

> Also, the above query is a bit of a work-around as what I really want
> is just all patches from me. I haven't been able to get something to
> work. I've tried things like 'dfn:*' or 'dfn:/' or 'dfn:b/'.

I think 's:patch AND nq:diff' is a good option here.

-K

test: 'seen set from rename' potentially racy

2021-11-04 Thread Konstantin Ryabitsev

Hello:

I was having some trouble using "make check" on the Fedora build system:

#   Failed test 'seen set from rename'
#   at t/lei-watch.t line 61.
#   
'/tmp/pi-lei-watch-11080-FYz2/lei-daemon/md2/cur/9bf1002c49eb075df47247b74d69bcd555e23422=99:2,'
# doesn't match '(?^:S\z)'
# [
#   
'/tmp/pi-lei-watch-11080-FYz2/lei-daemon/md2/cur/9bf1002c49eb075df47247b74d69bcd555e23422=99:2,'
# ]

This wasn't happening in my own chroot builds, so I'm suspecting that the
faster system on the Fedora build infra side is causing this failure,
therefore perhaps some kind of race condition. 

You can see the entire log here:

https://download.copr.fedorainfracloud.org/results/icon/b4/fedora-35-x86_64/02928611-public-inbox/build.log.gz

Switching to "make test" seems to help, which is what I've done for now.

Cheers,
-K

Re: [ANNOUNCE] public-inbox 1.7.0

2021-11-04 Thread Konstantin Ryabitsev

On Thu, Nov 04, 2021 at 07:52:00AM +, Eric Wong wrote:
> Another big release focused on multi-inbox search and scalability.

Congratulations on the release, Eric! Happy to be one of the frontline users,
and I'm sure a lot more kernel devs will jump on it now that lei is becoming
more available to them.

-K

Initial Fedora packaging for lei

2021-11-02 Thread Konstantin Ryabitsev

Hi, all:

I did some initial work to package lei for Fedora 34 and 35 (out today). The
lei parts should be ready to use, though I'll continue to work on the server
parts (only needed if you're running httpd/nntpd/imapd daemons).

For now, you'll need to enable my copr repository to use it:

sudo dnf copr enable icon/b4
sudo dnf install lei

(you can also install python3-b4 from there if you're using b4)

I'm still working on some introductory docs, but you can use the following
resource to get yourself going with search-based lei goodness:

https://josefbacik.github.io/kernel/2021/10/18/lei-and-b4.html

-K


signature.asc
Description: PGP signature

Re: Troubleshooting threads missing from /all/

2021-10-26 Thread Konstantin Ryabitsev

On Sun, Oct 24, 2021 at 12:03:17AM +, Eric Wong wrote:
> > I used the default flags for --reindex --all --fast, so it can perhaps be 
> > sped
> > up with larger memory use, but this is good enough for daily runs already.
> 
> Cool.  Just wondering if all is well on your end with daily runs.

Yes, so far nothing else has come up and the runs are just quietly succeeding.
> 
> --reindex --all --fast hasn't found any work to do the past week
> or so on https://yhbt.net/lore/
> 
> I'm thinking about cutting a new release soonish and just
> putting giant warnings around the not-yet-ready parts of
> lei for now...

I think this may be a good plan, considering that lei is now getting more and
more attention from kernel devs and it would be convenient to be able to
provide a packaged version of it to install on popular distros.

Thanks,
-K

Re: Troubleshooting threads missing from /all/

2021-10-18 Thread Konstantin Ryabitsev

On Mon, Oct 18, 2021 at 05:25:26AM +, Eric Wong wrote:
> > Btw, I'm chasing a separate bug in v2 which causes recycled
> > Message-IDs to go missing sometimes from a v2 over.sqlite3;
> > which then causes -extindex to lose a message...
> 
> I just pushed out commit 325fbe26c3e7731e
> (v2: mirrors don't clobber msgs w/ reused Message-IDs, 2021-10-18)
> 
> Now I'm reindexing all my v2 inboxes before running
> "-extindex --all --reindex --fast".  Fortunately, v2 inboxes
> are all "-L basic" so they're not too expensive to reindex.

Okay, I guess I should plan the same, then. I'll see if I can pair this with
the switching over to the "basic" indexing for individual inboxes.

-K

Re: Troubleshooting threads missing from /all/

2021-10-18 Thread Konstantin Ryabitsev

On Sat, Oct 16, 2021 at 09:43:24AM +, Eric Wong wrote:
> With --fast, --reindex takes around 20 minutes for me with
> "--batch-size=20m --no-fsync".  The first run may take longer
> if it has stuff to do.  But running it repeatedly should not
> cause it to complain about unseen/stale/mismatched messages
> (likely the first run will).

I just ran it across the 3 lore nodes and the results have been fairly
consistent:

- 16m for the initial run where it finds a few hundred things to fix
- 14m for the subsequent runs

I used the default flags for --reindex --all --fast, so it can perhaps be sped
up with larger memory use, but this is good enough for daily runs already.

-K

Re: Troubleshooting threads missing from /all/

2021-10-17 Thread Konstantin Ryabitsev

On Sat, Oct 16, 2021 at 09:43:24AM +, Eric Wong wrote:
> Eric Wong  wrote:
> > Yes.  Though given the current situation with missing messages
> > from /all/, I'd wait until a reindex recovers the missing
> > messages (and probably a fast fsck checker).
> 
> I think "public-inbox-extindex --reindex --all --fast" is
> reasonably ready as an fsck checker.  I've been running it a
> bunch in recent days/weeks and also found+fixed some other bugs
> along the way.

Thanks, Eric! I've been out this week for some family time (it was
Thanksgiving in Canada), which is why I was staying conspicuously silent. :)
I'll give --reindex --fast a whirl in the next few days.

> With --fast, --reindex takes around 20 minutes for me with
> "--batch-size=20m --no-fsync".  The first run may take longer
> if it has stuff to do.  But running it repeatedly should not
> cause it to complain about unseen/stale/mismatched messages
> (likely the first run will).
> 
> So it's not /really/ fast, but compared to ~35 hours w/o --fast,
> then it's alright.   Either way, --reindex should be safe
> with parallel -index and -extindex being run by cronjobs.

Absolutely, thanks for working on this.

-K

Re: Troubleshooting threads missing from /all/

2021-10-08 Thread Konstantin Ryabitsev

On Thu, Oct 07, 2021 at 09:33:07PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > [publicinbox "regressions"]
> >   address = regressi...@lists.linux.dev
> >   url = regressions
> >   inboxdir = /srv/public-inbox/lore.kernel.org/regressions
> >   indexlevel = full
> 
> Btw, "indexlevel = basic" ought to be sufficient if an inbox
> is in extindex once bugs are ironed out.  full/medium is
> of course helpful if messages are missing from extindex,
> though...

That would save tons of space for sure. How does that work, would the search
box still be available on per-list basis? Does it just use extindex and
additionally filter by list-id or some similar parameter?

To switch from the current setup where every list has its own full index, is
it sufficient to just set indexlevel = basic and delete the xapian db?

Thanks,
-K

Re: Troubleshooting threads missing from /all/

2021-10-07 Thread Konstantin Ryabitsev

On Thu, Oct 07, 2021 at 08:36:52AM +, Eric Wong wrote:
> Also, did you capture any error messages to stderr?
> I suppose you would've told us if you did.

Yeah, I looked through any place that would have logged an error and I didn't
really see anything. I expect this would have happened during an extindex run,
but I didn't see any non-zero exits when I looked through the logs.

Regarding reindex -- is that something that would make sense to do
occasionally simply for potential improvements, e.g. similarly to how we
periodically repack repos with -f for better packs? Or would that be pointless
churn in the context of xapian?

-K

Re: Troubleshooting threads missing from /all/

2021-10-05 Thread Konstantin Ryabitsev

On Tue, Oct 05, 2021 at 04:39:54AM +, Eric Wong wrote:
> Eric Wong  wrote:
> > b) just reindex in place (it /should/ work...)
> 
> I reindexing live on yhbt/lore and it didn't break...
> 
> Btw, did you see my other questions about whether or not boost
> was in use?

Yes, but I was attending the Linux Security Summit last week, so my attention
was all over the place. We do use boost values there. Looking at that
particular message, it was sent to regressions and linux-wireless, which have
different boost values:

[publicinbox "regressions"]
  address = regressi...@lists.linux.dev
  url = regressions
  inboxdir = /srv/public-inbox/lore.kernel.org/regressions
  indexlevel = full
  newsgroup = dev.linux.lists.regressions
  boost = 11
  listid = regressions.lists.linux.dev

[publicinbox "linux-wireless"]
  address = linux-wirel...@vger.kernel.org
  url = linux-wireless
  inboxdir = /srv/public-inbox/lore.kernel.org/linux-wireless
  indexlevel = full
  newsgroup = org.kernel.vger.linux-wireless
  boost = 10
  listid = linux-wireless.vger.kernel.org

we give lists.linux.dev a higher boost value because we populate it straight
from watched mlmmj archive dirs and it's more likely to have "truest" headers.

-K

Re: Troubleshooting threads missing from /all/

2021-10-01 Thread Konstantin Ryabitsev

On Fri, Oct 01, 2021 at 10:25:09PM +, Eric Wong wrote:
>   export HOME=/tmp/trash # fresh lei/store instance
>   M=87czop5j33@tynnyri.adurom.net
>   lei import https://yhbt.net/lore/all/$M/t.mbox.gz
>   lei q z:0.. | wc -l # should have all (11) msgs
>   lei q m:$M -t | wc -l # should have the same msgs (11)

Yes, both are reporting 11.

> Can you confirm the above gives all 11 msgs for you?
> 
> I am running an -extindex --reindex on lore/all, though;
> hopefully it doesn't break anything.

I'll also be happy to provide the extindex, though that's a bit on the largish
side at almost 225G. :) Just let me know and I'll see if I can set up a
temporary VM with access for you.

-K

Re: Troubleshooting threads missing from /all/

2021-10-01 Thread Konstantin Ryabitsev

On Fri, Oct 01, 2021 at 08:54:42PM +, Eric Wong wrote:
> Oops, inspect doesn't work well w/o initialization (it should).
> Running "lei init" first should workaround it, for now.

Yes, that fixes the problem, but it still doesn't return much:

{
   "mid" : "87czop5j33@tynnyri.adurom.net"
}

-K

Re: Troubleshooting threads missing from /all/

2021-10-01 Thread Konstantin Ryabitsev

On Fri, Oct 01, 2021 at 08:41:31PM +, Eric Wong wrote:
> Curious, what is the output of:
> 
>lei inspect --dir /path/to/all mid:87czop5j33@tynnyri.adurom.net
> 
> for you?

Not much. :)

lei inspect --dir /srv/public-inbox/extindex 
mid:87czop5j33@tynnyri.adurom.net
48242 lei-inspect worker wq_worker: Can't call method "can" on an 
undefined value at /usr/local/share/perl5/PublicInbox/LeiInspect.pm line 156.

I seem to get the same error for a message-id that *is* present in /all/

The tree is at af774d3bb0d728f2f37c418b8c3e215f1d4d860f

-K

Re: Troubleshooting threads missing from /all/

2021-10-01 Thread Konstantin Ryabitsev

On Fri, Oct 01, 2021 at 09:05:27AM -0400, Konstantin Ryabitsev wrote:
> I was told about the following problem today:
> 
> The following thread:
> https://lore.kernel.org/regressions/87czop5j33@tynnyri.adurom.net/
> 
> Doesn't appear to show up in /all/:
> https://lore.kernel.org/all/87czop5j33@tynnyri.adurom.net/

One thing I noticed is that this returns a 300, not a 404:

  HTTP/1.1 300 Multiple Choices

However, this seems to happen for all unknown message-ids (e.g.
lore.kernel.org/all/bogus@bogus), so probably entirely unrelated.

-K

Re: Troubleshooting threads missing from /all/

2021-10-01 Thread Konstantin Ryabitsev

On Fri, Oct 01, 2021 at 07:58:11PM +, Eric Wong wrote:
> Are you running "public-inbox-extindex --all"?
> Or relying on "public-inbox-index -E ..."? (which is automatic
> for /all/?).

After each repository update, we run:

public-inbox-index --no-update-extindex

And at the end of each grokmirror run, we perform a single:

public-inbox-extindex --all

I believe this is what you suggested a while back to minimize disk thrashing.

-K

Re: [PATCH 0/3] fixes for odd/old/missing dependencies

2021-09-27 Thread Konstantin Ryabitsev

On Mon, Sep 27, 2021 at 04:05:42PM -0500, Eric Wong wrote:
> On Mon, 27 Sep 2021 15:45:20 -0400, Konstantin Ryabitsev wrote:
> > On Mon, Sep 27, 2021 at 07:33:46PM +, Eric Wong wrote:
> > > The partial fetching would need some work to support working as
> > > root.
> > 
> > Ah. Just catch this with check if if id=0 and skipping the test as "known 
> > not
> > to work for this use-case."
> 
> It wasn't much to add support for root, actually.  The rest of
> the stuff should be fixed, too.

Yep, I can confirm that after applying these in addition to the one I applied
earlier, all tests pass.

Thanks, Eric!

-K

Re: -fetch failures [was: latest make test failures on CentOS-7]

2021-09-27 Thread Konstantin Ryabitsev

On Mon, Sep 27, 2021 at 07:33:46PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > t/v2mirror.t . 71/? W: 
> > /tmp/pi-v2mirror-39373-Dl1N/m/git/3.git missing remote.origin.url
> > fatal: not a git repository: '/tmp/pi-v2mirror-39373-Dl1N/m/git/3.git'
> > git --git-dir=/tmp/pi-v2mirror-39373-Dl1N/m/git/3.git fetch -q failed
> > Bailout called.  Further testing stopped:  -fetch failed
> > FAILED--Further testing stopped: -fetch failed
> > make: *** [test_dynamic] Error 255
> > 
> > FYI, this is git 2.31.1.
> 
> I'm not seeing this at all with 2.31.1, 2.20, 2.33...
> I wonder if there's a permissions problem or some latent GIT_*
> var in env...
> 
> Are you testing as root?  I think that would be broken, yes.

Yes, this is testing as root, largely because automatic deployment makes it
hard to do the checkout/make/make test as an unprivileged user. I appreciate
that in most other scenarios the final step would be "sudo make install" but
when things are installed via configuration management, the process usually
runs as root already and su-ing to a user for "make test" is adding more
complication to the process.

> The partial fetching would need some work to support working as
> root.

Ah. Just catch this with check if if id=0 and skipping the test as "known not
to work for this use-case."

-K

Re: [PATCH] t/cmd_ipc: allow extra errors and add diagnostics

2021-09-27 Thread Konstantin Ryabitsev

On Mon, Sep 27, 2021 at 01:35:36PM -0500, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > t/cmd_ipc.t .. 29/?
> > #   Failed test 'got EMSGSIZE'
> > #   at t/cmd_ipc.t line 108.
> > # Looks like you failed 1 test of 42.
> > t/cmd_ipc.t .. Dubious, test returned 1 (wstat 256, 
> > 0x100)
> > Failed 1/42 subtests
> > (less 13 skipped subtests: 28 okay)
> 
> I think this is either caused by too much RAM and/or
> /proc/sys/net/core/wmem_* being larger-than-expected
> (both default to 212992 for me).
> 
> This fixes failures when I set both wmem_max and wmem_default
> to 2129920 (10x its default value).

Confirmed:

t/cmd_ipc.t .. ok

Thanks,
-K

latest make test failures on CentOS-7

2021-09-27 Thread Konstantin Ryabitsev

Hello:

I wanted to try the searchable /all/ from www_index, but it looks like I'm
unable to get a clean make test. Below are a few failures that I can see:

t/cmd_ipc.t .. 29/?
#   Failed test 'got EMSGSIZE'
#   at t/cmd_ipc.t line 108.
# Looks like you failed 1 test of 42.
t/cmd_ipc.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/42 subtests
(less 13 skipped subtests: 28 okay)

[...]

t/lei-index.t  11/?
#   Failed test 'lei index imap://$HOST_PORT/t.v2.0'
#   at /usr/local/share/public-inbox/blib/lib/PublicInbox/TestCommon.pm 
line 519.
# $?=6400 err=E: eval-ed lei: Mail::IMAPClient is required for IMAP:
# Can't locate Mail/IMAPClient.pm in @INC (@INC contains: 
/root/.cache/public-inbox/inline-c/lib /root/.cache/public-inbox/inline-c/lib 
/usr/local/share/public-inbox/blib/lib /usr/local/share/public-inbox/blib/arch 
/usr/local/lib64/perl5 /usr/local/share/perl5/x86_64-linux-thread-multi 
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . 
/usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at 
/usr/share/perl5/vendor_perl/parent.pm line 20.
# BEGIN failed--compilation aborted at 
/usr/local/share/public-inbox/blib/lib/PublicInbox/IMAPClient.pm line 13.
# Compilation failed in require at 
/usr/local/share/public-inbox/blib/lib/PublicInbox/NetReader.pm line 433.
#
#
# Looks like you failed 1 test of 36.
t/lei-index.t  Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/36 subtests

(Normal, I don't have Mail::IMAPClient, but should probably fail differently.)

t/lei-tag.t .. 23/?
#   Failed test 'lei _complete lei tag'
#   at /usr/local/share/public-inbox/blib/lib/PublicInbox/TestCommon.pm 
line 519.
# $?=6400 err=E: eval-ed lei: Modification of non-creatable array value 
attempted, subscript -1 at 
/usr/local/share/public-inbox/blib/lib/PublicInbox/LeiImport.pm line 125.
#

#   Failed test 'completed with labels'
#   at t/lei-tag.t line 73.
# Looks like you failed 2 tests of 65.
t/lei-tag.t .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/65 subtests
t/lei-up.t ... ok
t/lei-watch.t  ok
t/lei.t .. 43/?
#   Failed test 'lei _complete lei import'
#   at /usr/local/share/public-inbox/blib/lib/PublicInbox/TestCommon.pm 
line 519.
# $?=6400 err=E: eval-ed lei: Modification of non-creatable array value 
attempted, subscript -1 at 
/usr/local/share/public-inbox/blib/lib/PublicInbox/LeiImport.pm line 125.
#
# Looks like you failed 1 test of 151.
t/lei.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/151 subtests

[...]

t/v2mirror.t . 71/? W: 
/tmp/pi-v2mirror-39373-Dl1N/m/git/3.git missing remote.origin.url
fatal: not a git repository: '/tmp/pi-v2mirror-39373-Dl1N/m/git/3.git'
git --git-dir=/tmp/pi-v2mirror-39373-Dl1N/m/git/3.git fetch -q failed
Bailout called.  Further testing stopped:  -fetch failed
FAILED--Further testing stopped: -fetch failed
make: *** [test_dynamic] Error 255

FYI, this is git 2.31.1.

I'll be happy to help troubleshoot things as necessary.

-K

Re: Holding on to deleted packfiles

2021-09-21 Thread Konstantin Ryabitsev

On Tue, Sep 21, 2021 at 07:06:53PM +, Eric Wong wrote:
> Was this from /all/ (ALL.git using batch-file) or Gcf2?

I believe this was from Gcf2, though I can't go back and check,
unfortunately.

> The old stuff has timers to do periodic cleanup, but the new
> stuff is trickier as the cost of a restart is higher...
> 
> It should be alright to wire up the old timers to ALL.git with
> (hundreds) of inboxes lore currently has.  git 2.33+ should be
> better when we get into the thousands; but it's still not
> great.

Well, it may also not be something that's the responsibility of public-inbox
either, e.g. other long-running daemons don't perform such checks. We can just
issue a reload after we've done repacking.

I was just wondering if perhaps you already did something that would recognize
that old pack files have gone away.

-K

Holding on to deleted packfiles

2021-09-21 Thread Konstantin Ryabitsev

Hello:

A large git repack job that ran over the weekend revealed a minor problem --
public-inbox daemon processes will hold on to deleted pack files until they
are restarted. Is there any way to gracefully recognize and handle this
condition? It's not quite benign, as this ended up keeping 40GB+ worth of
inodes from being released.

-K

Re: make menuconfig interface for lei / grok-pull

2021-09-16 Thread Konstantin Ryabitsev

On Wed, Sep 15, 2021 at 11:06:05PM +, Eric Wong wrote:
> Does lore.kernel.org run public-inbox-imapd?

I'm still not convinced it's useful for huge collections, especially
considering how chatty IMAP is. Is there any benefit to enable it for lei use?

-K

Re: make menuconfig interface for lei / grok-pull

2021-09-16 Thread Konstantin Ryabitsev

On Wed, Sep 15, 2021 at 02:34:40PM -0700, Luis Chamberlain wrote:
> My use case is I'm subscribed to a few kernel mailign lists and I use
> mutt with Maildir. I had configured recently pi-piper and grokmirror
> so that I get only the last 1 year of email from a few set of mailing
> lists. For this I needed to know the commit IDs for those emails on
> the public-inbox git mirror for each mailing list.

The pi-piper bit was really only useful for this until lei showed up. I wrote
it mostly so we could pipe things to patchwork straight from public-inbox git
archives.

> I was hinted using lei would be better though. But I'm stuck:

FYI, I'm giving a talk about that on Monday.
https://linuxplumbersconf.org/event/11/contributions/983/

Assuming I finish the prep work by then.

Hopefully, you don't live on the US West Coast and don't have to wake up at
7AM to attend.

-K

Re: mbox support in other software

2021-09-16 Thread Konstantin Ryabitsev

On Thu, Sep 16, 2021 at 07:34:37AM +, Eric Wong wrote:
> Since I've written the lei-mail-formats manpage, I've been
> curious what other software differentiates between the various
> mbox formats and supports several/all of them?
> 
> AFAIK, none of the Perl Mail::* stuff does, nor does Email::Folder
> (AFAIK abandoned).  I haven't looked at libraries for Python nor
> other languages...

Python only does mboxo. I've switched to using git-mailsplit in b4 because of
various quirks of interaction between mboxrd and mboxo.

> git supports mboxo and mboxrd, nowadays; but it seems like most
> other software only know how to deal with one of the mbox family
> (and mixing software on the same mbox leads to bad things).

I'm not 100% sure git does the right thing with mboxo, see my tirade here:
https://git.kernel.org/pub/scm/utils/b4/b4.git/commit/?id=4950093c0c3ee71e7045b545626d2b232271cbc8=2

-K

Re: [PATCH] uri_imap: fix ->uidvalidity and ->uid w/ `/' separator

2021-09-15 Thread Konstantin Ryabitsev

On Wed, Sep 15, 2021 at 05:43:41PM +, Eric Wong wrote:
> > The error is gone, but the saved search still doesn't show up when I run
> > "lei ls-search". Looking in .local/share/lei/saved-searches, I see that they
> > still get created as lore/foldername-${checksum}, which is probably why
> > ls-search doesn't find them?
> 
> Ah, it's because the "foldername" part was incorrectly parsed
> and it should be .local/share/lei/saved-searches/foldername-$checksum
> without the "lore/" parent.
> 
> Once patched with the below, you should be able to move the
> folder up a level and remove the now-empty "lore" dir inside
> saved-searches.
> 
> Anyways, I think this fixes it:

Yep, confirmed. And lei up --all now picks up remote imap folders.

Thanks!

-K

Re: [PATCH] uri_imap: fix ->uidvalidity and ->uid w/ `/' separator

2021-09-15 Thread Konstantin Ryabitsev

On Tue, Sep 14, 2021 at 10:10:36PM +, Eric Wong wrote:
> Konstantin Ryabitsev  wrote:
> > 2021-09-14T20:59:12Z 20428 20428 die: BUG: 
> > imaps://imap.migadu.com/lore/b4;UIDVALIDITY=1621977334 has no UIDVALIDITY 
> > at /usr/local/share/perl/5.32.1/PublicInbox/LeiStore.pm line 313.
> >  (from nowait set_sync_info)
> 
> Same bug, different place :x
> 8<--
> Subject: [PATCH] uri_imap: fix ->uidvalidity and ->uid w/ `/' separator
> 
> Again, we were failing to account for '/' use in mailbox names :x

The error is gone, but the saved search still doesn't show up when I run
"lei ls-search". Looking in .local/share/lei/saved-searches, I see that they
still get created as lore/foldername-${checksum}, which is probably why
ls-search doesn't find them?

-K

Re: [PATCH] uri_imap: handle '/' as an IMAP hierarchy separator

2021-09-14 Thread Konstantin Ryabitsev

On Tue, Sep 14, 2021 at 03:55:10PM -0400, Konstantin Ryabitsev wrote:
> On Tue, Sep 14, 2021 at 07:35:28PM +, Eric Wong wrote:
> > > I found an interesting problem using lei with imaps:// folders. I'm trying
> > > things out with migadu, and the folder paths use '/' separators, so a full
> > > IMAPS folder path for a folder "lore/mentions" is
> > > imaps://imap.migadu.com/lore/mentions. However, if I configure lei-q to 
> > > use
> > > that remote path, everything actually ends up in the folder
> > > imap.migadu.com/lore (not the "mentions" subfolder).
> > 
> > Oops, I think the patch below should fix it.
> 
> Yep, that worked. Thanks!

I think I found a couple of other bugs while testing this with migadu. E.g.:

$ export MFOLDER=imaps://imap.migadu.com/lore/b4
$ lei q -o $MFOLDER -I https://lore.kernel.org/all/ '(s:b4 OR nq:b4 OR 
dfn:b4) AND rt:1.week.ago..'
# /usr/bin/curl -Sf -s -d '' 
https://lore.kernel.org/all/?q=(s%3Ab4+OR+nq%3Ab4+OR+dfn%3Ab4)+AND+rt%3A1631066349..=m
# /home/user/.local/share/lei/store 54/54

So far so good, but then:

2021-09-14T20:59:12Z 20428 20428 die: BUG: 
imaps://imap.migadu.com/lore/b4;UIDVALIDITY=1621977334 has no UIDVALIDITY at 
/usr/local/share/perl/5.32.1/PublicInbox/LeiStore.pm line 313.
 (from nowait set_sync_info)
# https://lore.kernel.org/all/ 19/?
# https://lore.kernel.org/all/ 25/?
# https://lore.kernel.org/all/ 51/?
# https://lore.kernel.org/all/ 54/54
# 54 written to imaps://imap.migadu.com/lore/b4 (108 matches)

However, it doesn't show up in ls-search:

$ lei ls-search
/home/user/work/temp/lei/lockdown
/home/user/work/temp/lei/mentions

That would appear to be due to them being saved in the lore/ subdir:

$ find .local/share/lei/saved-searches/ -type d
.local/share/lei/saved-searches/

.local/share/lei/saved-searches/lockdown-1804cfad691a409f55598a8528566d5f1539b2632e1db7e206cb147396582631

.local/share/lei/saved-searches/mentions-f467d0a01dfdc3e42523b5d0d090773269e199a6a109b0713dc48142f0e30526
.local/share/lei/saved-searches/lore

.local/share/lei/saved-searches/lore/mentions-e9ca065affe84b4e4637620c72b64b09970a02b83171ba75c86afff95489d392

.local/share/lei/saved-searches/lore/b4-4811ca1722c2c2817e8cdc6a8d390f63a3b723c3c991f0267425d380aa1c8add

Cheers,
-K

RFC: lei-daemon and auto-up

2021-09-14 Thread Konstantin Ryabitsev

Hello:

Since lei-daemon is already up and running, would it be possible to tell it to
automatically "lei up" things at certain intervals?

Maybe something like:

[lei]
  q = [...]
[lei "q"]
  output = [...]
  include = https://lore.kernel.org/all/
  external = 1
  local = 1
  remote = 1
  refresh = 300

That would allow folks to automatically get updated info without needing to
set up cronjobs or systemd timers.

-K

Re: [PATCH] uri_imap: handle '/' as an IMAP hierarchy separator

2021-09-14 Thread Konstantin Ryabitsev

On Tue, Sep 14, 2021 at 08:12:16PM +, Eric Wong wrote:
> Ah, I forgot to update the docs again :x
> 
> My main concern with .netrc was actually inadvertantly sending
> FTP auth info to an IMAP server just because they share the same
> host.

No big deal -- folks can always just use the "store" credential helper to
pretty much the same effect.

> Not sure if plaintext is a real problem on encrypted block
> devices/filesystems.  Ordinary users can't mlock(2) to prevent
> in-memory passwords from hitting swap (thus I always use
> encrypted swap).

Right, plus most of them probably have their .gitconfig with
sendemail.smtppass configured anyway. :)

-K

Re: [PATCH] uri_imap: handle '/' as an IMAP hierarchy separator

2021-09-14 Thread Konstantin Ryabitsev

On Tue, Sep 14, 2021 at 07:35:28PM +, Eric Wong wrote:
> > I found an interesting problem using lei with imaps:// folders. I'm trying
> > things out with migadu, and the folder paths use '/' separators, so a full
> > IMAPS folder path for a folder "lore/mentions" is
> > imaps://imap.migadu.com/lore/mentions. However, if I configure lei-q to use
> > that remote path, everything actually ends up in the folder
> > imap.migadu.com/lore (not the "mentions" subfolder).
> 
> Oops, I think the patch below should fix it.

Yep, that worked. Thanks!

> Btw, if you encounter more IMAP problems, I've found adding
> "-c imap.debug -c imap.compress=0" to the command-line useful
> (Mail::IMAPClient dumps the raw compressed traffic, so I need to
> disable compression).

Good to know, thanks. Quick follow-up -- documentation says that .netrc should
work, but I've found that even though I have the following entries in
~/.netrc, I still get prompted for credentials:

machine imap.migadu.com
  login konstantin.ryabit...@linux.dev
  password [...]

The credential helper works after the initial "lei up" but I'm curious why
.netrc isn't happy. Not a huge deal, seeing as that requires storing passwords
in plaintext.

-K

1 2 3 >

1 - 100 of 296 matches

Mail list logo