Konstantin Ryabitsev <konstan...@linuxfoundation.org> wrote:
> Hi, all:
> 
> Here's something I've been mulling over for a while, and sorta goes
> hand-in-hand with what Eric has been doing with diff highlights work.
> We have significant duplication of functionality on lore.kernel.org
> between the public-inbox repository for LKML and the patchwork
> instance for the same list (https://lore.kernel.org/patchwork). I am
> currently working on a tool that would move a lot of patchwork's
> functionality to the developers' workstation by using the public-inbox
> archive of the mailing list that receives patches. After it is done,
> the tool should be able to do all of the following:
> 
> - Keep track of patches and patch series, including series revisions
> - Allow developers to easily apply patch series directly from the
> public-inbox repository (by creating a mbox file of the series behind
> the scenes with adjusted git by-llines -- e.g. if someone replies to a
> patch with a "Acked-by: Foo Dev" or "Tested-by: Foo Bot", the original
> patch trailers are adjusted to reflect that new data)
> - Automatically recognising when patches have been applied to a repo
> and auto-"accepting" them
> - Allowing developers to see changes between series using interdiff

Cool!  I haven't looked into patchwork, much, but it looks like
a centralized system that's not easily possible to replicate
to every developer system?

And your tool could be seen as a local/replicatible version of
patchwork?

> Currently, the tool sticks various patch tracking information into a
> custom sqlite3 db, but I'm increasingly wondering if it makes more
> sense to have this machine-parseable data available as part of the
> repository itself -- say, as a git note on the commit-id of the
> message. In other words, if the patch is in a message in abcd1234:m, a
> refs/notes/patches entry for abcd1234 would have json-formatted data
> with patch tracking information similar to what patchwork has for it
> (see, for example,
> https://lore.kernel.org/patchwork/api/1.1/patches/1035986/), but
> without patchwork-specific bits and duplicate info like headers and
> patch contents. If we add these notes on lore.kernel.org itself, then
> we save a lot of redundant data processing on the client end. A
> developer who has mirror-cloned a public-inbox archive of a
> development mailing list would be able to start applying patches and
> series without having to parse tens of thousands of messages.

How much of this data is reproducible/recreatable using existing
mails + git repos?  If all of it is reproducible, I don't see
storing extra/redundant data in git as necessary...

Especially since SQLite supports online backup.  The overhead for
parsing a lot of mails could be alleviated by offering DB snapshots
for download and seeding of developer machines.  Incremental updates
would be similar to public-inbox: git fetch && $FOO-index

(To that end, public-inbox could offer a way to download the msgmap and
 overview DBs, too; but offering Xapian DB downloads would be tough,
 as it doesn't offer online backups/snapshots).

> I have limited experience with notes, however, and I'm curious if they
> are a good candidate for such task. A public-inbox repo of LKML, even
> after sharding, contains hundreds of thousands of messages. If many of
> them carry such notes, would that significantly increase the
> repository size and reduce its performance?

I'm not familiar with notes, either (or I forgot what I knew).
However, I know pain points w.r.t. git performance are:

1) too many refs
2) big trees

On a quick glance git notes seems to hit #2 by having a giant
top-level tree in "refs/notes/commit".  There's been some
work in git.git over the years to fix #1 with reftable, reftrees
or lmdb; but it's not yet an accepted solution in git.git.

> I have similar thoughts about publishing CI-related information as
> notes to public-inbox commits, as well. This would allow centralised
> archival of distributed CI efforts, which is a common problem in
> distributed projects. Currently, bots flood lists with automated email
> as patch follow-ups, but if they could publish their CI information as
> refs/notes/ci/projectname in a public repository, we could mirror that
> back to lore.kernel.org via regular pulls of that ref for each defined
> projects and developers would be able to view CI reports from multiple
> distributed projects inside the same interface.

I'm thinking this can can be done with the current public-inbox using
custom search queries (or perhaps enhancements/customizations to search).

Bots would still be flooding lists with automated mails, but their
emails could be searched by patch URL/Message-ID/Subject and made into
a CI report.

> Again, I'm not sure if git notes is the right tool for this. Another
> way to go about it would be to use something like a custom blockchain
> ledger, but I'm afraid that picking that would result in lower
> adoption rate due to general unease people have about blockchain (it's
> new, it's overhyped, and it's mired by association with cryptocoins).

Just call it "git" instead of "blockchain" :)

Philosophically, I am heavily influenced by the choices git made around
rename-detection and delta-compression.  Doing these things
after-the-fact allows freedom for improvements later on as algorithms
improve.

To that end, public-inbox stores minimal information up-front and relies
heavily on search indices for after-the-fact querying.

Right now, one could search by a patch subject + "Acked-by" in
non-quoted text:

        s:"$SUBJECT" AND ( nq:Acked-by OR nq:Tested-by )

More advanced queries are possible, of course; and the way data is
indexed is not set in stone at all (we're up to "xap15", already)

> I'd love to hear your thoughts. One of my goals is to find a way to
> keep the distributed nature of Linux Kernel development without
> locking it into a single vendor (like GitHub) or a suite of tools
> (like GitLab, CircleCI, whatnot). We need to find a way to preserve
> and archive the data generated by such tools in a way that is easy to
> replicate and verify.

Good to know our goals are aligned.  Beyond the Linux kernel, my goal
since 2004(*) was to make ALL development distributed and free from
any lock-in.  Sadly, the world's gone the opposite direction :<


(*) with svn-arch-mirror and git-svn
--
unsubscribe: meta+unsubscr...@public-inbox.org
archive: https://public-inbox.org/meta/

Reply via email to