On Wed, May 20, 2026 at 11:40 AM Daniel P. Berrangé <[email protected]>
wrote:

> On Wed, May 20, 2026 at 12:33:16PM -0500, Pierrick Bouvier wrote:
> > On 5/20/2026 10:09 AM, Daniel P. Berrangé wrote:
> > > On Wed, May 20, 2026 at 10:01:14AM -0500, Pierrick Bouvier wrote:
> > >> Hi Daniel,
> > >>
> > >> On 5/19/2026 9:26 AM, Daniel P. Berrangé wrote:
> > >>> The qemu-security mailing list was created several years back now and
> > >>> traditionally saw 1-2 disclosures a month at worst. This was
> manageable.
> > >>>
> > >>> Since approx March 1st, the new normal is to see as many as 20
> disclosures
> > >>> in one single day, more than 200 in total now. This is unsustainable.
> > >>> I was thinking we needed more people on qemu-security to triage, but
> IMHO
> > >>> this won't really fix the problem.
> > >>>
> > >>
> > >> Considering the increase in number of issues, would that be possible
> to
> > >> make stricter rules about what is expected?
> > >>
> > >> For instance, asking for a working exploit and optionally a VM image +
> > >> instructions to reproduce it. I am not expert on the topic, but what I
> > >> see is that if we have this, all duplicates would be eliminated at
> once.
> > >
> > > With the new crop of AI assisted disclosures there is absolutely no
> > > lack of data provided.
> > >
> > > Most come with reproducible exploits, detailed descriptions and
> analysis,
> > > and more - everything you could conceivably need to triage the
> disclosure.
> > > Reading and interpreting this takes significant mental effort and
> there's
> > > too much data to quickly/easily eliminate dupes.
> > >
> >
> > Maybe we need to "standardize" this part then.
> > Or do something like asking a (GitLab) CI pipeline to be written to
> > expose the issue. If we can just run this with a specific qemu
> > remote/branch, it becomes trivial to rerun it when fixes are pushed.
> >
> > It definitely does not solve the original scaling issue, but maybe can
> > help to absorb it, and spend time where it's useful: writing and
> > upstreaming a fix, and check it "broke" the exploit.
> >
> > >>> This needs an issue tracker to cope with & email is not an issue
> tracker.
> > >>> We faked an issue tracker with a shared spreadsheet to prevent us
> drowning
> > >>> these past few months, but this is still not sustainable & probably
> won't
> > >>> ever be.
> > >>>
> > >>
> > >> Overall, you're right.
> > >> However, changing the tool won't solve the number of issues sent, and
> > >> for that, something additional is needed.
> > >
> > > I don't expect there to be any change in submission rate. The proposal
> > > is based on the expectation that the submission rate will continue at
> > > a high level for a long time. Primarily the goal is to reduce the
> > > tracking and triage work overhead and to eliminate/reduce single person
> > > bottlenecks in the process
> > >
> > >> I wonder also what is the percentage of duplicates there is from what
> > >> you observed in the last 2 months. Any rough idea of the number?
> > >
> > > Definitely at least 10%, probably closer to 15%.
> > >
> >
> > Ok, interesting number, thanks. I was expecting much more, but I'm
> > biased having heard Linus this morning talk about this for Linux kernel.
>
> I expect the dupes to increase over time as more people run the
> same analysis across QEMU, especially given that most of the bugs
> are not yet fixed.
>

So this isn't security related, but I think it's informative. I started
upstreaming
a large amount (8k lines) of bsd-user code in March. Claude did the moving
and author chasing. Then I asked claude to review that code, and refined
the reviews into a checklist. I iterated for 6 weeks or so, on and off, and
I wound up fixing 100 "bugs," 10 of which were 'real' the rest were logical
or only if certain options were enabled. I went ahead and fixed them all
since CONFIG_REMAP_DEBUG should have a chance of working and
it's useful to be careful between host arch values and target arch values,
even if we know they are the same. But it takes a while to come up with
these, and has an annoying habit of reporting no more than 8-12 on any
run and always seems to find something new (sometimes a new critical
thing).

But I just stopped running the reviews after a while. The details likely
don't matter...  I just did another review, and claude found more, so
that suggests that even with this flood, we're still in the sampling phase
and we'll see dupes, but we won't hit bottom for quite some time since
I think my experience is at least suggestive about what to expect.

But all this suggests that we're going to have a huge volume of reports,
most of which will be mediocre in terms of effect, even if they come
with big mountains of data. AI produces the data in this regard
very well, but is crap for know what's a real problem and what
isn't.

Warner

P.S. For "yucks" I asked claude to review my code again, and it
flagged 20 more things. Most of which look at least semi-legit,
though claude has about a high false positive hit rate when it
found the 100 bugs, it flagged about 125 or so.

Reply via email to