On Fri, 2026-04-24 at 06:33 +0200, Carsten Ziegeler wrote:
> Definitely interesting.
> 
> As you might have noticed, I wrote a tool which can perform such
> actions 
> like updating the parent pom across a large selection of repositories
> using a coding agent.

Yes, I noticed the results :-)

> 
> I have also some skills flying around for the SCR annotation
> migration 
> and parent pom updates :)
> 
> Maybe we can combine these into one. I'll have a look in the next
> days.

I think that would be very useful. I definitely did not cover a lot of
repos during testing with the current skills and would be happy for any
enhancements.

Thanks,
Robert

> 
> Regards
> Carsten
> 
> On 4/23/2026 5:35 PM, Robert Munteanu wrote:
> > On Fri, 2026-04-17 at 18:32 +0200, Robert Munteanu wrote:
> > > Hi,
> > > 
> > > Updating the parent pom version in Sling modules is one task that
> > > usually gets left behind. We have many modules, the work is not
> > > that
> > > rewarding and sometimes very tedious - for instance migrating
> > > from
> > > the
> > > Felix SCR annotations to the official OSGi ones.
> > > 
> > > To make things simpler I have started an experiment in the
> > > whiteboard
> > > -
> > > using agent skills [1] to upgrade the parent pom version.
> > 
> > I extended the experiment and created a tiny evaluation harness for
> > agent skills at [3] based on the Inpect framework [4].
> > 
> > I did some measurements of the skill and tried to answer some
> > questions
> > around efficiency and cost; captured the raw data at [5]:
> > 
> > 1. Is the free variant gpt-oss-120b from openrouter good enough?
> > 
> > With skills it is good enough - sometimes better than haiku-4.5
> > from
> > Amazon Bedrock.
> > 
> > 2. How big is the difference between haiku-4.5 and sonnet-4.5?
> > 
> > With skills the success rate is almost the same - haiku missed 1/15
> > of
> > the evals. But Sonnet ends up being almost 3.x more expensive.
> > 
> > 3. How good is Claude Sonnet with or without skills?
> > 
> > The skills make all the difference.
> > 
> > Without skills Sonnet can only perform basic upgrades (100%) but it
> > fails in more complex cases:
> > - 20% success rate if the rat checks fail after upgrade
> > - 0% success rate if the build fails because of relocated
> > dependencies
> > (OSGi R6)
> > 
> > With skills Sonnet passes all 15 tests.
> > 
> > [1]: https://agentskills.io/
> > [2]: https://github.com/apache/sling-whiteboard/tree/master/skills/
> > [3]:
> > https://github.com/apache/sling-whiteboard/tree/master/skill-evals
> > [4]: https://inspect.aisi.org.uk/
> > [5]:
> > https://gist.github.com/rombert/c099c13013fbdf27445816c976005aba

Reply via email to