On Fri, 2026-04-24 at 06:33 +0200, Carsten Ziegeler wrote: > Definitely interesting. > > As you might have noticed, I wrote a tool which can perform such > actions > like updating the parent pom across a large selection of repositories > using a coding agent.
Yes, I noticed the results :-) > > I have also some skills flying around for the SCR annotation > migration > and parent pom updates :) > > Maybe we can combine these into one. I'll have a look in the next > days. I think that would be very useful. I definitely did not cover a lot of repos during testing with the current skills and would be happy for any enhancements. Thanks, Robert > > Regards > Carsten > > On 4/23/2026 5:35 PM, Robert Munteanu wrote: > > On Fri, 2026-04-17 at 18:32 +0200, Robert Munteanu wrote: > > > Hi, > > > > > > Updating the parent pom version in Sling modules is one task that > > > usually gets left behind. We have many modules, the work is not > > > that > > > rewarding and sometimes very tedious - for instance migrating > > > from > > > the > > > Felix SCR annotations to the official OSGi ones. > > > > > > To make things simpler I have started an experiment in the > > > whiteboard > > > - > > > using agent skills [1] to upgrade the parent pom version. > > > > I extended the experiment and created a tiny evaluation harness for > > agent skills at [3] based on the Inpect framework [4]. > > > > I did some measurements of the skill and tried to answer some > > questions > > around efficiency and cost; captured the raw data at [5]: > > > > 1. Is the free variant gpt-oss-120b from openrouter good enough? > > > > With skills it is good enough - sometimes better than haiku-4.5 > > from > > Amazon Bedrock. > > > > 2. How big is the difference between haiku-4.5 and sonnet-4.5? > > > > With skills the success rate is almost the same - haiku missed 1/15 > > of > > the evals. But Sonnet ends up being almost 3.x more expensive. > > > > 3. How good is Claude Sonnet with or without skills? > > > > The skills make all the difference. > > > > Without skills Sonnet can only perform basic upgrades (100%) but it > > fails in more complex cases: > > - 20% success rate if the rat checks fail after upgrade > > - 0% success rate if the build fails because of relocated > > dependencies > > (OSGi R6) > > > > With skills Sonnet passes all 15 tests. > > > > [1]: https://agentskills.io/ > > [2]: https://github.com/apache/sling-whiteboard/tree/master/skills/ > > [3]: > > https://github.com/apache/sling-whiteboard/tree/master/skill-evals > > [4]: https://inspect.aisi.org.uk/ > > [5]: > > https://gist.github.com/rombert/c099c13013fbdf27445816c976005aba
