On Fri, 2026-04-17 at 18:32 +0200, Robert Munteanu wrote:
> Hi,
> 
> Updating the parent pom version in Sling modules is one task that
> usually gets left behind. We have many modules, the work is not that
> rewarding and sometimes very tedious - for instance migrating from
> the
> Felix SCR annotations to the official OSGi ones.
> 
> To make things simpler I have started an experiment in the whiteboard
> -
> using agent skills [1] to upgrade the parent pom version.

I extended the experiment and created a tiny evaluation harness for
agent skills at [3] based on the Inpect framework [4].

I did some measurements of the skill and tried to answer some questions
around efficiency and cost; captured the raw data at [5]:

1. Is the free variant gpt-oss-120b from openrouter good enough? 

With skills it is good enough - sometimes better than haiku-4.5 from
Amazon Bedrock.

2. How big is the difference between haiku-4.5 and sonnet-4.5?

With skills the success rate is almost the same - haiku missed 1/15 of
the evals. But Sonnet ends up being almost 3.x more expensive.

3. How good is Claude Sonnet with or without skills?

The skills make all the difference.

Without skills Sonnet can only perform basic upgrades (100%) but it
fails in more complex cases:
- 20% success rate if the rat checks fail after upgrade
- 0% success rate if the build fails because of relocated dependencies
(OSGi R6)

With skills Sonnet passes all 15 tests.

[1]: https://agentskills.io/
[2]: https://github.com/apache/sling-whiteboard/tree/master/skills/
[3]: https://github.com/apache/sling-whiteboard/tree/master/skill-evals
[4]: https://inspect.aisi.org.uk/
[5]: https://gist.github.com/rombert/c099c13013fbdf27445816c976005aba

Reply via email to