Hello, this is a summary of a video conference call that happened yesterday (April 24).
Topic: Discussion about performance improvements that have been proposed by Stefan Oehme, namely: - [MNG-6638] - Prevent reparsing POMs in MavenMetadataSource ( https://github.com/apache/maven/pull/244) - [MNG-6633] - Reduce memory usage of excludes ( https://github.com/apache/maven/pull/243) - Speed up project discovery ( https://github.com/apache/maven/pull/242) - Make location handling more memory efficient (https://github.com/codehaus-plexus/modello/pull/31) The goal of this call was to give some more insights into how Stefan found the improvements and to better understand what is missing before these changes be merged. Attendees of the call: - Benedikt Ritter (Gradle Inc.) - Stefan Oehme (Gradle Inc.) - Robert Scholte (Apache Maven Team) - Hervé Boutemy (Apache Maven Team; joined about half an hour after the call started) Summary: Stefan gave some insights into how he discovered bottlenecks in Maven: - One of our customers has a huge Maven build: - Lots of sub projects (2000) - Lots of entries in dependency management (4000) - Results in a lot of garbage collection - Problems discovered in that build: - Re-parsing project POMs during dependency resolution - Model objects are too large because of location tracking - Low-level bottlenecks in project discovery (especially version parsing) - Customer now has a Maven fork with the proposed changes included: - 1h 50min, 12GB RAM without changes - 45min, 8GB RAM with changes Robert: - How to ensure that improvements are not broken? - No answer to how to test this Stefan gave some insights into how performance testing works in the Gradle project: - Build has a project generator - Create different projects in different shapes (e.g. lots of subprojects, deeply nested projects) during the build - Download old Gradle version and run the build on generated projects - Run build again with current Gradle version - Compare results - use statistic methods to filter out variance - Downside to this approach is that it requires a lot of computing resources More information can be found on GitHub: https://github.com/gradle/gradle/tree/master/subprojects/performance The corresponding TeamCity build can be found here: https://builds.gradle.org/viewLog.html?buildId=22179604&buildTypeId=Gradle_Check_PerformanceExperimentCoordinator&tab=report_project941_Performance&branch_Gradle_Check_Stage_ReadyforRelease=master (use "Login as guest" to view) Robert: - What about measuring performance using instruction calls? Stefan: - The performance improvements we found were mostly about garbage being created - Measuring using instruction calls is interesting - ... but it is also very machine dependent Robert: - We need to find out who is interested in these kind improvements inside the Maven community. - Build a community of people who would like to work on these kind of things. Stefan: - It's easy to get started. We just used open source tools: - We used async-profiler for measuring things ( https://github.com/jvm-profiling-tools/async-profiler) - Heap dumps for analyzing memory usage To get started with performance tests in the maven project: - Start with only a few test projects - The Gradle generator is Apache License v2 and can be used as a starting point to generate a big maven project Hervé: - PRs should be merged soon - Discussion need to be resolved - Why was the PR not merged after the discussion and resolving all issues with the code? - Hervé will take care that the changes are merged soon Thank you! Benedikt
