I have added the configurability option. I have left the default as md5 for now. I hope to do some more testing and will consider a swap of the default to sha256 if/when I get time to do that.
On Thu, Sep 12, 2024 at 10:14 AM Daniel Sun <sun...@apache.org> wrote: > > > An option for Groovy 5 is to make sha256 the default with the ability to > > swap back to md5 with a system property. > > +1 > > Cheers, > Daniel Sun > > On 2024/09/11 15:22:18 Paul King wrote: > > Hi folks, > > > > We recently had a Jira and PR created around allowing sha256 to be > > used instead of (or as well as) md5 for script caching/naming > > purposes. The relevant issue is: > > > > https://issues.apache.org/jira/browse/GROOVY-11459 > > > > By way of background, often md5 is not a good choice when a hack > > threat is possible. That's not really the scenario for our usage, we > > only use hashing to generate a unique name for scripts without a > > filename or as a caching index. The sha256 algorithm does offer > > slightly better collision avoidance which would be a good thing for > > the caching scenario. Having said that, we'd not want to have a big > > performance hit to gain a little collision avoidance. > > > > I'll also point out that this caching isn't applicable to normally > > compiled sources, just when using GroovyClassLoader directly (or > > indirectly via GroovyScriptEngine), so most users wouldn't be impacted > > anyway. > > > > My initial thinking was that we'd make the algorithm configurable and > > allow sha256 to be swapped in for folks that needed it. But in my > > tests, sha256 is very close to md5, so having the default be md5 > > possibly comes into play. I also tested murmur3 and xxHash algorithms > > but at least for the all-Java implementations that I tried, they > > didn't offer any useful improvement for us. > > > > An option for Groovy 5 is to make sha256 the default with the ability > > to swap back to md5 with a system property. And we could back port the > > configurability part to Groovy 4 but keep the default as is (md5). But > > for us to consider this, I think we'd need to do some more testing. My > > current testing is based around just the 12 sample files that the > > performance testing uses. > > > > Does anyone else have thoughts or have the ability to help out with > > further testing? > > (See more information in the Jira). > > > > Thanks, Paul. > >