> I got pointed to https://nightlies.apache.org/ (it is SVN so no git file size issues) which looks like a possibility for making data files for benchmarking for collaborators. We could even put up some larger files --
We use nightlies for largeish (~3gb) test data at opennlp (there was a discussion about hosting the nlp models there too, but in the end we deployed them to dist). It was fairly easy to set it up (infra created a Jenkins job for us) and we never had issues with it. https://nightlies.apache.org/opennlp/ On Mon, 14 Oct 2024, 19:52 Andy Seaborne, <a...@apache.org> wrote: > On 14/10/2024 12:09, Arne Bernhardt wrote: > > Since I extended the regression tests in jena-benchmarks-jmh to > > include Jena 5.1 additionally to Jena 4.8, I may be partly > > responsible for some growth here.... > > Should I open an issue to remove Jena 4.8 from jena-benchmarks-jmh? > > It does not add much AFAICS. > > What is noticeable is having bsbm-1m.nt.gz in the release source. It's > not even a very large file by data standards (only 27M) and it's 50% of > the release. > > It's not necessary to have access to it to be able to build the source > code so I don;t think we need to make it part of the release. > > I just asked infra on slack about how to handle such files (and we're > not exactly "big data" here!). I got pointed to > https://nightlies.apache.org/ (it is SVN so no git file size issues) > which looks like a possibility for making data files for benchmarking > for collaborators. We could even put up some larger files -- > > 27M bsbm-1m.nt.gz > 660M bsbm-25m.nt.gz > > Andy > >