> I got pointed to
https://nightlies.apache.org/ (it is SVN so no git file size issues)
which looks like a possibility for making data files for benchmarking
for collaborators. We could even put up some larger files --

We use nightlies for largeish (~3gb) test data at opennlp (there was a
discussion about hosting the nlp models there too, but in the end we
deployed them to dist). It was fairly easy to set it up (infra created a
Jenkins job for us) and we never had issues with it.

https://nightlies.apache.org/opennlp/



On Mon, 14 Oct 2024, 19:52 Andy Seaborne, <a...@apache.org> wrote:

> On 14/10/2024 12:09, Arne Bernhardt wrote:
>  > Since I extended the regression tests in jena-benchmarks-jmh to
>  > include Jena 5.1 additionally to Jena 4.8, I may be partly
>  > responsible for some growth here....
>  > Should I open an issue to remove Jena 4.8 from jena-benchmarks-jmh?
>
> It does not add much AFAICS.
>
> What is noticeable is having bsbm-1m.nt.gz in the release source. It's
> not even a very large file by data standards (only 27M) and it's 50% of
> the release.
>
> It's not necessary to have access to it to be able to build the source
> code so I don;t think we need to make it part of the release.
>
> I just asked infra on slack about how to handle such files (and we're
> not exactly "big data" here!). I got pointed to
> https://nightlies.apache.org/ (it is SVN so no git file size issues)
> which looks like a possibility for making data files for benchmarking
> for collaborators. We could even put up some larger files --
>
> 27M     bsbm-1m.nt.gz
> 660M    bsbm-25m.nt.gz
>
>      Andy
>
>
  • File space Andy Seaborne
    • Re: File space Bruno Kinoshita

Reply via email to