>>> Copyrights is a problem: I'm testing mostly with JIRA attachments that I've 
>>> downloaded over the years. While uploading such files to JIRA might count 
>>> as fair use, I doubt that this would still be true if they are included in 
>>> a distribution. Instead, they should be stored somewhere on Apache servers 
>>> where only committers and build software ("Travis", "Jenkins", ...) can 
>>> access then. The public PDFs that Maruan mentions don't possibly have all 
>>> the Problem cases that we solved before. However I have started working 
>>> with these files and there are at least 5 recent issues that deals with 
>>> them.
>> The PDFs won’t be in a distribution. They will just happen to be stored in 
>> an SVN repo but not our source code repo, in the same way that the website 
>> is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law 
>> doesn’t distinguish between JIRA and SVN, both are publicly available via 
>> HTTP, so using SVN will simply be a continuation of what we’re already doing 
>> with JIRA.
>> 
>> The crucial factor is that we’re only storing publicly available PDFs,  
>> because we have the right to do so, just like Google’s cache, and like we 
>> currently do with JIRA.
> 
> Yes but many PDFs we got aren't really "public". If this svn repo is only 
> accessible to committers, and if the publicly available build scripts won't 
> break because of this, then it is OK.

Any non-public PDFs will not be permitted in our test suite, just as they 
shouldn't be on JIRA.

> Note that even if something is "publicly available", it may still be 
> copyrighted. Other risks can be that some people upload PDFs that include 
> personal data. One really good test PDF was apparently a loan application. I 
> remember that the user insisted that 1. it was test data, and 2. that it be 
> removed.

All Apache development should be in the open, this is a key ASF principle, 
having a committers-only test suite is basically a no-no. It's important to 
understand that "fair use" allows us to use copyrighted works - this is 
expressly permitted, it's the same legal principle as Google’s cache. There is 
no need to seek permission. This is what we’ve been doing with JIRA already for 
years, so we are already doing this - it’s fine.

Naturally, if anybody objects to their PDF being in our test suite, we can 
always remove it, but it shouldn’t include anything which isn’t already on the 
public web.

-- John

Reply via email to