Am 05.07.2014 22:12, schrieb John Hewson:
Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the 
years. While uploading such files to JIRA might count as fair use, I doubt that this would still be 
true if they are included in a distribution. Instead, they should be stored somewhere on Apache 
servers where only committers and build software ("Travis", "Jenkins", ...) can 
access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we 
solved before. However I have started working with these files and there are at least 5 recent 
issues that deals with them.
The PDFs won’t be in a distribution. They will just happen to be stored in an 
SVN repo but not our source code repo, in the same way that the website is 
stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t 
distinguish between JIRA and SVN, both are publicly available via HTTP, so 
using SVN will simply be a continuation of what we’re already doing with JIRA.

The crucial factor is that we’re only storing publicly available PDFs,  because 
we have the right to do so, just like Google’s cache, and like we currently do 
with JIRA.
Yes but many PDFs we got aren't really "public". If this svn repo is only 
accessible to committers, and if the publicly available build scripts won't break because 
of this, then it is OK.
Any non-public PDFs will not be permitted in our test suite, just as they 
shouldn't be on JIRA.

Note that even if something is "publicly available", it may still be 
copyrighted. Other risks can be that some people upload PDFs that include personal data. 
One really good test PDF was apparently a loan application. I remember that the user 
insisted that 1. it was test data, and 2. that it be removed.
All Apache development should be in the open, this is a key ASF principle, having a 
committers-only test suite is basically a no-no. It's important to understand that 
"fair use" allows us to use copyrighted works - this is expressly permitted, 
it's the same legal principle as Google’s cache. There is no need to seek permission. 
This is what we’ve been doing with JIRA already for years, so we are already doing this - 
it’s fine.

The problem is that this has all happened before. A few years ago, many files were deleted, see PDFBOX-391.

Tilman


Naturally, if anybody objects to their PDF being in our test suite, we can 
always remove it, but it shouldn’t include anything which isn’t already on the 
public web.

-- John

Reply via email to