On 5 Jul 2014, at 13:47, Tilman Hausherr <[email protected]> wrote:
> Am 05.07.2014 22:12, schrieb John Hewson:
>>>>> Copyrights is a problem: I'm testing mostly with JIRA attachments that
>>>>> I've downloaded over the years. While uploading such files to JIRA might
>>>>> count as fair use, I doubt that this would still be true if they are
>>>>> included in a distribution. Instead, they should be stored somewhere on
>>>>> Apache servers where only committers and build software ("Travis",
>>>>> "Jenkins", ...) can access then. The public PDFs that Maruan mentions
>>>>> don't possibly have all the Problem cases that we solved before. However
>>>>> I have started working with these files and there are at least 5 recent
>>>>> issues that deals with them.
>>>> The PDFs won’t be in a distribution. They will just happen to be stored in
>>>> an SVN repo but not our source code repo, in the same way that the website
>>>> is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law
>>>> doesn’t distinguish between JIRA and SVN, both are publicly available via
>>>> HTTP, so using SVN will simply be a continuation of what we’re already
>>>> doing with JIRA.
>>>>
>>>> The crucial factor is that we’re only storing publicly available PDFs,
>>>> because we have the right to do so, just like Google’s cache, and like we
>>>> currently do with JIRA.
>>> Yes but many PDFs we got aren't really "public". If this svn repo is only
>>> accessible to committers, and if the publicly available build scripts won't
>>> break because of this, then it is OK.
>> Any non-public PDFs will not be permitted in our test suite, just as they
>> shouldn't be on JIRA.
>>
>>> Note that even if something is "publicly available", it may still be
>>> copyrighted. Other risks can be that some people upload PDFs that include
>>> personal data. One really good test PDF was apparently a loan application.
>>> I remember that the user insisted that 1. it was test data, and 2. that it
>>> be removed.
>> All Apache development should be in the open, this is a key ASF principle,
>> having a committers-only test suite is basically a no-no. It's important to
>> understand that "fair use" allows us to use copyrighted works - this is
>> expressly permitted, it's the same legal principle as Google’s cache. There
>> is no need to seek permission. This is what we’ve been doing with JIRA
>> already for years, so we are already doing this - it’s fine.
>
> The problem is that this has all happened before. A few years ago, many files
> were deleted, see PDFBOX-391.
That issue is about including files in the source code repo as part of the
PDFBox distribution, where there is a need to put files under an Apache 2.0
compatible license. What I’m advocating is keeping a separate public repository
of test files which are not a part of the PDFBox source, like we currently have
on JIRA.
-- John