>> To keep the tests we could repackage the test suite as an independent
>> component and release it under the PDFBox SourceForge project with
>> proper disclaimers about the copyright status of the included files.
>> Then in our developer documentation at Apache we could have a pointer
>> to that test suite and instructions on how to integrate it with a
>> normal PDFBox checkout.
>
>This sounds good. For the text extraction tests, and possibly the
>others, we could do it such that the SF files are placed in their own
>directory and the test suite will test for the existence of the
>directory (and not give an error if it does not exist).
That's a good idea.

>...
>It seems like we could add tests that use the sorting feature by either:
>a) Store the PDF files in one directory and a separate directory
>exists for each test (i.e. a directory for non-sorted text files and
>a directory for sorted text files).  The text files for each test are
>stored in the directory and renamed to have a .txt extension.
>b) Store the PDF files and text files in the same directory, but
>rename the 'sorted' text files to have "-sort.txt" at the end.  For
>example, "test1.pdf" would have "test1.txt" for its non-sorted gold
>standard and "test1-sort.txt" for its sorted gold standard.
>
>If we do approach b, then we do not need to change the current
>directory structure.  If we do approach a, then we do.  'b' seems a
>little more clumsy, but it could be easier if we are going to have
>multiple directories of test files.   For example, we could have an
>'input' directory of the files in Apache and a 'input-sf' directory
>of the files in SourceForge.
I agree with Brian. We should extend the extraction-part with an additional
test with sorting enabled. I prefer approach b).

I like the idea with 2 directories, one for each source. Then we are able to
replace all needed documents in 'input-sf' with other suitable documents step
by step.

Andreas


----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender),
Stefan Niehusmann -
- Sitz der Gesellschaft: Dortmund -
- Eingetragen beim Amtsgericht Dortmund -
- Handelsregister-Nr. HR B 21222 -
- USt.-IdNr. DE 2588 96 719 -

Reply via email to