Hi Andrew,

Are you using DSpace version 1.5.x or 1.6?

Both versions use Apache's pdfbox 0.7.3 for filtering PDF files which is a
older version. Current one is version 1.0.0. I believe this newer version
is capable of taking care of your PDF problems. Below is a snippet of an
earlier posting and mail exchanges:

>> E.g. the pdfbox used is version 0.7.3. and a couple of years old.
>> Noted in our production instance that more and more new pdf's are not
>> processed. Just trying the new 1.1.0 version and it seems to process
>> these pdf's without difficulty.

>> You need the uptodate versions of jempbox and fontbox too,
>> only updating the pdfbox is not enough, and depending on the input may
>> be the bouncy castle provider for the java version you are using.
>> See http://pdfbox.apache.org/

>> Claudia

But Dspace 1.6 also has provision for XPDF, the details of which is
outlined in the manual in the section under media filtering. I have tried
this out and it solved issues with large pdf files as well as pdfs created
using newer version of Acrobat engine.

Hope this will help,
Debashree



> We are experiencing problems with media filtering of PDF files added in
our thesis digitisation project.
>
> A number of the files (perhaps 10%) will not filter, the command window
just pauses for up to 15 minutes or so, then displays:
>
> "SKIPPED: bitstream 5698 (item: 10182/1780) because filtering was
unsuccessful"
>
> No other error message or clue is given.
>
> I can see no common feature of the PDFs that won't filter - they can be
b&w only or some colour, different PDF versions. Yes, they are all quite
large files (10MB or larger), but not all files of this size are failing
in this way.
>
> I find that if I split the file into parts and re-upload, they will then
filter OK.
>
> Has anyone else experienced this and do you have a solution?
>
> Andrew White
> Information Technology Librarian
>
> George Forbes Memorial Library
> PO Box 64
> Lincoln University
> Lincoln 7647
> Christchurch, New Zealand
>
> p +64 3 321 8542 | f +64 3 325 2944
> e andrew.wh...@lincoln.ac.nz<mailto:andrew.wh...@lincoln.ac.nz> | w
library.lincoln.ac.nz<http://library.lincoln.ac.nz/>
>
> Lincoln University, Te Whare Wanaka o Aoraki
> New Zealand's Specialist Land Based University
>
> "The contents of this e-mail (including any attachments) may be
> confidential and/or subject to copyright. Any unauthorised use,
> distribution, or copying of the contents is expressly prohibited.  If
you
> have received this e-mail in error, please advise the sender
> by return e-mail or telephone and then delete this e-mail together with
all attachments from your system."
> ------------------------------------------------------------------------------
_______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>








------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... White, Andrew
    • R... Debashree Pati
    • R... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to