Hi Christian,

This error has occurred because ImageIO could not read the file generated
by pdftoppm command. I think what you should do is to check whether
pdftoppm generates a correct file. To do this, I recommend the following
two steps.

1) Set the logging level to DEBUG and rerun.
2) Comment out the lines 253 to 256 in XPDF2Thumbnail.java
     temporally, rebuild and run.

With step 1, you can see the real command executed by DSpace
and the path name of generated file to check these are correct.

With Step 2, you can retain the generated file to check its content
and the mode.

Regards,
Keiji Suzuki



2014-06-13 8:44 GMT+09:00 Christian Völker <c.voel...@gmx.net>:

> Hello,
>
> I just try to get the XPDF based PDF Thumbnail creation working. It works
> fine in my DSpace 4.1 test instance.
>
> The feature was already available in DSpace 1.8.2 which is still our
> production release. Instead of waiting until the new version is production
> ready, I install the features step by step in the production environment.
>
>
> On the production machine, I get this error:
>
> esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v
> The following MediaFilters are enabled:
> Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
> org.dspace.app.mediafilter.HTMLFilter
> Full Filter Name: org.dspace.app.mediafilter.WordFilter
> org.dspace.app.mediafilter.WordFilter
> Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
> org.dspace.app.mediafilter.JPEGFilter
> Full Filter Name: org.dspace.app.mediafilter.XPDF2Text
> org.dspace.app.mediafilter.XPDF2Text
> Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail
> org.dspace.app.mediafilter.XPDF2Thumbnail
> Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
> org.dspace.app.mediafilter.PowerPointFilter
> SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt'
> already exists
> ERROR filtering, skipping bitstream:
>
>         Item Handle: 2339/4318
>         Bundle Name: ORIGINAL
>         File Size: 2667225
>         Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5)
>         Asset Store: 0
> javax.imageio.IIOException: Can't read input file!
> javax.imageio.IIOException: Can't read input file!
>         at javax.imageio.ImageIO.read(ImageIO.java:1291)
>         at
> org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244)
>         at
> org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
>         at
> org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
>         at
> org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
>         at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
>         at
> org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at
> org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
> Updating search index:
>
>
> Note, that the text extraction took place in an earlier run of
> filter-media. So the message "Can't read input file!" is not very credible.
> Also the method called when the Exeption took place was
> XPDF2Thumbnail.getDestinationStream, which means that this issue might not
> be with the input file but with creating the output file.
>
>
> In 2012, Osama Alkadi reported a similar issue and solved it by updating
> the pdftoppm tool. On Debian and Ubuntu, the required tools are contained
> in the package poppler-utils. I have installed Version 0.18.4 on both test
> and production machine. Here is the output:
>
> esxh-15:/srv/dspace# pdftoppm -v
> pdftoppm version 0.18.4
> Copyright 2005-2011 The Poppler Developers -
> http://poppler.freedesktop.org
> Copyright 1996-2004 Glyph & Cog, LLC
>
> The version numberings seems to have changed in unexpected ways as Osama
> Alkadi told that he updated from 3.0 to 3.0.2. For the moment, this does
> not help too much.
>
> All other components involved are also the same on both machines.
> jai_imageio is version 1.1 and jai_core is 1.1.3.
>
>
> As the file is hard to find in the assetstore, I downloaded it using the
> browser, scped it back to the server and converted it manually using
> pdftoppm -jpeg inputfile.pdf outputname. It worked.
>
> I exported the item containing the file using the AIP packager,
> transferred it to the test server running DSpace 4.1, imported it and ran
> filter-media there. It worked fine.
>
> I compared the installation instructions of DSpace 4.1 and 1.8.2 and could
> not find a significant difference regarding the XPDF Feature. The mvn
> package and ant update command had not shown any irregularities.
>
> File permissions in assetstore did not show any problems. On both
> machines, DSpace is run as the daemon user tomcat7. In both cases, I run
> Tomcat 7, albeit in slightly different versions. But Tomcat is not involved
> in running the command line tool like bin/dspace filter-media anyway.
>
> So far, I have not found a clue, where to search for the reason. If
> anybody has an idea, Id be grateful.
>
> Bye, Christian
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>



-- 
鈴木敬二@江別市
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to