Re: [Dspace-tech] XPDF Thumbnail Preview issue
Hi Christian, I wrote the following option in last post but Poppler's version of pdftoppm seems to make a varying length of sequence by the page number of the original pdf. This problem already has been fixed from the DSpace version 3.0. You can see this fix at the following url. https://github.com/DSpace/DSpace/commit/0d01078eb165f7431bee64dfde2271e9b149e862#diff-e8b609cb2ef5cb3d5d1945f38ffbe2ec Sorry I have checked only DSpace 1.8.2 and not the current version. Regards, Keiji Suzuki 2014-06-15 22:54 GMT+09:00 SUZUKI Keiji z...@mbc.ocn.ne.jp: 2) Edit line 237 of XPDF2Thumbnail.java and rebuild DSpace from File outf = new File(outPrefix+-01.ppm); to File outf = new File(outPrefix+-1.ppm); -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] XPDF Thumbnail Preview issue
Hello, Am 13.06.2014 um 04:47 schrieb SUZUKI Keiji z...@mbc.ocn.ne.jp: 1) Set the logging level to DEBUG and rerun. Should have done so before. Thanks you for the heads up. You were perfectly right. But then, the result leaves me a bit clueless for now: esxh-15:/srv/dspace tail -n 10 log/dspace.log.2014-06-15 2014-06-15 12:45:17,812 DEBUG org.dspace.content.BitstreamFormat @ anonymous::find_bitstream_format:bitstream_format_id=2 2014-06-15 12:45:17,812 DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query SELECT * FROM fileextension WHERE bitstream_format_id= ? with parameters: 2 2014-06-15 12:45:17,851 DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query select * from bitstream where bitstream_id = ? with parameters: 27442 2014-06-15 12:45:17,852 DEBUG org.dspace.storage.bitstore.BitstreamStorageManager @ Local filename for 8706628839618174761158592395102959 is /srv/dspace/assetstore/87/06/62/8706628839618174761158592395102959 2014-06-15 12:45:17,865 INFO net.sf.ehcache.util.UpdateChecker @ New update(s) found: 2.4.7 [http://www.terracotta.org/confluence/display/release/Release+Notes+Ehcache+Core+2.4] 2014-06-15 12:45:17,919 DEBUG org.dspace.app.mediafilter.XPDF2Thumbnail @ DPI: pdfinfo method got dpi=75 for max dim=759 (points, 1/72) 2014-06-15 12:45:17,920 DEBUG org.dspace.app.mediafilter.XPDF2Thumbnail @ Running xpdf command: [/usr/bin/pdftoppm, -q, -f, 1, -l, 1, -r, 75, /tmp/DSfilt2327548125683453130.pdf, /tmp/prevu8591868713129272046out] 2014-06-15 12:45:18,357 DEBUG org.dspace.app.mediafilter.XPDF2Thumbnail @ PDFTOPPM output is: /tmp/prevu8591868713129272046out-01.ppm, exists=false 2014-06-15 12:45:18,420 ERROR org.dspace.app.mediafilter.XPDF2Thumbnail @ Unable to delete file 2014-06-15 12:45:18,421 DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query SELECT bundle.* FROM bundle, bundle2bitstream WHERE bundle.bundle_id=bundle2bitstream.bundle_id AND bundle2bitstream.bitstream_id= ? with parameters: 27442 esxh-15:/srv/dspace ls -l /tmp insgesamt 1272 drwx-- 2 amanda backup 4096 Jun 15 11:27 amanda drwxr-xr-x 2 rootroot 4096 Jun 15 12:17 hsperfdata_root drwxr-xr-x 2 tomcat7 tomcat74096 Jun 15 12:45 hsperfdata_tomcat7 -rw-r--r-- 1 tomcat7 tomcat7 1281435 Jun 15 12:45 prevu8591868713129272046out-1.ppm drwxr-xr-x 2 tomcat7 root 4096 Jun 15 12:12 tomcat7-tomcat7-tmp drwx-- 2 rootroot 4096 Jun 15 11:26 vmware-root esxh-15:/srv/dspace This means, the enumeration scheme used by pdftoppm for writing image files from several pages is different from what the XPDF Plugin expects. If I got it right, the plugin tells pdftoppm to do this: /usr/bin/pdftoppm -q -f 1 -l 1 -r 75 /tmp/DSfilt2327548125683453130.pdf /tmp/prevu8591868713129272046out It expects to find the resulting file here: /tmp/prevu8591868713129272046out-01.ppm However, the file gets written here: /tmp/prevu8591868713129272046out-1.ppm Everything is fine regarding file permissions, the file is in the expected directory /tmp, only the six digits instead of a single digit make the difference. There are several questions here. Why does the filter write a .ppm file and not a .jpg file using the -jpeg option of pdftoppm and when does the actual conversion happen? The task of the filter is always to produce a thumbnail image of the first page. So it would seem much more logical and robust to me to use the -singlepage attribute of pdftoppm which does not add anything to the output name besides the file extension. Instead first page -f and last page -l are set to 1. But well I would not need to bother if everything worked fine. Where does this six digit rule get set? During my tests I had produced thousands of files starting with /tmp/prevu*. Most of them ended on -1.ppm, but some of them on -01.ppm. Mysterious. I will try to produce the same fault on my test system which works fine for now, just to understand where are the differences. For now, I wont try the second suggestion to recompile with source code commented out, because I guess, I already found the issue, just dont understand it yet. Thanks for your support. Further suggestions welcome. Bye, Christian -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] XPDF Thumbnail Preview issue
Hi Christian, 2014-06-15 20:18 GMT+09:00 Christian Völker c.voel...@gmx.net: This means, the enumeration scheme used by pdftoppm for writing image files from several pages is different from what the XPDF Plugin expects. If I got it right, the plugin tells pdftoppm to do this: /usr/bin/pdftoppm -q -f 1 -l 1 -r 75 /tmp/DSfilt2327548125683453130.pdf /tmp/prevu8591868713129272046out It expects to find the resulting file here: /tmp/prevu8591868713129272046out-01.ppm However, the file gets written here: /tmp/prevu8591868713129272046out-1.ppm Everything is fine regarding file permissions, the file is in the expected directory /tmp, only the six digits instead of a single digit make the difference. There are several questions here. Why does the filter write a .ppm file and not a .jpg file using the -jpeg option of pdftoppm and when does the actual conversion happen? The task of the filter is always to produce a thumbnail image of the first page. So it would seem much more logical and robust to me to use the -singlepage attribute of pdftoppm which does not add anything to the output name besides the file extension. Instead first page -f and last page -l are set to 1. But well I would not need to bother if everything worked fine. Where does this six digit rule get set? The version of my pdftoppm is different from yours and my version of it makes a output ppm with 6 digit as a sequece. dspace@www:~$ pdftoppm -v pdftoppm version 3.02 Copyright 1996-2007 Glyph Cog, LLC And I confirm the version of pdftoppm in the package poppler-utils of Ubuntu 1204LTS server (64bit version) is same as yours and this version of pdftoppm make a output file with one digit. I use Ubuntu 12.04LTS server (32bit version). I can't remember how did I install my version but there is the xpdf-utils package that is not in 64bit OS. I might install my version from this package. In any case, I think there are two options. 1) Install the version 3.02 of pdftoppm in some way, 2) Edit line 237 of XPDF2Thumbnail.java and rebuild DSpace from File outf = new File(outPrefix+-01.ppm); to File outf = new File(outPrefix+-1.ppm); Hope this helps you. Regards, Keiji Suzuki During my tests I had produced thousands of files starting with /tmp/prevu*. Most of them ended on -1.ppm, but some of them on -01.ppm. Mysterious. I will try to produce the same fault on my test system which works fine for now, just to understand where are the differences. For now, I wont try the second suggestion to recompile with source code commented out, because I guess, I already found the issue, just dont understand it yet. Thanks for your support. Further suggestions welcome. Bye, Christian -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- 鈴木敬二@江別市 -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
[Dspace-tech] XPDF Thumbnail Preview issue
Hello, I just try to get the XPDF based PDF Thumbnail creation working. It works fine in my DSpace 4.1 test instance. The feature was already available in DSpace 1.8.2 which is still our production release. Instead of waiting until the new version is production ready, I install the features step by step in the production environment. On the production machine, I get this error: esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v The following MediaFilters are enabled: Full Filter Name: org.dspace.app.mediafilter.HTMLFilter org.dspace.app.mediafilter.HTMLFilter Full Filter Name: org.dspace.app.mediafilter.WordFilter org.dspace.app.mediafilter.WordFilter Full Filter Name: org.dspace.app.mediafilter.JPEGFilter org.dspace.app.mediafilter.JPEGFilter Full Filter Name: org.dspace.app.mediafilter.XPDF2Text org.dspace.app.mediafilter.XPDF2Text Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail org.dspace.app.mediafilter.XPDF2Thumbnail Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter org.dspace.app.mediafilter.PowerPointFilter SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt' already exists ERROR filtering, skipping bitstream: Item Handle: 2339/4318 Bundle Name: ORIGINAL File Size: 2667225 Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5) Asset Store: 0 javax.imageio.IIOException: Can't read input file! javax.imageio.IIOException: Can't read input file! at javax.imageio.ImageIO.read(ImageIO.java:1291) at org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244) at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183) Updating search index: Note, that the text extraction took place in an earlier run of filter-media. So the message Can't read input file! is not very credible. Also the method called when the Exeption took place was XPDF2Thumbnail.getDestinationStream, which means that this issue might not be with the input file but with creating the output file. In 2012, Osama Alkadi reported a similar issue and solved it by updating the pdftoppm tool. On Debian and Ubuntu, the required tools are contained in the package poppler-utils. I have installed Version 0.18.4 on both test and production machine. Here is the output: esxh-15:/srv/dspace# pdftoppm -v pdftoppm version 0.18.4 Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph Cog, LLC The version numberings seems to have changed in unexpected ways as Osama Alkadi told that he updated from 3.0 to 3.0.2. For the moment, this does not help too much. All other components involved are also the same on both machines. jai_imageio is version 1.1 and jai_core is 1.1.3. As the file is hard to find in the assetstore, I downloaded it using the browser, scped it back to the server and converted it manually using pdftoppm -jpeg inputfile.pdf outputname. It worked. I exported the item containing the file using the AIP packager, transferred it to the test server running DSpace 4.1, imported it and ran filter-media there. It worked fine. I compared the installation instructions of DSpace 4.1 and 1.8.2 and could not find a significant difference regarding the XPDF Feature. The mvn package and ant update command had not shown any irregularities. File permissions in assetstore did not show any problems. On both machines, DSpace is run as the daemon user tomcat7. In both cases, I run Tomcat 7, albeit in slightly different versions. But Tomcat is not involved in running the command line tool like bin/dspace filter-media anyway. So far, I have not found a clue, where to search for the reason. If anybody has an idea, Id be grateful. Bye, Christian -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems
Re: [Dspace-tech] XPDF Thumbnail Preview issue
Hi Christian, This error has occurred because ImageIO could not read the file generated by pdftoppm command. I think what you should do is to check whether pdftoppm generates a correct file. To do this, I recommend the following two steps. 1) Set the logging level to DEBUG and rerun. 2) Comment out the lines 253 to 256 in XPDF2Thumbnail.java temporally, rebuild and run. With step 1, you can see the real command executed by DSpace and the path name of generated file to check these are correct. With Step 2, you can retain the generated file to check its content and the mode. Regards, Keiji Suzuki 2014-06-13 8:44 GMT+09:00 Christian Völker c.voel...@gmx.net: Hello, I just try to get the XPDF based PDF Thumbnail creation working. It works fine in my DSpace 4.1 test instance. The feature was already available in DSpace 1.8.2 which is still our production release. Instead of waiting until the new version is production ready, I install the features step by step in the production environment. On the production machine, I get this error: esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v The following MediaFilters are enabled: Full Filter Name: org.dspace.app.mediafilter.HTMLFilter org.dspace.app.mediafilter.HTMLFilter Full Filter Name: org.dspace.app.mediafilter.WordFilter org.dspace.app.mediafilter.WordFilter Full Filter Name: org.dspace.app.mediafilter.JPEGFilter org.dspace.app.mediafilter.JPEGFilter Full Filter Name: org.dspace.app.mediafilter.XPDF2Text org.dspace.app.mediafilter.XPDF2Text Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail org.dspace.app.mediafilter.XPDF2Thumbnail Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter org.dspace.app.mediafilter.PowerPointFilter SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt' already exists ERROR filtering, skipping bitstream: Item Handle: 2339/4318 Bundle Name: ORIGINAL File Size: 2667225 Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5) Asset Store: 0 javax.imageio.IIOException: Can't read input file! javax.imageio.IIOException: Can't read input file! at javax.imageio.ImageIO.read(ImageIO.java:1291) at org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244) at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183) Updating search index: Note, that the text extraction took place in an earlier run of filter-media. So the message Can't read input file! is not very credible. Also the method called when the Exeption took place was XPDF2Thumbnail.getDestinationStream, which means that this issue might not be with the input file but with creating the output file. In 2012, Osama Alkadi reported a similar issue and solved it by updating the pdftoppm tool. On Debian and Ubuntu, the required tools are contained in the package poppler-utils. I have installed Version 0.18.4 on both test and production machine. Here is the output: esxh-15:/srv/dspace# pdftoppm -v pdftoppm version 0.18.4 Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph Cog, LLC The version numberings seems to have changed in unexpected ways as Osama Alkadi told that he updated from 3.0 to 3.0.2. For the moment, this does not help too much. All other components involved are also the same on both machines. jai_imageio is version 1.1 and jai_core is 1.1.3. As the file is hard to find in the assetstore, I downloaded it using the browser, scped it back to the server and converted it manually using pdftoppm -jpeg inputfile.pdf outputname. It worked. I exported the item containing the file using the AIP packager, transferred it to the test server running DSpace 4.1, imported it and ran filter-media there. It worked fine. I compared the installation instructions of DSpace 4.1 and 1.8.2 and could not find a significant difference regarding the XPDF Feature. The mvn package and ant update command had not shown any irregularities. File permissions in assetstore did not show any problems. On both