Re: [Dspace-tech] XPDF Thumbnail Preview issue

2014-06-16 Thread SUZUKI Keiji
Hi Christian,

I wrote the following option in last post but Poppler's version of pdftoppm
seems
to make a varying length of sequence by the page number of the original pdf.
This problem already has been fixed from the DSpace version 3.0. You can
see
this fix at the following url.

https://github.com/DSpace/DSpace/commit/0d01078eb165f7431bee64dfde2271e9b149e862#diff-e8b609cb2ef5cb3d5d1945f38ffbe2ec

Sorry I have checked only DSpace 1.8.2 and not the current version.

Regards,
Keiji Suzuki

2014-06-15 22:54 GMT+09:00 SUZUKI Keiji z...@mbc.ocn.ne.jp:

 2) Edit line 237 of XPDF2Thumbnail.java and rebuild DSpace

 from

 File outf = new File(outPrefix+-01.ppm);

 to

 File outf = new File(outPrefix+-1.ppm);


--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] XPDF Thumbnail Preview issue

2014-06-15 Thread Christian Völker
Hello,

Am 13.06.2014 um 04:47 schrieb SUZUKI Keiji z...@mbc.ocn.ne.jp:

 1) Set the logging level to DEBUG and rerun.

Should have done so before. Thanks you for the heads up.

You were perfectly right. But then, the result leaves me a bit clueless for now:

 esxh-15:/srv/dspace tail -n 10 log/dspace.log.2014-06-15
 2014-06-15 12:45:17,812 DEBUG org.dspace.content.BitstreamFormat @ 
 anonymous::find_bitstream_format:bitstream_format_id=2
 2014-06-15 12:45:17,812 DEBUG org.dspace.storage.rdbms.DatabaseManager @ 
 Running query SELECT * FROM fileextension WHERE bitstream_format_id= ?   
 with parameters: 2
 2014-06-15 12:45:17,851 DEBUG org.dspace.storage.rdbms.DatabaseManager @ 
 Running query select * from bitstream where bitstream_id = ?   with 
 parameters: 27442
 2014-06-15 12:45:17,852 DEBUG 
 org.dspace.storage.bitstore.BitstreamStorageManager @ Local filename for 
 8706628839618174761158592395102959 is 
 /srv/dspace/assetstore/87/06/62/8706628839618174761158592395102959
 2014-06-15 12:45:17,865 INFO  net.sf.ehcache.util.UpdateChecker @ New 
 update(s) found: 2.4.7 
 [http://www.terracotta.org/confluence/display/release/Release+Notes+Ehcache+Core+2.4]
 2014-06-15 12:45:17,919 DEBUG org.dspace.app.mediafilter.XPDF2Thumbnail @ 
 DPI: pdfinfo method got dpi=75 for max dim=759 (points, 1/72)
 2014-06-15 12:45:17,920 DEBUG org.dspace.app.mediafilter.XPDF2Thumbnail @ 
 Running xpdf command: [/usr/bin/pdftoppm, -q, -f, 1, -l, 1, -r, 75, 
 /tmp/DSfilt2327548125683453130.pdf, /tmp/prevu8591868713129272046out]
 2014-06-15 12:45:18,357 DEBUG org.dspace.app.mediafilter.XPDF2Thumbnail @ 
 PDFTOPPM output is: /tmp/prevu8591868713129272046out-01.ppm, exists=false
 2014-06-15 12:45:18,420 ERROR org.dspace.app.mediafilter.XPDF2Thumbnail @ 
 Unable to delete file
 2014-06-15 12:45:18,421 DEBUG org.dspace.storage.rdbms.DatabaseManager @ 
 Running query SELECT bundle.* FROM bundle, bundle2bitstream WHERE 
 bundle.bundle_id=bundle2bitstream.bundle_id AND 
 bundle2bitstream.bitstream_id= ?   with parameters: 27442
 esxh-15:/srv/dspace ls -l /tmp
 insgesamt 1272
 drwx-- 2 amanda  backup 4096 Jun 15 11:27 amanda
 drwxr-xr-x 2 rootroot   4096 Jun 15 12:17 hsperfdata_root
 drwxr-xr-x 2 tomcat7 tomcat74096 Jun 15 12:45 hsperfdata_tomcat7
 -rw-r--r-- 1 tomcat7 tomcat7 1281435 Jun 15 12:45 
 prevu8591868713129272046out-1.ppm
 drwxr-xr-x 2 tomcat7 root   4096 Jun 15 12:12 tomcat7-tomcat7-tmp
 drwx-- 2 rootroot   4096 Jun 15 11:26 vmware-root
 esxh-15:/srv/dspace 

This means, the enumeration scheme used by pdftoppm for writing image files 
from several pages is different from what the XPDF Plugin expects. If I got it 
right, the plugin tells pdftoppm to do this:

/usr/bin/pdftoppm -q -f 1 -l 1 -r 75 /tmp/DSfilt2327548125683453130.pdf 
/tmp/prevu8591868713129272046out

It expects to find the resulting file here:

/tmp/prevu8591868713129272046out-01.ppm

However, the file gets written here:

/tmp/prevu8591868713129272046out-1.ppm

Everything is fine regarding file permissions, the file is in the expected 
directory /tmp, only the six digits instead of a single digit make the 
difference. There are several questions here. Why does the filter write a .ppm 
file and not a .jpg file using the -jpeg option of pdftoppm and when does the 
actual conversion happen? The task of the filter is always to produce a 
thumbnail image of the first page. So it would seem much more logical and 
robust to me to use the -singlepage attribute of pdftoppm which does not add 
anything to the output name besides the file extension. Instead first page -f 
and last page -l are set to 1. But well I would not need to bother if 
everything worked fine.

Where does this six digit rule get set? 

During my tests I had produced thousands of files starting with /tmp/prevu*. 
Most of them ended on -1.ppm, but some of them on -01.ppm. Mysterious.

I will try to produce the same fault on my test system which works fine for 
now, just to understand where are the differences.

For now, I wont try the second suggestion to recompile with source code 
commented out, because I guess, I already found the issue, just dont understand 
it yet.

Thanks for your support. Further suggestions welcome.

Bye, Christian


--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] XPDF Thumbnail Preview issue

2014-06-15 Thread SUZUKI Keiji
Hi Christian,

2014-06-15 20:18 GMT+09:00 Christian Völker c.voel...@gmx.net:



 This means, the enumeration scheme used by pdftoppm for writing image
 files from several pages is different from what the XPDF Plugin expects. If
 I got it right, the plugin tells pdftoppm to do this:

 /usr/bin/pdftoppm -q -f 1 -l 1 -r 75 /tmp/DSfilt2327548125683453130.pdf
 /tmp/prevu8591868713129272046out

 It expects to find the resulting file here:

 /tmp/prevu8591868713129272046out-01.ppm

 However, the file gets written here:

 /tmp/prevu8591868713129272046out-1.ppm

 Everything is fine regarding file permissions, the file is in the expected
 directory /tmp, only the six digits instead of a single digit make the
 difference. There are several questions here. Why does the filter write a
 .ppm file and not a .jpg file using the -jpeg option of pdftoppm and when
 does the actual conversion happen? The task of the filter is always to
 produce a thumbnail image of the first page. So it would seem much more
 logical and robust to me to use the -singlepage attribute of pdftoppm which
 does not add anything to the output name besides the file extension.
 Instead first page -f and last page -l are set to 1. But well I would not
 need to bother if everything worked fine.

 Where does this six digit rule get set?


The version of my pdftoppm is different from yours and my version of it
makes a output ppm with 6 digit as a sequece.

  dspace@www:~$ pdftoppm -v
  pdftoppm version 3.02
  Copyright 1996-2007 Glyph  Cog, LLC

And I confirm the version of pdftoppm in the package poppler-utils of
Ubuntu 1204LTS server (64bit version)  is same as yours and this version
of pdftoppm make a output file with one digit.

I use Ubuntu 12.04LTS server (32bit version). I can't remember how did
I install my version but there is the xpdf-utils package that is not in
64bit OS.
I might install my version from this package.

In any case, I think there are two options.

1) Install the version 3.02 of pdftoppm in some way,
2) Edit line 237 of XPDF2Thumbnail.java and rebuild DSpace

from

File outf = new File(outPrefix+-01.ppm);

to

File outf = new File(outPrefix+-1.ppm);

Hope this helps you.

Regards,
Keiji Suzuki



 During my tests I had produced thousands of files starting with
 /tmp/prevu*. Most of them ended on -1.ppm, but some of them on -01.ppm.
 Mysterious.

 I will try to produce the same fault on my test system which works fine
 for now, just to understand where are the differences.

 For now, I wont try the second suggestion to recompile with source code
 commented out, because I guess, I already found the issue, just dont
 understand it yet.

 Thanks for your support. Further suggestions welcome.

 Bye, Christian



 --
 HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
 Find What Matters Most in Your Big Data with HPCC Systems
 Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
 Leverages Graph Analysis for Fast Processing  Easy Data Exploration
 http://p.sf.net/sfu/hpccsystems
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette:
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette




-- 
鈴木敬二@江別市
--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

[Dspace-tech] XPDF Thumbnail Preview issue

2014-06-12 Thread Christian Völker
Hello,

I just try to get the XPDF based PDF Thumbnail creation working. It works fine 
in my DSpace 4.1 test instance. 

The feature was already available in DSpace 1.8.2 which is still our production 
release. Instead of waiting until the new version is production ready, I 
install the features step by step in the production environment.


On the production machine, I get this error:

esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v
The following MediaFilters are enabled: 
Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
org.dspace.app.mediafilter.HTMLFilter
Full Filter Name: org.dspace.app.mediafilter.WordFilter
org.dspace.app.mediafilter.WordFilter
Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
org.dspace.app.mediafilter.JPEGFilter
Full Filter Name: org.dspace.app.mediafilter.XPDF2Text
org.dspace.app.mediafilter.XPDF2Text
Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail
org.dspace.app.mediafilter.XPDF2Thumbnail
Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
org.dspace.app.mediafilter.PowerPointFilter
SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt' 
already exists
ERROR filtering, skipping bitstream:

Item Handle: 2339/4318
Bundle Name: ORIGINAL
File Size: 2667225
Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5)
Asset Store: 0
javax.imageio.IIOException: Can't read input file!
javax.imageio.IIOException: Can't read input file!
at javax.imageio.ImageIO.read(ImageIO.java:1291)
at 
org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244)
at 
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
at 
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
at 
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
at 
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
Updating search index:


Note, that the text extraction took place in an earlier run of filter-media. So 
the message Can't read input file! is not very credible. Also the method 
called when the Exeption took place was XPDF2Thumbnail.getDestinationStream, 
which means that this issue might not be with the input file but with creating 
the output file.


In 2012, Osama Alkadi reported a similar issue and solved it by updating the 
pdftoppm tool. On Debian and Ubuntu, the required tools are contained in the 
package poppler-utils. I have installed Version 0.18.4 on both test and 
production machine. Here is the output:

esxh-15:/srv/dspace# pdftoppm -v
pdftoppm version 0.18.4
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph  Cog, LLC

The version numberings seems to have changed in unexpected ways as Osama Alkadi 
told that he updated from 3.0 to 3.0.2. For the moment, this does not help too 
much.

All other components involved are also the same on both machines. jai_imageio 
is version 1.1 and jai_core is 1.1.3.


As the file is hard to find in the assetstore, I downloaded it using the 
browser, scped it back to the server and converted it manually using pdftoppm 
-jpeg inputfile.pdf outputname. It worked.

I exported the item containing the file using the AIP packager, transferred it 
to the test server running DSpace 4.1, imported it and ran filter-media there. 
It worked fine.

I compared the installation instructions of DSpace 4.1 and 1.8.2 and could not 
find a significant difference regarding the XPDF Feature. The mvn package and 
ant update command had not shown any irregularities.

File permissions in assetstore did not show any problems. On both machines, 
DSpace is run as the daemon user tomcat7. In both cases, I run Tomcat 7, albeit 
in slightly different versions. But Tomcat is not involved in running the 
command line tool like bin/dspace filter-media anyway.

So far, I have not found a clue, where to search for the reason. If anybody has 
an idea, Id be grateful.

Bye, Christian


--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems

Re: [Dspace-tech] XPDF Thumbnail Preview issue

2014-06-12 Thread SUZUKI Keiji
Hi Christian,

This error has occurred because ImageIO could not read the file generated
by pdftoppm command. I think what you should do is to check whether
pdftoppm generates a correct file. To do this, I recommend the following
two steps.

1) Set the logging level to DEBUG and rerun.
2) Comment out the lines 253 to 256 in XPDF2Thumbnail.java
 temporally, rebuild and run.

With step 1, you can see the real command executed by DSpace
and the path name of generated file to check these are correct.

With Step 2, you can retain the generated file to check its content
and the mode.

Regards,
Keiji Suzuki



2014-06-13 8:44 GMT+09:00 Christian Völker c.voel...@gmx.net:

 Hello,

 I just try to get the XPDF based PDF Thumbnail creation working. It works
 fine in my DSpace 4.1 test instance.

 The feature was already available in DSpace 1.8.2 which is still our
 production release. Instead of waiting until the new version is production
 ready, I install the features step by step in the production environment.


 On the production machine, I get this error:

 esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v
 The following MediaFilters are enabled:
 Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
 org.dspace.app.mediafilter.HTMLFilter
 Full Filter Name: org.dspace.app.mediafilter.WordFilter
 org.dspace.app.mediafilter.WordFilter
 Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
 org.dspace.app.mediafilter.JPEGFilter
 Full Filter Name: org.dspace.app.mediafilter.XPDF2Text
 org.dspace.app.mediafilter.XPDF2Text
 Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail
 org.dspace.app.mediafilter.XPDF2Thumbnail
 Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
 org.dspace.app.mediafilter.PowerPointFilter
 SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt'
 already exists
 ERROR filtering, skipping bitstream:

 Item Handle: 2339/4318
 Bundle Name: ORIGINAL
 File Size: 2667225
 Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5)
 Asset Store: 0
 javax.imageio.IIOException: Can't read input file!
 javax.imageio.IIOException: Can't read input file!
 at javax.imageio.ImageIO.read(ImageIO.java:1291)
 at
 org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244)
 at
 org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
 at
 org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
 at
 org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
 at
 org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
 at
 org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:622)
 at
 org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
 Updating search index:


 Note, that the text extraction took place in an earlier run of
 filter-media. So the message Can't read input file! is not very credible.
 Also the method called when the Exeption took place was
 XPDF2Thumbnail.getDestinationStream, which means that this issue might not
 be with the input file but with creating the output file.


 In 2012, Osama Alkadi reported a similar issue and solved it by updating
 the pdftoppm tool. On Debian and Ubuntu, the required tools are contained
 in the package poppler-utils. I have installed Version 0.18.4 on both test
 and production machine. Here is the output:

 esxh-15:/srv/dspace# pdftoppm -v
 pdftoppm version 0.18.4
 Copyright 2005-2011 The Poppler Developers -
 http://poppler.freedesktop.org
 Copyright 1996-2004 Glyph  Cog, LLC

 The version numberings seems to have changed in unexpected ways as Osama
 Alkadi told that he updated from 3.0 to 3.0.2. For the moment, this does
 not help too much.

 All other components involved are also the same on both machines.
 jai_imageio is version 1.1 and jai_core is 1.1.3.


 As the file is hard to find in the assetstore, I downloaded it using the
 browser, scped it back to the server and converted it manually using
 pdftoppm -jpeg inputfile.pdf outputname. It worked.

 I exported the item containing the file using the AIP packager,
 transferred it to the test server running DSpace 4.1, imported it and ran
 filter-media there. It worked fine.

 I compared the installation instructions of DSpace 4.1 and 1.8.2 and could
 not find a significant difference regarding the XPDF Feature. The mvn
 package and ant update command had not shown any irregularities.

 File permissions in assetstore did not show any problems. On both