Re: [Dspace-tech] filter-media hanging
On 17 October 2010 15:14, Andrea Bollini boll...@cilea.it wrote: I have created a JIRA issue for the pdfbox library update. https://jira.duraspace.org/browse/DS-704 Patch against current trunk is attached. Please let me know if this solve your issues too. Best, Andrea Thank you Andrea, that resolved the issue for me. Sean -- Sean Carte esAL Library Systems Manager +27 72 898 8775 +27 31 373 2490 fax: 0866741254 http://esal.dut.ac.za/ -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] filter-media hanging
I have created a JIRA issue for the pdfbox library update. https://jira.duraspace.org/browse/DS-704 Patch against current trunk is attached. Please let me know if this solve your issues too. Best, Andrea Il 15/10/2010 16:46, Blanco, Jose ha scritto: Andrea: Did you get this message. I'm now thinking I should try this patch: https://jira.duraspace.org/browse/DS-183 But before I do, I'm wondering if your patch might be easier to install. Thank you! Jose *From:* Blanco, Jose [mailto:blan...@umich.edu] *Sent:* Thursday, October 14, 2010 10:38 AM *To:* Andrea Bollini; Sean Carte *Cc:* dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] *Subject:* Re: [Dspace-tech] filter-media hanging Andrea: I was looking for your JIRA Patch for this, and could not find it. Could you direct me to it? Thank you! Jose *From:* Andrea Bollini [mailto:boll...@cilea.it] *Sent:* Friday, July 16, 2010 2:07 AM *To:* Sean Carte *Cc:* dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] *Subject:* Re: [Dspace-tech] filter-media hanging Another solution could be update the version of pdfbox. Using the last version of pdfbox we have solved a lot of issues, it could work also in your case. I will post a patch to JIRA as soon as possible, in meantime you just need to update the follow jars and remove some unused import in dspace code bcprov-jdk15-145 bcmail-jdk15-145 icu4j-3_8_1 fontbox-1.1.0 jempbox-1.1.0 pdfbox-1.1.0 Andrea Sean Carte ha scritto: On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]susan.m.thorn...@nasa.gov mailto:susan.m.thorn...@nasa.gov wrote: We had tons of problems with filter-media until we switched from using PDFBox to XPDF. With PDFBox ours used to hang too and take 4-EVER to run. Since we've switched over, our filter-media takes a fraction of the time to complete and 100% of our documents filter, except for those that truly are corrupt. Take a look athttp://www.foolabs.com/xpdf/index.html. Also Google xpdf AND dspace and you'll find detailed instructions on how to implement it. Btw, we are currently running DSpace 1.5.1. Good luck, Sue Thanks Sue; I was beginning to think it was just me. Sean -- Dott. Andrea Bollini Project Manager, IT Architect Systems Integrator Sezione Servizi per le Biblioteche e l'Editoria Elettronica CILEA,http://www.cilea.it tel. +39 06-59292853 cel. +39 348-8277525 --- Disclaimer: the content of this email is confidential and may be privileged, and it must not be disclosed or copied without the sender's consent. If you have received this message in error, please notify the sender and remove it from your system. The content of this email does not constitute legal advice, nor any responsibility is accepted for loss or damage incurred as a result of acting upon its contents or attachments. The statements and opinions expressed in this email are those of the author and do not necessarily reflect those of the employer. -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Dott. Andrea Bollini Project Manager, IT Architect Systems Integrator Sezione Servizi per le Biblioteche e l'Editoria Elettronica CILEA, http://www.cilea.it tel. +39 06-59292853 cel. +39 348-8277525 --- Disclaimer: the content of this email is confidential and may be privileged, and it must not be disclosed or copied without the sender's consent. If you have received this message in error, please notify the sender and remove it from your system. The content of this email does not constitute legal advice, nor any responsibility is accepted for loss or damage incurred as a result of acting upon its contents or attachments. The statements and opinions expressed in this email are those of the author and do not necessarily reflect those of the employer. -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] filter-media hanging
Andrea: Did you get this message. I'm now thinking I should try this patch: https://jira.duraspace.org/browse/DS-183 But before I do, I'm wondering if your patch might be easier to install. Thank you! Jose From: Blanco, Jose [mailto:blan...@umich.edu] Sent: Thursday, October 14, 2010 10:38 AM To: Andrea Bollini; Sean Carte Cc: dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] Subject: Re: [Dspace-tech] filter-media hanging Andrea: I was looking for your JIRA Patch for this, and could not find it. Could you direct me to it? Thank you! Jose From: Andrea Bollini [mailto:boll...@cilea.it] Sent: Friday, July 16, 2010 2:07 AM To: Sean Carte Cc: dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] Subject: Re: [Dspace-tech] filter-media hanging Another solution could be update the version of pdfbox. Using the last version of pdfbox we have solved a lot of issues, it could work also in your case. I will post a patch to JIRA as soon as possible, in meantime you just need to update the follow jars and remove some unused import in dspace code bcprov-jdk15-145 bcmail-jdk15-145 icu4j-3_8_1 fontbox-1.1.0 jempbox-1.1.0 pdfbox-1.1.0 Andrea Sean Carte ha scritto: On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.govmailto:susan.m.thorn...@nasa.gov wrote: We had tons of problems with filter-media until we switched from using PDFBox to XPDF. With PDFBox ours used to hang too and take 4-EVER to run. Since we've switched over, our filter-media takes a fraction of the time to complete and 100% of our documents filter, except for those that truly are corrupt. Take a look at http://www.foolabs.com/xpdf/index.html. Also Google xpdf AND dspace and you'll find detailed instructions on how to implement it. Btw, we are currently running DSpace 1.5.1. Good luck, Sue Thanks Sue; I was beginning to think it was just me. Sean -- Dott. Andrea Bollini Project Manager, IT Architect Systems Integrator Sezione Servizi per le Biblioteche e l'Editoria Elettronica CILEA, http://www.cilea.it tel. +39 06-59292853 cel. +39 348-8277525 --- Disclaimer: the content of this email is confidential and may be privileged, and it must not be disclosed or copied without the sender's consent. If you have received this message in error, please notify the sender and remove it from your system. The content of this email does not constitute legal advice, nor any responsibility is accepted for loss or damage incurred as a result of acting upon its contents or attachments. The statements and opinions expressed in this email are those of the author and do not necessarily reflect those of the employer. -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] filter-media hanging
Andrea: I was looking for your JIRA Patch for this, and could not find it. Could you direct me to it? Thank you! Jose From: Andrea Bollini [mailto:boll...@cilea.it] Sent: Friday, July 16, 2010 2:07 AM To: Sean Carte Cc: dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] Subject: Re: [Dspace-tech] filter-media hanging Another solution could be update the version of pdfbox. Using the last version of pdfbox we have solved a lot of issues, it could work also in your case. I will post a patch to JIRA as soon as possible, in meantime you just need to update the follow jars and remove some unused import in dspace code bcprov-jdk15-145 bcmail-jdk15-145 icu4j-3_8_1 fontbox-1.1.0 jempbox-1.1.0 pdfbox-1.1.0 Andrea Sean Carte ha scritto: On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.govmailto:susan.m.thorn...@nasa.gov wrote: We had tons of problems with filter-media until we switched from using PDFBox to XPDF. With PDFBox ours used to hang too and take 4-EVER to run. Since we've switched over, our filter-media takes a fraction of the time to complete and 100% of our documents filter, except for those that truly are corrupt. Take a look at http://www.foolabs.com/xpdf/index.html. Also Google xpdf AND dspace and you'll find detailed instructions on how to implement it. Btw, we are currently running DSpace 1.5.1. Good luck, Sue Thanks Sue; I was beginning to think it was just me. Sean -- Dott. Andrea Bollini Project Manager, IT Architect Systems Integrator Sezione Servizi per le Biblioteche e l'Editoria Elettronica CILEA, http://www.cilea.it tel. +39 06-59292853 cel. +39 348-8277525 --- Disclaimer: the content of this email is confidential and may be privileged, and it must not be disclosed or copied without the sender's consent. If you have received this message in error, please notify the sender and remove it from your system. The content of this email does not constitute legal advice, nor any responsibility is accepted for loss or damage incurred as a result of acting upon its contents or attachments. The statements and opinions expressed in this email are those of the author and do not necessarily reflect those of the employer. -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] filter-media hanging
Another solution could be update the version of pdfbox. Using the last version of pdfbox we have solved a lot of issues, it could work also in your case. I will post a patch to JIRA as soon as possible, in meantime you just need to update the follow jars and remove some unused import in dspace code bcprov-jdk15-145 bcmail-jdk15-145 icu4j-3_8_1 fontbox-1.1.0 jempbox-1.1.0 pdfbox-1.1.0 Andrea Sean Carte ha scritto: On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.gov wrote: We had tons of problems with filter-media until we switched from using PDFBox to XPDF. With PDFBox ours used to hang too and take 4-EVER to run. Since we've switched over, our filter-media takes a fraction of the time to complete and 100% of our documents filter, except for those that truly are corrupt. Take a look at http://www.foolabs.com/xpdf/index.html. Also Google xpdf AND dspace and you'll find detailed instructions on how to implement it. Btw, we are currently running DSpace 1.5.1. Good luck, Sue Thanks Sue; I was beginning to think it was just me. Sean -- Dott. Andrea Bollini Project Manager, IT Architect Systems Integrator Sezione Servizi per le Biblioteche e l'Editoria Elettronica CILEA, http://www.cilea.it tel. +39 06-59292853 cel. +39 348-8277525 --- Disclaimer: the content of this email is confidential and may be privileged, and it must not be disclosed or copied without the sender's consent. If you have received this message in error, please notify the sender and remove it from your system. The content of this email does not constitute legal advice, nor any responsibility is accepted for loss or damage incurred as a result of acting upon its contents or attachments. The statements and opinions expressed in this email are those of the author and do not necessarily reflect those of the employer. -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] filter-media hanging
On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.gov wrote: We had tons of problems with filter-media until we switched from using PDFBox to XPDF. With PDFBox ours used to hang too and take 4-EVER to run. Since we've switched over, our filter-media takes a fraction of the time to complete and 100% of our documents filter, except for those that truly are corrupt. Take a look at http://www.foolabs.com/xpdf/index.html. Also Google xpdf AND dspace and you'll find detailed instructions on how to implement it. Btw, we are currently running DSpace 1.5.1. Good luck, Sue Thanks Sue; I was beginning to think it was just me. Sean -- Sean Carte esAL Library Systems Manager +27 72 898 8775 +27 31 373 2490 fax: 0866741254 http://esal.dut.ac.za/ -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] filter-media hanging
I have a problem with filter-media apparently getting stuck processing a file. It ends up pegging the CPU at 100% until I kill the process. I've tried leaving it for a few days to complete, but it never does. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 21853 dspace20 0 411m 292m 8364 S 100 7.2 1782:59 java 27008 dspace20 0 418m 299m 8368 S 100 7.4 343:01.51 java r...@ir:~# ps -ef | grep 21853 dspace 21853 21847 99 Jul13 ?1-05:43:53 java -Xmx256m -classpath :/dspace/lib/activation-1.1.jar:/dspace/lib/bcmail-jdk14-136.jar:/dspace/lib/bcprov-jdk14-136.jar:/dspace/lib/commons-cli-1.0.jar:/dspace/lib/commons-codec-1.3.jar:/dspace/lib/commons-collections-3.2.jar:/dspace/lib/commons-dbcp-1.2.2.jar:/dspace/lib/commons-fileupload-1.2.1.jar:/dspace/lib/commons-io-1.4.jar:/dspace/lib/commons-lang-2.2.jar:/dspace/lib/commons-logging-1.0.4.jar:/dspace/lib/commons-logging-1.0.jar:/dspace/lib/commons-pool-1.4.jar:/dspace/lib/dom4j-1.6.1.jar:/dspace/lib/dspace-api-1.5.3-20090716.011317-5.jar:/dspace/lib/dspace-api-1.5.3-SNAPSHOT.jar:/dspace/lib/dspace-api-lang-1.5.2.1.jar:/dspace/lib/embargo-api-1.0.3.jar:/dspace/lib/embargo-dspace-1.0.3.jar:/dspace/lib/fontbox-0.1.0.jar:/dspace/lib/handle-5.3.4.jar:/dspace/lib/handle-6.2.5.02.jar:/dspace/lib/icu4j-3.4.4.jar:/dspace/lib/jargon-1.4.25.jar:/dspace/lib/jaxen-1.1.jar:/dspace/lib/jdom-1.0.jar:/dspace/lib/jempbox-0.2.0.jar:/dspace/lib/log4j-1.2.14.jar:/dspace/lib/lucene-analyzers-2.3.0.jar:/dspace/lib/lucene-core-2.3.0.jar:/dspace/lib/mail-1.4.jar:/dspace/lib/mets-1.5.2.jar:/dspace/lib/oro-2.0.8.jar:/dspace/lib/pdfbox-0.7.3.jar:/dspace/lib/poi-2.5.1-final-20040804.jar:/dspace/lib/postgresql-8.1-408.jdbc3.jar:/dspace/lib/rome-0.8.jar:/dspace/lib/tm-extractors-0.4.jar:/dspace/lib/xalan-2.7.0.jar:/dspace/lib/xercesImpl-2.8.1.jar:/dspace/lib/xml-apis-1.3.02.jar:/dspace/lib/xmlParserAPIs-2.0.2.jar:/dspace/config org.dspace.app.mediafilter.MediaFilterManager root 28484 18209 0 07:43 pts/100:00:00 grep 21853 r...@ir:~# ps -ef | grep 27008 dspace 27008 27002 99 02:00 ?05:44:04 java -Xmx256m -classpath :/dspace/lib/activation-1.1.jar:/dspace/lib/bcmail-jdk14-136.jar:/dspace/lib/bcprov-jdk14-136.jar:/dspace/lib/commons-cli-1.0.jar:/dspace/lib/commons-codec-1.3.jar:/dspace/lib/commons-collections-3.2.jar:/dspace/lib/commons-dbcp-1.2.2.jar:/dspace/lib/commons-fileupload-1.2.1.jar:/dspace/lib/commons-io-1.4.jar:/dspace/lib/commons-lang-2.2.jar:/dspace/lib/commons-logging-1.0.4.jar:/dspace/lib/commons-logging-1.0.jar:/dspace/lib/commons-pool-1.4.jar:/dspace/lib/dom4j-1.6.1.jar:/dspace/lib/dspace-api-1.5.3-20090716.011317-5.jar:/dspace/lib/dspace-api-1.5.3-SNAPSHOT.jar:/dspace/lib/dspace-api-lang-1.5.2.1.jar:/dspace/lib/embargo-api-1.0.3.jar:/dspace/lib/embargo-dspace-1.0.3.jar:/dspace/lib/fontbox-0.1.0.jar:/dspace/lib/handle-5.3.4.jar:/dspace/lib/handle-6.2.5.02.jar:/dspace/lib/icu4j-3.4.4.jar:/dspace/lib/jargon-1.4.25.jar:/dspace/lib/jaxen-1.1.jar:/dspace/lib/jdom-1.0.jar:/dspace/lib/jempbox-0.2.0.jar:/dspace/lib/log4j-1.2.14.jar:/dspace/lib/lucene-analyzers-2.3.0.jar:/dspace/lib/lucene-core-2.3.0.jar:/dspace/lib/mail-1.4.jar:/dspace/lib/mets-1.5.2.jar:/dspace/lib/oro-2.0.8.jar:/dspace/lib/pdfbox-0.7.3.jar:/dspace/lib/poi-2.5.1-final-20040804.jar:/dspace/lib/postgresql-8.1-408.jdbc3.jar:/dspace/lib/rome-0.8.jar:/dspace/lib/tm-extractors-0.4.jar:/dspace/lib/xalan-2.7.0.jar:/dspace/lib/xercesImpl-2.8.1.jar:/dspace/lib/xml-apis-1.3.02.jar:/dspace/lib/xmlParserAPIs-2.0.2.jar:/dspace/config org.dspace.app.mediafilter.MediaFilterManager root 28486 18209 0 07:43 pts/100:00:00 grep 27008 I've tried running it manually with the -v switch, but that doesn't offer me any clues as to the problem bitstream: dsp...@ir:~$ /dspace/bin/filter-media -v Applying Media Filters The following MediaFilters are enabled: Full Filter Name: org.dspace.app.mediafilter.HTMLFilter org.dspace.app.mediafilter.HTMLFilter Full Filter Name: org.dspace.app.mediafilter.WordFilter org.dspace.app.mediafilter.WordFilter Full Filter Name: org.dspace.app.mediafilter.JPEGFilter org.dspace.app.mediafilter.JPEGFilter Full Filter Name: org.dspace.app.mediafilter.PDFFilter org.dspace.app.mediafilter.PDFFilter SKIPPED: bitstream 3640 (item: 10321/287) because 'Matkovich_2004.pdf.txt' already exists ... SKIPPED: bitstream 2164 (item: 10321/180) because 'TITLE PAGE.pdf.txt' already exists ERROR filtering, skipping bitstream: Item Handle: 10321/460 Bundle Name: ORIGINAL File Size: 567170 Checksum: 9e17b9fd124ac43b34390203fb164f9c (MD5) Asset Store: 0 java.io.EOFException: Unexpected end of ZLIB input stream java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141) at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97) at