Re: [Dspace-tech] filter-media hanging

2010-10-25 Thread Sean Carte
On 17 October 2010 15:14, Andrea Bollini boll...@cilea.it wrote:
 I have created a JIRA issue for the pdfbox library update.
 https://jira.duraspace.org/browse/DS-704
 Patch against current trunk is attached.
 Please let me know if this solve your issues too.
 Best,
 Andrea

Thank you Andrea, that resolved the issue for me.

Sean
-- 
Sean Carte
esAL Library Systems Manager
+27 72 898 8775
+27 31 373 2490
fax: 0866741254
http://esal.dut.ac.za/

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] filter-media hanging

2010-10-17 Thread Andrea Bollini

 I have created a JIRA issue for the pdfbox library update.
https://jira.duraspace.org/browse/DS-704
Patch against current trunk is attached.
Please let me know if this solve your issues too.
Best,
Andrea



Il 15/10/2010 16:46, Blanco, Jose ha scritto:


Andrea:

Did you get this message.  I'm now thinking I should try this patch:

https://jira.duraspace.org/browse/DS-183

But before I do, I'm wondering if your patch might be easier to install.

Thank you!

Jose

*From:* Blanco, Jose [mailto:blan...@umich.edu]
*Sent:* Thursday, October 14, 2010 10:38 AM
*To:* Andrea Bollini; Sean Carte
*Cc:* dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL 
SERVICES COMPANY]

*Subject:* Re: [Dspace-tech] filter-media hanging

Andrea:

I was looking for your JIRA Patch for this, and could not find it.  
Could you direct me to it?


Thank you!

Jose

*From:* Andrea Bollini [mailto:boll...@cilea.it]
*Sent:* Friday, July 16, 2010 2:07 AM
*To:* Sean Carte
*Cc:* dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL 
SERVICES COMPANY]

*Subject:* Re: [Dspace-tech] filter-media hanging

Another solution could be update the version of pdfbox. Using the last 
version of pdfbox we have solved a lot of issues, it could work also 
in your case.
I will post a patch to JIRA as soon as possible, in meantime you just 
need to update the follow jars and remove some unused import in dspace 
code

bcprov-jdk15-145
bcmail-jdk15-145
icu4j-3_8_1
fontbox-1.1.0
jempbox-1.1.0
pdfbox-1.1.0
Andrea


Sean Carte ha scritto:

On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON
TECHNICAL SERVICES COMPANY]susan.m.thorn...@nasa.gov  
mailto:susan.m.thorn...@nasa.gov  wrote:
   


We had tons of problems with filter-media until we switched from using 
PDFBox to XPDF.  With PDFBox ours used to hang too and take 4-EVER to run.  
Since we've switched over, our filter-media takes a fraction of the time to 
complete and 100% of our documents filter, except for those that truly are 
corrupt.

  


Take a look athttp://www.foolabs.com/xpdf/index.html.  Also Google xpdf AND 
dspace and you'll find detailed instructions on how to implement it.

  


Btw, we are currently running DSpace 1.5.1.

  


Good luck,

Sue

 

  
Thanks Sue; I was beginning to think it was just me.
  
Sean
   




--
Dott. Andrea Bollini
Project Manager, IT Architect  Systems Integrator
Sezione Servizi per le Biblioteche e l'Editoria Elettronica
CILEA,http://www.cilea.it
tel. +39 06-59292853
cel. +39 348-8277525
  
---
  
Disclaimer: the content of this email is confidential and may be privileged, and it must not be disclosed or copied without the sender's consent. If you have received this message in error, please notify the sender and remove it from your system. The content of this email does not constitute legal advice, nor any responsibility is accepted for loss or damage incurred as a result of acting upon its contents or attachments.

The statements and opinions expressed in this email are those of the author and 
do not necessarily reflect those of the employer.


--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev


___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech



--
Dott. Andrea Bollini
Project Manager, IT Architect  Systems Integrator
Sezione Servizi per le Biblioteche e l'Editoria Elettronica
CILEA, http://www.cilea.it
tel. +39 06-59292853
cel. +39 348-8277525

---

Disclaimer: the content of this email is confidential and may be privileged, 
and it must not be disclosed or copied without the sender's consent. If you 
have received this message in error, please notify the sender and remove it 
from your system. The content of this email does not constitute legal advice, 
nor any responsibility is accepted for loss or damage incurred as a result of 
acting upon its contents or attachments.
The statements and opinions expressed in this email are those of the author and 
do not necessarily reflect those of the employer.

--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] filter-media hanging

2010-10-15 Thread Blanco, Jose
Andrea:

Did you get this message.  I'm now thinking I should try this patch:

https://jira.duraspace.org/browse/DS-183

But before I do, I'm wondering if your patch might be easier to install.

Thank you!
Jose

From: Blanco, Jose [mailto:blan...@umich.edu]
Sent: Thursday, October 14, 2010 10:38 AM
To: Andrea Bollini; Sean Carte
Cc: dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES 
COMPANY]
Subject: Re: [Dspace-tech] filter-media hanging

Andrea:

I was looking for your JIRA Patch for this, and could not find it.  Could you 
direct me to it?

Thank you!
Jose


From: Andrea Bollini [mailto:boll...@cilea.it]
Sent: Friday, July 16, 2010 2:07 AM
To: Sean Carte
Cc: dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES 
COMPANY]
Subject: Re: [Dspace-tech] filter-media hanging

Another solution could be update the version of pdfbox. Using the last version 
of pdfbox we have solved a lot of issues, it could work also in your case.
I will post a patch to JIRA as soon as possible, in meantime you just need to 
update the follow jars and remove some unused import in dspace code
bcprov-jdk15-145
bcmail-jdk15-145
icu4j-3_8_1
fontbox-1.1.0
jempbox-1.1.0
pdfbox-1.1.0
Andrea


Sean Carte ha scritto:

On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON

TECHNICAL SERVICES COMPANY] 
susan.m.thorn...@nasa.govmailto:susan.m.thorn...@nasa.gov wrote:



We had tons of problems with filter-media until we switched from using PDFBox 
to XPDF.  With PDFBox ours used to hang too and take 4-EVER to run.  Since 
we've switched over, our filter-media takes a fraction of the time to complete 
and 100% of our documents filter, except for those that truly are corrupt.



Take a look at http://www.foolabs.com/xpdf/index.html.  Also Google xpdf AND 
dspace and you'll find detailed instructions on how to implement it.



Btw, we are currently running DSpace 1.5.1.



Good luck,

Sue





Thanks Sue; I was beginning to think it was just me.



Sean





--

Dott. Andrea Bollini

Project Manager, IT Architect  Systems Integrator

Sezione Servizi per le Biblioteche e l'Editoria Elettronica

CILEA, http://www.cilea.it

tel. +39 06-59292853

cel. +39 348-8277525



---



Disclaimer: the content of this email is confidential and may be privileged, 
and it must not be disclosed or copied without the sender's consent. If you 
have received this message in error, please notify the sender and remove it 
from your system. The content of this email does not constitute legal advice, 
nor any responsibility is accepted for loss or damage incurred as a result of 
acting upon its contents or attachments.

The statements and opinions expressed in this email are those of the author and 
do not necessarily reflect those of the employer.
--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] filter-media hanging

2010-10-14 Thread Blanco, Jose
Andrea:

I was looking for your JIRA Patch for this, and could not find it.  Could you 
direct me to it?

Thank you!
Jose


From: Andrea Bollini [mailto:boll...@cilea.it]
Sent: Friday, July 16, 2010 2:07 AM
To: Sean Carte
Cc: dspace-tech; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES 
COMPANY]
Subject: Re: [Dspace-tech] filter-media hanging

Another solution could be update the version of pdfbox. Using the last version 
of pdfbox we have solved a lot of issues, it could work also in your case.
I will post a patch to JIRA as soon as possible, in meantime you just need to 
update the follow jars and remove some unused import in dspace code
bcprov-jdk15-145
bcmail-jdk15-145
icu4j-3_8_1
fontbox-1.1.0
jempbox-1.1.0
pdfbox-1.1.0
Andrea


Sean Carte ha scritto:

On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON

TECHNICAL SERVICES COMPANY] 
susan.m.thorn...@nasa.govmailto:susan.m.thorn...@nasa.gov wrote:



We had tons of problems with filter-media until we switched from using PDFBox 
to XPDF.  With PDFBox ours used to hang too and take 4-EVER to run.  Since 
we've switched over, our filter-media takes a fraction of the time to complete 
and 100% of our documents filter, except for those that truly are corrupt.



Take a look at http://www.foolabs.com/xpdf/index.html.  Also Google xpdf AND 
dspace and you'll find detailed instructions on how to implement it.



Btw, we are currently running DSpace 1.5.1.



Good luck,

Sue





Thanks Sue; I was beginning to think it was just me.



Sean






--

Dott. Andrea Bollini

Project Manager, IT Architect  Systems Integrator

Sezione Servizi per le Biblioteche e l'Editoria Elettronica

CILEA, http://www.cilea.it

tel. +39 06-59292853

cel. +39 348-8277525



---



Disclaimer: the content of this email is confidential and may be privileged, 
and it must not be disclosed or copied without the sender's consent. If you 
have received this message in error, please notify the sender and remove it 
from your system. The content of this email does not constitute legal advice, 
nor any responsibility is accepted for loss or damage incurred as a result of 
acting upon its contents or attachments.

The statements and opinions expressed in this email are those of the author and 
do not necessarily reflect those of the employer.
--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] filter-media hanging

2010-07-16 Thread Andrea Bollini
Another solution could be update the version of pdfbox. Using the last
version of pdfbox we have solved a lot of issues, it could work also in
your case.
I will post a patch to JIRA as soon as possible, in meantime you just
need to update the follow jars and remove some unused import in dspace code
bcprov-jdk15-145
bcmail-jdk15-145
icu4j-3_8_1
fontbox-1.1.0
jempbox-1.1.0
pdfbox-1.1.0
Andrea


Sean Carte ha scritto:
 On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON
 TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.gov wrote:
   
 We had tons of problems with filter-media until we switched from using 
 PDFBox to XPDF.  With PDFBox ours used to hang too and take 4-EVER to run.  
 Since we've switched over, our filter-media takes a fraction of the time to 
 complete and 100% of our documents filter, except for those that truly are 
 corrupt.

 Take a look at http://www.foolabs.com/xpdf/index.html.  Also Google xpdf 
 AND dspace and you'll find detailed instructions on how to implement it.

 Btw, we are currently running DSpace 1.5.1.

 Good luck,
 Sue
 

 Thanks Sue; I was beginning to think it was just me.

 Sean
   


-- 
Dott. Andrea Bollini
Project Manager, IT Architect  Systems Integrator
Sezione Servizi per le Biblioteche e l'Editoria Elettronica
CILEA, http://www.cilea.it
tel. +39 06-59292853
cel. +39 348-8277525

---

Disclaimer: the content of this email is confidential and may be privileged, 
and it must not be disclosed or copied without the sender's consent. If you 
have received this message in error, please notify the sender and remove it 
from your system. The content of this email does not constitute legal advice, 
nor any responsibility is accepted for loss or damage incurred as a result of 
acting upon its contents or attachments. 
The statements and opinions expressed in this email are those of the author and 
do not necessarily reflect those of the employer.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] filter-media hanging

2010-07-15 Thread Sean Carte
On 16 July 2010 00:25, Thornton, Susan M. (LARC-B702)[RAYTHEON
TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.gov wrote:
 We had tons of problems with filter-media until we switched from using PDFBox 
 to XPDF.  With PDFBox ours used to hang too and take 4-EVER to run.  Since 
 we've switched over, our filter-media takes a fraction of the time to 
 complete and 100% of our documents filter, except for those that truly are 
 corrupt.

 Take a look at http://www.foolabs.com/xpdf/index.html.  Also Google xpdf AND 
 dspace and you'll find detailed instructions on how to implement it.

 Btw, we are currently running DSpace 1.5.1.

 Good luck,
 Sue

Thanks Sue; I was beginning to think it was just me.

Sean
-- 
Sean Carte
esAL Library Systems Manager
+27 72 898 8775
+27 31 373 2490
fax: 0866741254
http://esal.dut.ac.za/

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] filter-media hanging

2010-07-14 Thread Sean Carte
I have a problem with filter-media apparently getting stuck processing
a file. It ends up pegging the CPU at 100% until I kill the process.
I've tried leaving it for a few days to complete, but it never does.

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
21853 dspace20   0  411m 292m 8364 S  100  7.2   1782:59 java
27008 dspace20   0  418m 299m 8368 S  100  7.4 343:01.51 java


r...@ir:~# ps -ef | grep 21853
dspace   21853 21847 99 Jul13 ?1-05:43:53 java -Xmx256m
-classpath 
:/dspace/lib/activation-1.1.jar:/dspace/lib/bcmail-jdk14-136.jar:/dspace/lib/bcprov-jdk14-136.jar:/dspace/lib/commons-cli-1.0.jar:/dspace/lib/commons-codec-1.3.jar:/dspace/lib/commons-collections-3.2.jar:/dspace/lib/commons-dbcp-1.2.2.jar:/dspace/lib/commons-fileupload-1.2.1.jar:/dspace/lib/commons-io-1.4.jar:/dspace/lib/commons-lang-2.2.jar:/dspace/lib/commons-logging-1.0.4.jar:/dspace/lib/commons-logging-1.0.jar:/dspace/lib/commons-pool-1.4.jar:/dspace/lib/dom4j-1.6.1.jar:/dspace/lib/dspace-api-1.5.3-20090716.011317-5.jar:/dspace/lib/dspace-api-1.5.3-SNAPSHOT.jar:/dspace/lib/dspace-api-lang-1.5.2.1.jar:/dspace/lib/embargo-api-1.0.3.jar:/dspace/lib/embargo-dspace-1.0.3.jar:/dspace/lib/fontbox-0.1.0.jar:/dspace/lib/handle-5.3.4.jar:/dspace/lib/handle-6.2.5.02.jar:/dspace/lib/icu4j-3.4.4.jar:/dspace/lib/jargon-1.4.25.jar:/dspace/lib/jaxen-1.1.jar:/dspace/lib/jdom-1.0.jar:/dspace/lib/jempbox-0.2.0.jar:/dspace/lib/log4j-1.2.14.jar:/dspace/lib/lucene-analyzers-2.3.0.jar:/dspace/lib/lucene-core-2.3.0.jar:/dspace/lib/mail-1.4.jar:/dspace/lib/mets-1.5.2.jar:/dspace/lib/oro-2.0.8.jar:/dspace/lib/pdfbox-0.7.3.jar:/dspace/lib/poi-2.5.1-final-20040804.jar:/dspace/lib/postgresql-8.1-408.jdbc3.jar:/dspace/lib/rome-0.8.jar:/dspace/lib/tm-extractors-0.4.jar:/dspace/lib/xalan-2.7.0.jar:/dspace/lib/xercesImpl-2.8.1.jar:/dspace/lib/xml-apis-1.3.02.jar:/dspace/lib/xmlParserAPIs-2.0.2.jar:/dspace/config
org.dspace.app.mediafilter.MediaFilterManager
root 28484 18209  0 07:43 pts/100:00:00 grep 21853
r...@ir:~# ps -ef | grep 27008
dspace   27008 27002 99 02:00 ?05:44:04 java -Xmx256m
-classpath 
:/dspace/lib/activation-1.1.jar:/dspace/lib/bcmail-jdk14-136.jar:/dspace/lib/bcprov-jdk14-136.jar:/dspace/lib/commons-cli-1.0.jar:/dspace/lib/commons-codec-1.3.jar:/dspace/lib/commons-collections-3.2.jar:/dspace/lib/commons-dbcp-1.2.2.jar:/dspace/lib/commons-fileupload-1.2.1.jar:/dspace/lib/commons-io-1.4.jar:/dspace/lib/commons-lang-2.2.jar:/dspace/lib/commons-logging-1.0.4.jar:/dspace/lib/commons-logging-1.0.jar:/dspace/lib/commons-pool-1.4.jar:/dspace/lib/dom4j-1.6.1.jar:/dspace/lib/dspace-api-1.5.3-20090716.011317-5.jar:/dspace/lib/dspace-api-1.5.3-SNAPSHOT.jar:/dspace/lib/dspace-api-lang-1.5.2.1.jar:/dspace/lib/embargo-api-1.0.3.jar:/dspace/lib/embargo-dspace-1.0.3.jar:/dspace/lib/fontbox-0.1.0.jar:/dspace/lib/handle-5.3.4.jar:/dspace/lib/handle-6.2.5.02.jar:/dspace/lib/icu4j-3.4.4.jar:/dspace/lib/jargon-1.4.25.jar:/dspace/lib/jaxen-1.1.jar:/dspace/lib/jdom-1.0.jar:/dspace/lib/jempbox-0.2.0.jar:/dspace/lib/log4j-1.2.14.jar:/dspace/lib/lucene-analyzers-2.3.0.jar:/dspace/lib/lucene-core-2.3.0.jar:/dspace/lib/mail-1.4.jar:/dspace/lib/mets-1.5.2.jar:/dspace/lib/oro-2.0.8.jar:/dspace/lib/pdfbox-0.7.3.jar:/dspace/lib/poi-2.5.1-final-20040804.jar:/dspace/lib/postgresql-8.1-408.jdbc3.jar:/dspace/lib/rome-0.8.jar:/dspace/lib/tm-extractors-0.4.jar:/dspace/lib/xalan-2.7.0.jar:/dspace/lib/xercesImpl-2.8.1.jar:/dspace/lib/xml-apis-1.3.02.jar:/dspace/lib/xmlParserAPIs-2.0.2.jar:/dspace/config
org.dspace.app.mediafilter.MediaFilterManager
root 28486 18209  0 07:43 pts/100:00:00 grep 27008

I've tried running it manually with the -v switch, but that doesn't
offer me any clues as to the problem bitstream:

dsp...@ir:~$ /dspace/bin/filter-media -v
Applying Media Filters
The following MediaFilters are enabled:
Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
org.dspace.app.mediafilter.HTMLFilter
Full Filter Name: org.dspace.app.mediafilter.WordFilter
org.dspace.app.mediafilter.WordFilter
Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
org.dspace.app.mediafilter.JPEGFilter
Full Filter Name: org.dspace.app.mediafilter.PDFFilter
org.dspace.app.mediafilter.PDFFilter
SKIPPED: bitstream 3640 (item: 10321/287) because
'Matkovich_2004.pdf.txt' already exists
...
SKIPPED: bitstream 2164 (item: 10321/180) because 'TITLE PAGE.pdf.txt'
already exists
ERROR filtering, skipping bitstream:

Item Handle: 10321/460
Bundle Name: ORIGINAL
File Size: 567170
Checksum: 9e17b9fd124ac43b34390203fb164f9c (MD5)
Asset Store: 0
java.io.EOFException: Unexpected end of ZLIB input stream
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
at