Hi Heinz,

Back a few months ago, my media-filter command gave me out-of-memory error
from time to time and eventually it crashed the system. I did two things.
After that it never happens again.

1. Increase the command line java memory by editing /dspace/bin/dspace:

if [ "$JAVA_OPTS" = "" ]; then
  #Default Java to use 256MB of memory
  #JAVA_OPTS="-Xmx256m -Dfile.encoding=UTF-8"
  #increse to 1024M, August 24, 2017
  JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
fi

2. my /var is almost full. I had to delete following two cache files in
/var/cache/tomcat/work/Catalina/localhost/_/cache-dir


-rw-r--r-- 1 dspace dspace 958M Aug 24 13:18 cocoon-ehcache.data

-rw-r--r-- 1 dspace dspace  27M Aug 24 13:18 cocoon-ehcache.index


but eventually I increased /var from 4G to 8G



Hope that helps!



Yongming

On Thu, Nov 16, 2017 at 6:56 AM, Heinz Gnehm <[email protected]> wrote:

> Dear all
>
> We're using DSpace version 6.0 on a Windows Server 2012 and we currently
> store over 300'000 items in our repository (each item has two bitstreams, a
> PDF document and a digital timestamp).
>
>    DSpace version:  6.0
>      SCM revision:  0fea17436854acf9048b0e11fbf988333ea02956
>        SCM branch:  UNKNOWN
>                OS:  Windows Server 2012 R2(amd64) version 6.3
>         Discovery:  enabled.
>               JRE:  Oracle Corporation version 1.8.0_131
>       Ant version:  Apache Ant(TM) version 1.10.0 compiled on December 27
> 2016
>     Maven version:  3.3.9
>       DSpace home:  D:\dspace
>
> We also use the full text search engine of Apache Solr and therefore we
> start a batch process with a filter-media command every night. The log file
> of this process reports creating the respective text files and thumbnail
> images from the PDF documents (as seen below).
>
>    FILTERED: bitstream 13e851e4-59e1-46e1-98a6-27afad4c7382 (item:
> archivsuisse/92351) and created '4403_Y_20170721T122639.PDF.txt'
>    File: 4403_Y_20170721T122639.PDF.jpg
>    FILTERED: bitstream 13e851e4-59e1-46e1-98a6-27afad4c7382 (item:
> archivsuisse/92351) and created '4403_Y_20170721T122639.PDF.jpg'
>    File: 4406_B_20170721T121321.PDF.txt
>    FILTERED: bitstream 202d2e4e-33e6-47af-baba-41c1b58a938c (item:
> archivsuisse/92359) and created '4406_B_20170721T121321.PDF.txt'
>    File: 4406_B_20170721T121321.PDF.jpg
>    FILTERED: bitstream 202d2e4e-33e6-47af-baba-41c1b58a938c (item:
> archivsuisse/92359) and created '4406_B_20170721T121321.PDF.jpg'
>    File: 4407_Y_20170721T121334.PDF.txt
>    FILTERED: bitstream 17fdd605-3a14-49de-882c-e38c9d6b57b0 (item:
> archivsuisse/92361) and created '4407_Y_20170721T121334.PDF.txt'
>    File: 4407_Y_20170721T121334.PDF.jpg
>
> But when I check the processed items in DSpace, no text file or thumbnail
> image has been created and the full text of the PDF documents never gets
> indexed by the full text search engine Apache Solr. Only a fraction of the
> items have a generated text document and for a few months no new text
> documents have been added to the DSpace items and the numbers in the
> Discovery widget in DSpace JSPUI haven't changed in a while.
>
>    Discover
>    Has File(s)
>    333641 false
>    8970 true
>
> When I try to filter a single item with the following command
>
>    dspace filter-media -m 1 -v
>
> the process is first skipping over the first items which have already been
> filtered and at the first item to be filtered it stops and never finishes.
>
>    D:\dspace\bin>dspace filter-media -m 1 -v
>    Using DSpace installation in: D:\dspace
>    Invalid maximum value '1' - ignoring
>    The following MediaFilters are enabled:
>    Full Filter Name: org.dspace.app.mediafilter.WordFilter
>    org.dspace.app.mediafilter.WordFilter
>    Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
>    org.dspace.app.mediafilter.JPEGFilter
>    Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
>    org.dspace.app.mediafilter.PowerPointFilter
>    Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
>    org.dspace.app.mediafilter.HTMLFilter
>    Full Filter Name: org.dspace.app.mediafilter.ExcelFilter
>    org.dspace.app.mediafilter.ExcelFilter
>    Full Filter Name: org.dspace.app.mediafilter.PDFFilter
>    org.dspace.app.mediafilter.PDFFilter
>    Full Filter Name: org.dspace.app.mediafilter.PDFBoxThumbnail
>    org.dspace.app.mediafilter.PDFBoxThumbnail
>    SKIPPED: bitstream 170bc089-2aba-4f38-8f5f-44cfb91092c6 (item:
> 123456789/72) because 'py-tutorial-de.pdf.txt' already exists
>    SKIPPED: bitstream 170bc089-2aba-4f38-8f5f-44cfb91092c6 (item:
> 123456789/72) because 'py-tutorial-de.pdf.jpg' already exists
>    SKIPPED: bitstream b56999cd-ef41-4bbf-a18a-6abf0d2be4a7 (item:
> 123456789/95) because '200001_gelb.pdf.txt' already exists
>    SKIPPED: bitstream b56999cd-ef41-4bbf-a18a-6abf0d2be4a7 (item:
> 123456789/95) because '200001_gelb.pdf.jpg' already exists
>    SKIPPED: bitstream cc04eb6e-5860-4359-8374-dc751ba07ed2 (item:
> 123456789/96) because '200001_grün.pdf.txt' already exists
>    SKIPPED: bitstream cc04eb6e-5860-4359-8374-dc751ba07ed2 (item:
> 123456789/96) because '200001_grün.pdf.jpg' already exists
>    SKIPPED: bitstream 833ee548-76e9-4ef3-9f52-dbffc3a28df3 (item:
> 123456789/97) because '200001_rosa.pdf.txt' already exists
>    SKIPPED: bitstream 833ee548-76e9-4ef3-9f52-dbffc3a28df3 (item:
> 123456789/97) because '200001_rosa.pdf.jpg' already exists
>    SKIPPED: bitstream d6fc9e85-a85d-427e-9907-ba3aecff34fc (item:
> 123456789/98) because '200001_rot.pdf.txt' already exists
>    SKIPPED: bitstream d6fc9e85-a85d-427e-9907-ba3aecff34fc (item:
> 123456789/98) because '200001_rot.pdf.jpg' already exists
>    PROCESSING: bitstream e7d9a3dd-494d-461c-9320-513f8a68f773 (item:
> archivsuisse/51880)
>    File: 96020_P_20170707T142205.PDF.txt
>    FILTERED: bitstream e7d9a3dd-494d-461c-9320-513f8a68f773 (item:
> archivsuisse/51880) and created '96020_P_20170707T142205.PDF.txt'
>
> I also checked the DSpace log files but can find no warnings or errors. I
> don't know what could have caused DSpace to stop filtering our PDF
> documents and what can be done to fix this problem. Are there any other log
> files I can check?
>
> Any help is greatly appreciated.
>
> Heinz Gnehm
> archivsuisse AG
> Bernstrasse 23
> <https://maps.google.com/?q=Bernstrasse+233122+Kehrsatz+Switzerland&entry=gmail&source=g>
> 3122 Kehrsatz
> <https://maps.google.com/?q=Bernstrasse+233122+Kehrsatz+Switzerland&entry=gmail&source=g>
> Switzerland
> <https://maps.google.com/?q=Bernstrasse+233122+Kehrsatz+Switzerland&entry=gmail&source=g>
>
> --
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Yongming Wang
The College of New Jersey
tel: 609-771-3337
email: [email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to