Please Find below error stack trace:-
ERROR: agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.put(HashMap.java:612)
at
org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState.noteTag(
HTMLParseState.java:51)
at
org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState.dealWithC
haracter(TagParseState.java:638)
at
org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
.dealWithCharacters(SingleCharacterReceiver.java:51)
at
org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.de
alWithBytes(DecodingByteReceiver.java:48)
at
org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithoutChar
setDetection(Parser.java:99)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
or.handleHTML(WebcrawlerConnector.java:4918)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
or.extractLinks(WebcrawlerConnector.java:3852)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
or.processDocuments(WebcrawlerConnector.java:747)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja
va:399)
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
at
org.apache.commons.compress.archivers.zip.ZipFile.resolveLocalFileHea
derData(ZipFile.java:1059)
at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
:296)
at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
:218)
at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
:201)
at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
:162)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipCon
tainerDetector.java:241)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipCo
ntainerDetector.java:173)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDe
tector.java:110)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.jav
a:84)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
16)
at
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7
2)
at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed
ded(ParsingEmbeddedDocumentExtractor.java:102)
at
org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.jav
a:350)
at
org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287
)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
43)
at
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7
2)
at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed
ded(ParsingEmbeddedDocumentExtractor.java:102)
at
org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.ja
va:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
43)
at
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(Tik
aParser.java:74)
at
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrR
eplaceDocumentWithException(TikaExtractor.java:235)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi
pelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3
226)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi
pelineAddFanout.sendDocument(IncrementalIngester.java:3077)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi
pelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.j
ava:2708)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do
cumentIngest(IncrementalIngester.java:756)
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing
estDocumentWithException(WorkerThread.java:1583)
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
[Thread-491] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConne
ctor@3a4621bd{HTTP/1.1}{0.0.0.0:8345}
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
[Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.
j.w.WebAppContext@6a57ae10{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api
-service.war-_mcf-api-service-any-3323783172971878700.dir/webapp/,UNAVAILABLE}{/
usr/share/manifoldcf/example/./../web/war/mcf-api-service.war}
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
[Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.
j.w.WebAppContext@51c693d{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mc
f-authority-service.war-_mcf-authority-service-any-3706951886687463454.dir/webap
p/,UNAVAILABLE}{/usr/share/manifoldcf/example/./../web/war/mcf-authority-service
.war}
On Fri, Aug 16, 2019 at 3:22 PM Karl Wright <[email protected]> wrote:
> Without an out-of-memory stack trace, I cannot definitively point to Tika
> or say that it's a specific kind of file. Please send one.
>
> Karl
>
>
> On Fri, Aug 16, 2019 at 2:09 AM Priya Arora <[email protected]> wrote:
>
> > *Existing Threads/connections configuration is :-*
> >
> > How many worker threads do you have? - 15 worker threads has been
> > allocated(in properties.xml file).
> > And the Tika Extractor connections -10 connections are defined.
> >
> > Is this suggested to reduce the number more.
> > If not, what else can be a solution
> >
> > Thanks
> > Priya
> >
> >
> >
> > On Wed, Aug 14, 2019 at 5:32 PM Karl Wright <[email protected]> wrote:
> >
> > > How many worker threads do you have?
> > > Even if each worker thread is constrained in memory, and they should
> be,
> > > you can easily cause things to run out of memory by giving too many
> > worker
> > > threads. Another way to keep Tika's usage constrained would be to
> reduce
> > > the number of Tika Extractor connections, because that effectively
> limits
> > > the number of extractions that can be going on at the same time.
> > >
> > > Karl
> > >
> > >
> > > On Wed, Aug 14, 2019 at 7:23 AM Priya Arora <[email protected]>
> wrote:
> > >
> > > > Yes , I am using Tika Extractor. And the version used for manifold is
> > > 2.13.
> > > > Also I am using postgres as database.
> > > >
> > > > I have 4 types of jobs
> > > > One is accessing/re crawling data from a public site. Other three are
> > > > accessing intranet site.
> > > > Out of which two are giving me correct output-without any error and
> > third
> > > > one which is having data more than the other two , and giving me
> this
> > > > error.
> > > >
> > > > Is there any possibility with site accessibility issue. Can you
> please
> > > > suggest some solution
> > > > Thanks and regards
> > > > Priya
> > > >
> > > > On Wed, Aug 14, 2019 at 3:11 PM Karl Wright <[email protected]>
> > wrote:
> > > >
> > > > > I will need to know more. Do you have the tika extractor in your
> > > > > pipeline? If so, what version of ManifoldCF are you using? Tika
> has
> > > had
> > > > > bugs related to memory consumption in the past; the out of memory
> > > > exception
> > > > > may be coming from it and therefore a stack trace is critical to
> > have.
> > > > >
> > > > > Alternatively, you can upgrade to the latest version of MCF (2.13)
> > and
> > > > that
> > > > > has a newer version of Tika without those problem. But you may
> need
> > to
> > > > get
> > > > > the agents process more memory.
> > > > >
> > > > > Another possible cause is that you're using hsqldb in production.
> > > HSQLDB
> > > > > keeps all of its tables in memory. If you have a large crawl, you
> do
> > > not
> > > > > want to use HSQLDB.
> > > > >
> > > > > Thanks,
> > > > > Karl
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2019 at 3:41 AM Priya Arora <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi Karl,
> > > > > >
> > > > > > Manifold CF logs hints out me an error like :
> > > > > > agents process ran out of memory - shutting down
> > > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > >
> > > > > > Also I have -Xms1024m ,-Xmx1024m memory allocated in
> > > > > > start-options.env.unix, start-options.env.win file.
> > > > > > Also Configuration:-
> > > > > > 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU
> > > > E5-2660
> > > > > > v3 @ 2.60GHz and
> > > > > >
> > > > > > 2) For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R)
> CPU
> > > > > E5-2660
> > > > > > v3 @ 2.60GHz and i am using postgres as database.
> > > > > >
> > > > > > Can you please help me out, what to do in this case.
> > > > > >
> > > > > > Thanks
> > > > > > Priya
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 12:33 PM Karl Wright <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > The error occurs, I believe, as the result of basic connection
> > > > > problems,
> > > > > > > e.g. the connection is getting rejected. You can find more
> > > > information
> > > > > > in
> > > > > > > the simple history, and in the manifoldcf log.
> > > > > > >
> > > > > > > I would like to know the underlying cause, since the connector
> > > should
> > > > > be
> > > > > > > resilient against errors of this kind.
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019, 1:46 AM Priya Arora <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Karl,
> > > > > > > >
> > > > > > > > I have an web Repository connector(Seeds:- an intranet
> Site).,
> > > and
> > > > > job
> > > > > > i
> > > > > > > > son Production server.
> > > > > > > >
> > > > > > > > When i ran job on PROD, the job stops itself 2 times with and
> > > > > > > error:Error:
> > > > > > > > Unexpected HTTP result code: -1: null.
> > > > > > > >
> > > > > > > >
> > > > > > > > Can you please provide me an idea, in which it happens so?
> > > > > > > >
> > > > > > > > Thanks and regards
> > > > > > > > Priya Arora
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>