Hi Furkan,

These are the configuration defined  I have -Xms1024m ,-Xmx1024m memory
allocated *in start-options.env.unix, start-options.env.win* file.
Also Configuration:-
1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU E5-2660
v3 @ 2.60GHz and -ManifolCF

2) For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) CPU E5-2660
v3 @ 2.60GHz and i am using postgres as database.

When this parameter -Xms1024m was set as -Xms512m., even then i was facing
this error.

Thanks and Regards
Priya

On Fri, Aug 16, 2019 at 4:21 PM Furkan KAMACI <[email protected]>
wrote:

> Hi Priya,
>
> What is your GC params and does it throw error at a particular Zip file?
> Check its size and if much, consider limiting allowed file size to ingest.
>
> Kind Regards,
> Furkan KAMACI
>
> 16 Ağu 2019 Cum, saat 13:40 tarihinde Karl Wright <[email protected]>
> şunu
> yazdı:
>
> > I see nothing indicating any single Tika extraction content type.  It's
> > basically just unhappy with heap fragmentation and is GC'ing too
> > frequently.  I would suggest just increasing the amount of memory you
> give
> > the process for an experiment.  This might allow it to succeed.
> >
> > MCF uses the principle of "bounded memory", which means that every
> > connector cannot put whole documents into memory, but must be limited.
> But
> > there is no stipulation as to any specific limit that each connector must
> > work within.  Some connectors, therefore, use a lot more memory than
> > others, and Tika is one of the ones that can use a lot.  But it is still
> > bounded unless there's a bug, so just try increasing for a start.
> >
> > Karl
> >
> >
> > On Fri, Aug 16, 2019 at 6:25 AM Priya Arora <[email protected]> wrote:
> >
> > > Please Find below error stack trace:-
> > >
> > > ERROR: agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >         at java.util.HashMap.newNode(HashMap.java:1750)
> > >         at java.util.HashMap.putVal(HashMap.java:631)
> > >         at java.util.HashMap.put(HashMap.java:612)
> > >         at
> > > org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState.noteTag(
> > >
> > >      HTMLParseState.java:51)
> > >         at
> > > org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState.dealWithC
> > >
> > >      haracter(TagParseState.java:638)
> > >         at
> > > org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver
> > >
> > >      .dealWithCharacters(SingleCharacterReceiver.java:51)
> > >         at
> > > org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.de
> > >
> > >      alWithBytes(DecodingByteReceiver.java:48)
> > >         at
> > > org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithoutChar
> > >
> > >      setDetection(Parser.java:99)
> > >         at
> > > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
> > >
> > >      or.handleHTML(WebcrawlerConnector.java:4918)
> > >         at
> > > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
> > >
> > >      or.extractLinks(WebcrawlerConnector.java:3852)
> > >         at
> > > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
> > >
> > >      or.processDocuments(WebcrawlerConnector.java:747)
> > >         at
> > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja
> > >
> > >      va:399)
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >         at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
> > >         at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
> > >         at
> > > org.apache.commons.compress.archivers.zip.ZipFile.resolveLocalFileHea
> > >
> > >      derData(ZipFile.java:1059)
> > >         at
> > > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
> > >
> > >      :296)
> > >         at
> > > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
> > >
> > >      :218)
> > >         at
> > > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
> > >
> > >      :201)
> > >         at
> > > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java
> > >
> > >      :162)
> > >         at
> > > org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipCon
> > >
> > >      tainerDetector.java:241)
> > >         at
> > > org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipCo
> > >
> > >      ntainerDetector.java:173)
> > >         at
> > > org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDe
> > >
> > >      tector.java:110)
> > >         at
> > > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.jav
> > >
> > >      a:84)
> > >         at
> > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
> > >
> > >      16)
> > >         at
> > > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7
> > >
> > >      2)
> > >         at
> > > org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed
> > >
> > >      ded(ParsingEmbeddedDocumentExtractor.java:102)
> > >         at
> > > org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.jav
> > >
> > >      a:350)
> > >         at
> > > org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287
> > >
> > >      )
> > >         at
> > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
> > >
> > >      )
> > >         at
> > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
> > >
> > >      )
> > >         at
> > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
> > >
> > >      43)
> > >         at
> > > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7
> > >
> > >      2)
> > >         at
> > > org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed
> > >
> > >      ded(ParsingEmbeddedDocumentExtractor.java:102)
> > >         at
> > > org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.ja
> > >
> > >      va:280)
> > >         at
> > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
> > >
> > >      )
> > >         at
> > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280
> > >
> > >      )
> > >         at
> > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
> > >
> > >      43)
> > >         at
> > > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(Tik
> > >
> > >      aParser.java:74)
> > >         at
> > > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrR
> > >
> > >      eplaceDocumentWithException(TikaExtractor.java:235)
> > >         at
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi
> > >
> > >
> > >
> > >
> >
> pelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3
> > >
> > >            226)
> > >         at
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi
> > >
> > >      pelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> > >         at
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi
> > >
> > >
> > >
> > >
> >
> pelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.j
> > >
> > >            ava:2708)
> > >         at
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do
> > >
> > >      cumentIngest(IncrementalIngester.java:756)
> > >         at
> > > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing
> > >
> > >      estDocumentWithException(WorkerThread.java:1583)
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > [Thread-491] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> > > ServerConne
> > >                        ctor@3a4621bd{HTTP/1.1}{0.0.0.0:8345}
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > [Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler -
> > Stopped
> > > o.e.
> > >
> > > j.w.WebAppContext@6a57ae10
> > > {/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api
> > >
> > >
> > >
> > >
> >
> -service.war-_mcf-api-service-any-3323783172971878700.dir/webapp/,UNAVAILABLE}{/
> > >
> > >
> usr/share/manifoldcf/example/./../web/war/mcf-api-service.war}
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > agents process ran out of memory - shutting down
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > [Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler -
> > Stopped
> > > o.e.
> > >
> > > j.w.WebAppContext@51c693d
> > > {/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mc
> > >
> > >
> > >
> > >
> >
> f-authority-service.war-_mcf-authority-service-any-3706951886687463454.dir/webap
> > >
> > >
> > >
> > >
> >
> p/,UNAVAILABLE}{/usr/share/manifoldcf/example/./../web/war/mcf-authority-service
> > >
> > >            .war}
> > >
> > > On Fri, Aug 16, 2019 at 3:22 PM Karl Wright <[email protected]>
> wrote:
> > >
> > > > Without an out-of-memory stack trace, I cannot definitively point to
> > Tika
> > > > or say that it's a specific kind of file.  Please send one.
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Fri, Aug 16, 2019 at 2:09 AM Priya Arora <[email protected]>
> > wrote:
> > > >
> > > > > *Existing Threads/connections configuration is :-*
> > > > >
> > > > > How many worker threads do you have? - 15 worker threads has been
> > > > > allocated(in properties.xml file).
> > > > > And the Tika Extractor connections -10 connections are defined.
> > > > >
> > > > > Is this suggested to reduce the number more.
> > > > > If not, what else can be a solution
> > > > >
> > > > > Thanks
> > > > > Priya
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2019 at 5:32 PM Karl Wright <[email protected]>
> > > wrote:
> > > > >
> > > > > > How many worker threads do you have?
> > > > > > Even if each worker thread is constrained in memory, and they
> > should
> > > > be,
> > > > > > you can easily cause things to run out of memory by giving too
> many
> > > > > worker
> > > > > > threads.  Another way to keep Tika's usage constrained would be
> to
> > > > reduce
> > > > > > the number of Tika Extractor connections, because that
> effectively
> > > > limits
> > > > > > the number of extractions that can be going on at the same time.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 7:23 AM Priya Arora <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > Yes , I am using Tika Extractor. And the version used for
> > manifold
> > > is
> > > > > > 2.13.
> > > > > > > Also I am using postgres as database.
> > > > > > >
> > > > > > > I have 4 types of jobs
> > > > > > > One is accessing/re crawling data from a public site. Other
> three
> > > are
> > > > > > > accessing intranet site.
> > > > > > > Out of which two are giving me correct output-without any error
> > and
> > > > > third
> > > > > > > one which is having data more than the other two , and  giving
> me
> > > > this
> > > > > > > error.
> > > > > > >
> > > > > > > Is there any possibility with site accessibility issue. Can you
> > > > please
> > > > > > > suggest some solution
> > > > > > > Thanks and regards
> > > > > > > Priya
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 3:11 PM Karl Wright <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > I will need to know more.  Do you have the tika extractor in
> > your
> > > > > > > > pipeline?  If so, what version of ManifoldCF are you using?
> > Tika
> > > > has
> > > > > > had
> > > > > > > > bugs related to memory consumption in the past; the out of
> > memory
> > > > > > > exception
> > > > > > > > may be coming from it and therefore a stack trace is critical
> > to
> > > > > have.
> > > > > > > >
> > > > > > > > Alternatively, you can upgrade to the latest version of MCF
> > > (2.13)
> > > > > and
> > > > > > > that
> > > > > > > > has a newer version of Tika without those problem.  But you
> may
> > > > need
> > > > > to
> > > > > > > get
> > > > > > > > the agents process more memory.
> > > > > > > >
> > > > > > > > Another possible cause is that you're using hsqldb in
> > production.
> > > > > > HSQLDB
> > > > > > > > keeps all of its tables in memory.  If you have a large
> crawl,
> > > you
> > > > do
> > > > > > not
> > > > > > > > want to use HSQLDB.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 3:41 AM Priya Arora <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Karl,
> > > > > > > > >
> > > > > > > > > Manifold CF logs hints out me an error like :
> > > > > > > > > agents process ran out of memory - shutting down
> > > > > > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > > > > >
> > > > > > > > > Also I have -Xms1024m ,-Xmx1024m memory allocated in
> > > > > > > > > start-options.env.unix, start-options.env.win file.
> > > > > > > > > Also Configuration:-
> > > > > > > > > 1) For Crawler server - 16 GB RAM and 8-Core Intel(R)
> Xeon(R)
> > > CPU
> > > > > > > E5-2660
> > > > > > > > > v3 @ 2.60GHz and
> > > > > > > > >
> > > > > > > > > 2) For Elasticsearch server - 48GB and 1-Core Intel(R)
> > Xeon(R)
> > > > CPU
> > > > > > > > E5-2660
> > > > > > > > > v3 @ 2.60GHz and i am using postgres as database.
> > > > > > > > >
> > > > > > > > > Can you please help me out, what to do in this case.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Priya
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Aug 14, 2019 at 12:33 PM Karl Wright <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > The error occurs, I believe, as the result of basic
> > > connection
> > > > > > > > problems,
> > > > > > > > > > e.g. the connection is getting rejected.  You can find
> more
> > > > > > > information
> > > > > > > > > in
> > > > > > > > > > the simple history, and in the manifoldcf log.
> > > > > > > > > >
> > > > > > > > > > I would like to know the underlying cause, since the
> > > connector
> > > > > > should
> > > > > > > > be
> > > > > > > > > > resilient against errors of this kind.
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 14, 2019, 1:46 AM Priya Arora <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Karl,
> > > > > > > > > > >
> > > > > > > > > > > I have an web Repository connector(Seeds:- an intranet
> > > > Site).,
> > > > > > and
> > > > > > > > job
> > > > > > > > > i
> > > > > > > > > > > son Production server.
> > > > > > > > > > >
> > > > > > > > > > > When i ran job on PROD, the job stops itself 2 times
> with
> > > and
> > > > > > > > > > error:Error:
> > > > > > > > > > > Unexpected HTTP result code: -1: null.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Can you please provide me an idea, in which it happens
> > so?
> > > > > > > > > > >
> > > > > > > > > > > Thanks and regards
> > > > > > > > > > > Priya Arora
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to