Hi Karl,

Thanks for your answer and the quick fix.

Best regards,

Olivier 

> Le 13 mai 2019 à 19:46, Karl Wright <daddy...@gmail.com> a écrit :
> 
> I committed a fix for this hang to trunk.
> Karl
> 
> 
> On Mon, May 13, 2019 at 11:40 AM Karl Wright <daddy...@gmail.com> wrote:
> 
>> Thanks -- this shows a classic deadlock occurring, where the System.exit()
>> call effectively blocks the thread that does it from exiting, and thus
>> blocks the system from shutting down.  This is a change of behavior in the
>> JVM; not sure when it happened, but this didn't happen before.  But it
>> easily addressed.
>> 
>> Karl
>> 
>> 
>> On Mon, May 13, 2019 at 10:58 AM Olivier Tavard <
>> olivier.tav...@francelabs.com> wrote:
>> 
>>> Hi Karl,
>>> 
>>> Thank you for your mail.
>>> I installed an ootb MCF 2.13 to have a clean installation. I launched the
>>> File-based multiprocess example and I did the previous tests I mentioned.
>>> I managed to see the error that you told me about :
>>> 
>>> agents process ran out of memory - shutting down
>>> java.lang.OutOfMemoryError: Java heap space
>>>        at com.mysql.jdbc.Buffer.ensureCapacity(Buffer.java:155)
>>>        at com.mysql.jdbc.Buffer.writeBytesNoNull(Buffer.java:513)
>>>        at
>>> com.mysql.jdbc.MysqlIO.readRemainingMultiPackets(MysqlIO.java:3116)
>>>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2971)
>>>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
>>>        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
>>>        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:910)
>>>        at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1405)
>>>        at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816)
>>>        at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467)
>>>        at
>>> com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510)
>>>        at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746)
>>>        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135)
>>>        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
>>>        at
>>> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
>>>        at
>>> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1885)
>>>        at
>>> org.apache.manifoldcf.jdbc.JDBCConnection$PreparedStatementQueryThread.run(JDBCConnection.java:1392)
>>> 
>>> But the mcf-agent process is still active, it was not shut down.
>>> I did a new thread dump, I obtained this :
>>> Full thread dump OpenJDK 64-Bit Server VM (25.212-b01 mixed mode):
>>> 
>>> "Shutdown thread" #8 prio=5 os_prio=0 tid=0x00007f8ccc014000 nid=0x436a
>>> in Object.wait() [0x00007f8c2a9e8000]
>>>   java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at
>>> org.apache.manifoldcf.core.system.ManifoldCF.sleep(ManifoldCF.java:1248)
>>> - locked <0x00000000e0930708> (a java.lang.Integer)
>>> at
>>> org.apache.manifoldcf.crawler.system.CrawlerAgent.stopSystem(CrawlerAgent.java:617)
>>> at
>>> org.apache.manifoldcf.crawler.system.CrawlerAgent.stopAgent(CrawlerAgent.java:249)
>>> at
>>> org.apache.manifoldcf.agents.system.AgentsDaemon.stopAgents(AgentsDaemon.java:168)
>>> - locked <0x00000000eab32808> (a java.util.HashMap)
>>> at
>>> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsShutdownHook.doCleanup(AgentsDaemon.java:395)
>>> at
>>> org.apache.manifoldcf.core.system.ManifoldCF.cleanUpEnvironment(ManifoldCF.java:1540)
>>> - locked <0x00000000eab24af8> (a java.util.ArrayList)
>>> - locked <0x00000000eab24ca0> (a java.lang.Integer)
>>> at
>>> org.apache.manifoldcf.core.system.ManifoldCF$ShutdownThread.run(ManifoldCF.java:1718)
>>> 
>>> "MySQL Statement Cancellation Timer" #2567 daemon prio=5 os_prio=0
>>> tid=0x00007f8c84061800 nid=0x4090 in Object.wait() [0x00007f8c2aeed000]
>>>   java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at java.lang.Object.wait(Object.java:502)
>>> at java.util.TimerThread.mainLoop(Timer.java:526)
>>> - locked <0x00000000ebe79d20> (a java.util.TaskQueue)
>>> at java.util.TimerThread.run(Timer.java:505)
>>> 
>>> "Connection pool reaper" #2565 daemon prio=5 os_prio=0
>>> tid=0x00007f8cf809d000 nid=0x408e waiting on condition [0x00007f8c2aceb000]
>>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> at java.lang.Thread.sleep(Native Method)
>>> at
>>> org.apache.manifoldcf.core.jdbcpool.ConnectionPoolManager$ConnectionCloserThread.run(ConnectionPoolManager.java:152)
>>> 
>>> "Worker thread '0'" #66 daemon prio=5 os_prio=0 tid=0x00007f8cfc209800
>>> nid=0x35d8 in Object.wait() [0x00007f8d0c6f4000]
>>>   java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at java.lang.Thread.join(Thread.java:1252)
>>> - locked <0x00000000eab24b28> (a
>>> org.apache.manifoldcf.core.system.ManifoldCF$ShutdownThread)
>>> at java.lang.Thread.join(Thread.java:1326)
>>> at
>>> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:107)
>>> at
>>> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>>> at java.lang.Shutdown.runHooks(Shutdown.java:123)
>>> at java.lang.Shutdown.sequence(Shutdown.java:167)
>>> at java.lang.Shutdown.exit(Shutdown.java:212)
>>> - locked <0x00000000eaae4150> (a java.lang.Class for java.lang.Shutdown)
>>> at java.lang.Runtime.exit(Runtime.java:109)
>>> at java.lang.System.exit(System.java:971)
>>> at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:851)
>>> 
>>> "Connection pool reaper" #13 daemon prio=5 os_prio=0
>>> tid=0x00007f8cfc04b000 nid=0x35a0 waiting on condition [0x00007f8d2c47b000]
>>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> at java.lang.Thread.sleep(Native Method)
>>> at
>>> org.apache.manifoldcf.core.jdbcpool.ConnectionPoolManager$ConnectionCloserThread.run(ConnectionPoolManager.java:152)
>>> 
>>> "Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f8d3c0b6000
>>> nid=0x359c runnable [0x0000000000000000]
>>>   java.lang.Thread.State: RUNNABLE
>>> 
>>> "C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f8d3c0b3000
>>> nid=0x359b waiting on condition [0x0000000000000000]
>>>   java.lang.Thread.State: RUNNABLE
>>> 
>>> "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f8d3c0b0800
>>> nid=0x359a waiting on condition [0x0000000000000000]
>>>   java.lang.Thread.State: RUNNABLE
>>> 
>>> "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f8d3c0af000
>>> nid=0x3599 waiting on condition [0x0000000000000000]
>>>   java.lang.Thread.State: RUNNABLE
>>> 
>>> "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f8d3c07b800 nid=0x3598
>>> in Object.wait() [0x00007f8d410d3000]
>>>   java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
>>> - locked <0x00000000eaaf2038> (a java.lang.ref.ReferenceQueue$Lock)
>>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
>>> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
>>> 
>>> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f8d3c079000
>>> nid=0x3597 in Object.wait() [0x00007f8d411d4000]
>>>   java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at java.lang.Object.wait(Object.java:502)
>>> at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
>>> - locked <0x00000000eaaf21f0> (a java.lang.ref.Reference$Lock)
>>> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
>>> 
>>> "main" #1 prio=5 os_prio=0 tid=0x00007f8d3c00b800 nid=0x3595 in
>>> Object.wait() [0x00007f8d45902000]
>>>   java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at
>>> org.apache.manifoldcf.core.system.ManifoldCF.sleep(ManifoldCF.java:1248)
>>> - locked <0x00000000e26ec910> (a java.lang.Integer)
>>> at
>>> org.apache.manifoldcf.agents.system.AgentsDaemon.runAgents(AgentsDaemon.java:115)
>>> at org.apache.manifoldcf.agents.AgentRun.doExecute(AgentRun.java:64)
>>> at
>>> org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:37)
>>> at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
>>> 
>>> "VM Thread" os_prio=0 tid=0x00007f8d3c06f800 nid=0x3596 runnable
>>> 
>>> "VM Periodic Task Thread" os_prio=0 tid=0x00007f8d3c0b9000 nid=0x359d
>>> waiting on condition
>>> 
>>> JNI global references: 13
>>> 
>>> Heap
>>> def new generation   total 157248K, used 39902K [0x00000000e0000000,
>>> 0x00000000eaaa0000, 0x00000000eaaa0000)
>>>  eden space 139776K,  28% used [0x00000000e0000000, 0x00000000e26f7a80,
>>> 0x00000000e8880000)
>>>  from space 17472K,   0% used [0x00000000e9990000, 0x00000000e9990000,
>>> 0x00000000eaaa0000)
>>>  to   space 17472K,   0% used [0x00000000e8880000, 0x00000000e8880000,
>>> 0x00000000e9990000)
>>> tenured generation   total 349568K, used 173322K [0x00000000eaaa0000,
>>> 0x0000000100000000, 0x0000000100000000)
>>>   the space 349568K,  49% used [0x00000000eaaa0000, 0x00000000f53e2ac8,
>>> 0x00000000f53e2c00, 0x0000000100000000)
>>> Metaspace       used 16548K, capacity 16750K, committed 17024K, reserved
>>> 1064960K
>>>  class space    used 1806K, capacity 1891K, committed 1920K, reserved
>>> 1048576K
>>> 
>>> Best regards,
>>> 
>>> Olivier
>>> 
>>>> Le 7 mai 2019 à 13:02, Karl Wright <daddy...@gmail.com> a écrit :
>>>> 
>>>> The thread dump is not helpful.  I see no traces for any threads that
>>>> identify code from MCF at all.  That's new and has nothing to do with
>>> MCF;
>>>> it's probably a new Java "feature".
>>>> 
>>>> The OutOfMemoryException should, I've confirmed, be rethrown.  The
>>> Worker
>>>> Thread code does in fact catch the exception and terminate the process:
>>>> 
>>>>>>>>>> 
>>>>       catch (OutOfMemoryError e)
>>>>       {
>>>>         System.err.println("agents process ran out of memory - shutting
>>>> down");
>>>>         e.printStackTrace(System.err);
>>>>         System.exit(-200);
>>>>       }
>>>> <<<<<<
>>>> 
>>>> So that is what you should see happening.
>>>> 
>>>> Are you providing the complete stack trace from your out-of-memory
>>>> exception?  That would help confirm the picture.  It had to have been
>>>> printed from somewhere; it goes to standard error, whatever that is.
>>>> 
>>>> Karl
>>>> 
>>>> 
>>>> On Tue, May 7, 2019 at 6:42 AM Karl Wright <daddy...@gmail.com> wrote:
>>>> 
>>>>> Hi Olivier,
>>>>> 
>>>>> Any out-of-memory exception that makes it to the top level should cause
>>>>> the agents process to shut itself down.
>>>>> 
>>>>> The reason for this is simple: in a multithread environment, with lots
>>> of
>>>>> third-party jars, an out-of-memory in one thread is usually indicative
>>> of
>>>>> an out-of-memory in other threads as well, and we cannot count on that
>>>>> third-party software being rigorous about letting an
>>> OutOfMemoryException
>>>>> be thrown to the top level.  So we tend to get corrupted results that
>>> are
>>>>> hard to recover from.
>>>>> 
>>>>> In this case, the exception is being thrown within the JDBCConnector,
>>> so
>>>>> it ought to rethrow the OutOfMemoryException.  Let me confirm that is
>>> what
>>>>> the code indicates.
>>>>> 
>>>>> Karl
>>>>> 
>>>>> 
>>>>> On Tue, May 7, 2019 at 6:14 AM Olivier Tavard <
>>>>> olivier.tav...@francelabs.com> wrote:
>>>>> 
>>>>>> Hi Karl,
>>>>>> 
>>>>>> Thank you for your help.
>>>>>> Indeed I found a OoM error in the logs of the MCF-agent  process :
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>       at com.mysql.jdbc.Buffer.<init>(Buffer.java:58)
>>>>>>       at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1441)
>>>>>>       at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816)
>>>>>>       at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467)
>>>>>>       at
>>>>>> com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510)
>>>>>>       at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746)
>>>>>>       at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135)
>>>>>>       at
>>> com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
>>>>>>       at
>>>>>> 
>>> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
>>>>>>       at
>>>>>> 
>>> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1885)
>>>>>>       at
>>>>>> 
>>> org.apache.manifoldcf.jdbc.JDBCConnection$PreparedStatementQueryThread.run(JDBCConnection.java:1392)
>>>>>> 
>>>>>> 
>>>>>> But the MCF process agent was not killed, it is still active. Why is
>>> the
>>>>>> process still active in this case ? Since we know that we can get an
>>> OOM,
>>>>>> couldn't we find an elegant way to notify users of this problem from
>>> the
>>>>>> MCF admin UI, rather than just having the process endlessly active ?
>>>>>> 
>>>>>> As for the OOM, the work around we decided to use is to increase the
>>>>>> values of xms and xmx java memory in options.env.unix.
>>>>>> I also attached the thread dump to this mail.
>>>>>> 
>>>>>> Thanks,
>>>>>> Best regards,
>>>>>> 
>>>>>> Olivier
>>>>>> 
>>>>>> Le 6 mai 2019 à 12:55, Karl Wright <daddy...@gmail.com> a écrit :
>>>>>> 
>>>>>> It sounds like there might be an out-of-memory situation, although if
>>>>>> that is the case I would expect that the MCF agents process would
>>> just shut
>>>>>> itself down.
>>>>>> 
>>>>>> If the process is hung, it should be easy to get a thread dump.  That
>>>>>> would be the first step.
>>>>>> 
>>>>>> Some thoughts as to what might be happening: there might be an
>>>>>> out-of-memory condition that is being silently eaten somewhere.  MCF
>>> relies
>>>>>> on its connectors to use streams rather than load documents into
>>> memory,
>>>>>> BUT we're at the mercy of the JDBC drivers and Tika.  So another
>>> experiment
>>>>>> would be to try to crawl your documents without the tika extractor
>>> (just
>>>>>> send them to the null output connector) and see if that succeeds.  If
>>> it
>>>>>> does, then maybe try the external Tika Service connector instead and
>>> see
>>>>>> what happens then.
>>>>>> 
>>>>>> Karl
>>>>>> 
>>>>>> 
>>>>>> On Mon, May 6, 2019 at 5:56 AM Olivier Tavard <
>>>>>> olivier.tav...@francelabs.com> wrote:
>>>>>> 
>>>>>>> Hi MCF community,
>>>>>>> 
>>>>>>> We have some issues with the JDBC connector to index database with
>>> large
>>>>>>> LONGBLOB into it : I mean files more than 100 MB.
>>>>>>> To reproduce the issue, I created a simple database on MySQL 5.7 with
>>>>>>> only one table into it with 2 columns : id (int)  and data
>>> (longblob).
>>>>>>> There are 8 records in the table : two records with very small files
>>> :
>>>>>>> less than 5 KB and 6 records with a large file of 192 MB.
>>>>>>> 
>>>>>>> If I include in my crawl only one big BLOB, the crawl is successful.
>>>>>>> seeding query : SELECT id AS $(IDCOLUMN) FROM data WHERE id =6
>>>>>>> 
>>>>>>> But if I include more than 2 of these (with a WHERE IN condition or
>>> if
>>>>>>> select all the records of the table) the job stays in running
>>> condition but
>>>>>>> no document is processed :
>>>>>>> seeding query : SELECT id AS $(IDCOLUMN) FROM data WHERE id IN (6,7)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> There is no message in the logs (even with debug mode activated).
>>>>>>> The tests were done in a Docker cluster on dedicated servers (4
>>> cores,
>>>>>>> 32 GB RAM each).
>>>>>>> Regarding MCF, the version is 2.12. The MCF agent has 1GB of RAM (xmx
>>>>>>> and xmx) the internal database is PostgreSQL 10.1 and we use
>>> Zookeeper
>>>>>>> based synchronization.
>>>>>>> We have a dump of the database here :
>>>>>>> https://www.datafari.com/files/mcftest.tar.gz
>>>>>>> 
>>>>>>> In other cases, we also encounter this error :
>>>>>>> Error: Unexpected jobqueue status - record id 1555190422690,
>>> expecting
>>>>>>> active status, saw 0
>>>>>>> 
>>>>>>> Any ideas of what the problem could be ? or at least why there is
>>>>>>> nothing in the debug log you think ?
>>>>>>> 
>>>>>>> Thank you,
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Olivier
>>>>>>> 
>>>>>> 
>>>>>> 
>>> 
>>> 

Reply via email to