I will look into this when I have a chance. Karl
On Wed, Nov 5, 2014 at 11:48 AM, Aeham Abushwashi < [email protected]> wrote: > Hi Karl, > > With the latest revisions, documents for all jobs (legacy and new) do get > picked up and processed, which is great! This was verified on a small > 1-node test system. > I have since applied the fix to a much larger environment (29M docs across > 4 MCF agents using a 3-node Zookeeper cluster) which has a bunch of > mid-sized (100,000s docs) jobs in a Running state. The update of the > priorityset field for ~36M jobqueue records took just over an hour. More > problematically for me is the rate of reprioritization on startup which was > very slow - nearly 2 hours to update ~600,000 records. > > A couple of SQL queries > (JobManager#getNextNotYetProcessedRepriotizationDocuments and > ManifoldCF#writeDocumentPriorities) come up frequently, but a VisualVM > profile of the MCF agent shows the majority of the Agents thread's time is > spent talking to ZK, for locking + reading some config data very frequently > - see the snapshots below. > > Is it possible to avoid the per-document locking pattern seen in this case? > > Cheers, > Aeham > > ++++++++++++++++++++++ > > 2014-11-05 15:39:42 > > "Agents thread" - Thread t@21 > java.lang.Thread.State: WAITING > at java.lang.Object.wait(Native Method) > - waiting on <487ef1bb> (a org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.readData(ZooKeeperConnection.java:819) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.getSharedConfiguration(ZooKeeperLockManager.java:670) > at > > org.apache.manifoldcf.core.interfaces.LockManagerFactory.getBooleanProperty(LockManagerFactory.java:110) > at > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.setThreadContext(SharedDriveConnector.java:157) > at > > org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.getConnector(ConnectorPool.java:489) > - locked <3f2843d4> (a > org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool) > at > > org.apache.manifoldcf.core.connectorpool.ConnectorPool.grab(ConnectorPool.java:255) > at > > org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.grab(RepositoryConnectorPool.java:86) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1007) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960) > at > > org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$CleanupAgent.cleanUpAllServices(AgentsDaemon.java:356) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:203) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:129) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) > - locked <6c7d33b0> (a java.util.HashMap) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) > > Locked ownable synchronizers: > - None > > ++++++++++++++++++++++ > > 2014-11-05 15:39:52 > > "Agents thread" - Thread t@21 > java.lang.Thread.State: WAITING > at java.lang.Object.wait(Native Method) > - waiting on <52698c72> (a org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:781) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.createSequentialChild(ZooKeeperConnection.java:1116) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.obtainReadLock(ZooKeeperConnection.java:691) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.obtainGlobalReadLock(ZooKeeperLockObject.java:193) > at > > org.apache.manifoldcf.core.lockmanager.LockObject.enterReadLock(LockObject.java:310) > - locked <151db932> (a > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject) > at > > org.apache.manifoldcf.core.lockmanager.LockGate.enterReadLock(LockGate.java:261) > at > > org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterRead(BaseLockManager.java:1283) > at > > org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterReadLock(BaseLockManager.java:790) > at > > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getMinimumDepth(ReprioritizationTracker.java:251) > at > > org.apache.manifoldcf.crawler.system.PriorityCalculator.<init>(PriorityCalculator.java:89) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1021) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960) > at > > org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$CleanupAgent.cleanUpAllServices(AgentsDaemon.java:356) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:203) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:129) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) > - locked <6c7d33b0> (a java.util.HashMap) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) > > Locked ownable synchronizers: > - None > > ++++++++++++++++++++++ > > 2014-11-05 15:39:59 > > "Agents thread" - Thread t@21 > java.lang.Thread.State: WAITING > at java.lang.Object.wait(Native Method) > - waiting on <79c64d6d> (a org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:871) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.releaseLock(ZooKeeperConnection.java:796) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.clearLock(ZooKeeperLockObject.java:218) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.clearGlobalReadLockNoWait(ZooKeeperLockObject.java:212) > at > > org.apache.manifoldcf.core.lockmanager.LockObject.clearGlobalReadLock(LockObject.java:395) > at > > org.apache.manifoldcf.core.lockmanager.LockObject.leaveReadLock(LockObject.java:376) > - locked <126e1776> (a > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject) > at > > org.apache.manifoldcf.core.lockmanager.LockGate.leaveReadLock(LockGate.java:289) > - locked <126e1776> (a > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject) > at > > org.apache.manifoldcf.core.lockmanager.BaseLockManager.leaveRead(BaseLockManager.java:1369) > at > > org.apache.manifoldcf.core.lockmanager.BaseLockManager.leaveReadLock(BaseLockManager.java:804) > at > > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.getMinimumDepth(ReprioritizationTracker.java:258) > at > > org.apache.manifoldcf.crawler.system.PriorityCalculator.<init>(PriorityCalculator.java:89) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1021) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960) > at > > org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$CleanupAgent.cleanUpAllServices(AgentsDaemon.java:356) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:203) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:129) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) > - locked <6c7d33b0> (a java.util.HashMap) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) > > Locked ownable synchronizers: > - None > > ++++++++++++++++++++++ > > 2014-11-05 15:40:11 > > "Agents thread" - Thread t@21 > java.lang.Thread.State: WAITING > at java.lang.Object.wait(Native Method) > - waiting on <354dbdf> (a org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.readData(ZooKeeperConnection.java:819) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.getSharedConfiguration(ZooKeeperLockManager.java:670) > at > > org.apache.manifoldcf.core.interfaces.LockManagerFactory.getBooleanProperty(LockManagerFactory.java:110) > at > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.setThreadContext(SharedDriveConnector.java:157) > at > > org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.getConnector(ConnectorPool.java:489) > - locked <6f2f3168> (a > org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool) > at > > org.apache.manifoldcf.core.connectorpool.ConnectorPool.grab(ConnectorPool.java:255) > at > > org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.grab(RepositoryConnectorPool.java:86) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1007) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.resetAllDocumentPriorities(ManifoldCF.java:960) > at > > org.apache.manifoldcf.crawler.system.CrawlerAgent.cleanUpAllAgentData(CrawlerAgent.java:155) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$CleanupAgent.cleanUpAllServices(AgentsDaemon.java:356) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:203) > at > > org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.registerServiceBeginServiceActivity(ZooKeeperLockManager.java:129) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) > - locked <6c7d33b0> (a java.util.HashMap) > at > > org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) > > Locked ownable synchronizers: > - None >
