[jira] [Commented] (CONNECTORS-281) RSS connector takes nearly a second to fetch a document even with no throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13137765#comment-13137765 ] Karl Wright commented on CONNECTORS-281: With the fixes as checked in so far, there's no appreciable difference between early parts of the crawl and later parts. So I'm going to resolve this issue. RSS connector takes nearly a second to fetch a document even with no throttling --- Key: CONNECTORS-281 URL: https://issues.apache.org/jira/browse/CONNECTORS-281 Project: ManifoldCF Issue Type: Bug Components: RSS connector Affects Versions: ManifoldCF 0.4 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.4 The RSS connector load test shows that the RSS connector is overthrottling, for some reason. 10-24-2011 05:30:50.423 fetch http://localhost:8189/rss/gen.php?doc=4feed=782type=doc 200 46 843 ... Where 843 ms is taken to fetch a document of size 46 bytes. This is with connection parameters as follows: Parameters: Robots usage=none Max fetches per minute=100 Email address=someb...@somewhere.com KB per second=100 Max server connections=100 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-281) RSS connector takes nearly a second to fetch a document even with no throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133951#comment-13133951 ] Karl Wright commented on CONNECTORS-281: A thread dump shows all worker threads waiting on database functionality, but this is interesting. A full 11/30 threads are waiting to RETURN connections to the pool: Worker thread '5' daemon prio=6 tid=0x055b5c00 nid=0x1c14 waiting for monitor entry [0x05def000] java.lang.Thread.State: BLOCKED (on object monitor) at com.bitmechanic.sql.ConnectionPool.returnConnection(ConnectionPool.java:474) - waiting to lock 0x292ad3a0 (a com.bitmechanic.sql.ConnectionPool) at com.bitmechanic.sql.PooledConnection.close(PooledConnection.java:202) at org.apache.manifoldcf.core.database.ConnectionFactory.releaseConnection(ConnectionFactory.java:113) at org.apache.manifoldcf.core.database.Database.endTransaction(Database.java:330) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.endTransaction(DBInterfacePostgreSQL.java:1112) at org.apache.manifoldcf.core.database.BaseTable.endTransaction(BaseTable.java:274) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.noteDocumentIngest(IncrementalIngester.java:1373) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:503) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRecord(IncrementalIngester.java:325) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.recordDocument(WorkerThread.java:1556) at org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.processDocuments(RSSConnector.java:1281) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561) It's not clear at all why this should be. The only possible hint is that there's one thread waiting on GETTING a connection from the pool: Worker thread '19' daemon prio=6 tid=0x045e1400 nid=0x1b94 waiting for monitor entry [0x0624f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.bitmechanic.sql.ConnectionPool.getConnection(ConnectionPool.java:375) - waiting to lock 0x292ad3a0 (a com.bitmechanic.sql.ConnectionPool) at com.bitmechanic.sql.ConnectionPoolManager.connect(ConnectionPoolManager.java:442) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:207) at org.apache.manifoldcf.core.database.ConnectionFactory.getConnectionWithRetries(ConnectionFactory.java:144) at org.apache.manifoldcf.core.database.ConnectionFactory.getConnection(ConnectionFactory.java:90) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:502) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1152) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:860) at org.apache.manifoldcf.crawler.jobs.Carrydown.getDataValuesAsFiles(Carrydown.java:603) at org.apache.manifoldcf.crawler.jobs.JobManager.retrieveParentDataAsFiles(JobManager.java:4263) at org.apache.manifoldcf.crawler.system.WorkerThread$VersionActivity.retrieveParentDataAsFiles(WorkerThread.java:1211) at org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.getDocumentVersions(RSSConnector.java:818) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:318) This has me wondering if we're seeing a bug in the pool driver. I'll try to confirm with further stack traces, since this does not explain the high CPU usage. RSS connector takes nearly a second to fetch a document even with no throttling --- Key: CONNECTORS-281 URL: https://issues.apache.org/jira/browse/CONNECTORS-281 Project: ManifoldCF Issue Type: Bug Components: RSS connector Affects Versions: ManifoldCF 0.4 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.4 The RSS connector load test shows that the RSS connector is overthrottling, for some reason. 10-24-2011 05:30:50.423 fetch http://localhost:8189/rss/gen.php?doc=4feed=782type=doc 200 46 843 ... Where 843 ms is taken to fetch a document of size 46 bytes. This is with connection parameters as follows: Parameters: Robots usage=none Max fetches per
[jira] [Commented] (CONNECTORS-281) RSS connector takes nearly a second to fetch a document even with no throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133957#comment-13133957 ] Karl Wright commented on CONNECTORS-281: A second capture shows a much more expected mix of thread activities. Most of the threads are waiting for database activities, mostly inserting documents and managing carrydown information. The non-database work seems to be fetching documents and parsing URLs, which is as it should be. RSS connector takes nearly a second to fetch a document even with no throttling --- Key: CONNECTORS-281 URL: https://issues.apache.org/jira/browse/CONNECTORS-281 Project: ManifoldCF Issue Type: Bug Components: RSS connector Affects Versions: ManifoldCF 0.4 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.4 The RSS connector load test shows that the RSS connector is overthrottling, for some reason. 10-24-2011 05:30:50.423 fetch http://localhost:8189/rss/gen.php?doc=4feed=782type=doc 200 46 843 ... Where 843 ms is taken to fetch a document of size 46 bytes. This is with connection parameters as follows: Parameters: Robots usage=none Max fetches per minute=100 Email address=someb...@somewhere.com KB per second=100 Max server connections=100 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-281) RSS connector takes nearly a second to fetch a document even with no throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133960#comment-13133960 ] Karl Wright commented on CONNECTORS-281: A third dump shows up the temporary file tracker as being a potential bottleneck. Many threads are waiting on the same synchronizer: Worker thread '7' daemon prio=6 tid=0x055b6800 nid=0x165c waiting for monitor entry [0x05e8f000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.manifoldcf.core.system.ManifoldCF$FileTrack.addFile(ManifoldCF.java:1177) - waiting to lock 0x292a2728 (a org.apache.manifoldcf.core.system.ManifoldCF$FileTrack) at org.apache.manifoldcf.core.system.ManifoldCF.addFile(ManifoldCF.java:701) at org.apache.manifoldcf.crawler.connectors.rss.DataCache.addData(DataCache.java:67) at org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.getDocumentVersions(RSSConnector.java:1065) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:318) ... both add and delete. RSS connector takes nearly a second to fetch a document even with no throttling --- Key: CONNECTORS-281 URL: https://issues.apache.org/jira/browse/CONNECTORS-281 Project: ManifoldCF Issue Type: Bug Components: RSS connector Affects Versions: ManifoldCF 0.4 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.4 The RSS connector load test shows that the RSS connector is overthrottling, for some reason. 10-24-2011 05:30:50.423 fetch http://localhost:8189/rss/gen.php?doc=4feed=782type=doc 200 46 843 ... Where 843 ms is taken to fetch a document of size 46 bytes. This is with connection parameters as follows: Parameters: Robots usage=none Max fetches per minute=100 Email address=someb...@somewhere.com KB per second=100 Max server connections=100 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-281) RSS connector takes nearly a second to fetch a document even with no throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134662#comment-13134662 ] Karl Wright commented on CONNECTORS-281: Use of temporary files, plus carry-down data, seems to be what makes the RSS connector significantly slower than a file crawl. I'm still trying to assess whether the carrydown data is the issue later in the crawl. RSS connector takes nearly a second to fetch a document even with no throttling --- Key: CONNECTORS-281 URL: https://issues.apache.org/jira/browse/CONNECTORS-281 Project: ManifoldCF Issue Type: Bug Components: RSS connector Affects Versions: ManifoldCF 0.4 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.4 The RSS connector load test shows that the RSS connector is overthrottling, for some reason. 10-24-2011 05:30:50.423 fetch http://localhost:8189/rss/gen.php?doc=4feed=782type=doc 200 46 843 ... Where 843 ms is taken to fetch a document of size 46 bytes. This is with connection parameters as follows: Parameters: Robots usage=none Max fetches per minute=100 Email address=someb...@somewhere.com KB per second=100 Max server connections=100 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira