Re: Does anyone use MOSS?
I don't know of any difference from a SharePoint standpoint between MOSS and WSS, except for additional Office-related plugins on MOSS. Connection working means you could could get to SharePoint at least. Can you look in the log and find the exception associated with the Cannot open the requested Sharepoint Site error? It should give a clue as to what the connector is trying to do at that time. Thanks, Karl On Wed, Oct 10, 2012 at 1:50 AM, Shinichiro Abe shinichiro.ab...@gmail.com wrote: Hi, I think MCF supports Windows SharePoint Services(WSS) though, does MCF support Microsoft Office SharePoint Server(MOSS)? I tried to crawl MOSS but I couldn't crawl and got the error. I'm using MOSS 2007 as out of the box. I have only Administrator user. On the repository connection, the config said connection working but when crawling the log said that Cannot open the requested Sharepoint Site.. I couldn't find the server event log that specifies this error. Any help please. Regards, Shinichiro Abe
Web crawling causes Socket Timeout after Database Exception
Hi I am having a trouble with crawling web using MCF1.0. I run MCF with MySQL 5.5 and Tomcat 6.0. It should keep crawling contents, but MCF prints the following Database exception log, then hangs. After DB Exception, Socket Time Exception occurs. Anyone has faced this problem? --Database Exception log: ERROR 2012-10-10 16:11:05,787 (Worker thread '42') - Worker thread aborting and restarting due to database connection reset: Database exception: Exception doing query: Lock wait timeout exceeded; try restarting transaction org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: Exception doing query: Lock wait timeout exceeded; try restarting transaction at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186) at org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852) at org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:1932) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.addDocumentReference(WorkerThread.java:1487) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessActivityLinkHandler.noteDiscoveredLink(WebcrawlerConnector.java:6049) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessAcivityHTMLHandler.noteAHREF(WebcrawlerConnector.java:6159) at org.apache.manifoldcf.crawler.connectors.webcrawler.LinkParseState.noteNonscriptTag(LinkParseState.java:44) at org.apache.manifoldcf.crawler.connectors.webcrawler.FormParseState.noteNonscriptTag(FormParseState.java:52) at org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState.noteTag(ScriptParseState.java:50) at org.apache.manifoldcf.crawler.connectors.webcrawler.BasicParseState.dealWithCharacter(BasicParseState.java:225) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleHTML(WebcrawlerConnector.java:7047) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:6011) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:1282) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551) Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127) at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2293) at org.apache.manifoldcf.core.database.Database.execute(Database.java:826) at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:641) ERROR 2012-10-10 16:11:06,799 (Worker thread '9') - Worker thread aborting and restarting due to database connection reset: Database exception: Exception doing query: Lock wait timeout exceeded; try restarting transaction org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: Exception doing query: Lock wait timeout exceeded; try restarting transaction at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186) at org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852) at org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089) at
Re: Web crawling causes Socket Timeout after Database Exception
Hi Shigeki, The socket timeout exception is only a warning. It means that some site you are crawling did not accept a socket connection within the allowed time (5 minutes I think). The Web Connector will retry the connection a few times, and if it is still rejected, it will eventually give up on that page. One thing you want to check, though, is that you are using proper throttling, because if you aren't then one cause of this problem is that the webmaster of the site you are trying to crawl may have blocked you from accessing it. The database exception is more problematic. It means that MySQL thinks it took too long for a specific transaction to complete, and the database aborted the transaction due to a timeout. There are two ways of dealing with this issue. One way is to modify your MySQL configuration to increase the transaction timeout value to some high number. The second way is to modify ManifoldCF to recognize the timeout error specifically, and cause a retry. But in order to do the latter, I would need to know what SQL error code MySQL returns for this situation, which will mean we either need to look it up (if we can), or modify a ManifoldCF instance to log it when this problem occurs. Please let me know how you would like to proceed. Karl On Wed, Oct 10, 2012 at 3:51 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi I am having a trouble with crawling web using MCF1.0. I run MCF with MySQL 5.5 and Tomcat 6.0. It should keep crawling contents, but MCF prints the following Database exception log, then hangs. After DB Exception, Socket Time Exception occurs. Anyone has faced this problem? --Database Exception log: ERROR 2012-10-10 16:11:05,787 (Worker thread '42') - Worker thread aborting and restarting due to database connection reset: Database exception: Exception doing query: Lock wait timeout exceeded; try restarting transaction org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: Exception doing query: Lock wait timeout exceeded; try restarting transaction at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186) at org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852) at org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:1932) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.addDocumentReference(WorkerThread.java:1487) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessActivityLinkHandler.noteDiscoveredLink(WebcrawlerConnector.java:6049) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessAcivityHTMLHandler.noteAHREF(WebcrawlerConnector.java:6159) at org.apache.manifoldcf.crawler.connectors.webcrawler.LinkParseState.noteNonscriptTag(LinkParseState.java:44) at org.apache.manifoldcf.crawler.connectors.webcrawler.FormParseState.noteNonscriptTag(FormParseState.java:52) at org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState.noteTag(ScriptParseState.java:50) at org.apache.manifoldcf.crawler.connectors.webcrawler.BasicParseState.dealWithCharacter(BasicParseState.java:225) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleHTML(WebcrawlerConnector.java:7047) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:6011) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:1282) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551) Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624) at