Re: Does anyone use MOSS?

2012-10-10 Thread Karl Wright
I don't know of any difference from a SharePoint standpoint between
MOSS and WSS, except for additional Office-related plugins on MOSS.

Connection working means you could could get to SharePoint at least.

Can you look in the log and find the exception associated with the
Cannot open the requested Sharepoint Site error?  It should give a
clue as to what the connector is trying to do at that time.

Thanks,
Karl


On Wed, Oct 10, 2012 at 1:50 AM, Shinichiro Abe
shinichiro.ab...@gmail.com wrote:
 Hi,
 I think MCF supports Windows SharePoint Services(WSS) though,
 does MCF support Microsoft Office SharePoint Server(MOSS)?

 I tried to crawl MOSS but I couldn't crawl and got the error.

 I'm using MOSS 2007 as out of the box.
 I have only Administrator user.
 On the repository connection, the config said connection working
 but when crawling the log said that
 Cannot open the requested Sharepoint Site..
 I couldn't find the server event log that specifies this error.

 Any help please.
 Regards,
 Shinichiro Abe


Web crawling causes Socket Timeout after Database Exception

2012-10-10 Thread Shigeki Kobayashi
Hi

I am having a trouble with crawling web using MCF1.0.
I run MCF with MySQL 5.5 and Tomcat 6.0.
It should keep crawling contents, but MCF prints the following Database
exception log, then hangs.
After DB Exception, Socket Time Exception occurs.

Anyone has faced this problem?

--Database Exception log:

ERROR 2012-10-10 16:11:05,787 (Worker thread '42') - Worker thread aborting
and restarting due to database connection reset: Database exception:
Exception doing query: Lock wait timeout exceeded; try restarting
transaction
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
exception: Exception doing query: Lock wait timeout exceeded; try
restarting transaction
at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681)
at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709)
at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394)
at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
at
org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852)
at
org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089)
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:1932)
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.addDocumentReference(WorkerThread.java:1487)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessActivityLinkHandler.noteDiscoveredLink(WebcrawlerConnector.java:6049)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessAcivityHTMLHandler.noteAHREF(WebcrawlerConnector.java:6159)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.LinkParseState.noteNonscriptTag(LinkParseState.java:44)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.FormParseState.noteNonscriptTag(FormParseState.java:52)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState.noteTag(ScriptParseState.java:50)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.BasicParseState.dealWithCharacter(BasicParseState.java:225)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleHTML(WebcrawlerConnector.java:7047)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:6011)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:1282)
at
org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try
restarting transaction
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
at
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127)
at
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2293)
at
org.apache.manifoldcf.core.database.Database.execute(Database.java:826)
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:641)
ERROR 2012-10-10 16:11:06,799 (Worker thread '9') - Worker thread aborting
and restarting due to database connection reset: Database exception:
Exception doing query: Lock wait timeout exceeded; try restarting
transaction
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
exception: Exception doing query: Lock wait timeout exceeded; try
restarting transaction
at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681)
at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709)
at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394)
at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
at
org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852)
at
org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089)
at

Re: Web crawling causes Socket Timeout after Database Exception

2012-10-10 Thread Karl Wright
Hi Shigeki,

The socket timeout exception is only a warning.  It means that some
site you are crawling did not accept a socket connection within the
allowed time (5 minutes I think).  The Web Connector will retry the
connection a few times, and if it is still rejected, it will
eventually give up on that page.  One thing you want to check, though,
is that you are using proper throttling, because if you aren't then
one cause of this problem is that the webmaster of the site you are
trying to crawl may have blocked you from accessing it.

The database exception is more problematic.  It means that MySQL
thinks it took too long for a specific transaction to complete, and
the database aborted the transaction due to a timeout.  There are two
ways of dealing with this issue.  One way is to modify your MySQL
configuration to increase the transaction timeout value to some high
number.  The second way is to modify ManifoldCF to recognize the
timeout error specifically, and cause a retry.  But in order to do the
latter, I would need to know what SQL error code MySQL returns for
this situation, which will mean we either need to look it up (if we
can), or modify a ManifoldCF instance to log it when this problem
occurs.

Please let me know how you would like to proceed.

Karl

On Wed, Oct 10, 2012 at 3:51 AM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:

 Hi

 I am having a trouble with crawling web using MCF1.0.
 I run MCF with MySQL 5.5 and Tomcat 6.0.
 It should keep crawling contents, but MCF prints the following Database
 exception log, then hangs.
 After DB Exception, Socket Time Exception occurs.

 Anyone has faced this problem?

 --Database Exception log:

 ERROR 2012-10-10 16:11:05,787 (Worker thread '42') - Worker thread aborting
 and restarting due to database connection reset: Database exception:
 Exception doing query: Lock wait timeout exceeded; try restarting
 transaction
 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
 exception: Exception doing query: Lock wait timeout exceeded; try restarting
 transaction
 at
 org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681)
 at
 org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709)
 at
 org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394)
 at
 org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
 at
 org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
 at
 org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852)
 at
 org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089)
 at
 org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:1932)
 at
 org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.addDocumentReference(WorkerThread.java:1487)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessActivityLinkHandler.noteDiscoveredLink(WebcrawlerConnector.java:6049)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessAcivityHTMLHandler.noteAHREF(WebcrawlerConnector.java:6159)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.LinkParseState.noteNonscriptTag(LinkParseState.java:44)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.FormParseState.noteNonscriptTag(FormParseState.java:52)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState.noteTag(ScriptParseState.java:50)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.BasicParseState.dealWithCharacter(BasicParseState.java:225)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleHTML(WebcrawlerConnector.java:7047)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:6011)
 at
 org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:1282)
 at
 org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
 at
 org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
 Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting
 transaction
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
 at