Re: Error: Repeated service interruptions - failure processing document: Read timed out
Hi, We have reset thottling to 10 for AD and SOLR (2 for the windows repository). Job indexing all pptx to null ouput has run successfully (162733 documents) Job indexing all pptx to solr still fails, manifoldcf.log contains: WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy. jcifs.smb.SmbException: All pipe instances are busy. at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) at jcifs.smb.SmbTransport.send(SmbTransport.java:663) at jcifs.smb.SmbSession.send(SmbSession.java:238) at jcifs.smb.SmbTree.send(SmbTree.java:119) at jcifs.smb.SmbFile.send(SmbFile.java:775) at jcifs.smb.SmbFile.open0(SmbFile.java:989) at jcifs.smb.SmbFile.open(SmbFile.java:1006) at jcifs.smb.SmbFileOutputStream.init(SmbFileOutputStream.java:142) at jcifs.smb.TransactNamedPipeOutputStream.init(TransactNamedPipeOutputStream.java:32) at jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) at jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2393) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.describeDocumentSecurity(SharedDriveConnector.java:1045) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getDocumentVersions(SharedDriveConnector.java:554) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322) WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during indexing: Read timed out java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919) WARN 2013-11-07 14:55:45,273 (Worker thread '30') - Service interruption reported for job 1383765534700 connection 'Filesharesrv1': IO exception during indexing: Read timed out ERROR 2013-11-07 14:55:45,304 (Worker thread '30') - Exception tossed: Repeated service interruptions - failure processing document: Read timed out org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Read timed out at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:586) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at
Re: Error: Repeated service interruptions - failure processing document: Read timed out
Hi Ronny, The failure is being caused because the time spent transferring data to Solr is exceeding the socket timeout you have set for the Solr connection, for some documents. This is probably due to excessive load on the Solr instance. My suggestion is to increase the socket timeout on your solr connection to at least 30 minutes or more to see if this resolves. Thanks, Karl On Thu, Nov 7, 2013 at 9:30 AM, Ronny Heylen securaqbere...@gmail.comwrote: Hi, We have reset thottling to 10 for AD and SOLR (2 for the windows repository). Job indexing all pptx to null ouput has run successfully (162733 documents) Job indexing all pptx to solr still fails, manifoldcf.log contains: WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy. jcifs.smb.SmbException: All pipe instances are busy. at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) at jcifs.smb.SmbTransport.send(SmbTransport.java:663) at jcifs.smb.SmbSession.send(SmbSession.java:238) at jcifs.smb.SmbTree.send(SmbTree.java:119) at jcifs.smb.SmbFile.send(SmbFile.java:775) at jcifs.smb.SmbFile.open0(SmbFile.java:989) at jcifs.smb.SmbFile.open(SmbFile.java:1006) at jcifs.smb.SmbFileOutputStream.init(SmbFileOutputStream.java:142) at jcifs.smb.TransactNamedPipeOutputStream.init(TransactNamedPipeOutputStream.java:32) at jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) at jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2393) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.describeDocumentSecurity(SharedDriveConnector.java:1045) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getDocumentVersions(SharedDriveConnector.java:554) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322) WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during indexing: Read timed out java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919) WARN 2013-11-07 14:55:45,273 (Worker thread '30') - Service interruption reported for job 1383765534700 connection 'Filesharesrv1': IO exception during indexing: Read timed out ERROR 2013-11-07 14:55:45,304 (Worker
Re: repohistory table on PostgresQL too high size
Hi Marcello, The only thing in this table is the history data, so yes, feel free to delete as much of it as you want. Karl On Thu, Nov 7, 2013 at 11:17 AM, Marcello Lorenzi mlore...@sorint.itwrote: Hi All, during the Manifold installation on our pre-production environment, we have noticed that the repohistory table on PostgresQL instance has ha size of 4 GB after few months of crawling. Is it possible to remove some historical data from this table? Thanks, Marcello
Re: Error: Repeated service interruptions - failure processing document: Read timed out
Karl, I don't know where you live but if you come to Belgium, stop in Brussels for a good Belgian beer ;-) In other words, setting the socket timeout to 2000 instead of 900 has solved the problem. It has indexed about 160,000 documents in 2 hours. On the other hand, the Manifold/Solr machine (all run in the same Windows VM) has been allocated 8 3.6GHZ CPU and 32GB memory, and is used only for the indexing test, no search on SOLR. So the fact that a timeout of 900 seconds was not enough looks strange: is it possible that some of these 160,000 docments take more than 15 minutes to be handled by SOLR? RonnyFrédéric On Thu, Nov 7, 2013 at 4:30 PM, Karl Wright daddy...@gmail.com wrote: Hi Ronny, The failure is being caused because the time spent transferring data to Solr is exceeding the socket timeout you have set for the Solr connection, for some documents. This is probably due to excessive load on the Solr instance. My suggestion is to increase the socket timeout on your solr connection to at least 30 minutes or more to see if this resolves. Thanks, Karl On Thu, Nov 7, 2013 at 9:30 AM, Ronny Heylen securaqbere...@gmail.comwrote: Hi, We have reset thottling to 10 for AD and SOLR (2 for the windows repository). Job indexing all pptx to null ouput has run successfully (162733 documents) Job indexing all pptx to solr still fails, manifoldcf.log contains: WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy. jcifs.smb.SmbException: All pipe instances are busy. at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) at jcifs.smb.SmbTransport.send(SmbTransport.java:663) at jcifs.smb.SmbSession.send(SmbSession.java:238) at jcifs.smb.SmbTree.send(SmbTree.java:119) at jcifs.smb.SmbFile.send(SmbFile.java:775) at jcifs.smb.SmbFile.open0(SmbFile.java:989) at jcifs.smb.SmbFile.open(SmbFile.java:1006) at jcifs.smb.SmbFileOutputStream.init(SmbFileOutputStream.java:142) at jcifs.smb.TransactNamedPipeOutputStream.init(TransactNamedPipeOutputStream.java:32) at jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) at jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2393) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.describeDocumentSecurity(SharedDriveConnector.java:1045) at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getDocumentVersions(SharedDriveConnector.java:554) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322) WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during indexing: Read timed out java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at
Authority Service questions
Hi there, Having problems getting my head around the MCF Authority Service. Please just direct me to the documentation if this information is out there already. Is this service (just) a webapp? If so, should running start-webapps.sh install/start it? (I have run start-webapps.sh, but myhost:8345/mcf-authority-service/UserACLs gives me a 404.) If it's not just a webapp, what else do I need to install/register? That is, is something like mod-authz-annotate also required in addition to the webapp? Assuming I'm using my own search engine output connector (not using Solr) how does the search engine call the Authority Service? Via the JSON API? Or should I just dive into the Solr plug-in source? Thanks, Mark
Re: Authority Service questions
Hi Mark, Yes, the Authority Service is a web application. You are supposed to call it something like this: curl http://localhost:8345/mcf-authority-service/UserACLs?username=foo@domain; ... and you get back a list of tokens and authority statuses. Karl On Thu, Nov 7, 2013 at 1:51 PM, Mark Libucha mlibu...@gmail.com wrote: Hi there, Having problems getting my head around the MCF Authority Service. Please just direct me to the documentation if this information is out there already. Is this service (just) a webapp? If so, should running start-webapps.sh install/start it? (I have run start-webapps.sh, but myhost:8345/mcf-authority-service/UserACLs gives me a 404.) If it's not just a webapp, what else do I need to install/register? That is, is something like mod-authz-annotate also required in addition to the webapp? Assuming I'm using my own search engine output connector (not using Solr) how does the search engine call the Authority Service? Via the JSON API? Or should I just dive into the Solr plug-in source? Thanks, Mark