Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-07 Thread Ronny Heylen
Hi,
We have reset thottling to 10 for AD and SOLR (2 for the windows
repository).
Job indexing all pptx to null ouput has run successfully (162733 documents)
Job indexing all pptx to solr still fails, manifoldcf.log contains:
 WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly
transient exception detected on attempt 1 while getting share security: All
pipe instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
at jcifs.smb.SmbSession.send(SmbSession.java:238)
at jcifs.smb.SmbTree.send(SmbTree.java:119)
at jcifs.smb.SmbFile.send(SmbFile.java:775)
at jcifs.smb.SmbFile.open0(SmbFile.java:989)
at jcifs.smb.SmbFile.open(SmbFile.java:1006)
at jcifs.smb.SmbFileOutputStream.init(SmbFileOutputStream.java:142)
at
jcifs.smb.TransactNamedPipeOutputStream.init(TransactNamedPipeOutputStream.java:32)
at
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
at
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2393)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.describeDocumentSecurity(SharedDriveConnector.java:1045)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getDocumentVersions(SharedDriveConnector.java:554)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
 WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during
indexing: Read timed out
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at
org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
 WARN 2013-11-07 14:55:45,273 (Worker thread '30') - Service interruption
reported for job 1383765534700 connection 'Filesharesrv1': IO exception
during indexing: Read timed out
ERROR 2013-11-07 14:55:45,304 (Worker thread '30') - Exception tossed:
Repeated service interruptions - failure processing document: Read timed out
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
interruptions - failure processing document: Read timed out
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:586)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at 

Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-07 Thread Karl Wright
Hi Ronny,

The failure is being caused because the time spent transferring data to
Solr is exceeding the socket timeout you have set for the Solr connection,
for some documents.

This is probably due to excessive load on the Solr instance.  My suggestion
is to increase the socket timeout on your solr connection to at least 30
minutes or more to see if this resolves.

Thanks,
Karl



On Thu, Nov 7, 2013 at 9:30 AM, Ronny Heylen securaqbere...@gmail.comwrote:

 Hi,
 We have reset thottling to 10 for AD and SOLR (2 for the windows
 repository).
 Job indexing all pptx to null ouput has run successfully (162733 documents)
 Job indexing all pptx to solr still fails, manifoldcf.log contains:
  WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly
 transient exception detected on attempt 1 while getting share security: All
 pipe instances are busy.
 jcifs.smb.SmbException: All pipe instances are busy.
 at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
 at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
 at jcifs.smb.SmbSession.send(SmbSession.java:238)
 at jcifs.smb.SmbTree.send(SmbTree.java:119)
 at jcifs.smb.SmbFile.send(SmbFile.java:775)
 at jcifs.smb.SmbFile.open0(SmbFile.java:989)
 at jcifs.smb.SmbFile.open(SmbFile.java:1006)
 at jcifs.smb.SmbFileOutputStream.init(SmbFileOutputStream.java:142)
 at
 jcifs.smb.TransactNamedPipeOutputStream.init(TransactNamedPipeOutputStream.java:32)
 at
 jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
 at
 jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
 at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
 at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
 at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
 at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943)
 at
 org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2393)
 at
 org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.describeDocumentSecurity(SharedDriveConnector.java:1045)
 at
 org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getDocumentVersions(SharedDriveConnector.java:554)
 at
 org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
  WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during
 indexing: Read timed out
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:152)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
 at
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
 at
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
 at
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
 at
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
 at
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
 at
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
 at
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
 at
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
 at
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
 at
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
 at
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
 at
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at
 org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at
 org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
  WARN 2013-11-07 14:55:45,273 (Worker thread '30') - Service interruption
 reported for job 1383765534700 connection 'Filesharesrv1': IO exception
 during indexing: Read timed out
 ERROR 2013-11-07 14:55:45,304 (Worker 

Re: repohistory table on PostgresQL too high size

2013-11-07 Thread Karl Wright
Hi Marcello,

The only thing in this table is the history data, so yes, feel free to
delete as much of it as you want.

Karl



On Thu, Nov 7, 2013 at 11:17 AM, Marcello Lorenzi mlore...@sorint.itwrote:

 Hi All,
 during the Manifold installation on our pre-production environment, we
 have noticed that the repohistory table on PostgresQL instance has ha size
 of 4 GB after few months of crawling.

 Is it possible to remove some historical data from this table?

 Thanks,
 Marcello






Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-07 Thread Ronny Heylen
Karl,
I don't know where you live but if you come to Belgium, stop in Brussels
for a good Belgian beer ;-)
In other words, setting the socket timeout to 2000 instead of 900 has
solved the problem.
It has indexed about 160,000 documents in 2 hours.
On the other hand, the Manifold/Solr machine (all run in the same Windows
VM) has been allocated 8 3.6GHZ CPU and 32GB memory, and is used only for
the indexing test, no search on SOLR.
So the fact that a timeout of 900 seconds was not enough looks strange: is
it possible that some of these 160,000 docments take more than 15 minutes
to be handled by SOLR?
RonnyFrédéric


On Thu, Nov 7, 2013 at 4:30 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Ronny,

 The failure is being caused because the time spent transferring data to
 Solr is exceeding the socket timeout you have set for the Solr connection,
 for some documents.

 This is probably due to excessive load on the Solr instance.  My
 suggestion is to increase the socket timeout on your solr connection to at
 least 30 minutes or more to see if this resolves.

 Thanks,
 Karl



 On Thu, Nov 7, 2013 at 9:30 AM, Ronny Heylen securaqbere...@gmail.comwrote:

 Hi,
 We have reset thottling to 10 for AD and SOLR (2 for the windows
 repository).
 Job indexing all pptx to null ouput has run successfully (162733
 documents)
 Job indexing all pptx to solr still fails, manifoldcf.log contains:
  WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly
 transient exception detected on attempt 1 while getting share security: All
 pipe instances are busy.
 jcifs.smb.SmbException: All pipe instances are busy.
 at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
 at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
 at jcifs.smb.SmbSession.send(SmbSession.java:238)
 at jcifs.smb.SmbTree.send(SmbTree.java:119)
 at jcifs.smb.SmbFile.send(SmbFile.java:775)
 at jcifs.smb.SmbFile.open0(SmbFile.java:989)
 at jcifs.smb.SmbFile.open(SmbFile.java:1006)
 at jcifs.smb.SmbFileOutputStream.init(SmbFileOutputStream.java:142)
 at
 jcifs.smb.TransactNamedPipeOutputStream.init(TransactNamedPipeOutputStream.java:32)
 at
 jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
 at
 jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
 at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
 at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
 at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
 at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943)
 at
 org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2393)
 at
 org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.describeDocumentSecurity(SharedDriveConnector.java:1045)
 at
 org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getDocumentVersions(SharedDriveConnector.java:554)
 at
 org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
  WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during
 indexing: Read timed out
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:152)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
 at
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
 at
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
 at
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
 at
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
 at
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
 at
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
 at
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
 at
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
 at
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
 at
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
 at
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
 at
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at
 

Authority Service questions

2013-11-07 Thread Mark Libucha
Hi there,

Having problems getting my head around the MCF Authority Service. Please
just direct me to the documentation if this information is out there
already.

Is this service (just) a webapp?

If so, should running start-webapps.sh install/start it? (I have run
start-webapps.sh, but myhost:8345/mcf-authority-service/UserACLs gives me a
404.)

If it's not just a webapp, what else do I need to install/register? That
is, is something like mod-authz-annotate also required in addition to the
webapp?

Assuming I'm using my own search engine output connector (not using Solr)
how does the search engine call the Authority Service? Via the JSON API?

Or should I just dive into the Solr plug-in source?

Thanks,

Mark


Re: Authority Service questions

2013-11-07 Thread Karl Wright
Hi Mark,

Yes, the Authority Service is a web application.  You are supposed to call
it something like this:

curl 
http://localhost:8345/mcf-authority-service/UserACLs?username=foo@domain;

... and you get back a list of tokens and authority statuses.

Karl



On Thu, Nov 7, 2013 at 1:51 PM, Mark Libucha mlibu...@gmail.com wrote:

 Hi there,

 Having problems getting my head around the MCF Authority Service. Please
 just direct me to the documentation if this information is out there
 already.

 Is this service (just) a webapp?

 If so, should running start-webapps.sh install/start it? (I have run
 start-webapps.sh, but myhost:8345/mcf-authority-service/UserACLs gives me a
 404.)

 If it's not just a webapp, what else do I need to install/register? That
 is, is something like mod-authz-annotate also required in addition to the
 webapp?

 Assuming I'm using my own search engine output connector (not using Solr)
 how does the search engine call the Authority Service? Via the JSON API?

 Or should I just dive into the Solr plug-in source?

 Thanks,

 Mark