There are two different issues here. The first one is that you are having a connection close on you; not sure the reason why, but could potentially be caused by a Tika exception in Solr. The second is that the refactored WorkerThread code I checked in Sunday might have a bug in handling exceptions of this kind.
I'll have a look at these and get back to you shortly. Karl On Mon, Aug 13, 2012 at 10:28 PM, Ahmet Arslan <[email protected]> wrote: > If I modify my Path Rules to index only *.doc and *.docx files, I can > re-index over and over without restarting anything. Everything works fine. > It seems that there is a problem with non text extractable files. > > /Documents/*.doc file include > /Documents/*.docx file include > > --- On Tue, 8/14/12, Ahmet Arslan <[email protected]> wrote: > >> From: Ahmet Arslan <[email protected]> >> Subject: Re: SharePoint: Error closing connection to file >> To: [email protected] >> Date: Tuesday, August 14, 2012, 5:20 AM >> >> Also after this, when i hit "View Repository Connection >> Status" i get : >> >> Got an unknown remote exception accessing site - axis fault >> = Server.userException, detail = >> java.net.UnknownHostException: null >> >> I restart mcf, I get "Connection status: Connection working" >> at "View Repository Connection Status" page. >> >> --- On Tue, 8/14/12, Ahmet Arslan <[email protected]> >> wrote: >> >> > From: Ahmet Arslan <[email protected]> >> > Subject: SharePoint: Error closing connection to file >> > To: [email protected] >> > Date: Tuesday, August 14, 2012, 5:18 AM >> > Hello, >> > >> > Using solr output connector and SP2010 Repository >> connector, >> > I am indexing a document library named Documents. This >> > library has some scanned pdf documents. Very First >> crawl >> > indexes all 91 docs. >> > When I hit "Re-ingest all associated documents" and >> start >> > second crawl, I get : "Error: Unexpected jobqueue >> status - >> > record id 1344907007021, expecting active status, saw >> 3" >> > >> > Here is the stack trace: >> > When i look at >> > http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf, >> > it is an image (scanned) pdf. >> > >> > WARN 2012-08-14 05:13:22,068 (Worker thread '39') - >> > SharePoint: Error closing connection to file >> > 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf': >> > Connection reset >> > java.net.SocketException: Connection reset >> > at >> > >> java.net.SocketInputStream.read(SocketInputStream.java:113) >> > at >> > >> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >> > at >> > >> java.io.BufferedInputStream.read1(BufferedInputStream.java:258) >> > at >> > >> java.io.BufferedInputStream.read(BufferedInputStream.java:317) >> > at >> > >> org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown >> > Source) >> > at >> > >> org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown >> > Source) >> > at >> > >> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(Unknown >> > Source) >> > at >> > >> org.apache.commons.httpclient.ContentLengthInputStream.close(Unknown >> > Source) >> > at >> > >> java.io.FilterInputStream.close(FilterInputStream.java:155) >> > at >> > >> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(Unknown >> > Source) >> > at >> > >> org.apache.commons.httpclient.AutoCloseInputStream.close(Unknown >> > Source) >> > at >> > >> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1457) >> > at >> > >> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) >> > at >> > >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549) >> > DEBUG 2012-08-14 05:13:22,072 (Worker thread '42') - >> > SharePoint: Path attribute name is null >> > WARN 2012-08-14 05:13:22,081 (Worker thread '39') >> - >> > SharePoint: IOException thrown: Connection reset >> > java.net.SocketException: Connection reset >> > at >> > >> java.net.SocketInputStream.read(SocketInputStream.java:168) >> > at >> > >> java.io.BufferedInputStream.read1(BufferedInputStream.java:256) >> > at >> > >> java.io.BufferedInputStream.read(BufferedInputStream.java:317) >> > at >> > >> org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown >> > Source) >> > at >> > >> java.io.FilterInputStream.read(FilterInputStream.java:116) >> > at >> > >> org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown >> > Source) >> > at >> > >> java.io.FilterInputStream.read(FilterInputStream.java:90) >> > at >> > >> org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown >> > Source) >> > at >> > >> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1447) >> > at >> > >> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) >> > at >> > >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549) >> > WARN 2012-08-14 05:13:22,186 (Worker thread '39') >> - Service >> > interruption reported for job 1344906886879 connection >> > 'SP2010': SharePoint is down attempting to read >> > 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf', >> > retrying: Connection reset >> > ERROR 2012-08-14 05:13:22,230 (Worker thread '39') - >> > Exception tossed: Unexpected jobqueue status - record >> id >> > 1344907007021, expecting active status, saw 3 >> > >> org.apache.manifoldcf.core.interfaces.ManifoldCFException: >> > Unexpected jobqueue status - record id 1344907007021, >> > expecting active status, saw 3 >> > at >> > >> org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:711) >> > at >> > >> org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:2435) >> > at >> > >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:745) >> > >>
