Re: Error Job stop after repeatidly interruption
9 (Worker thread '87') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > WARN 2018-11-26T13:18:26,668 (Worker thread '92') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > WARN 2018-11-26T13:18:26,722 (Worker thread '99') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > WARN 2018-11-26T13:18:26,862 (Worker thread '75') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > WARN 2018-11-26T13:18:26,862 (Worker thread '12') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > > > > > So, I don’t understand if the worker tried to reconnect after 10 seconds > or not > > > > How could I check it? > > > > Thanks a lot > > > > Mario > > > > > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 15 novembre 2018 13:00 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Error Job stop after repeatidly interruption > > > > The easiest way to do it is just to check out current trunk: > > > > svn co https://svn.apache.org/repos/asf/manifoldcf/trunk > <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fmanifoldcf%2Ftrunk=01%7C01%7CMario.Bisonti%40vimar.com%7Ce35d2a3ecf744ed254da08d64af1ecf0%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=H%2BeYb5xRnrIp%2FIlcR%2F%2FR2FaX8KjYR3Ec6Dvyeii8OnU%3D=0> > > > > No need to apply a patch then. Just build: > > > > ant make-core-deps > > ant make-deps > > ant build > > > > Karl > > > > > > On Thu, Nov 15, 2018 at 4:30 AM Bisonti Mario > wrote: > > Thanks a lot Karl. > > > > To overwrite the connector with your patch, have I to download the trunk > and recompile, isn’t it? > > > > Excuse me for my questions but I am not expert on programming, compiling, > etc. > > > > > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 15 novembre 2018 09:48 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Error Job stop after repeatidly interruption > > > > (1) I increased the retries to go at least 10 minutes. > > (2) I handled the 503 response explicitly, with the same logic. > > See: https://issues.apache.org/jira/browse/CONNECTORS-1556 > <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCONNECTORS-1556=01%7C01%7CMario.Bisonti%40vimar.com%7Ce35d2a3ecf744ed254da08d64af1ecf0%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=LWrGl7ELqT4IZRvO0%2Fq43m1W6D40jY0RvU5JA%2F2fjLk%3D=0> > > > > Karl > > > > > > On Thu, Nov 15, 2018 at 3:35 AM Bisonti Mario > wrote: > > Yes, Karl. > > > > Is it possible to apply the same your concept , wait 10 sec and retry > three times , to the 503 error , too? > > > > So, I would like to try, if, with the modification, I obtain that job end > correctly instead of failure. > > > > > > Thanks a lot > > Mario > > > > *Da:* Karl Wright > *Inviato:* giovedì 15 novembre 2018 09:17 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Error Job stop after repeatidly interruption > > > > Hi Mario, > > > > Here's the code: > > > > >>>>>> > > try { > > //System.out.println("About to do a content PUT"); > > response = this.httpClient.execute(tikaHost, httpPut); > > //System.out.println("... content PUT succeeded"); > > } catch (IOException e) { > > // Retry 3 times, 1 ms between retries, and abort if > doesn't work > > final long currentTime = System.currentTimeMillis(); > > throw new ServiceInterruption("Tika down, retrying: > "+e.getMessage(),e,currentTim
Re: Error Job stop after repeatidly interruption
(1) I increased the retries to go at least 10 minutes. (2) I handled the 503 response explicitly, with the same logic. See: https://issues.apache.org/jira/browse/CONNECTORS-1556 Karl On Thu, Nov 15, 2018 at 3:35 AM Bisonti Mario wrote: > Yes, Karl. > > > > Is it possible to apply the same your concept , wait 10 sec and retry > three times , to the 503 error , too? > > > > So, I would like to try, if, with the modification, I obtain that job end > correctly instead of failure. > > > > > > Thanks a lot > > Mario > > > > *Da:* Karl Wright > *Inviato:* giovedì 15 novembre 2018 09:17 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Error Job stop after repeatidly interruption > > > > Hi Mario, > > > > Here's the code: > > > > >>>>>> > > try { > > //System.out.println("About to do a content PUT"); > > response = this.httpClient.execute(tikaHost, httpPut); > > //System.out.println("... content PUT succeeded"); > > } catch (IOException e) { > > // Retry 3 times, 1 ms between retries, and abort if > doesn't work > > final long currentTime = System.currentTimeMillis(); > > throw new ServiceInterruption("Tika down, retrying: > "+e.getMessage(),e,currentTime + 1L, > > -1L,3,true); > > } > > > > responseCode = response.getStatusLine().getStatusCode(); > > if (response.getStatusLine().getStatusCode() == 200 || > response.getStatusLine().getStatusCode() == 204) { > > tikaServerIs = response.getEntity().getContent(); > > try { > > responseDs = new FileDestinationStorage(); > > final OutputStream os2 = responseDs.getOutputStream(); > > try { > > IOUtils.copyLarge(tikaServerIs, os2, 0L, sp.writeLimit); > > } finally { > > os2.close(); > > } > > length = new Long(responseDs.getBinaryLength()); > > } finally { > > tikaServerIs.close(); > > } > > } else { > > activities.noDocument(); > > if (responseCode == 422) { > > resultCode = "TIKASERVERREJECTS"; > > description = "Tika Server rejected document with the > following reason: " > > + response.getStatusLine().getReasonPhrase(); > > return handleTikaServerRejects(description); > > } else { > > resultCode = "TIKASERVERERROR"; > > description = "Tika Server failed to parse document with > the following error: " > > + response.getStatusLine().getReasonPhrase(); > > return handleTikaServerError(description); > > } > > } > > > > } catch (IOException | ParseException e) { > > resultCode = "TIKASERVERRESPONSEISSUE"; > > description = e.getMessage(); > > int rval; > > if (e instanceof IOException) { > > rval = handleTikaServerException((IOException) e); > > } else { > > rval = handleTikaServerException((ParseException) e); > > } > > if (rval == DOCUMENTSTATUS_REJECTED) { > > activities.noDocument(); > > } > > return rval; > > } > > <<<<<< > > and > > >>>>>> > > protected static int handleTikaServerError(String description) > > throws IOException, ManifoldCFException, ServiceInterruption { > > // MHL - what does Tika throw if it gets an IOException reading the > stream?? > > Logging.ingest.warn("Tika Server: Tika Server error: " + description); > > return DOCUMENTSTATUS_REJECTED; > > } > > <<<<<< > > > > The summary: > > (1) If ManifoldCF cannot connect at all, or gets an IO error, it will wait > at least 10 seconds and then retry -- up to three times. > > (2) When Manifold sees a 503 error it immediately just rejects the > document. > > So you are requesting different handling for 503 errors? > > > > Karl > > > > > > On Thu, Nov 15, 2018 at 2:42 AM Bisonti Mario > wrot
Re: Error Job stop after repeatidly interruption
Hi Mario, Here's the code: >>>>>> try { //System.out.println("About to do a content PUT"); response = this.httpClient.execute(tikaHost, httpPut); //System.out.println("... content PUT succeeded"); } catch (IOException e) { // Retry 3 times, 1 ms between retries, and abort if doesn't work final long currentTime = System.currentTimeMillis(); throw new ServiceInterruption("Tika down, retrying: "+e.getMessage(),e,currentTime + 1L, -1L,3,true); } responseCode = response.getStatusLine().getStatusCode(); if (response.getStatusLine().getStatusCode() == 200 || response.getStatusLine().getStatusCode() == 204) { tikaServerIs = response.getEntity().getContent(); try { responseDs = new FileDestinationStorage(); final OutputStream os2 = responseDs.getOutputStream(); try { IOUtils.copyLarge(tikaServerIs, os2, 0L, sp.writeLimit); } finally { os2.close(); } length = new Long(responseDs.getBinaryLength()); } finally { tikaServerIs.close(); } } else { activities.noDocument(); if (responseCode == 422) { resultCode = "TIKASERVERREJECTS"; description = "Tika Server rejected document with the following reason: " + response.getStatusLine().getReasonPhrase(); return handleTikaServerRejects(description); } else { resultCode = "TIKASERVERERROR"; description = "Tika Server failed to parse document with the following error: " + response.getStatusLine().getReasonPhrase(); return handleTikaServerError(description); } } } catch (IOException | ParseException e) { resultCode = "TIKASERVERRESPONSEISSUE"; description = e.getMessage(); int rval; if (e instanceof IOException) { rval = handleTikaServerException((IOException) e); } else { rval = handleTikaServerException((ParseException) e); } if (rval == DOCUMENTSTATUS_REJECTED) { activities.noDocument(); } return rval; } <<<<<< and >>>>>> protected static int handleTikaServerError(String description) throws IOException, ManifoldCFException, ServiceInterruption { // MHL - what does Tika throw if it gets an IOException reading the stream?? Logging.ingest.warn("Tika Server: Tika Server error: " + description); return DOCUMENTSTATUS_REJECTED; } <<<<<< The summary: (1) If ManifoldCF cannot connect at all, or gets an IO error, it will wait at least 10 seconds and then retry -- up to three times. (2) When Manifold sees a 503 error it immediately just rejects the document. So you are requesting different handling for 503 errors? Karl On Thu, Nov 15, 2018 at 2:42 AM Bisonti Mario wrote: > Hallo Karl. > > I opened an issue on Tika here: > > https://issues.apache.org/jira/browse/TIKA-2776 > > > > The person that develops tika, suggests me to put a waiting on the client > (in my case manifoldcf) > > > https://issues.apache.org/jira/browse/TIKA-2776?focusedCommentId=16686620=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16686620 > > > > I am not able to do this… > > Is it possible to implement on the MCF source? > > > > > Thanks a lot > > > > Mario > > > > *Da:* Karl Wright > *Inviato:* giovedì 8 novembre 2018 21:03 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Error Job stop after repeatidly interruption > > > > Hi Mario, > > > > The Tika external connector retries for a while before it gives up and > aborts the job. If you can get the Tika server back up within a reasonable > period of time all should be well. But if one specific document *always* > brings down the Tika server, it will be hard to recover from that. > > > > Karl > > > > > > On Thu, Nov 8, 2018 at 2:56 PM Bisonti Mario > wrote: > > Hallo. > > > > I am trying to index more than 500 documents in a Windows Share. > > > > It happens that job is interrupted due to repeatidly interruption. > > This is the manifold.log: > > . > . > WARN 2018-11-07T21:53:25,296 (Worker thread '59') - Service interruption > reported for job 1533797717712 connec
Re: Error Job stop after repeatidly interruption
Hi Mario, The Tika external connector retries for a while before it gives up and aborts the job. If you can get the Tika server back up within a reasonable period of time all should be well. But if one specific document *always* brings down the Tika server, it will be hard to recover from that. Karl On Thu, Nov 8, 2018 at 2:56 PM Bisonti Mario wrote: > Hallo. > > > > I am trying to index more than 500 documents in a Windows Share. > > > > It happens that job is interrupted due to repeatidly interruption. > > This is the manifold.log: > > . > . > WARN 2018-11-07T21:53:25,296 (Worker thread '59') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to localhost:9998 [localhost/127.0.0.1, > localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused) > > WARN 2018-11-07T21:53:25,476 (Worker thread '89') - Service interruption > reported for job 1533797717712 connection 'WinShare': Tika down, retrying: > Connect to localhost:9998 [localhost/127.0.0.1, > localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused) > > WARN 2018-11-07T21:53:33,814 (Worker thread '15') - JCIFS: Possibly > transient exception detected on attempt 1 while getting share security: All > pipe instances are busy. > > jcifs.smb.SmbException: All pipe instances are busy. > > at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbTransport.send(SmbTransport.java:669) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbSession.send(SmbSession.java:238) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.open0(SmbFile.java:993) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.open(SmbFile.java:1010) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) > ~[jcifs-1.3.18.3.jar:?] > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438) > [mcf-jcifs-connector.jar:?] > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221) > [mcf-jcifs-connector.jar:?] > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627) > [mcf-jcifs-connector.jar:?] > > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > [mcf-pull-agent.jar:?] > > WARN 2018-11-07T21:53:57,861 (Worker thread '12') - JCIFS: Possibly > transient exception detected on attempt 1 while getting share security: All > pipe instances are busy. > > jcifs.smb.SmbException: All pipe instances are busy. > > at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbTransport.send(SmbTransport.java:669) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbSession.send(SmbSession.java:238) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.open0(SmbFile.java:993) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.smb.SmbFile.open(SmbFile.java:1010) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) > ~[jcifs-1.3.18.3.jar:?] > > at > jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) > ~[jcifs-1.3.18.3.jar:?] > > at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) > ~[jcifs-1.3.18.3.jar:?] > > at