Re: Error Job stop after repeatidly interruption

2018-11-26 Thread Karl Wright
9 (Worker thread '87') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/
> 172.16.1.135] failed: Connection refused (Connection refused)
>
> WARN 2018-11-26T13:18:26,668 (Worker thread '92') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/
> 172.16.1.135] failed: Connection refused (Connection refused)
>
> WARN 2018-11-26T13:18:26,722 (Worker thread '99') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/
> 172.16.1.135] failed: Connection refused (Connection refused)
>
> WARN 2018-11-26T13:18:26,862 (Worker thread '75') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/
> 172.16.1.135] failed: Connection refused (Connection refused)
>
> WARN 2018-11-26T13:18:26,862 (Worker thread '12') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/
> 172.16.1.135] failed: Connection refused (Connection refused)
>
>
>
>
>
> So, I don’t understand if the worker tried to reconnect after 10 seconds
> or not
>
>
>
> How could I check it?
>
>
>
> Thanks a lot
>
>
>
> Mario
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* giovedì 15 novembre 2018 13:00
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Error Job stop after repeatidly interruption
>
>
>
> The easiest way to do it is just to check out current trunk:
>
>
>
> svn co https://svn.apache.org/repos/asf/manifoldcf/trunk
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fmanifoldcf%2Ftrunk=01%7C01%7CMario.Bisonti%40vimar.com%7Ce35d2a3ecf744ed254da08d64af1ecf0%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=H%2BeYb5xRnrIp%2FIlcR%2F%2FR2FaX8KjYR3Ec6Dvyeii8OnU%3D=0>
>
>
>
> No need to apply a patch then.  Just build:
>
>
>
> ant make-core-deps
>
> ant make-deps
>
> ant build
>
>
>
> Karl
>
>
>
>
>
> On Thu, Nov 15, 2018 at 4:30 AM Bisonti Mario 
> wrote:
>
> Thanks a lot Karl.
>
>
>
> To overwrite the connector with your patch, have I to download the trunk
> and recompile, isn’t it?
>
>
>
> Excuse me for my questions but I am not expert on programming, compiling,
> etc.
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* giovedì 15 novembre 2018 09:48
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Error Job stop after repeatidly interruption
>
>
>
> (1) I increased the retries to go at least 10 minutes.
>
> (2) I handled the 503 response explicitly, with the same logic.
>
> See: https://issues.apache.org/jira/browse/CONNECTORS-1556
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCONNECTORS-1556=01%7C01%7CMario.Bisonti%40vimar.com%7Ce35d2a3ecf744ed254da08d64af1ecf0%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=LWrGl7ELqT4IZRvO0%2Fq43m1W6D40jY0RvU5JA%2F2fjLk%3D=0>
>
>
>
> Karl
>
>
>
>
>
> On Thu, Nov 15, 2018 at 3:35 AM Bisonti Mario 
> wrote:
>
> Yes, Karl.
>
>
>
> Is it possible to apply the same your concept , wait 10 sec and retry
> three times , to the 503 error , too?
>
>
>
> So, I would like to try, if, with the modification, I obtain that job end
> correctly instead of failure.
>
>
>
>
>
> Thanks a lot
>
> Mario
>
>
>
> *Da:* Karl Wright 
> *Inviato:* giovedì 15 novembre 2018 09:17
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Error Job stop after repeatidly interruption
>
>
>
> Hi Mario,
>
>
>
> Here's the code:
>
>
>
> >>>>>>
>
> try {
>
>   //System.out.println("About to do a content PUT");
>
>   response = this.httpClient.execute(tikaHost, httpPut);
>
>   //System.out.println("... content PUT succeeded");
>
> } catch (IOException e) {
>
>   // Retry 3 times, 1 ms between retries, and abort if
> doesn't work
>
>   final long currentTime = System.currentTimeMillis();
>
>   throw new ServiceInterruption("Tika down, retrying:
> "+e.getMessage(),e,currentTim

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
(1) I increased the retries to go at least 10 minutes.
(2) I handled the 503 response explicitly, with the same logic.

See: https://issues.apache.org/jira/browse/CONNECTORS-1556

Karl


On Thu, Nov 15, 2018 at 3:35 AM Bisonti Mario 
wrote:

> Yes, Karl.
>
>
>
> Is it possible to apply the same your concept , wait 10 sec and retry
> three times , to the 503 error , too?
>
>
>
> So, I would like to try, if, with the modification, I obtain that job end
> correctly instead of failure.
>
>
>
>
>
> Thanks a lot
>
> Mario
>
>
>
> *Da:* Karl Wright 
> *Inviato:* giovedì 15 novembre 2018 09:17
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Error Job stop after repeatidly interruption
>
>
>
> Hi Mario,
>
>
>
> Here's the code:
>
>
>
> >>>>>>
>
> try {
>
>   //System.out.println("About to do a content PUT");
>
>   response = this.httpClient.execute(tikaHost, httpPut);
>
>   //System.out.println("... content PUT succeeded");
>
> } catch (IOException e) {
>
>   // Retry 3 times, 1 ms between retries, and abort if
> doesn't work
>
>   final long currentTime = System.currentTimeMillis();
>
>   throw new ServiceInterruption("Tika down, retrying:
> "+e.getMessage(),e,currentTime + 1L,
>
> -1L,3,true);
>
> }
>
>
>
> responseCode = response.getStatusLine().getStatusCode();
>
> if (response.getStatusLine().getStatusCode() == 200 ||
> response.getStatusLine().getStatusCode() == 204) {
>
>   tikaServerIs = response.getEntity().getContent();
>
>   try {
>
> responseDs = new FileDestinationStorage();
>
> final OutputStream os2 = responseDs.getOutputStream();
>
> try {
>
>   IOUtils.copyLarge(tikaServerIs, os2, 0L, sp.writeLimit);
>
> } finally {
>
>   os2.close();
>
> }
>
> length = new Long(responseDs.getBinaryLength());
>
>   } finally {
>
> tikaServerIs.close();
>
>   }
>
> } else {
>
>   activities.noDocument();
>
>   if (responseCode == 422) {
>
> resultCode = "TIKASERVERREJECTS";
>
> description = "Tika Server rejected document with the
> following reason: "
>
> + response.getStatusLine().getReasonPhrase();
>
> return handleTikaServerRejects(description);
>
>   } else {
>
> resultCode = "TIKASERVERERROR";
>
> description = "Tika Server failed to parse document with
> the following error: "
>
> + response.getStatusLine().getReasonPhrase();
>
> return handleTikaServerError(description);
>
>   }
>
> }
>
>
>
>   } catch (IOException | ParseException e) {
>
> resultCode = "TIKASERVERRESPONSEISSUE";
>
> description = e.getMessage();
>
> int rval;
>
> if (e instanceof IOException) {
>
>   rval = handleTikaServerException((IOException) e);
>
> } else {
>
>   rval = handleTikaServerException((ParseException) e);
>
> }
>
> if (rval == DOCUMENTSTATUS_REJECTED) {
>
>   activities.noDocument();
>
> }
>
> return rval;
>
>   }
>
> <<<<<<
>
> and
>
> >>>>>>
>
>   protected static int handleTikaServerError(String description)
>
>   throws IOException, ManifoldCFException, ServiceInterruption {
>
> // MHL - what does Tika throw if it gets an IOException reading the
> stream??
>
> Logging.ingest.warn("Tika Server: Tika Server error: " + description);
>
> return DOCUMENTSTATUS_REJECTED;
>
>   }
>
> <<<<<<
>
>
>
> The summary:
>
> (1) If ManifoldCF cannot connect at all, or gets an IO error, it will wait
> at least 10 seconds and then retry -- up to three times.
>
> (2) When Manifold sees a 503 error it immediately just rejects the
> document.
>
> So you are requesting different handling for 503 errors?
>
>
>
> Karl
>
>
>
>
>
> On Thu, Nov 15, 2018 at 2:42 AM Bisonti Mario 
> wrot

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
Hi Mario,

Here's the code:

>>>>>>
try {
  //System.out.println("About to do a content PUT");
  response = this.httpClient.execute(tikaHost, httpPut);
  //System.out.println("... content PUT succeeded");
} catch (IOException e) {
  // Retry 3 times, 1 ms between retries, and abort if
doesn't work
  final long currentTime = System.currentTimeMillis();
  throw new ServiceInterruption("Tika down, retrying:
"+e.getMessage(),e,currentTime + 1L,
-1L,3,true);
}

responseCode = response.getStatusLine().getStatusCode();
if (response.getStatusLine().getStatusCode() == 200 ||
response.getStatusLine().getStatusCode() == 204) {
  tikaServerIs = response.getEntity().getContent();
  try {
responseDs = new FileDestinationStorage();
final OutputStream os2 = responseDs.getOutputStream();
try {
  IOUtils.copyLarge(tikaServerIs, os2, 0L, sp.writeLimit);
} finally {
  os2.close();
}
length = new Long(responseDs.getBinaryLength());
  } finally {
tikaServerIs.close();
  }
} else {
  activities.noDocument();
  if (responseCode == 422) {
resultCode = "TIKASERVERREJECTS";
description = "Tika Server rejected document with the
following reason: "
+ response.getStatusLine().getReasonPhrase();
return handleTikaServerRejects(description);
  } else {
resultCode = "TIKASERVERERROR";
description = "Tika Server failed to parse document with
the following error: "
+ response.getStatusLine().getReasonPhrase();
return handleTikaServerError(description);
  }
}

  } catch (IOException | ParseException e) {
resultCode = "TIKASERVERRESPONSEISSUE";
description = e.getMessage();
int rval;
if (e instanceof IOException) {
  rval = handleTikaServerException((IOException) e);
} else {
  rval = handleTikaServerException((ParseException) e);
}
if (rval == DOCUMENTSTATUS_REJECTED) {
  activities.noDocument();
}
return rval;
  }
<<<<<<
and
>>>>>>
  protected static int handleTikaServerError(String description)
  throws IOException, ManifoldCFException, ServiceInterruption {
// MHL - what does Tika throw if it gets an IOException reading the
stream??
Logging.ingest.warn("Tika Server: Tika Server error: " + description);
return DOCUMENTSTATUS_REJECTED;
  }
<<<<<<

The summary:

(1) If ManifoldCF cannot connect at all, or gets an IO error, it will wait
at least 10 seconds and then retry -- up to three times.
(2) When Manifold sees a 503 error it immediately just rejects the document.

So you are requesting different handling for 503 errors?

Karl


On Thu, Nov 15, 2018 at 2:42 AM Bisonti Mario 
wrote:

> Hallo Karl.
>
> I opened an issue on Tika here:
>
> https://issues.apache.org/jira/browse/TIKA-2776
>
>
>
> The person that develops tika, suggests me to put a waiting on the client
> (in my case manifoldcf)
>
>
> https://issues.apache.org/jira/browse/TIKA-2776?focusedCommentId=16686620=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16686620
>
>
>
> I am not able to do this…
>
> Is it possible to implement on the MCF source?
>
>
>
>
> Thanks a lot
>
>
>
> Mario
>
>
>
> *Da:* Karl Wright 
> *Inviato:* giovedì 8 novembre 2018 21:03
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Error Job stop after repeatidly interruption
>
>
>
> Hi Mario,
>
>
>
> The Tika external connector retries for a while before it gives up and
> aborts the job.  If you can get the Tika server back up within a reasonable
> period of time all should be well.  But if one specific document *always*
> brings down the Tika server, it will be hard to recover from that.
>
>
>
> Karl
>
>
>
>
>
> On Thu, Nov 8, 2018 at 2:56 PM Bisonti Mario 
> wrote:
>
> Hallo.
>
>
>
> I am trying to index more than 500 documents in a Windows Share.
>
>
>
> It happens that job is interrupted due to repeatidly interruption.
>
> This is the manifold.log:
>
> .
> .
> WARN 2018-11-07T21:53:25,296 (Worker thread '59') - Service interruption
> reported for job 1533797717712 connec

Re: Error Job stop after repeatidly interruption

2018-11-08 Thread Karl Wright
Hi Mario,

The Tika external connector retries for a while before it gives up and
aborts the job.  If you can get the Tika server back up within a reasonable
period of time all should be well.  But if one specific document *always*
brings down the Tika server, it will be hard to recover from that.

Karl


On Thu, Nov 8, 2018 at 2:56 PM Bisonti Mario 
wrote:

> Hallo.
>
>
>
> I am trying to index more than 500 documents in a Windows Share.
>
>
>
> It happens that job is interrupted due to repeatidly interruption.
>
> This is the manifold.log:
>
> .
> .
> WARN 2018-11-07T21:53:25,296 (Worker thread '59') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to localhost:9998 [localhost/127.0.0.1,
> localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused)
>
> WARN 2018-11-07T21:53:25,476 (Worker thread '89') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Tika down, retrying:
> Connect to localhost:9998 [localhost/127.0.0.1,
> localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused)
>
> WARN 2018-11-07T21:53:33,814 (Worker thread '15') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTransport.send(SmbTransport.java:669)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbSession.send(SmbSession.java:238)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open0(SmbFile.java:993)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open(SmbFile.java:1010)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2018-11-07T21:53:57,861 (Worker thread '12') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTransport.send(SmbTransport.java:669)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbSession.send(SmbSession.java:238)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open0(SmbFile.java:993)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open(SmbFile.java:1010)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
> ~[jcifs-1.3.18.3.jar:?]
>
> at