Re: Job stuck - WorkerThread functions return null

2018-11-12 Thread Karl Wright
Hi,
Have you been modifying the framework code?  If so, I really cannot help
you.

If you haven't -- it looks like you've got code that is injecting document
identifiers that are incorrect.  But I will need to see a full stack trace
to be sure of that.

Thanks,
Karl


On Mon, Nov 12, 2018 at 4:06 AM Cheng Zeng  wrote:

> Hi Karl,
>
> I am developing my own repository where I borrowed some code from the file
> repository connector. I use my repository connector to crawling documents
> from IBM domino system. I managed to retrieve all the files in the domino,
> however, when I restart my job to recrawl the database in the domino, I've
> got problems with the following code where 
> previousDocuments.get(documentIdentifierHash)
> in the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null
> for some of the document ids. As a result, the job got stuck with the
> specific document id.
>
> Could you please tell me how I could fix the problem?
>
>  protected IPipelineSpecificationWithVersions
> computePipelineSpecificationWithVersions(String documentIdentifierHash,
>   String componentIdentifierHash,
>   String documentIdentifier)
> {
>   QueuedDocument qd = previousDocuments.get(documentIdentifierHash);
>  // return null. The problem is here.
>   if (qd == null)
> throw new IllegalArgumentException("Unrecognized document
> identifier: '"+documentIdentifier+"'");
>   return new
> PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
> }
>
>
> Thanks a lot.
>
> Cheng
>


Job stuck - WorkerThread functions return null

2018-11-12 Thread Cheng Zeng
Hi Karl,

I am developing my own repository where I borrowed some code from the file 
repository connector. I use my repository connector to crawling documents from 
IBM domino system. I managed to retrieve all the files in the domino, however, 
when I restart my job to recrawl the database in the domino, I've got problems 
with the following code where previousDocuments.get(documentIdentifierHash) in 
the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null for 
some of the document ids. As a result, the job got stuck with the specific 
document id.

Could you please tell me how I could fix the problem?

 protected IPipelineSpecificationWithVersions 
computePipelineSpecificationWithVersions(String documentIdentifierHash,
  String componentIdentifierHash,
  String documentIdentifier)
{
  QueuedDocument qd = previousDocuments.get(documentIdentifierHash);  // 
return null. The problem is here.
  if (qd == null)
throw new IllegalArgumentException("Unrecognized document identifier: 
'"+documentIdentifier+"'");
  return new 
PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
}


Thanks a lot.

Cheng