[jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE

Karl Wright (JIRA) Mon, 26 Feb 2018 09:51:22 -0800

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377260#comment-16377260
 ]


Karl Wright commented on CONNECTORS-1497:
-----------------------------------------

This is the wrong place to put this in any case.

Please examine the method signature:

{code}
  /** Add an initial set of documents to the queue.
  * This method is called during job startup, when the queue is being loaded.
  * A set of document references is passed to this method, which updates the 
status of the document
  * in the specified job's queue, according to specific state rules.
  *@param processID is the current process ID.
  *@param jobID is the job identifier.
  *@param legalLinkTypes is the set of legal link types that this connector 
generates.
  *@param docIDs are the local document identifiers.
  *@param overrideSchedule is true if any existing document schedule should be 
overridden.
  *@param hopcountMethod is either accurate, nodelete, or neverdelete.
  *@param documentPriorities are the document priorities corresponding to the 
document identifiers.
  *@param prereqEventNames are the events that must be completed before each 
document can be processed.
  */
  @Override
  public void addDocumentsInitial(String processID, Long jobID, String[] 
legalLinkTypes,
    String[] docIDHashes, String[] docIDs, boolean overrideSchedule,
    int hopcountMethod, IPriorityCalculator[] documentPriorities,
    String[][] prereqEventNames)
    throws ManifoldCFException
{code}

Note the parameter called "overrideSchedule".  You want to set that to "true" 
to override the schedule in the manner you are trying to do.

This method is called during seeding.  When this is called during the run of a 
non-continuous job, overrideSchedule=true already.  So the question is whether 
you want all *continuous* jobs to override the schedule every time they reseed. 
 I'm still not sold that that is the right thing, but assuming it is, then you 
want to find where that happens (it's a different thread that does continuous 
job seeding than does initial job seeding) and change that parameter in the 
addDocumentsInitial() method call there.


> Re-index seeded modified documents when the re-crawl interval is infinity and 
>   connector model is MODEL_ADD_CHANGE
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1497
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1497
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 2.9.1
>            Reporter: Ahmed Mahfouz
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: CONNECTORS-1497.patch
>
>
> Trying to avoid a full scan of all documents for a better efficiency with a 
> large number of documents. I tried so many different setting for the Jobs but 
> I couldn't accomplish that. Especially when the repository connector model is 
> MODEL_ADD_CHANGE I was expecting the modified documents seeded should be 
> re-indexed immediately similar to the new seeds but I found out it uses the 
> re-crawl time as the scheduled time and it waits for the full scan to get 
> re-indexed. I avoided full scan by setting the re-crawl interval to infinity 
> but still, my modified documents seeds were not getting indexed. After 
> digging into the code for quite good time. I did some modification to the 
> JobManager and it worked for me. I would like to share the change with you 
> for review so I opened this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE

Reply via email to