[
https://issues.apache.org/jira/browse/CONNECTORS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377509#comment-16377509
]
Ahmed Mahfouz commented on CONNECTORS-1497:
-------------------------------------------
Alright, I changed the implementation as you instructed but I left overriding
the schedule to the repository connector when seeding the document if it wants
to. So it's now optional and left for the repository connector implementation.
I overloaded the addSeedDocument with a new one that accepts overrideSchedule
parameter:
{code:java}
public void addSeedDocument(String documentIdentifier, boolean
overrideSchedule) throws ManifoldCFException;
{code}
for STATUS_PENDING it's already having some logic so I didn't want to mess it
up. I just added the code to STATUS_PENDINGPURGATORY as following:
{code:java}
case STATUS_PENDINGPURGATORY:
// In this case we presume that the reason we are in this state is due to
adaptive crawling or retry, so DON'T bump up the schedule!
// The existing doc priority field should also be preserved.
if (desiredExecuteTime == -1L)
map.put(checkTimeField,new Long(0L));
else
map.put(checkTimeField,new Long(desiredExecuteTime));
break;
{code}
> Re-index seeded modified documents when the re-crawl interval is infinity and
> connector model is MODEL_ADD_CHANGE
> -------------------------------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1497
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1497
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Framework agents process
> Affects Versions: ManifoldCF 2.9.1
> Reporter: Ahmed Mahfouz
> Assignee: Karl Wright
> Priority: Major
> Attachments: CONNECTORS-1497.patch, CONNECTORS-1497.patch2,
> CONNECTORS-1497.patch3
>
>
> Trying to avoid a full scan of all documents for a better efficiency with a
> large number of documents. I tried so many different setting for the Jobs but
> I couldn't accomplish that. Especially when the repository connector model is
> MODEL_ADD_CHANGE I was expecting the modified documents seeded should be
> re-indexed immediately similar to the new seeds but I found out it uses the
> re-crawl time as the scheduled time and it waits for the full scan to get
> re-indexed. I avoided full scan by setting the re-crawl interval to infinity
> but still, my modified documents seeds were not getting indexed. After
> digging into the code for quite good time. I did some modification to the
> JobManager and it worked for me. I would like to share the change with you
> for review so I opened this ticket.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)