[ 
https://issues.apache.org/jira/browse/CONNECTORS-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825008#comment-13825008
 ] 

Karl Wright commented on CONNECTORS-781:
----------------------------------------

Some more analysis and thoughts.

  - ManifoldCF.resetAllDocumentPriorities is called whenever the flow of the 
documents changes by user intervention or job abort.  It is written to use 
queueTracker.beginReset() and queueTracker.endReset(), which basically stops 
anything else from doing anything until complete.  Otherwise: (a) bin counts 
need to be kept globally; (b) I think connector statistics can be kept locally. 
 Whether the system is in the process of having its doc priorities reset is a 
global feature though, because while that is going on any document queuing 
operations must make sure that the documents are added in such a way that they 
will be caught by the reprioritization before it completes.  Note: The doc 
priorities are also reset when the system is first started, because of the 
reset operations changing things around: see next bullet point.  So every time 
ANY cluster member comes up, all documents need to be reprioritized.
  - Reset operations really need to know which process is involved.  So we need 
a schema change for this; there needs to be a column in the jobqueue table that 
is used when a document is in a transient state so that we know which actor 
(process) was supposedly doing the deed.  We also need a concept of "global 
reset", which should basically be called only when the mix of cluster members 
changes.  Essentially, if a cluster member goes away permanently, that would 
allow its documents to be reset.


> Fault-Tolerant Setup for ManifoldCF Agent.
> ------------------------------------------
>
>                 Key: CONNECTORS-781
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-781
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework agents process, Framework core, Framework 
> crawler agent
>    Affects Versions: ManifoldCF 1.5
>            Reporter: Swami Rajamohan
>            Assignee: Karl Wright
>              Labels: agents, crawler, fault-tolerance
>             Fix For: ManifoldCF 1.5
>
>
> It should be possible to setup ManifoldCF as a Fault-Tolerant infrastructure.
> The Agent component of ManifoldCF should support multiple instances of an 
> agent crawling against a single crawl store, to be able to both distribute 
> (share) the crawl load as well as to be able to pick up a request that gets 
> abruptly terminated due to either partitioning of the instance/failure of the 
> instance itself.
> Since there is a proposal to move to a store like Voldemort, it would be nice 
> to be able to have a fault tolerant infrastructure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to