[ 
https://issues.apache.org/jira/browse/CONNECTORS-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835134#comment-13835134
 ] 

Karl Wright commented on CONNECTORS-781:
----------------------------------------

The approach I am going to try involves tracking the maximum possible 
allocation for each bin, and attempting to do the allocation up-front.  I can 
do this effectively because documents are added in blocks up to count 100, so 
at the time I am done creating all the PriorityCalculator objects, I should 
know exactly what the maximums are.  Of course, this is still just
an estimate, because there are plenty of cases where the allocated document 
priority will not be used - and there is no way in advance to refine the 
estimate without doing a lot more work.

The other thing that has to happen will be to do the allocation outside of the 
main addDocuments() transaction.  This decouples the allocation from the usage, 
which should win big in allowing parallelism.  But if ANY allocations wind up 
taking place within the addDocuments() transaction, then effectively all gains 
are lost, so by definition we will wind up throwing some precalculated document 
priorities away on every addDocuments() call.  Not sure what the effects of 
that would be.


> Fault-Tolerant Setup for ManifoldCF Agent.
> ------------------------------------------
>
>                 Key: CONNECTORS-781
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-781
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework agents process, Framework core, Framework 
> crawler agent
>    Affects Versions: ManifoldCF 1.5
>            Reporter: Swami Rajamohan
>            Assignee: Karl Wright
>              Labels: agents, crawler, fault-tolerance
>             Fix For: ManifoldCF 1.5
>
>
> It should be possible to setup ManifoldCF as a Fault-Tolerant infrastructure.
> The Agent component of ManifoldCF should support multiple instances of an 
> agent crawling against a single crawl store, to be able to both distribute 
> (share) the crawl load as well as to be able to pick up a request that gets 
> abruptly terminated due to either partitioning of the instance/failure of the 
> instance itself.
> Since there is a proposal to move to a store like Voldemort, it would be nice 
> to be able to have a fault tolerant infrastructure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to