[jira] [Commented] (CONNECTORS-962) Support multiple output connections for a single job

Karl Wright (JIRA) Wed, 11 Jun 2014 09:57:48 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028010#comment-14028010
 ]


Karl Wright commented on CONNECTORS-962:
----------------------------------------

Contract for RepositoryDocument right now:
(1) Repository connector creates RepositoryDocument
(2) Repository connector calls RepositoryDocument.setData(InputStream) (or 
whatever it is)
(3) Repository connector calls activity method
(4) Various pipeline methods see the RepositoryDocument object, and only one 
reads it, then returns
(5) Repository connector closes the InputStream itself (NOT the 
RepositoryDocument object)

Maintaining backwards compatibility is hard without a RepositoryDocument 
close() or cleanup() method.  But this can be invoked at the processactivity 
level.  The flow becomes:

(1) Repository connector creates RepositoryDocument
(2) Repository connector calls RepositoryDocument.setData(InputStream) (or 
whatever it is)
(3) Repository connector calls activity method
(4) ProcessActivity calls RepositoryDocument.setMultipleConsumers(), based on 
current pipeline
(5) Various pipeline methods see the RepositoryDocument object, and many may 
read the stream
(6) ProcessActivity calls RepositoryDocument.cleanup()
(7) Repository connector closes the InputStream itself (NOT the 
RepositoryDocument object)


> Support multiple output connections for a single job
> ----------------------------------------------------
>
>                 Key: CONNECTORS-962
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-962
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> Zaizi has a requirement to support multiple outputs for a single job.  In 
> theory this requirement can be met by doing the following:
> - Allow multiple output connections, and multiple pipelines, per job
> - Keep a distinct ingeststatus record for each document/output combination
> - Modify WorkerThread to call IncrementalIndexer multiple times for every 
> document fetched
> Places where different things need to happen are:
> - RepositoryDocument - because one binary stream will not do for multiple 
> outputs
> - UI, obviously, because there will need to be multiple pipelines, not just 
> one, and in addition it would be probably important to be able to "split" the 
> pipeline at arbitrary points



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CONNECTORS-962) Support multiple output connections for a single job

Reply via email to