[
https://issues.apache.org/jira/browse/CONNECTORS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028010#comment-14028010
]
Karl Wright commented on CONNECTORS-962:
----------------------------------------
Contract for RepositoryDocument right now:
(1) Repository connector creates RepositoryDocument
(2) Repository connector calls RepositoryDocument.setData(InputStream) (or
whatever it is)
(3) Repository connector calls activity method
(4) Various pipeline methods see the RepositoryDocument object, and only one
reads it, then returns
(5) Repository connector closes the InputStream itself (NOT the
RepositoryDocument object)
Maintaining backwards compatibility is hard without a RepositoryDocument
close() or cleanup() method. But this can be invoked at the processactivity
level. The flow becomes:
(1) Repository connector creates RepositoryDocument
(2) Repository connector calls RepositoryDocument.setData(InputStream) (or
whatever it is)
(3) Repository connector calls activity method
(4) ProcessActivity calls RepositoryDocument.setMultipleConsumers(), based on
current pipeline
(5) Various pipeline methods see the RepositoryDocument object, and many may
read the stream
(6) ProcessActivity calls RepositoryDocument.cleanup()
(7) Repository connector closes the InputStream itself (NOT the
RepositoryDocument object)
> Support multiple output connections for a single job
> ----------------------------------------------------
>
> Key: CONNECTORS-962
> URL: https://issues.apache.org/jira/browse/CONNECTORS-962
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Framework crawler agent
> Affects Versions: ManifoldCF 1.7
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
>
> Zaizi has a requirement to support multiple outputs for a single job. In
> theory this requirement can be met by doing the following:
> - Allow multiple output connections, and multiple pipelines, per job
> - Keep a distinct ingeststatus record for each document/output combination
> - Modify WorkerThread to call IncrementalIndexer multiple times for every
> document fetched
> Places where different things need to happen are:
> - RepositoryDocument - because one binary stream will not do for multiple
> outputs
> - UI, obviously, because there will need to be multiple pipelines, not just
> one, and in addition it would be probably important to be able to "split" the
> pipeline at arbitrary points
--
This message was sent by Atlassian JIRA
(v6.2#6252)