[
https://issues.apache.org/jira/browse/STANBOL-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler updated STANBOL-1326:
-----------------------------------------
Fix Version/s: (was: 1.0.0)
2.0.0
> Stanbol Enhancer 2.0 API
> ------------------------
>
> Key: STANBOL-1326
> URL: https://issues.apache.org/jira/browse/STANBOL-1326
> Project: Stanbol
> Issue Type: Epic
> Components: Enhancement Engines, Enhancer
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Fix For: 2.0.0
>
>
> h2. Enhancer API v1.0
> =================
> This describes changes and addition to the Stanbol Enhancer API with version
> 1.0.
> Main Features of the new API are
> * Clear separation between
> *# the content and analysis results
> *# metadata and state of the enhancement process
> * Support for
> [EnhancementProperties](https://issues.apache.org/jira/browse/STANBOL-488)
> (_Note_: light weight version is also supported started from `0.12.1` - see
> [STANBOL-1280](https://issues.apache.org/jira/browse/STANBOL-1280) for
> details) EnhancementProperties can be used for Enhancement Chain /
> ExecutionPlan specific parameters as well as Request specific parameters.
> Typical use cases include: Parsing of credentials for remote services; the
> configuration of dereferenced fields, minimum confidence values, ...
> * Low level support for [Enhancement
> Workflows](https://issues.apache.org/jira/browse/STANBOL-1008): The new API
> will allow to create `EnhancementJobs` directly based on RDF
> [ExecutionPlans](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan)
> in addition to Enhancement `Chains`. In addition The `EnhancementJobManager`
> will support partial executions of selected `ExecutionNodes` as well as
> resuming the enhancement after an change of the execution plan. This will
> allow enhancement workflows e.g. to (1) start with a simple language
> detection; (2) add additional `ExecutionNodes` based on the detected language
> and resume processing by parsing the `EnhancementJob` again the the
> `EnhancementJobManager`
> * Low level support for distributed computation of EnhancementJobs: The API
> will allow to execute only selected `ExecutionNodes`of an
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
> This will allow to have different Stanbol Worker with different
> configurations. `EnhancementJobManager` running on workers could than be
> instructed to only execute specific `ExecutionNodes`.
> The following sections do provide an overview about API changes and additions.
> h3. EnhancementJob
> --------------
> The `EnhancementJob`represents the process of the enhancement of an
> `ContentItem` by the Stanbol Enhancer. It is a new interface introduced with
> `1.0`. Before 1.0 this was an implementation specific class used by the
> [EventJobManager](http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementjobmanager#eventjobmanager).
> {code:java}
> EnhancementJob
> + getJobId : NonLiteral
> + getLock() : ReadWriteLock
> + getExecutionMetadata() : MGraph
> + getContentItem() : ContentItem
> {code}
> The `EnhancementJob` provides access to both the `ContentItem` and processing
> information. Only parsers, Writers and the `EnhancementJobManager` are
> intended to have a reference to the `EnhancementJob`. `EnhancementEngines`
> will only get an reference to the `ContentItem`. Engines will also no longer
> be able to access the `MGraph` with the
> [ExecutionMetadata](http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata)
> nor the
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
> Both can be obtained in 0.12.1 via the
> [ContentParts](http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.#contentparts)
> of the processed `ContentItem`.
> The `jobId` of the EnhancementJob is used to reference the Job. It SHOULD be
> different as the URI of the ContentItem to avoid issues with multiple
> requests for the same ContentItem (as described by
> [STANBOL-830](https://issues.apache.org/jira/browse/STANBOL-830)
> The EnhancementJob API does not distinguish between the
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan)
> and the
> [ExecutionMetadata](http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata).
> There is only a single getter for the ExecutionMetadata that need to provide
> access to both.
> In most cases it will be sufficient to copy over the triples of the
> ExecutionPlan to the `MGraph` of the ExecutionMetadata before starting the
> enhancement. However in use cases where the ExecutionPlan might change (e.g.
> in between several partial executions) one can also use a setting where the
> ExecutionPlan is kept in a separate graph. In enforce this the Clerezza
> `UnionMGraph` implementation can be used. This implementation supports to
> create an union view over several TripleCollections while all modifications
> are done on the first one. So creating a `UnionMGraph`with the MGraph holding
> the ExectionMetadata at idx `0` and the the TripleCollection with the
> ExecutionPlan at idx `1` results in the desired setting.
> h3. EnhancementJobManager
> ---------------------
> The job manager interface is very simple. It only contains the method to
> process an EnhancementJob. Optionally an array of `ep:ExecutionNode`
> instances can be parsed.
> {code:java}
> EnhancementJobManager
> + enhance(EnhancementJob job, NonLiteral...executions)
> {code}
> The parsed `EnhancementJob` is expected to have its ExecutionMetadata to be
> initialized. In contrast to earlier Stanbol version the
> `EnhancementJobManager` is no longer responsible to initialize those Metadata
> based on the parsed enhancement `Chain`. This is now in the responsibility of
> the `EnhancementJobBuilder`.
> The new `EnhancementJobManager` will support _partial executions_. This means
> that the callers can request the JobManager to process only some of the
> `ep:ExecutionNode` defined by the
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
> If no executions are defined the `EnhancementJobManager` is expected to
> execute all execution nodes.
> If a array of `ep:ExecutionNode` instances is parsed the
> EnhancementJobManager must only consider to process those and ignore all
> others. If those executions do `ep:dependsOn` on another `ep:ExecutionNode`
> that is not included and not yet completed (not `ep:optional` and not yet
> processed) the job manager is expected to fail with a `ChainException`.
> The `EnhancementJobManager` needs to consider existing `em:EngineExecutions`
> and their `em:status`. This is important correctly resume the processing of
> partially completed enhancement jobs.
> h3. EnhancementJobBuilder
> ---------------------
> The EnhancementJobBuilder allows to create EnhancementJobs. As building an
> EnhancementJob requires to select specific implementations of the
> `EnhancementJob` and `ContentItem` the `EnhancementJobBuilder` does not have
> a constructor, but an own `EnhancementJobFactory` is used. The
> `EnhancementJobFactory` is an OSGI service and can be looked up as those by
> components that need to build `EnhancementJob` instances.
> {code:java}
> EnhancementJobFactory
> + create() : EnhancementJobBuilder
> EnhancementJobBuilder
> + contentSource(ContentSource) : EnhancementJobBuilder
> + id(String id)
> + cotentRef(ContentReference)
> + chain(Chain chain)
> + execPlan(TripleCollection ExecutionPlan)
> + **(..)
> + build() : EnhancementJob
> {code}
> Intended Usage:
> {code:java}
> @Reference
> EnhancementJobFactory ejf;
>
> @Reference
> EnhancementJobManager ejm;
>
> ContentSource content; //the parsed content
> Chain chain; //the requested enhancement chain
> ejm.enhance(ejf.create()
> .source(content)
> .chain(chain)
> .build());
> {code}
> The `EnhancementJobBuilder` is obtained by using the
> EnhancementJobFactory#create() method. After creation the builder provides an
> API to set the parsed content, id as well as the enhancement chain. As an
> alternative the ExecutionPlan can also be set as RDF graph. After the
> configuration the `EnhancementJob` can be `#build()` and parsed to the
> `EnhancementJobManager`.
> h3. ContentItem
> -----------
> There will be also minor API adaptions to the ContentItem API. The main
> reason for that is the removal of the `ContentItemFactory` combined with the
> requirement of some `EnhancementEngines` to create `Blob` instances. Because
> of that methods will be added to the ContentItem that allow add an `Blob`
> content part based on a `ContentSource` as well as a `ContentSink`
> {code:java}
> ContentItem
> + addContent(UriRef id, ContentSource source) : Blob
> + addContentStream(UriRef id, String mediaType) : ContentStink
> {code}
> This methods will replace the `ContentItemFactory#createBlob(..)` and
> `ContentItemFactory#createContentSink(..)` methods. This means that
> EnhancementEngines that need to create `Blobs` need no longer care about
> obtaining a `ContentItemFactory` instance. The right `Blob` implementation to
> be used will already be wired when the `ContentItem` is created by the
> `EnhancementJobBuilder`.
> _Notes:_
> * the `ContentItem#addPart(..)` method can still be used to add `Blob`
> instances to the `ContentItem`. This might be useful for Engines that do
> provide their own `Blob` implementation.
> * both `addContent*` methods will override any contentPart registered with
> the parsed id. Those methods do NOT return the previously registered part
> such as the `#addPart(..)` method.
> h3. EnhancementEngine
> -----------------
> The API of the `EnhancementEngine` interface will be adapted to parse the
> [EnhancementProperties](https://issues.apache.org/jira/browse/STANBOL-488) as
> additional parameter of the `#computeEnhancements(..)` method
> {code:java}
> EnhancementEngine
> + getName() : String
> + canEnhance(ContentItem ci) : int
> + computeEnhancements(ContentItem ci, Map<String,Object> properties)
> {code}
> A new Map instance with a copy of the properties will be parsed to the
> engine. Therefore changes to the map will have no side effects.
> For details about EnhancementProperties see
> [STANBOL-488](https://issues.apache.org/jira/browse/STANBOL-488.
--
This message was sent by Atlassian JIRA
(v6.2#6252)