[ 
https://issues.apache.org/jira/browse/STANBOL-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler updated STANBOL-1326:
-----------------------------------------

    Summary: Stanbol Enhancer 2.0 API  (was: Updates to the Stanbol Enhancer 
API for 1.0)

> Stanbol Enhancer 2.0 API
> ------------------------
>
>                 Key: STANBOL-1326
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1326
>             Project: Stanbol
>          Issue Type: Epic
>          Components: Enhancement Engines, Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>             Fix For: 2.0.0
>
>
> h2. Enhancer API v1.0
> =================
> This describes changes and addition to the Stanbol Enhancer API with version 
> 1.0.
> Main Features of the new API are
> * Clear separation between 
>     *# the content and analysis results
>     *# metadata and state of the enhancement process
> * Support for 
> [EnhancementProperties](https://issues.apache.org/jira/browse/STANBOL-488) 
> (_Note_: light weight version is also supported started from `0.12.1` - see 
> [STANBOL-1280](https://issues.apache.org/jira/browse/STANBOL-1280) for 
> details) EnhancementProperties can be used for Enhancement Chain / 
> ExecutionPlan specific parameters  as well as Request specific parameters. 
> Typical use cases include: Parsing of credentials for remote services; the 
> configuration of dereferenced fields, minimum confidence values, ...
> * Low level support for [Enhancement 
> Workflows](https://issues.apache.org/jira/browse/STANBOL-1008): The new API 
> will allow to create `EnhancementJobs` directly based on RDF 
> [ExecutionPlans](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan)
>  in addition to Enhancement `Chains`. In addition The `EnhancementJobManager` 
> will support partial executions of selected `ExecutionNodes` as well as 
> resuming the enhancement after an change of the execution plan. This will 
> allow enhancement workflows e.g. to (1) start with a simple language 
> detection; (2) add additional `ExecutionNodes` based on the detected language 
> and resume processing by parsing the `EnhancementJob` again the the 
> `EnhancementJobManager`
> * Low level support for distributed computation of EnhancementJobs: The API 
> will allow to execute only selected `ExecutionNodes`of an 
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
>  This will allow to have different Stanbol Worker with different 
> configurations. `EnhancementJobManager` running on workers could than be 
> instructed to only execute specific `ExecutionNodes`.
> The following sections do provide an overview about API changes and additions.
> h3. EnhancementJob
> --------------
> The `EnhancementJob`represents the process of the enhancement of an 
> `ContentItem` by the Stanbol Enhancer. It is a new interface introduced with 
> `1.0`. Before 1.0 this was an implementation specific class used by the 
> [EventJobManager](http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementjobmanager#eventjobmanager).
> {code:java}
>     EnhancementJob
>       + getJobId : NonLiteral
>         + getLock() : ReadWriteLock
>         + getExecutionMetadata() : MGraph
>         + getContentItem() : ContentItem
> {code}
> The `EnhancementJob` provides access to both the `ContentItem` and processing 
> information. Only parsers, Writers and the `EnhancementJobManager` are 
> intended to have a reference to the `EnhancementJob`. `EnhancementEngines` 
> will only get an reference to the `ContentItem`. Engines will also no longer 
> be able to access the `MGraph` with the 
> [ExecutionMetadata](http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata)
>  nor the 
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
>  Both can be obtained in 0.12.1 via the 
> [ContentParts](http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.#contentparts)
>  of the processed `ContentItem`.
> The `jobId` of the EnhancementJob is used to reference the Job. It SHOULD be 
> different as the URI of the ContentItem to avoid issues with multiple 
> requests for the same ContentItem (as described by 
> [STANBOL-830](https://issues.apache.org/jira/browse/STANBOL-830)
> The EnhancementJob API does not distinguish between the 
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan)
>  and the 
> [ExecutionMetadata](http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata).
>  There is only a single getter for the ExecutionMetadata that need to provide 
> access to both.
> In most cases it will be sufficient to copy over the triples of the 
> ExecutionPlan to the `MGraph` of the ExecutionMetadata before starting the 
> enhancement. However in use cases where the ExecutionPlan might change (e.g. 
> in between several partial executions) one can also use a setting where the 
> ExecutionPlan is kept in a separate graph. In enforce this the Clerezza 
> `UnionMGraph` implementation can be used. This implementation supports to 
> create an union view over several TripleCollections while all modifications 
> are done on the first one. So creating a `UnionMGraph`with the MGraph holding 
> the ExectionMetadata at idx `0` and the the TripleCollection with the 
> ExecutionPlan at idx `1` results in the desired setting.
> h3. EnhancementJobManager
> ---------------------
> The job manager interface is very simple. It only contains the method to 
> process an EnhancementJob. Optionally an array of `ep:ExecutionNode` 
> instances can be parsed.
> {code:java}
>     EnhancementJobManager
>         + enhance(EnhancementJob job, NonLiteral...executions)
> {code}
> The parsed `EnhancementJob` is expected to have its ExecutionMetadata to be 
> initialized. In contrast to earlier Stanbol version the  
> `EnhancementJobManager` is no longer responsible to initialize those Metadata 
> based on the parsed enhancement `Chain`. This is now in the responsibility of 
> the `EnhancementJobBuilder`.
> The new `EnhancementJobManager` will support _partial executions_. This means 
> that the callers can request the JobManager to process only some of the 
> `ep:ExecutionNode` defined by the 
> [ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
>  If no executions are defined the `EnhancementJobManager` is expected to 
> execute all execution nodes. 
> If a array of `ep:ExecutionNode` instances is parsed the 
> EnhancementJobManager must only consider to process those and ignore all 
> others. If those executions do `ep:dependsOn` on another `ep:ExecutionNode` 
> that is not included and not yet completed (not `ep:optional` and not yet 
> processed) the job manager is expected to fail with a `ChainException`.
> The `EnhancementJobManager` needs to consider existing `em:EngineExecutions` 
> and their `em:status`. This is important correctly resume the processing of 
> partially completed enhancement jobs.
> h3. EnhancementJobBuilder
> ---------------------
> The EnhancementJobBuilder allows to create EnhancementJobs. As building an 
> EnhancementJob requires to select specific implementations of the 
> `EnhancementJob` and `ContentItem` the `EnhancementJobBuilder` does not have 
> a constructor, but an own `EnhancementJobFactory` is used. The 
> `EnhancementJobFactory` is an OSGI service and can be looked up as those by 
> components that need to build `EnhancementJob` instances.
> {code:java}
>     EnhancementJobFactory
>         + create() : EnhancementJobBuilder
>     EnhancementJobBuilder
>       + contentSource(ContentSource) : EnhancementJobBuilder
>       + id(String id)
>       + cotentRef(ContentReference)
>       + chain(Chain chain)
>       + execPlan(TripleCollection ExecutionPlan)
>       + **(..)
>       + build() : EnhancementJob
> {code}
> Intended Usage:
> {code:java}
>     @Reference
>     EnhancementJobFactory ejf;
>     
>     @Reference
>     EnhancementJobManager ejm;
>     
>     ContentSource content; //the parsed content
>     Chain chain; //the requested enhancement chain
>     ejm.enhance(ejf.create()
>         .source(content)
>         .chain(chain)
>         .build());
> {code}
> The `EnhancementJobBuilder` is obtained by using the 
> EnhancementJobFactory#create() method. After creation the builder provides an 
> API to set the parsed content, id as well as the enhancement chain. As an 
> alternative the ExecutionPlan can also be set as RDF graph. After the 
> configuration the `EnhancementJob` can be `#build()` and parsed to the 
> `EnhancementJobManager`.
> h3. ContentItem
> -----------
> There will be also minor API adaptions to the ContentItem API. The main 
> reason for that is the removal of the `ContentItemFactory` combined with the 
> requirement of some `EnhancementEngines` to create `Blob` instances. Because 
> of that methods will be added to the ContentItem that allow add an `Blob` 
> content part based on a `ContentSource` as well as a `ContentSink`
> {code:java}
>     ContentItem
>         + addContent(UriRef id, ContentSource source) : Blob
>         + addContentStream(UriRef id, String mediaType) : ContentStink
> {code}
> This methods will replace the `ContentItemFactory#createBlob(..)` and 
> `ContentItemFactory#createContentSink(..)` methods. This means that 
> EnhancementEngines that need to create `Blobs` need no longer care about 
> obtaining a `ContentItemFactory` instance. The right `Blob` implementation to 
> be used will already be wired when the `ContentItem` is created by the 
> `EnhancementJobBuilder`.
> _Notes:_ 
> * the `ContentItem#addPart(..)` method can still be used to add `Blob` 
> instances to the `ContentItem`. This might be useful for Engines that do 
> provide their own `Blob` implementation.
> * both `addContent*` methods will override any contentPart registered with 
> the parsed id. Those methods do NOT return the previously registered part 
> such as the `#addPart(..)` method. 
> h3. EnhancementEngine
> -----------------
> The API of the `EnhancementEngine` interface will be adapted to parse the 
> [EnhancementProperties](https://issues.apache.org/jira/browse/STANBOL-488) as 
> additional parameter of the `#computeEnhancements(..)` method
> {code:java}
>     EnhancementEngine
>         + getName() : String
>         + canEnhance(ContentItem ci) : int
>         + computeEnhancements(ContentItem ci, Map<String,Object> properties)
> {code}
> A new Map instance with a copy of the properties will be parsed to the 
> engine. Therefore changes to the map will have no side effects.
> For details about EnhancementProperties see 
> [STANBOL-488](https://issues.apache.org/jira/browse/STANBOL-488.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to