Hi Fabian

Short version:

I totally agree. Our vocabulary has changed over time, but the Engines
still use the names as when they where introduced. Changing them
(artifactIds and class names) is dangerous as this does break
backwards compatibility. So I would suggest change names only if we
can also come up with better implementation/design.

Regarding Vocabulary I think we should prefer the terms
"EntityLinking" and "NamedEntityLinking" and deprecate all others like
"keyword" instead of "entity" or "extraction" or "tagging" instead of
"linking".

The 'engines/entitylinking' and 'engines/entityhublinking' introduced
by STANBOL-733 do already use this new terminology. They also
deprecate the 'engines/keywordextraction'.

- - -

Long version with more background information

Regarding the linking of Entities there are currently two different principles:

* "NamedEntityLinking": A "NamedEntity" has a 'selected text' AND a
'type'. So the selected text AND the type can be used for linking
* "EntityLinking": An "Entity" does only have a 'selected text'. Here
linking is only possible based on the selected text.

The plan would be to also have two Engine implementations that support
those linking models.

* 'NamedEntityLinkingEngine' (currently /engines/entitytagging)
* 'EntityLinkingEngine' (was /engines/keywordextraction (now
deprecated) ; since yesterday  /engines/entitylinking)

Those should not have external dependencies (meaning to Stanbol
components other than Stanbol Commons, Enhancer module; also not other
major frameworks such as Solr or OpenNLP; no calls to external
services). That would allow to keep those Engines within the enhancer
module but also means that those implementation can not be directly
used by the user (as the Service used for linking will be just defined
by an Interface without an actual implementation.

Because of that there will be "Engines" that are based on the above,
but come with adapters to Services that do support the EntityLookup.
The default will be implementations based on the StanbolEntityhub, but
Stanbol users could also implement versions for their own
infrastructure needs.

The "EntityhubLinking" module [1] is the first example. When you look
at the module you will recognize that it does not contain an single
EnhancementEngine implementation. It only provides Entityhub specific
implementations of the EntitySearcher interface defined by the
"EntityLinkingEngine" and a OSGI component that allows users to
configure an EntityLinkingEngine instance that uses the Entityhub to
lookup Entities.

Current state:

Currently we are not yet there. The '/engines/entitytagging' still
implements both NamedEntityLinking AND Lookup via the Entityhub. This
engine could be replaced by a 'engines/namedentitylinking' that
follows the design as described above. The new
'/engines/entitylinking' already implements the above design. However
it still depends on the Entityhub, because the EntitySearcher
interface [3] that is still using the Entityhub Model classes.

'engines/entityhublinking' currently provides the ability to do
'entitylinking' with the Entityhub. As soon as the
'engines/namedentitylinking' is available I would add named entity
linking functionality to that module. In a last step this module will
also move out of the /enhancer component (as already suggested by
STANBOL-805 [4]).


BTW this design was the result of this [2] discussion on the Stanbol
dev mailing list.

best
Rupert



[1] 
http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/engines/entityhublinking/
[2] http://markmail.org/message/nptkntyuthv7wwqh
[3] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher
[4] https://issues.apache.org/jira/browse/STANBOL-805


On Tue, Nov 27, 2012 at 11:14 AM, Fabian Christ
<[email protected]> wrote:
> Hi,
>
> enhancement engines in Stanbol can have several names and this is confusing
> myself and very likely our users. Here are some examples that I came across
> when trying to identify the running engines. I started to look at the
> Web-UI and clicked through the OSGi console.
>
> dbpediaLinking (NamedEntityTaggingEngine) ->
> Named Entity Tagging -> Entity Tagging ->
> /engines/entitytagging
>
> entityhubExtraction (EntityLinkingEngine) ->
> Entityhub Linking -> Entityhub Linking ->
> /engines/entityhublinking
>
> Could we simplify this a bit to make it more obvious especially for new
> users what is going on?
>
> Best,
>  - Fabian
>
> --
> Fabian
> http://twitter.com/fctwitt



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to