Hi Fabian Short version:
I totally agree. Our vocabulary has changed over time, but the Engines still use the names as when they where introduced. Changing them (artifactIds and class names) is dangerous as this does break backwards compatibility. So I would suggest change names only if we can also come up with better implementation/design. Regarding Vocabulary I think we should prefer the terms "EntityLinking" and "NamedEntityLinking" and deprecate all others like "keyword" instead of "entity" or "extraction" or "tagging" instead of "linking". The 'engines/entitylinking' and 'engines/entityhublinking' introduced by STANBOL-733 do already use this new terminology. They also deprecate the 'engines/keywordextraction'. - - - Long version with more background information Regarding the linking of Entities there are currently two different principles: * "NamedEntityLinking": A "NamedEntity" has a 'selected text' AND a 'type'. So the selected text AND the type can be used for linking * "EntityLinking": An "Entity" does only have a 'selected text'. Here linking is only possible based on the selected text. The plan would be to also have two Engine implementations that support those linking models. * 'NamedEntityLinkingEngine' (currently /engines/entitytagging) * 'EntityLinkingEngine' (was /engines/keywordextraction (now deprecated) ; since yesterday /engines/entitylinking) Those should not have external dependencies (meaning to Stanbol components other than Stanbol Commons, Enhancer module; also not other major frameworks such as Solr or OpenNLP; no calls to external services). That would allow to keep those Engines within the enhancer module but also means that those implementation can not be directly used by the user (as the Service used for linking will be just defined by an Interface without an actual implementation. Because of that there will be "Engines" that are based on the above, but come with adapters to Services that do support the EntityLookup. The default will be implementations based on the StanbolEntityhub, but Stanbol users could also implement versions for their own infrastructure needs. The "EntityhubLinking" module [1] is the first example. When you look at the module you will recognize that it does not contain an single EnhancementEngine implementation. It only provides Entityhub specific implementations of the EntitySearcher interface defined by the "EntityLinkingEngine" and a OSGI component that allows users to configure an EntityLinkingEngine instance that uses the Entityhub to lookup Entities. Current state: Currently we are not yet there. The '/engines/entitytagging' still implements both NamedEntityLinking AND Lookup via the Entityhub. This engine could be replaced by a 'engines/namedentitylinking' that follows the design as described above. The new '/engines/entitylinking' already implements the above design. However it still depends on the Entityhub, because the EntitySearcher interface [3] that is still using the Entityhub Model classes. 'engines/entityhublinking' currently provides the ability to do 'entitylinking' with the Entityhub. As soon as the 'engines/namedentitylinking' is available I would add named entity linking functionality to that module. In a last step this module will also move out of the /enhancer component (as already suggested by STANBOL-805 [4]). BTW this design was the result of this [2] discussion on the Stanbol dev mailing list. best Rupert [1] http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/engines/entityhublinking/ [2] http://markmail.org/message/nptkntyuthv7wwqh [3] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher [4] https://issues.apache.org/jira/browse/STANBOL-805 On Tue, Nov 27, 2012 at 11:14 AM, Fabian Christ <[email protected]> wrote: > Hi, > > enhancement engines in Stanbol can have several names and this is confusing > myself and very likely our users. Here are some examples that I came across > when trying to identify the running engines. I started to look at the > Web-UI and clicked through the OSGi console. > > dbpediaLinking (NamedEntityTaggingEngine) -> > Named Entity Tagging -> Entity Tagging -> > /engines/entitytagging > > entityhubExtraction (EntityLinkingEngine) -> > Entityhub Linking -> Entityhub Linking -> > /engines/entityhublinking > > Could we simplify this a bit to make it more obvious especially for new > users what is going on? > > Best, > - Fabian > > -- > Fabian > http://twitter.com/fctwitt -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
