Re: The KeywordLinkingEngine and the Stanbol NLP processing module (STANBOL-740)

Fabian Christ Wed, 21 Nov 2012 11:46:37 -0800

Hi,

what about creating a branch from the trunk with the current version
(before the merge) that is known to be working? People could switch to that
branch to keep the status quo and we should make clear that this branch
will not be maintained in the future. From this branch we could even cut
releases as 0.10.0 components.


Then go with option 1) and make as soon as possible a 1.0.0 release to mark
this milestone.

- Fabian



2012/11/21 Rupert Westenthaler <[email protected]>

> Hi all,
>
> After about two month the Stanbol NLP module (STANBOL-733) is ready to
> be merged with the Trunk. As part of this work the
> KeywordLinkingEngine was adapted to make use of the new features
> (STANBOL-740). However as soon as those things will be merged back
> with the trunk this will affect currently used Stanbol configurations.
>
> Currently the minimal Enhancement Chain used for the
> KeywordLinkingEngine looks like follows
>
>     {language-detection}
>     {keyword-linking}
>
> where the KeywordLinkingEngine does all of tokenizing, sentence
> detection, POS tagging, Chunking and finally the linking with the
> vocabulary.
>
> With the Stanbol NLP module this will change. The KeywordLinkingEngine
> in the stanbol-nlp-processing branch is now only concerned with the
> linking task. All the text processing steps are done by other
> EnhancementEngines. However this also means that the minimal/typical
> Enhancement Chain used for the KeywordLinkingEngine changes quite
> dramatically.
>
>     {language-detection}
>     {sentence-detection} (optional)
>     {tokenizing}
>     {phrase-detection} (optional)
>     {keyword-linking}
>
> So even that the actual configurations for the trunk version of the
> KeywordLinkingEngine do still work with the branch version user will
> need to adapt their EnhancementChain configurations as soon as the
> Stanbol NLP processing is re-integrated with the trunk.
>
> So basically there are two options to deal with that:
>
> (1) Reintegrate the KeywordLinkingEngine and break existing
> Enhancement Chains: While this will affect most Stanbol users it will
> be easily recognized because the used Chains will no longer provide
> the expected results. The fix is also relatively easy, because current
> chains would only needed to be extended by the four new OpenNLP based
> NLP processing engines.
>
>     {possible-other-engines-like-tika}
>     {langauge-detection}
>     opennlp-sentence
>     opennlp-token
>     opennlp-pos
>     opennlp-chunker
>     {keyword-linking}
>     {possible-other-engines-like-refactor}
>
> (2) Change the name (and artifactid) of the KeywordLinkingEngine in
> the branch and reintegrate it as an new Engine (e.g. as
> EntityLinkingEngine). The KeywordLinkingEngine in the trunk would than
> be deprecated and after the next release of Stanbol moved to the
> /contrib folder). While this would ensure that current configurations
> would not become invalid it would also make it likely that Stanbol
> users would keep using an outdated engine. In additions users would
> need to adapt all KeywordLinkingEngine configurations to the new
> EntityLinkingEngine.
>
> Personally I see advantages and disadvantages in both Solutions and do
> not have a clear preference. So I would really appreciate feedback
> regarding this
>
> best
> Rupert
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Fabian
http://twitter.com/fctwitt

Re: The KeywordLinkingEngine and the Stanbol NLP processing module (STANBOL-740)

Reply via email to