Rupert Westenthaler created STANBOL-1251:
--------------------------------------------

             Summary: Pos tag based Phrase extraction Engine
                 Key: STANBOL-1251
                 URL: https://issues.apache.org/jira/browse/STANBOL-1251
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Implement an Enhancement Engine that uses POS tags to extract Noun and Verb 
Phrases

In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see 
documentation at [1] for detailed information). This alignment allows engines 
to language independent determine the lexical categories of tokens in the text.

The Pos-Chunker Engine will use those lexical categories of tokens to extract 
Noun and Verb phrases by using the following rules

### Noun Phrases

* start: noun, pronoun, determiners, adjectives
* continuation: nouns, adpositions , pronouns, determiner, adjectives, 
punctations
* end: noun, pronoun, determiners, adjectives
* required: noun

### Verb Phrases

* start: verb, adverb
* continuation: verb, adverb, punctations
* end: verb, adverb
* required: verb

This engine will allow to configure the processed languages (e.g. to deactivate 
it for languages where other chunker are available).

The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK

The current plan is to make this engine also available in the 0.12 branch

[1] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to