[ 
https://issues.apache.org/jira/browse/STANBOL-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler updated STANBOL-1251:
-----------------------------------------

    Description: 
Implement an Enhancement Engine that uses POS tags to extract Noun and Verb 
Phrases

In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see 
documentation at [1] for detailed information). This alignment allows engines 
to language independent determine the lexical categories of tokens in the text.

The Pos-Chunker Engine will use those lexical categories of tokens to extract 
Noun and Verb phrases by using the following rules

### Noun Phrases

* start: noun, pronoun, determiners, adjectives
* continuation: nouns, adpositions, adjectives, punctations
* end: noun, pronoun, determiners, adjectives
* required: noun

### Verb Phrases

* start: verb, adverb
* continuation: verb, adverb, punctations
* end: verb, adverb
* required: verb

This engine will allow to configure the processed languages (e.g. to deactivate 
it for languages where other chunker are available).

The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK

The current plan is to make this engine also available in the 0.12 branch

[1] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations

  was:
Implement an Enhancement Engine that uses POS tags to extract Noun and Verb 
Phrases

In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see 
documentation at [1] for detailed information). This alignment allows engines 
to language independent determine the lexical categories of tokens in the text.

The Pos-Chunker Engine will use those lexical categories of tokens to extract 
Noun and Verb phrases by using the following rules

### Noun Phrases

* start: noun, pronoun, determiners, adjectives
* continuation: nouns, adpositions , pronouns, determiner, adjectives, 
punctations
* end: noun, pronoun, determiners, adjectives
* required: noun

### Verb Phrases

* start: verb, adverb
* continuation: verb, adverb, punctations
* end: verb, adverb
* required: verb

This engine will allow to configure the processed languages (e.g. to deactivate 
it for languages where other chunker are available).

The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK

The current plan is to make this engine also available in the 0.12 branch

[1] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations


> Pos tag based Phrase extraction Engine
> --------------------------------------
>
>                 Key: STANBOL-1251
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1251
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancement Engines
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Implement an Enhancement Engine that uses POS tags to extract Noun and Verb 
> Phrases
> In Stanbol POS annotations can be aligned to concepts of the OLIA ontology 
> (see documentation at [1] for detailed information). This alignment allows 
> engines to language independent determine the lexical categories of tokens in 
> the text.
> The Pos-Chunker Engine will use those lexical categories of tokens to extract 
> Noun and Verb phrases by using the following rules
> ### Noun Phrases
> * start: noun, pronoun, determiners, adjectives
> * continuation: nouns, adpositions, adjectives, punctations
> * end: noun, pronoun, determiners, adjectives
> * required: noun
> ### Verb Phrases
> * start: verb, adverb
> * continuation: verb, adverb, punctations
> * end: verb, adverb
> * required: verb
> This engine will allow to configure the processed languages (e.g. to 
> deactivate it for languages where other chunker are available).
> The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK
> The current plan is to make this engine also available in the 0.12 branch
> [1] 
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to