[ 
https://issues.apache.org/jira/browse/STANBOL-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler updated STANBOL-1151:
-----------------------------------------

    Description: 
The SentimentClassifier interface currently only uses the `String word` when 
classifying words.

    public double classifyWord(String word);

However this is not sufficient for all cases, because different words might use 
the exact same lexical form and can be therefore only distinguished when using 
the POS as well as the lexical form.

An example is the German word 

* gefahren (past for to drive)
* Gefahren (plural for Danger)

So only the information Verb('gefahren') and Noun('gefahren') allows to 
correctly classify it with the sentiment

    Gefahr|NN   -1.0    Gefahren

NN ... Common Noun

To support this the SentimentClassifier needs to be updated to parse the word 
type as an additional parameter. Because only the major types are needed this 
will use the LexicalCategory enumeration defined by the stanbol.enhancer.pos 
module (based on the olia ontology).

The new method will use

    public double classifyWord(LexicalCategory cat, String word);

Also the isAdjective(PosTag pos) and isNoun(PosTag pos) method will be removed 
in favor of the new method

     public Set<LexicalCatecory> getCategories(PosTag posTag);

NOTE: as PosTag already provides a PosTag#getCategories() method implementation 
of this will be easy for most implementations. However this method allows to 
override data provided by the PosTag or to provide mappings for PosTags that do 
not provide any mappings for the String POS tag.


  was:
The SentimentClassifier interface needs to be changed in a way that the PosTag 
for the current word is also parsed to the 

    public double classifyWord(String word);

method. The reason for that is that the same word might have a different 
meaning for different POS tags.

An example is the German word 

* gefahren (past for to drive)
* Gefahren (plural for Danger)

While the verb does not have an sentiment assigned the Noun has the following 
entry

    Gefahr|NN   -1.0    Gefahren

because of that the SentimentClassifier currently incorrectly assigns a 
sentiment value of '-1' even if the word is used as verb in the text.

This will also require to adapt the implementation to support using the POS tag 
when looking up the words in the vocabulary



    
> Consider Lexical Categories for Sentiment Classification of Words
> -----------------------------------------------------------------
>
>                 Key: STANBOL-1151
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1151
>             Project: Stanbol
>          Issue Type: Improvement
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> The SentimentClassifier interface currently only uses the `String word` when 
> classifying words.
>     public double classifyWord(String word);
> However this is not sufficient for all cases, because different words might 
> use the exact same lexical form and can be therefore only distinguished when 
> using the POS as well as the lexical form.
> An example is the German word 
> * gefahren (past for to drive)
> * Gefahren (plural for Danger)
> So only the information Verb('gefahren') and Noun('gefahren') allows to 
> correctly classify it with the sentiment
>     Gefahr|NN -1.0    Gefahren
> NN ... Common Noun
> To support this the SentimentClassifier needs to be updated to parse the word 
> type as an additional parameter. Because only the major types are needed this 
> will use the LexicalCategory enumeration defined by the stanbol.enhancer.pos 
> module (based on the olia ontology).
> The new method will use
>     public double classifyWord(LexicalCategory cat, String word);
> Also the isAdjective(PosTag pos) and isNoun(PosTag pos) method will be 
> removed in favor of the new method
>      public Set<LexicalCatecory> getCategories(PosTag posTag);
> NOTE: as PosTag already provides a PosTag#getCategories() method 
> implementation of this will be easy for most implementations. However this 
> method allows to override data provided by the PosTag or to provide mappings 
> for PosTags that do not provide any mappings for the String POS tag.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to