[
https://issues.apache.org/jira/browse/STANBOL-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler updated STANBOL-1151:
-----------------------------------------
Description:
The SentimentClassifier interface currently only uses the `String word` when
classifying words.
public double classifyWord(String word);
However this is not sufficient for all cases, because different words might use
the exact same lexical form and can be therefore only distinguished when using
the POS as well as the lexical form.
An example is the German word
* gefahren (past for to drive)
* Gefahren (plural for Danger)
So only the information Verb('gefahren') and Noun('gefahren') allows to
correctly classify it with the sentiment
Gefahr|NN -1.0 Gefahren
NN ... Common Noun
To support this the SentimentClassifier needs to be updated to parse the word
type as an additional parameter. Because only the major types are needed this
will use the LexicalCategory enumeration defined by the stanbol.enhancer.pos
module (based on the olia ontology).
The new method will use
public double classifyWord(LexicalCategory cat, String word);
Also the isAdjective(PosTag pos) and isNoun(PosTag pos) method will be removed
in favor of the new method
public Set<LexicalCatecory> getCategories(PosTag posTag);
NOTE: as PosTag already provides a PosTag#getCategories() method implementation
of this will be easy for most implementations. However this method allows to
override data provided by the PosTag or to provide mappings for PosTags that do
not provide any mappings for the String POS tag.
was:
The SentimentClassifier interface needs to be changed in a way that the PosTag
for the current word is also parsed to the
public double classifyWord(String word);
method. The reason for that is that the same word might have a different
meaning for different POS tags.
An example is the German word
* gefahren (past for to drive)
* Gefahren (plural for Danger)
While the verb does not have an sentiment assigned the Noun has the following
entry
Gefahr|NN -1.0 Gefahren
because of that the SentimentClassifier currently incorrectly assigns a
sentiment value of '-1' even if the word is used as verb in the text.
This will also require to adapt the implementation to support using the POS tag
when looking up the words in the vocabulary
> Consider Lexical Categories for Sentiment Classification of Words
> -----------------------------------------------------------------
>
> Key: STANBOL-1151
> URL: https://issues.apache.org/jira/browse/STANBOL-1151
> Project: Stanbol
> Issue Type: Improvement
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> The SentimentClassifier interface currently only uses the `String word` when
> classifying words.
> public double classifyWord(String word);
> However this is not sufficient for all cases, because different words might
> use the exact same lexical form and can be therefore only distinguished when
> using the POS as well as the lexical form.
> An example is the German word
> * gefahren (past for to drive)
> * Gefahren (plural for Danger)
> So only the information Verb('gefahren') and Noun('gefahren') allows to
> correctly classify it with the sentiment
> Gefahr|NN -1.0 Gefahren
> NN ... Common Noun
> To support this the SentimentClassifier needs to be updated to parse the word
> type as an additional parameter. Because only the major types are needed this
> will use the LexicalCategory enumeration defined by the stanbol.enhancer.pos
> module (based on the olia ontology).
> The new method will use
> public double classifyWord(LexicalCategory cat, String word);
> Also the isAdjective(PosTag pos) and isNoun(PosTag pos) method will be
> removed in favor of the new method
> public Set<LexicalCatecory> getCategories(PosTag posTag);
> NOTE: as PosTag already provides a PosTag#getCategories() method
> implementation of this will be easy for most implementations. However this
> method allows to override data provided by the PosTag or to provide mappings
> for PosTags that do not provide any mappings for the String POS tag.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira