In my local copy I have these methods in the interface:
Map<String, Double> scoreMap(String text);
SortedMap<Double, Set<String>> sortedScoreMap(String text);
and these impls of them in the ME impl
public Map<String, Double> scoreMap(String text) {
Map<String, Double> probDist = new HashMap<String, Double>();
double[] categorize = categorize(text);
int catSize = getNumberOfCategories();
for (int i = 0; i < catSize; i++) {
String category = getCategory(i);
probDist.put(category, categorize[getIndex(category)]);
}
return probDist;
}
public SortedMap<Double, Set<String>> sortedScoreMap(String text) {
SortedMap<Double, Set<String>> descendingMap = new TreeMap<Double,
Set<String>>().descendingMap();
double[] categorize = categorize(text);
int catSize = getNumberOfCategories();
for (int i = 0; i < catSize; i++) {
String category = getCategory(i);
double score = categorize[getIndex(category)];
if (descendingMap.containsKey(score)) {
descendingMap.get(score).add(category);
} else {
Set<String> newset = new HashSet<>();
newset.add(category);
descendingMap.put(score, newset);
}
}
return descendingMap;
}
They are pretty simple, but if everyone agrees I can commit them (with some
java docs)
On Sat, Apr 26, 2014 at 8:39 AM, Jörn Kottmann <[email protected]> wrote:
> On Thu, 2014-04-24 at 19:54 -0300, William Colen wrote:
> > Yes, it looks nice. Maybe we should redo all the DocumentCategorizer
> > interface. It is different from other tools, for example, we can't get
> the
> > best category of one document with only one call, we need to use two
> > methods.
>
> Yes that is right. +1 to change it. Can we deprecate the old methods and
> just add new ones to not break backward compatibility?
>
> Jörn
>
>