On 17/04/12 17:27, Jörn Kottmann wrote:
On 04/17/2012 06:23 PM, Jim - FooBar(); wrote:
Yes i get your point, we've discussed this before and generally i do
agree...it is just that i hadn't thought that people would use the
AggregateNameFinder to produce names. The whole idea from day 1 was
to improve the evaluation.
Should i go ahead and modify the AggreagateNameFinder to sort all
the predictions spans according to a comparator that looks at the
start offsets (in increasing order) and checks for overlaps?
+1, and we should get a better name for it.
Jörn
Ok so i had i go at what was discussed and here is how it looks like
(allFindings comes in sorted according to ascending order of start offsets):
----------------------------------------------------------------------------------------------------
private Span[] untangle(List<Span> allFindings){
List<Span> problems = new ArrayList<Span>();//all the ovelaps
for (int i=1;i<allFindings.size();i++){//start from 1
Span current = allFindings.get(i);
Span previous = allFindings.get(i-1);//safe
if (current.intersects(previous) || current.crosses(previous)){
if (current.getType().equals(previous.getType())){//if same type
Span temp = ((current.length()-previous.length()) > 0) ?
current : previous;
allFindings.set(i, temp); //keep the longest one in findings
}
else { //add both as problems
problems.add(current);
problems.add(previous);
}
}
}
if(problems.isEmpty()) //if no problems do the usual
return allFindings.toArray(new Span[allFindings.size()]);
else
return sortProblems(allFindings, problems); //don't know what to
do in this method
}
--------------------------------------------------------------------------------------------------------------
as you can see i'm stuck at the very last line....how do we sort
overlapping spans with different type? on what basis? at this point i've
lost information like "what finder did this prediction came from?" and
thus cannot make any reasoning...I do keep a Map with the finder class
as key and a list of its predictions as value but his was intended only
for debugging. I cannot rely on that in order to reason about what
should stay and what should go in the final span array. If they are the
same type we keep the longest but what about different types? who do we
trust?
any pointers?
Jim