Re: Merging the output of multiple name finders

[email protected] Wed, 18 Apr 2012 07:13:38 -0700

* You said that allFindings was already sorted, maybe allFindings should be
a SortedSet already, or you don't need to sort outside anymore.
* You don't need to pass in a comparator because Span already implements
Comparable. I think it is better to use the same comparable implementation
so the order is consistent with the rest of the code.


Ok, anyway, I think it is still too complex. Check how the method
NameFinderME.dropOverlappingSpans is simpler.

On Wed, Apr 18, 2012 at 10:47 AM, Jim - FooBar(); <[email protected]>wrote:

> William,
>
> Were you thinking something along these lines?
> ------------------------------**------------------------------**
> --------------------
> private Span[] untangle2(List<Span> allFindings){
>
>    List<Span> problems = new ArrayList<Span>();//all the overlaps
>    SortedSet<Span> cleanFindings = new TreeSet<Span>(new
> Comparator<Span>() {
>        @Override
>        public int compare(Span s1, Span s2) {
>          return (s1.getStart()-s2.getStart());**//sort in ascending order
> of start offsets
>        }
>      });
>
>
>    for (int i=1;i<allFindings.size();i++){**//start from 1
>      Span current = allFindings.get(i);
>      Span previous = allFindings.get(i-1);//safe
>
>      if (current.intersects(previous)|**|current.crosses(previous)){
>        if (current.getType().equals(**previous.getType())){//if same type
>
>          if ((current.length() - previous.length()) > 0)
>            cleanFindings.add(current); //current is longer
>          else
>            cleanFindings.add(previous); //previous is longer
>
>
>        }
>        else {   //add both as problems
>          problems.add(current);
>          problems.add(previous);
>        }
>      }
>      else{    //add both as clean - each iteration will produce one
> duplicate
>               //which will be blocked by the set
>        cleanFindings.add(previous);
>        cleanFindings.add(current);
>      }
>    }
>    //by this point the set is ordered and contains no nulls
>    //----------------------------**----------------------------
>
>    if(problems.isEmpty()) //if no problems so far do the usual
>      return cleanFindings.toArray(new Span[cleanFindings.size()]);
>    else
>      return sortProblems(cleanFindings, problems);//don't know what to do
> in this method
>    }
> ------------------------------**------------------------------**
> ------------------------------**------------------------------**---
>
> Jim
>
>
>
> On 18/04/12 14:19, Jim - FooBar(); wrote:
>
>> Actually the sortedSet would result in a lot more concise and cleaner
>> code...ok I'll translate it into a version with Sets and then we put them
>> side by side and decide which one is prettier (and perhaps more
>> performant)...
>>
>> Jim
>>
>> On 18/04/12 14:11, Jim - FooBar(); wrote:
>>
>>> William,
>>>
>>> I did think about using a sortedSet - it is very convenient since it can
>>> take a comparator as an argument to determine the ordering (exactly what we
>>> need)...however i preferred to stay faithful to the general way openNLP is
>>> implemented which is using lists and arrays mainly - my guess for
>>> performance...
>>>
>>> Trust me, i hate writing code which mutates data-structures, but that is
>>> the imperative way, hence the Java way...i could try a more functional
>>> approach...let's see how this evolves at the moment and your suggestion
>>> will be seriously considered at a later stage...do you think performance
>>> will be affected at all when using sets instead of lists?I'm reading on
>>> various articles that if you're sorting just once a list could be slightly
>>> faster...on the other hand if one wants the items to be sorted at any given
>>> time then probably a sorted set is better...that makes sense i guess.
>>>
>>> Jim
>>>
>>>
>>> On 18/04/12 13:47, [email protected] wrote:
>>>
>>>> Jim,
>>>>
>>>> A comment about your code. Maybe instead of working with sort methods,
>>>> lists and arrays you should work with SortedSet. This would help you to
>>>> avoid duplicates and the order would be preserved after a removal. I
>>>> think
>>>> you could also avoid the nulls, maybe creating an output set instead of
>>>> modifying the input.
>>>>
>>>> Note: I checked the Span class and there is an issue with it. The
>>>> compareTo
>>>> method is not checking the Span type, and it inconsistent with equals. I
>>>> will open a Jira and fix it.
>>>>
>>>> On Wed, Apr 18, 2012 at 7:13 AM, Jim - FooBar();<[email protected]**
>>>> >wrote:
>>>>
>>>>  On 18/04/12 11:12, Jim - FooBar(); wrote:
>>>>>
>>>>>  On 18/04/12 11:06, Jörn Kottmann wrote:
>>>>>>
>>>>>>  On 04/18/2012 11:41 AM, Jim - FooBar(); wrote:
>>>>>>>
>>>>>>>  I keep the longest span whenever i'm encountering 2
>>>>>>>> overlapping/intersecting spans of the same type...it makes sense to
>>>>>>>> do that
>>>>>>>> cos one of them must be wrong...what about different types though?
>>>>>>>>
>>>>>>>>  When they are intersecting the following span could be longer,
>>>>>>> thats
>>>>>>> right. To keep things simple
>>>>>>> I suggested to always keep the first span.
>>>>>>>
>>>>>>> Jörn
>>>>>>>
>>>>>>>  well that sounds a bit arbitrary don't you think? It sounds like
>>>>>> flipping
>>>>>> a coin!
>>>>>> also I'm not talking about span length here...if the types are
>>>>>> different
>>>>>> it makes no sense to compare lengths...the chances of any of them
>>>>>> being the
>>>>>> correct one are equal...if we always keep the earliest one it is easy
>>>>>> to
>>>>>> implement but we're hard-coding something with no good
>>>>>> justification...see
>>>>>> what i mean?
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>  I wish there was a way to keep both...
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Merging the output of multiple name finders

Reply via email to