Actually the sortedSet would result in a lot more concise and cleaner code...ok I'll translate it into a version with Sets and then we put them side by side and decide which one is prettier (and perhaps more performant)...

Jim

On 18/04/12 14:11, Jim - FooBar(); wrote:
William,

I did think about using a sortedSet - it is very convenient since it can take a comparator as an argument to determine the ordering (exactly what we need)...however i preferred to stay faithful to the general way openNLP is implemented which is using lists and arrays mainly - my guess for performance...

Trust me, i hate writing code which mutates data-structures, but that is the imperative way, hence the Java way...i could try a more functional approach...let's see how this evolves at the moment and your suggestion will be seriously considered at a later stage...do you think performance will be affected at all when using sets instead of lists?I'm reading on various articles that if you're sorting just once a list could be slightly faster...on the other hand if one wants the items to be sorted at any given time then probably a sorted set is better...that makes sense i guess.

Jim


On 18/04/12 13:47, [email protected] wrote:
Jim,

A comment about your code. Maybe instead of working with sort methods,
lists and arrays you should work with SortedSet. This would help you to
avoid duplicates and the order would be preserved after a removal. I think
you could also avoid the nulls, maybe creating an output set instead of
modifying the input.

Note: I checked the Span class and there is an issue with it. The compareTo
method is not checking the Span type, and it inconsistent with equals. I
will open a Jira and fix it.

On Wed, Apr 18, 2012 at 7:13 AM, Jim - FooBar();<[email protected]>wrote:

On 18/04/12 11:12, Jim - FooBar(); wrote:

On 18/04/12 11:06, Jörn Kottmann wrote:

On 04/18/2012 11:41 AM, Jim - FooBar(); wrote:

I keep the longest span whenever i'm encountering 2
overlapping/intersecting spans of the same type...it makes sense to do that
cos one of them must be wrong...what about different types though?

When they are intersecting the following span could be longer, thats
right. To keep things simple
I suggested to always keep the first span.

Jörn

well that sounds a bit arbitrary don't you think? It sounds like flipping
a coin!
also I'm not talking about span length here...if the types are different it makes no sense to compare lengths...the chances of any of them being the correct one are equal...if we always keep the earliest one it is easy to implement but we're hard-coding something with no good justification...see
what i mean?

Jim

I wish there was a way to keep both...

Jim




Reply via email to