Actually the sortedSet would result in a lot more concise and cleaner
code...ok I'll translate it into a version with Sets and then we put
them side by side and decide which one is prettier (and perhaps more
performant)...
Jim
On 18/04/12 14:11, Jim - FooBar(); wrote:
William,
I did think about using a sortedSet - it is very convenient since it
can take a comparator as an argument to determine the ordering
(exactly what we need)...however i preferred to stay faithful to the
general way openNLP is implemented which is using lists and arrays
mainly - my guess for performance...
Trust me, i hate writing code which mutates data-structures, but that
is the imperative way, hence the Java way...i could try a more
functional approach...let's see how this evolves at the moment and
your suggestion will be seriously considered at a later stage...do you
think performance will be affected at all when using sets instead of
lists?I'm reading on various articles that if you're sorting just once
a list could be slightly faster...on the other hand if one wants the
items to be sorted at any given time then probably a sorted set is
better...that makes sense i guess.
Jim
On 18/04/12 13:47, [email protected] wrote:
Jim,
A comment about your code. Maybe instead of working with sort methods,
lists and arrays you should work with SortedSet. This would help you to
avoid duplicates and the order would be preserved after a removal. I
think
you could also avoid the nulls, maybe creating an output set instead of
modifying the input.
Note: I checked the Span class and there is an issue with it. The
compareTo
method is not checking the Span type, and it inconsistent with equals. I
will open a Jira and fix it.
On Wed, Apr 18, 2012 at 7:13 AM, Jim -
FooBar();<[email protected]>wrote:
On 18/04/12 11:12, Jim - FooBar(); wrote:
On 18/04/12 11:06, Jörn Kottmann wrote:
On 04/18/2012 11:41 AM, Jim - FooBar(); wrote:
I keep the longest span whenever i'm encountering 2
overlapping/intersecting spans of the same type...it makes sense
to do that
cos one of them must be wrong...what about different types though?
When they are intersecting the following span could be longer, thats
right. To keep things simple
I suggested to always keep the first span.
Jörn
well that sounds a bit arbitrary don't you think? It sounds like
flipping
a coin!
also I'm not talking about span length here...if the types are
different
it makes no sense to compare lengths...the chances of any of them
being the
correct one are equal...if we always keep the earliest one it is
easy to
implement but we're hard-coding something with no good
justification...see
what i mean?
Jim
I wish there was a way to keep both...
Jim