On 6/8/06, Stephen Colebourne <[EMAIL PROTECTED]> wrote:

Paul Dlug wrote:
> I'm happy to provide a patch since I already have an implementation
> completed. The best way I can think of to take this is to provide two
> methods to AbstractMapBag:
>
> public Object[] getSortedCount();
>
> public Object[] getSortedCount(Comparator c);
>
> Please let me know if List() is preferable to Object[], from the
research
> I've done so far it appears that a Set must be copied into a List or
Array
> in order to sort it. It seems to me that better performance would be
> achieved with Set.toArray() and Arrays.sort()
>
> The second method would be used to provide a second level comparator
(first
> sort by count, if there's a tie sort by Comparator). This is useful to
do
> something like sorting categories by count and then alphabetically.
>
> This could alternatively be placed in BagUtils but I think it's cleaner
to
> have it as an instance method on the Bag itself.

Hmmm. I've just taken a look at the code, and its not as simple as I
thought. I believe that we could take your methods as you propose and it
would work, however, I'm not sure thats the best way.

My problem is that the methods involve copying the data to another list
(List not Object[]) to return it. It feels like a core bag competancy.


That was the exact problem I had. I had to look at the API a few times since
it seemed like this functionality is so obvious for a bag to provide. The
data duplication problem is a big one, none of the core Collections API
methods are available to sort Set's in place (for obvious reasons). The way
to solve this might be with a different backing data structure.

Really, this would seem to be a SortedBag (TreeBag) where the comparator
checks the count. But it could be hard to write a TreeBag comparator
that does this (I haven't really checked). Thus the whole bag is kept
sorted by count, and normal bag methods can then be directly used to
extract the data.


I had the same thought, but I don't know enough about the TreeMap
implementation that backs TreeBag to know if this will introduce too much
reshuffling of the Tree and result in really horrid performance.

I'll continue looking at some other options for doing this in a performant
way, please let me know if you have any ideas.


--Paul

Reply via email to