Hi,

I'm working on an abstraction to sort random-access data-structures
for Lucene[1] and would like to know if you think this is interesting
enough for inclusion in Apache Commons.

The reason for such an abstraction is that we often need to sort
parallel arrays and the JDK provides no way to do it easily. This is
why we first borrowed SorterTemplate[2] from CGlib: provided the
definition of some primitive methods (compare and swap), it can sort
any sub-sequence of any random-access data-structure.

We recently faced the need to implement TimSort (LUCENE-4839[3]) for
some use-cases, but unfortunately it requires a different set of
primitive operations to be properly implemented. This is why I started
working on LUCENE-4946[1] to be able to have different primitive
operations depending on the sorting algorithm.

A few more notes about this sorting abstraction:
 - since it only assumes the data supports random-access, it can be
used to sort off-heap or commons-primitives collections,
 - it provides some useful sorting algorithms, in particular
introsort[4] and a modified Timsort[5],
 - performance is comparable to the JDK's Arrays.sort in spite of the
abstraction,
 - it is faster and more memory-efficient than Collections.sort for
random-access lists, since the latter dumps the content of the list
into an array to sort it with Arrays.sort.

What do you think?

[1] https://issues.apache.org/jira/browse/LUCENE-4946
[2] 
http://grepcode.com/file/repo1.maven.org/maven2/cglib/cglib/2.2.2/net/sf/cglib/util/SorterTemplate.java#SorterTemplate
[3] https://issues.apache.org/jira/browse/LUCENE-4839
[4] http://fr.wikipedia.org/wiki/Introsort
[5] http://en.wikipedia.org/wiki/Timsort

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to