"Philippe De Muyter" <[EMAIL PROTECTED]> writes: > Do you think that this new option would fit easily in the current `sort' > implementation, and how ?
It could be done and would I think be useful, though there would be a few issues. 1. You need to handle the case of "ties" correctly. A tie occurs if two lines compare equal: this can happen even if their byte-for-byte contents differ. In this case, you want the output of "sort --tail 1000" to be equivalent to "sort | tail -n 1000". As you're sorting if a new "tie" comes in for the first line in the tail, you need to discard that first line and insert the "tie" in the correct position somewhere in the "tail". Ties for "sort --head" are easier, as you can simply discard new ties for the last line in the head. 2. Have you looked at the literature for the best data structure to solve your problem? For example, the people doing simulations have worked hard on algorithms to implement priority queues (where the most important operations are "delete min" and "insert"), and this seems to be close to what you need. Splay heaps seem to be one of the current champions in that area. 3 (optional). There should be no arbitrary limits, so ideally the performance should be good even if N is quite large, so large that the number of saved lines doesn't fit into memory. However, this might require adding major structural complexity to your algorithm so perhaps it isn't worth doing. Anyway, thanks for looking into the problem. _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils
