On Fri, Jan 22, 2021 at 10:37 AM Tony Fischetti <tony.fische...@gmail.com> wrote: > > For a while, I've been using a small program I wrote (with help from a > GPL AVL-library) to filter unsorted duplicate lines. I thought I might > see if this can be added to `uniq` (or some other way) but I saw that > a nearly identical proposal > (https://lists.gnu.org/archive/html/coreutils/2011-11/msg00016.html) > was already put forth and rejected. > > I thought it might be worth it to make the case again, with an expanded > rationale, and especially as I already have a proof of concept (available > below) and I'm willing to write the code, documentation, translation, > etc... > > It was said in the replies to the original proposal that it's up to > the user to decide whether they want to run `sort` and then pipe it > to `uniq`. But in all the years I've used coreutils, I've never once > used `uniq` without `sort`. I've spoken to many others, and their > experience comports with mine. > > But this was not because I wanted the output to be sorted; in fact, > I specifically didn't. Most times, I want (and even require that) the > duplicated lines be stripped as soon as the data becomes available, > and remain in the original order. This is especially useful for log > files, journals, output from statistical software, etc...
I'm sure it's been mentioned in previous threads, but didn't see an alternate-language implementation listed in the 2011 coreutils thread you linked above, so here's one: perl -ne '$seen{$_}++ or print' You can also wrap a function around that, e.g., uniq_stream() { perl -ne '$seen{$_}++ or print' "$@"; }