Re: Another appeal for `uniq --stream`

Jim Meyering Fri, 22 Jan 2021 15:36:51 -0800

On Fri, Jan 22, 2021 at 10:37 AM Tony Fischetti
<tony.fische...@gmail.com> wrote:
>
> For a while, I've been using a small program I wrote (with help from a
> GPL AVL-library) to filter unsorted duplicate lines. I thought I might
> see if this can be added to `uniq` (or some other way) but I saw that
> a nearly identical proposal
> (https://lists.gnu.org/archive/html/coreutils/2011-11/msg00016.html)
> was already put forth and rejected.
>
> I thought it might be worth it to make the case again, with an expanded
> rationale, and especially as I already have a proof of concept (available
> below) and I'm willing to write the code, documentation, translation,
> etc...
>
> It was said in the replies to the original proposal that it's up to
> the user to decide whether they want to run `sort` and then pipe it
> to `uniq`. But in all the years I've used coreutils, I've never once
> used `uniq` without `sort`. I've spoken to many others, and their
> experience comports with mine.
>
> But this was not because I wanted the output to be sorted; in fact,
> I specifically didn't. Most times, I want (and even require that) the
> duplicated lines be stripped as soon as the data becomes available,
> and remain in the original order. This is especially useful for log
> files, journals, output from statistical software, etc...


I'm sure it's been mentioned in previous threads, but didn't see an
alternate-language implementation listed in the 2011 coreutils thread
you linked above, so here's one:

  perl -ne '$seen{$_}++ or print'

You can also wrap a function around that, e.g.,

  uniq_stream() { perl -ne '$seen{$_}++ or print' "$@"; }

Re: Another appeal for `uniq --stream`

Reply via email to