David Crisp <david.cr...@gmail.com> writes:

> name  | value
> ==========
> ItemOne : 10
> ItemOne : 10
> ItemOne : 10
> ItemOne : 10
> ItemTwo : 20
> ItemTwo : 20
> ItemTwo : 20
> ItemTwo : 20
> ItemThree : 30
> ItemThree : 30
> ItemThree : 30
> ItemThree : 30

If you're confident there will frequently be duplicated lines, and you
want to ignore the duplicates, I'd recommend (on Unix) filtering the
list to remove them::

    $ cat items | sort | uniq > items_dedup

Then you can read the ‘items_dedup’ file in your Python program.

You can even write your Python program as a filter (read the input lines
from ‘sys.stdin’, write the result to ‘sys.stdout’) and just hook it
into that command pipeline. If the program you're writing is named
‘do_more_processing’::

    $ cat items | sort | uniq | do_more_processing > outputfile

-- 
 \         “Science is a way of trying not to fool yourself. The first |
  `\     principle is that you must not fool yourself, and you are the |
_o__)               easiest person to fool.” —Richard P. Feynman, 1964 |
Ben Finney

_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to