2011-05-07 07:28, Robert Bonomi skrev:
 From listrea...@lazlarlyricon.com  Fri May  6 20:14:09 2011
Date: Sat, 07 May 2011 03:13:39 +0200
From: Rolf Nielsen<listrea...@lazlarlyricon.com>
To: Robert Bonomi<bon...@mail.r-bonomi.com>
CC: freebsd-questions@freebsd.org
Subject: Re: Comparing two lists

2011-05-07 02:54, Robert Bonomi skrev:
   From owner-freebsd-questi...@freebsd.org  Fri May  6 19:27:54 2011
Date: Sat, 07 May 2011 02:09:26 +0200
From: Rolf Nielsen<listrea...@lazlarlyricon.com>
To: FreeBSD<freebsd-questions@freebsd.org>
Subject: Comparing two lists

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?


If the files have only 'minor' differences -- i.e. no long runs of lines
that are in only one fie -- *and* the common lines are  in the same order
in each file, you can use diff(1), without any other shennigans.

If the above is -not- true, and If you need _only_ the common lines, AND
order is not important, then sort(1) both files, and use diff(1) on the
two sorted versions.


Beyond that it depends on what you mean by 'extensive' ones.  megabytes?
Gigabytes? or what??




Some 10,000 to 20,000 lines each. I do need only the common lines. Order
is not essential, but would make life easier. I've tried a little with
uniq, as suggested by Polyptron, but I guess 3am is not quite the right
time to do these things. Anyway, thanks.

Ok, 20k lines is only a medium-size file. There's no problem in fitting
the entire file 'in memory'.  ('big' files are ones that are larger than
available memory. :)

By "quite extensive" I was refering to the number of lines rather than the byte size, and 20k lines is, by my standards, quite a lot for a plain text file. :P
But that's beside the point. :)


Using uniq:
    sort  {{file1}} {{file2}} |uniq -d

Yes, I found that solution on
http://www.catonmat.net/blog/set-operations-in-unix-shell
which is mainly about comm, but also lists other ways of doing things. I also found
grep -xF -f file1 file2
there, and I've tested that one too. Both seem to be doing what I want.


to maintain order, put the following in a file, call it 'common.awk'

      NR==FNR   { array[$0]=1; next; }
                { if (array[$0] == 1) print $0; }

then use the command:

   awk -f common.awk {{file1}} {{file2}}

This will output common lines, in the order they occur in _file2_.



I took the liberty of sending a copy of this to the list although you replied privately.
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Reply via email to