On Sat, Oct 01, 2011 at 03:07:09PM -0700, archaeal wrote: > Hello all, > I would like to identify or eliminate pairs of "words" from different > lines. > > An example (all words are seperated by a tab: > > 53_G16I9RF01EUP2C 53_G16I9RF02JZUJU > 53_G16I9RF02JZUJU 53_G16I9RF01EUP2C > 53_G16I9RF02JZV1E 33_G0JCAX402GV9YC > 53_G16I9RF02JZV1E 33_G16I9RF02FOVF0 > or: > A B > B A > C D > E F > > Line one and two contains the same words but in inverted order. I > would like to eliminate one of these "duplicates". I thought it could > work with process duplicate lines with: [a-z0-9_]{17}\t[a-z0-9_]{17} > but this didn't work.
> I would be glad if someone could help me out with this. Perhaps there > is a more simple way to do this Process Duplicate Lines allows you to specify parts of the lines to compare using a pattern, but it doesn't reorder the parts when finding duplicates. For example, if your pattern is "(A) . (B)", and you have "All sub-patterns" checked, these lines are duplicates: A x B A y B but these lines are not: A x B B y A because AB is not the same as BA. To find the duplicates, you should first convert each line to a canonical form. In this case, you could sort the words on each line. Here's a Perl script that does it, which you can use in BBEdit as a Unix Filter: #!perl -ln print unless $seen{join "\t", sort split /\t/}++; __END__ It splits each input line on tabs, sorts the words, joins them back together with tabs, and increments the tally. The first time a particular set of words is seen, the original line is printed. Or, you could print the canonical form of the line instead: #!perl -ln $canonical = join "\t", sort split /\t/; print $canonical unless $seen{$canonical}++; __END__ Ronald -- You received this message because you are subscribed to the "BBEdit Talk" discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at <http://groups.google.com/group/bbedit?hl=en> If you have a feature request or would like to report a problem, please email "supp...@barebones.com" rather than posting to the group. Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>