On Sat, Oct 01, 2011 at 03:07:09PM -0700, archaeal wrote:
> Hello all,
> I would like to identify or eliminate pairs of "words" from different
> lines.
> 
> An example (all words are seperated by a tab:
> 
> 53_G16I9RF01EUP2C     53_G16I9RF02JZUJU
> 53_G16I9RF02JZUJU     53_G16I9RF01EUP2C
> 53_G16I9RF02JZV1E     33_G0JCAX402GV9YC
> 53_G16I9RF02JZV1E     33_G16I9RF02FOVF0
> or:
> A B
> B A
> C D
> E F
> 
> Line one and two contains the same words but in inverted order. I
> would like to eliminate one of these "duplicates". I thought it could
> work with process duplicate lines with: [a-z0-9_]{17}\t[a-z0-9_]{17}
> but this didn't work.

> I would be glad if someone could help me out with this. Perhaps there
> is a more simple way to do this

Process Duplicate Lines allows you to specify parts of the lines to compare
using a pattern, but it doesn't reorder the parts when finding duplicates.

For example, if your pattern is "(A) . (B)", and you have "All
sub-patterns" checked, these lines are duplicates:
  A x B
  A y B
but these lines are not:
  A x B
  B y A
because AB is not the same as BA.


To find the duplicates, you should first convert each line to a canonical
form.  In this case, you could sort the words on each line.


Here's a Perl script that does it, which you can use in BBEdit as a Unix
Filter:

#!perl -ln

print unless $seen{join "\t", sort split /\t/}++;

__END__

It splits each input line on tabs, sorts the words, joins them back
together with tabs, and increments the tally.  The first time a particular
set of words is seen, the original line is printed.


Or, you could print the canonical form of the line instead:

#!perl -ln

$canonical = join "\t", sort split /\t/;
print $canonical unless $seen{$canonical}++;

__END__


Ronald

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

Reply via email to