Re: Position Weight Matrix of Set of Strings with Perl

2006-09-07 Thread Rob Dixon

Wijaya Edward wrote:
>
> From: Rob Dixon
>>
>> use strict;
>> use warnings;
>>
>> my %pwm;
>>
>> while () {
>>my $col = 0;
>>foreach my $c (/\S/g) {
>>  $pwm{$c}[$col++]++;
>>}
>> }
>>
>> foreach my $freq (values %pwm) {
>>$_ = $_ ? $_ / keys %pwm : 0 foreach @$freq;
>> }
>>
>> use Data::Dumper;
>> print Dumper \%pwm;
>>
>>
>> __END__
>> AAA
>> ATG
>> TTT
>> GTC
>
> I was trying your script with this set of strings:
>
> __DATA__
> CAGGTG
> CAGGTG
>
> But how come it returns:
>
> $VAR1 = {
> 'A' => [ 0, '0.5' ],
> 'T' => [ 0, 0, 0, 0, '0.5' ],
> 'C' => [ '0.5' ],
> 'G' => [ 0, 0, '0.5', '0.5', 0, '0.5' ]
> };
>
> Instead of the correct:
>
> $VAR1 = {
> 'A' => [ '0', '1', '0', '0', '0', '0' ],
> 'T' => [ '0', '0', '0', '0', '1', '0' ],
> 'C' => [ '1', '0', '0', '0', '0', '0' ],
> 'G' => [ '0', '0', '1', '1', '0', '1' ]
> };

Hi Edward

(Please bottom-post your replies)

Yes, I'm sorry, Ruud pointed out that I was dividing by the number of symbols
instead of the number of rows to get the probabilities. You have also found that
each symbol was given no probability value after the last column where it
occurred. This revision fixes both of these problems. I've also borrowed Ruud's
neater construct to normalise the hash values in the final loop. Thanks Ruud!

use strict;
use warnings;

my %pwm;
my ($rows, $cols) = (0, 0);

while () {
  my $col = 0;
  $rows++ if /\S/;
  foreach my $c (/\S/g) {
$pwm{$c}[$col++]++;
$cols = $col if $col > $cols;
  }
}

foreach my $freq (values %pwm) {
  ($_ |= 0) /= $rows foreach @$freq[0 .. $cols - 1];
}

use Data::Dumper;
print Dumper \%pwm;


__DATA__
CAGGTG
CAGGTG

*OUTPUT*

$VAR1 = {
  'A' => [
   '0',
   '1',
   '0',
   '0',
   '0',
   '0'
 ],
  'T' => [
   '0',
   '0',
   '0',
   '0',
   '1',
   '0'
 ],
  'C' => [
   '1',
   '0',
   '0',
   '0',
   '0',
   '0'
 ],
  'G' => [
   '0',
   '0',
   '1',
   '1',
   '0',
   '1'
 ]
};

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Position Weight Matrix of Set of Strings with Perl

2006-09-06 Thread Mumia W.

On 09/06/2006 05:41 AM, Mumia W. wrote:

On 09/06/2006 04:02 AM, Wijaya Edward wrote:

Dear Experts,
 
I am looking for a really efficient way to compute a position weight 
matrix (PWM) [...]


Although I'm sure that smarter posters than I will [...]


do it right.

Ugh, I forgot about Wijaya's requirement that the PWM be 
calculated in probabilities, and I also forgot that the 
lengths of the base-pair strings can be different. Here is my 
updated code:


use strict;
use warnings;
use Data::Dumper;
local our @deep;
local $" = ', ';

my $length = 5;
my @data = qw(AAA ATG TTT GTC);
@data = map [ split // ], @data;

my (%hash);
for my $entry (@data) {
*deep = $entry;
for my $nx (0..$#deep) {
$hash{$deep[$nx]}[$nx]++;
}
}

my $count = keys %hash;
while (my ($key, $values) = each %hash) {
$#{$values} = $length;
@$values = map defined $_ ? $_ / ($count) : 0, @$values;
@$values = map sprintf('%4.2f',$_), @$values;
print "$key => [ @{$hash{$key}} ]\n";
}

__END__

Output:
A => [ 0.50, 0.25, 0.25, 0.00, 0.00, 0.00 ]
T => [ 0.25, 0.75, 0.25, 0.00, 0.00, 0.00 ]
C => [ 0.00, 0.00, 0.25, 0.00, 0.00, 0.00 ]
G => [ 0.25, 0.00, 0.25, 0.00, 0.00, 0.00 ]
End output.

Note: I use $length to set the maximal array index value.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Position Weight Matrix of Set of Strings with Perl

2006-09-06 Thread Wijaya Edward

Dear Rob,
 
I was trying your script with this set of strings:
 
__DATA__
CAGGTG
CAGGTG
 
But how come it returns:
 
$VAR1 = {
'A' => [ 0, '0.5' ], 
'T' => [ 0, 0, 0, 0, '0.5' ], 
'C' => [ '0.5' ],
'G' => [ 0, 0, '0.5', '0.5', 0, '0.5' ]
};

Instead of the correct:

$VAR1 = {
'A' => [ '0', '1', '0', '0', '0', '0' ],
'T' => [ '0', '0', '0', '0', '1', '0' ],
'C' => [ '1', '0', '0', '0', '0', '0' ],
'G' => [ '0', '0', '1', '1', '0', '1' ] 
};


Hope to hear from you again.


Regards,
Edward WIJAYA
SINGAPORE

 


 



From: Rob Dixon [mailto:[EMAIL PROTECTED]
Sent: Wed 9/6/2006 8:14 PM
To: beginners@perl.org
Subject: Re: Position Weight Matrix of Set of Strings with Perl



use strict;
use warnings;

my %pwm;

while () {
   my $col = 0;
   foreach my $c (/\S/g) {
 $pwm{$c}[$col++]++;
   }
}

foreach my $freq (values %pwm) {
   $_ = $_ ? $_ / keys %pwm : 0 foreach @$freq;
}

use Data::Dumper;
print Dumper \%pwm;


__END__
AAA
ATG
TTT
GTC


OUTPUT


$VAR1 = {
   'A' => [
'0.5',
'0.25',
'0.25'
  ],
   'T' => [
'0.25',
'0.75',
'0.25'
  ],
   'C' => [
0,
0,
'0.25'
  ],
   'G' => [
'0.25',
0,
'0.25'
  ]
 };

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>





 Institute For Infocomm Research - Disclaimer -
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: Position Weight Matrix of Set of Strings with Perl

2006-09-06 Thread Dr.Ruud
Rob Dixon schreef:

> use strict;
> use warnings;
>
> my %pwm;
>
> while () {
>my $col = 0;
>foreach my $c (/\S/g) {
>  $pwm{$c}[$col++]++;
>}
> }
>
> foreach my $freq (values %pwm) {
>$_ = $_ ? $_ / keys %pwm : 0 foreach @$freq;
> }
>
> use Data::Dumper;
> print Dumper \%pwm;
>
>
> __END__
> AAA
> ATG
> TTT
> GTC


Is "keys %pwm" the right divisor, or is the number of lines?


Variant:

#!/usr/bin/perl
  use warnings ;
  use strict ;

  my ($c, $r, %pwm) ;
  /\s/ ? ($r++, $c=0) : $pwm{$_}[$c++]++
  for do {local $/;  =~ /(.)/sg} ;
  for (values %pwm) { ($_||=0)/=$r for @$_ } ;

  use Data::Dumper ;
  print Dumper \%pwm ;

__DATA__
AAA
ATG
TTT
GTC

:)

-- 
Affijn, Ruud

"Gewoon is een tijger."



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Position Weight Matrix of Set of Strings with Perl

2006-09-06 Thread Rob Dixon

Wijaya Edward wrote:

> Dear Experts,
>
> I am looking for a really efficient way to compute a position weight matrix
> (PWM) from a set of strings. In each set the strings are of the same length.
> Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur
> in each position/column of a string. For example the set of strings below:
>
> AAA
> ATG
> TTT
> GTC
>
> Note that the length of these strings in the set
> maybe greater than 3.
>
> Would give the following result:
>
> $VAR1 =  {
> 'A' => [2,1,1],
> 'T' => [1,3,1],
> 'C' => [0,0,1],
> 'G' => [1,0,1]
>  };
>
> So the size of the array is the same with the length of the string.
> In my case I need the variation of it, namely the probability of the
> each base occur in the particular position:
>
> $VAR = {
> 'A' => ['0.5','0.25','0.25'],
> 'T' => ['0.25','0.75','0.25'],
> 'C' => ['0','0','0.25'],
> 'G' => ['0.25','0','0.25']
>   }
>
> In this link you can  find my incredibly naive and inefficient code.
> Can any body suggest a better and faster solution than this:
>
> http://www.rafb.net/paste/results/c6T7B629.html

Hi Edward.

A nice little problem. Thank you.

The main reason for the length of your own solution is that you haven't taken
the opportunity to use hashes to store data that is parallel across the four
possible characters, so the code is about four times as long as it needs to be!

Here is my solution. I have written it to pull data from the pseudo-filehandle
DATA, as it is unlikely that you will want your actual data hard-coded as an
array.

HTH.

Rob Dixon


use strict;
use warnings;

my %pwm;

while () {
  my $col = 0;
  foreach my $c (/\S/g) {
$pwm{$c}[$col++]++;
  }
}

foreach my $freq (values %pwm) {
  $_ = $_ ? $_ / keys %pwm : 0 foreach @$freq;
}

use Data::Dumper;
print Dumper \%pwm;


__END__
AAA
ATG
TTT
GTC


OUTPUT


$VAR1 = {
  'A' => [
   '0.5',
   '0.25',
   '0.25'
 ],
  'T' => [
   '0.25',
   '0.75',
   '0.25'
 ],
  'C' => [
   0,
   0,
   '0.25'
 ],
  'G' => [
   '0.25',
   0,
   '0.25'
 ]
};

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Position Weight Matrix of Set of Strings with Perl

2006-09-06 Thread Mumia W.

On 09/06/2006 04:02 AM, Wijaya Edward wrote:

Dear Experts,
 
I am looking for a really efficient way to compute a position weight matrix (PWM) from a set of strings. In each set the strings are of the same length. Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur in each position/column of a string. For example the set of strings below:
 
AAA

ATG
TTT
GTC

Note that the length of these strings in the set 
maybe greater than 3. 

Would give the following result: 
 
$VAR1 =  {

'A' => [2,1,1],
'T' => [1,3,1],
'C' => [0,0,1],
'G' => [1,0,1]
 };
 
So the size of the array is the same with the length of the string.
In my case I need the variation of it, namely the probability of the 
each base occur in the particular position:


$VAR = {
'A' => ['0.5','0.25','0.25'],
'T' => ['0.25','0.75','0.25'],
'C' => ['0','0','0.25'],
'G' => ['0.25','0','0.25']
  }
 
In this link you can  find my incredibly naive and inefficient code. 
Can any body suggest a better and faster solution than this:
 
http://www.rafb.net/paste/results/c6T7B629.html
 
 
Thanks and Regards,

Edward WIJAYA
SINGAPORE



Although I'm sure that smarter posters than I will turn this 
into a one-liner, I think that my solution is not so atrocious:


use strict;
use warnings;
use Data::Dumper;
local our @deep;
local $; = ','; # A vestige of a previous version

my @data = qw(AAA ATG TTT GTC);
my @d2 = map [ split // ], @data;

my (%hash);
for my $entry (@d2) {
*deep = $entry;
for my $nx (0..$#deep) {
$hash{$deep[$nx]}[$nx]++;
}
}
foreach my $entry (values %hash) {
$entry = [ map defined $_ ? $_ : 0, @$entry ];
}
print Dumper(\%hash);

__HTH__


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Position Weight Matrix of Set of Strings with Perl

2006-09-06 Thread Wijaya Edward

Dear Experts,
 
I am looking for a really efficient way to compute a position weight matrix 
(PWM) from a set of strings. In each set the strings are of the same length. 
Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur in 
each position/column of a string. For example the set of strings below:
 
AAA
ATG
TTT
GTC

Note that the length of these strings in the set 
maybe greater than 3. 

Would give the following result: 
 
$VAR1 =  {
'A' => [2,1,1],
'T' => [1,3,1],
'C' => [0,0,1],
'G' => [1,0,1]
 };
 
So the size of the array is the same with the length of the string.
In my case I need the variation of it, namely the probability of the 
each base occur in the particular position:

$VAR = {
'A' => ['0.5','0.25','0.25'],
'T' => ['0.25','0.75','0.25'],
'C' => ['0','0','0.25'],
'G' => ['0.25','0','0.25']
  }
 
In this link you can  find my incredibly naive and inefficient code. 
Can any body suggest a better and faster solution than this:
 
http://www.rafb.net/paste/results/c6T7B629.html
 
 
Thanks and Regards,
Edward WIJAYA
SINGAPORE

 Institute For Infocomm Research - Disclaimer -
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]