parsing script help please

2012-05-31 Thread nathalie



Hi
I have this format of file: (see attached example)
1   3206102-3207048 3411782-3411981 3660632-3661428
2   4481796-4482748 4483180-4483486


and I would like to change it to this
1   3206102-3207048
1   3411782-3411981
1   3660632-3661428
2   4481796-4482748
2   4483180-4483486 .


I have tried with this script to create an array for each line, and to 
print the first element (1 or  2) with the rest of the line but the 
output don't seem to be right, could you please advise?

#!/software/bin/perl
use warnings;
use strict;
my $file=example.txt;
my $in;
open(  $in , '' , $file ) or die( $! );
#open(  $out, txtout);


while ($in){
next if /^#/;
my @lines=split(/\t/);
chomp;
for (@lines) { print $lines[0],\t,$_,\n; };


ouput
1   1  i don't want this
1   3206102-3207048
1   3411782-3411981
1   3660632-3661428
1   i don't want this
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1   1
1   4334680-4340171
1   4341990-4342161
1   4342282-4342905
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1   1
1   4481796-4482748
1   4483180-4483486
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1   1
1   4797994-4798062
1   4798535-4798566
1   4818664-4818729
1   4820348-4820395
1   4822391-4822461
1   4827081-4827154
1   4829467-4829568
1   4831036-4831212
1   4835043-4835096

many thanks
Nathalie




--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
1   3206102-3207048 3411782-3411981 3660632-3661428 


1   4334680-4340171 4341990-4342161 4342282-4342905 


1   4481796-4482748 4483180-4483486 


1   4797994-4798062 4798535-4798566 4818664-4818729 4820348-4820395 
4822391-4822461 4827081-4827154 4829467-4829568 4831036-4831212 4835043-4835096 

1   4900554-4900742 4902394-4902628 4906977-4907060 4913927-4914069 
5009391-5009459 

1   5074531-5074643 5079089-5079191 5083443-5083532 5085695-5085808 
5088109-5088213 5091150-5091203 5107470-5107567 5114357-5114549 5123164-5123342 
5125892-5126017 5133830-5133931 5140028-5140141 5152185-5152245 

1   5579129-5579385 5588670-5589022 5592332-5592864 


1   6758978-6759032 6761024-6761054 6785466-6785656 6792400-6793185 
6797623-6797757 6799357-6799447 6800654-6800758 6807683-6807876 6810014-6810225 
6817856-6818101 6823643-6823758 6832219-6832281 6832953-6833021 6834201-6834263 
6834371-6834451 6835651-6835872 6838987-6839075 6845449-6845552 6847380-6847523 
6848540-6848680
1   9535604-9536701 


1   9538173-9538231 9539987-9540024 9541797-9541837 9543003-9543056 
9543398-9543552 9543703-9543899 9545618-9545695 9547080-9547185 9548213-9548365 
9550189-9550287 9553589-9553667 9553877-9553973 9556861-9557018 9566271-9566354 


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Re: parsing script help please

2012-05-31 Thread Rob Coops
On Thu, May 31, 2012 at 11:37 AM, nathalie n...@sanger.ac.uk wrote:



 Hi
 I have this format of file: (see attached example)
 1   3206102-3207048 3411782-3411981 3660632-3661428
 2   4481796-4482748 4483180-4483486


 and I would like to change it to this
 1   3206102-3207048
 1   3411782-3411981
 1   3660632-3661428
 2   4481796-4482748
 2   4483180-4483486 .


 I have tried with this script to create an array for each line, and to
 print the first element (1 or  2) with the rest of the line but the output
 don't seem to be right, could you please advise?
 #!/software/bin/perl
 use warnings;
 use strict;
 my $file=example.txt;
 my $in;
 open(  $in , '' , $file ) or die( $! );
 #open(  $out, txtout);


 while ($in){
next if /^#/;
my @lines=split(/\t/);
chomp;
 for (@lines) { print $lines[0],\t,$_,\n; };


 ouput
 1   1  i don't want this
 1   3206102-3207048
 1   3411782-3411981
 1   3660632-3661428
 1   i don't want this
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

 1   1
 1   4334680-4340171
 1   4341990-4342161
 1   4342282-4342905
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

 1   1
 1   4481796-4482748
 1   4483180-4483486
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

 1   1
 1   4797994-4798062
 1   4798535-4798566
 1   4818664-4818729
 1   4820348-4820395
 1   4822391-4822461
 1   4827081-4827154
 1   4829467-4829568
 1   4831036-4831212
 1   4835043-4835096

 many thanks
 Nathalie




 --
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a company
 registered in England with number 2742969, whose registered office is 215
 Euston Road, London, NW1 2BE.
 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/


Hi Nathalie,

Instead of using the split function I would personally go for a regular
expression as it allows for a lot more control over what you want to find.
Here is my solution...

#!/usr/local/bin/perl

use strict;
use warnings;

my $fh;

my %results;

open ( $fh, '', 'temp.txt' ) or die $!;
while ( $fh ) {
 chomp;
 my $line = $_;
 my $rownum = substr($line, 0, 1);

 my @othernumbers;
 while ( /(\d{7}-\d{7})/g ) {
  push ( @othernumbers, $1 );
 }

 $results{$rownum} = \@othernumbers;
}
close $fh;

use Data::Dumper;
print Dumper %results;

This should print the results below:

$VAR1 = '1';
$VAR2 = [
  '3206102-3207048',
  '3411782-3411981',
  '3660632-3661428'
];
$VAR3 = '2';
$VAR4 = [
  '4481796-4482748',
  '4483180-4483486'
];

And this is I believe where you wanted to go. Of course you could just
print it directly without the need for the temp variables etc but I assume
that you want to do something more with the found values then just dump
them on your screen.

Regards,

Rob


Re: parsing script help please

2012-05-31 Thread John W. Krahn

nathalie wrote:



Hi


Hello,


I have this format of file: (see attached example)
1 3206102-3207048 3411782-3411981 3660632-3661428
2 4481796-4482748 4483180-4483486


and I would like to change it to this
1 3206102-3207048
1 3411782-3411981
1 3660632-3661428
2 4481796-4482748
2 4483180-4483486 .


I have tried with this script to create an array for each line, and to
print the first element (1 or 2) with the rest of the line but the
output don't seem to be right, could you please advise?
#!/software/bin/perl
use warnings;
use strict;
my $file=example.txt;
my $in;
open( $in , '' , $file ) or die( $! );
#open( $out, txtout);


while ($in){
next if /^#/;
my @lines=split(/\t/);
chomp;
for (@lines) { print $lines[0],\t,$_,\n; };



You want something like this:

#!/software/bin/perl
use warnings;
use strict;

my $file = example.txt;

open my $in, '', $file or die Cannot open '$file' because: $!;

while ( $in ) {
next if /^#/;
chomp;
my ( $key, @fields ) = split /\t/;
print map $key\t$_\n, @fields;
}

__END__



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: parsing script help please

2012-05-31 Thread nathalie

thanks a lot Rob
I would like an output without the $VAR...
so I did add this after the push function and it is perfect, thanks a 
lot again


foreach my $number(@othernumbers){print $rownum,\t,$elet, \n;}




#!/usr/local/bin/perl

use strict;
use warnings;

my $fh;

my %results;

open ( $fh, '', 'temp.txt' ) or die $!;
while ( $fh ) {
 chomp;
 my $line = $_;
 my $rownum = substr($line, 0, 1);

 my @othernumbers;
 while ( /(\d{7}-\d{7})/g ) {
  push ( @othernumbers, $1 );
 }

 $results{$rownum} = \@othernumbers;
}
close $fh;

use Data::Dumper;
print Dumper %results;

This should print the results below:

$VAR1 = '1';
$VAR2 = [
  '3206102-3207048',
  '3411782-3411981',
  '3660632-3661428'
];
$VAR3 = '2';
$VAR4 = [
  '4481796-4482748',
  '4483180-4483486'
];

And this is I believe where you wanted to go. Of course you could just 
print it directly without the need for the temp variables etc but I 
assume that you want to do something more with the found values then 
just dump them on your screen.


Regards,

Rob






On Thu, May 31, 2012 at 11:37 AM, nathalie n...@sanger.ac.uk 
mailto:n...@sanger.ac.uk wrote:




Hi
I have this format of file: (see attached example)
1   3206102-3207048 3411782-3411981 3660632-3661428
2   4481796-4482748 4483180-4483486


and I would like to change it to this
1   3206102-3207048
1   3411782-3411981
1   3660632-3661428
2   4481796-4482748
2   4483180-4483486 .


I have tried with this script to create an array for each line,
and to print the first element (1 or  2) with the rest of the line
but the output don't seem to be right, could you please advise?
#!/software/bin/perl
use warnings;
use strict;
my $file=example.txt;
my $in;
open(  $in , '' , $file ) or die( $! );
#open(  $out, txtout);


while ($in){
   next if /^#/;
   my @lines=split(/\t/);
   chomp;
for (@lines) { print $lines[0],\t,$_,\n; };


ouput
1   1  i don't want this
1   3206102-3207048
1   3411782-3411981
1   3660632-3661428
1   i don't want this
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1   1
1   4334680-4340171
1   4341990-4342161
1   4342282-4342905
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1   1
1   4481796-4482748
1   4483180-4483486
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1   1
1   4797994-4798062
1   4798535-4798566
1   4818664-4818729
1   4820348-4820395
1   4822391-4822461
1   4827081-4827154
1   4829467-4829568
1   4831036-4831212
1   4835043-4835096

many thanks
Nathalie




-- 
The Wellcome Trust Sanger Institute is operated by Genome Research

Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose
registered office is 215 Euston Road, London, NW1 2BE.
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
mailto:beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
mailto:beginners-h...@perl.org
http://learn.perl.org/


Hi Nathalie,

Instead of using the split function I would personally go for a 
regular expression as it allows for a lot more control over what you 
want to find. Here is my solution...


#!/usr/local/bin/perl

use strict;
use warnings;

my $fh;

my %results;

open ( $fh, '', 'temp.txt' ) or die $!;
while ( $fh ) {
 chomp;
 my $line = $_;
 my $rownum = substr($line, 0, 1);

 my @othernumbers;
 while ( /(\d{7}-\d{7})/g ) {
  push ( @othernumbers, $1 );
 }

 $results{$rownum} = \@othernumbers;
}
close $fh;

use Data::Dumper;
print Dumper %results;

This should print the results below:

$VAR1 = '1';
$VAR2 = [
  '3206102-3207048',
  '3411782-3411981',
  '3660632-3661428'
];
$VAR3 = '2';
$VAR4 = [
  '4481796-4482748',
  '4483180-4483486'
];

And this is I believe where you wanted to go. Of course you could just 
print it directly without the need for the temp variables etc but I 
assume that you want to do something more with the found values then 
just dump them on your screen.


Regards,

Rob





--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: parsing script help please

2012-05-31 Thread nathalie




You want something like this:

#!/software/bin/perl
use warnings;
use strict;

my $file = example.txt;

open my $in, '', $file or die Cannot open '$file' because: $!;

while ( $in ) {
next if /^#/;
chomp;
my ( $key, @fields ) = split /\t/;
print map $key\t$_\n, @fields;
}

__END__



John

HI John,
work perfectly thanks

thanks
Nat


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/