parsing script help please
Hi I have this format of file: (see attached example) 1 3206102-3207048 3411782-3411981 3660632-3661428 2 4481796-4482748 4483180-4483486 and I would like to change it to this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 2 4481796-4482748 2 4483180-4483486 . I have tried with this script to create an array for each line, and to print the first element (1 or 2) with the rest of the line but the output don't seem to be right, could you please advise? #!/software/bin/perl use warnings; use strict; my $file=example.txt; my $in; open( $in , '' , $file ) or die( $! ); #open( $out, txtout); while ($in){ next if /^#/; my @lines=split(/\t/); chomp; for (@lines) { print $lines[0],\t,$_,\n; }; ouput 1 1 i don't want this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 1 i don't want this 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4334680-4340171 1 4341990-4342161 1 4342282-4342905 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4481796-4482748 1 4483180-4483486 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4797994-4798062 1 4798535-4798566 1 4818664-4818729 1 4820348-4820395 1 4822391-4822461 1 4827081-4827154 1 4829467-4829568 1 4831036-4831212 1 4835043-4835096 many thanks Nathalie -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 1 3206102-3207048 3411782-3411981 3660632-3661428 1 4334680-4340171 4341990-4342161 4342282-4342905 1 4481796-4482748 4483180-4483486 1 4797994-4798062 4798535-4798566 4818664-4818729 4820348-4820395 4822391-4822461 4827081-4827154 4829467-4829568 4831036-4831212 4835043-4835096 1 4900554-4900742 4902394-4902628 4906977-4907060 4913927-4914069 5009391-5009459 1 5074531-5074643 5079089-5079191 5083443-5083532 5085695-5085808 5088109-5088213 5091150-5091203 5107470-5107567 5114357-5114549 5123164-5123342 5125892-5126017 5133830-5133931 5140028-5140141 5152185-5152245 1 5579129-5579385 5588670-5589022 5592332-5592864 1 6758978-6759032 6761024-6761054 6785466-6785656 6792400-6793185 6797623-6797757 6799357-6799447 6800654-6800758 6807683-6807876 6810014-6810225 6817856-6818101 6823643-6823758 6832219-6832281 6832953-6833021 6834201-6834263 6834371-6834451 6835651-6835872 6838987-6839075 6845449-6845552 6847380-6847523 6848540-6848680 1 9535604-9536701 1 9538173-9538231 9539987-9540024 9541797-9541837 9543003-9543056 9543398-9543552 9543703-9543899 9545618-9545695 9547080-9547185 9548213-9548365 9550189-9550287 9553589-9553667 9553877-9553973 9556861-9557018 9566271-9566354 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: parsing script help please
On Thu, May 31, 2012 at 11:37 AM, nathalie n...@sanger.ac.uk wrote: Hi I have this format of file: (see attached example) 1 3206102-3207048 3411782-3411981 3660632-3661428 2 4481796-4482748 4483180-4483486 and I would like to change it to this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 2 4481796-4482748 2 4483180-4483486 . I have tried with this script to create an array for each line, and to print the first element (1 or 2) with the rest of the line but the output don't seem to be right, could you please advise? #!/software/bin/perl use warnings; use strict; my $file=example.txt; my $in; open( $in , '' , $file ) or die( $! ); #open( $out, txtout); while ($in){ next if /^#/; my @lines=split(/\t/); chomp; for (@lines) { print $lines[0],\t,$_,\n; }; ouput 1 1 i don't want this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 1 i don't want this 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4334680-4340171 1 4341990-4342161 1 4342282-4342905 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4481796-4482748 1 4483180-4483486 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4797994-4798062 1 4798535-4798566 1 4818664-4818729 1 4820348-4820395 1 4822391-4822461 1 4827081-4827154 1 4829467-4829568 1 4831036-4831212 1 4835043-4835096 many thanks Nathalie -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi Nathalie, Instead of using the split function I would personally go for a regular expression as it allows for a lot more control over what you want to find. Here is my solution... #!/usr/local/bin/perl use strict; use warnings; my $fh; my %results; open ( $fh, '', 'temp.txt' ) or die $!; while ( $fh ) { chomp; my $line = $_; my $rownum = substr($line, 0, 1); my @othernumbers; while ( /(\d{7}-\d{7})/g ) { push ( @othernumbers, $1 ); } $results{$rownum} = \@othernumbers; } close $fh; use Data::Dumper; print Dumper %results; This should print the results below: $VAR1 = '1'; $VAR2 = [ '3206102-3207048', '3411782-3411981', '3660632-3661428' ]; $VAR3 = '2'; $VAR4 = [ '4481796-4482748', '4483180-4483486' ]; And this is I believe where you wanted to go. Of course you could just print it directly without the need for the temp variables etc but I assume that you want to do something more with the found values then just dump them on your screen. Regards, Rob
Re: parsing script help please
nathalie wrote: Hi Hello, I have this format of file: (see attached example) 1 3206102-3207048 3411782-3411981 3660632-3661428 2 4481796-4482748 4483180-4483486 and I would like to change it to this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 2 4481796-4482748 2 4483180-4483486 . I have tried with this script to create an array for each line, and to print the first element (1 or 2) with the rest of the line but the output don't seem to be right, could you please advise? #!/software/bin/perl use warnings; use strict; my $file=example.txt; my $in; open( $in , '' , $file ) or die( $! ); #open( $out, txtout); while ($in){ next if /^#/; my @lines=split(/\t/); chomp; for (@lines) { print $lines[0],\t,$_,\n; }; You want something like this: #!/software/bin/perl use warnings; use strict; my $file = example.txt; open my $in, '', $file or die Cannot open '$file' because: $!; while ( $in ) { next if /^#/; chomp; my ( $key, @fields ) = split /\t/; print map $key\t$_\n, @fields; } __END__ John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: parsing script help please
thanks a lot Rob I would like an output without the $VAR... so I did add this after the push function and it is perfect, thanks a lot again foreach my $number(@othernumbers){print $rownum,\t,$elet, \n;} #!/usr/local/bin/perl use strict; use warnings; my $fh; my %results; open ( $fh, '', 'temp.txt' ) or die $!; while ( $fh ) { chomp; my $line = $_; my $rownum = substr($line, 0, 1); my @othernumbers; while ( /(\d{7}-\d{7})/g ) { push ( @othernumbers, $1 ); } $results{$rownum} = \@othernumbers; } close $fh; use Data::Dumper; print Dumper %results; This should print the results below: $VAR1 = '1'; $VAR2 = [ '3206102-3207048', '3411782-3411981', '3660632-3661428' ]; $VAR3 = '2'; $VAR4 = [ '4481796-4482748', '4483180-4483486' ]; And this is I believe where you wanted to go. Of course you could just print it directly without the need for the temp variables etc but I assume that you want to do something more with the found values then just dump them on your screen. Regards, Rob On Thu, May 31, 2012 at 11:37 AM, nathalie n...@sanger.ac.uk mailto:n...@sanger.ac.uk wrote: Hi I have this format of file: (see attached example) 1 3206102-3207048 3411782-3411981 3660632-3661428 2 4481796-4482748 4483180-4483486 and I would like to change it to this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 2 4481796-4482748 2 4483180-4483486 . I have tried with this script to create an array for each line, and to print the first element (1 or 2) with the rest of the line but the output don't seem to be right, could you please advise? #!/software/bin/perl use warnings; use strict; my $file=example.txt; my $in; open( $in , '' , $file ) or die( $! ); #open( $out, txtout); while ($in){ next if /^#/; my @lines=split(/\t/); chomp; for (@lines) { print $lines[0],\t,$_,\n; }; ouput 1 1 i don't want this 1 3206102-3207048 1 3411782-3411981 1 3660632-3661428 1 i don't want this 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4334680-4340171 1 4341990-4342161 1 4342282-4342905 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4481796-4482748 1 4483180-4483486 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4797994-4798062 1 4798535-4798566 1 4818664-4818729 1 4820348-4820395 1 4822391-4822461 1 4827081-4827154 1 4829467-4829568 1 4831036-4831212 1 4835043-4835096 many thanks Nathalie -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org mailto:beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org mailto:beginners-h...@perl.org http://learn.perl.org/ Hi Nathalie, Instead of using the split function I would personally go for a regular expression as it allows for a lot more control over what you want to find. Here is my solution... #!/usr/local/bin/perl use strict; use warnings; my $fh; my %results; open ( $fh, '', 'temp.txt' ) or die $!; while ( $fh ) { chomp; my $line = $_; my $rownum = substr($line, 0, 1); my @othernumbers; while ( /(\d{7}-\d{7})/g ) { push ( @othernumbers, $1 ); } $results{$rownum} = \@othernumbers; } close $fh; use Data::Dumper; print Dumper %results; This should print the results below: $VAR1 = '1'; $VAR2 = [ '3206102-3207048', '3411782-3411981', '3660632-3661428' ]; $VAR3 = '2'; $VAR4 = [ '4481796-4482748', '4483180-4483486' ]; And this is I believe where you wanted to go. Of course you could just print it directly without the need for the temp variables etc but I assume that you want to do something more with the found values then just dump them on your screen. Regards, Rob -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: parsing script help please
You want something like this: #!/software/bin/perl use warnings; use strict; my $file = example.txt; open my $in, '', $file or die Cannot open '$file' because: $!; while ( $in ) { next if /^#/; chomp; my ( $key, @fields ) = split /\t/; print map $key\t$_\n, @fields; } __END__ John HI John, work perfectly thanks thanks Nat -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/