Re: Extract text from file

2005-12-07 Thread Shawn Corey

Andrej Kastrin wrote:
I try your code; now I try to write each potential target in a record 
tab separated, like:

TI- xx HD x  AB- xxx HD x   #record 1
TI- yy AB x  AB- xxx AB x#record 2
etc...

So \t separated within record and \n separated  between records. I try with

for my $term (@terms) {
   if (/$term/) {
  print  "$_\t";


s/\n/\t/g;
print "$_\n";



but it's not OK.



You have to do a substitution.


--

Just my 0.0002 million dollars worth,
   --- Shawn

"Probability is now one. Any problems that are left are your own."
   SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Extract text from file

2005-12-07 Thread Shawn Corey

Andrej Kastrin wrote:
and additional question: how does Perl know where are the input files 
(while we only wrote: open (TERM, $file_terms) or die "Can't open...)?


It comes from the command line. The statements:

  my $file_terms = shift;
  my $file_medline = shift;

are a shorten version of:

  my $file_terms = shift @ARGV;
  my $file_medline = shift @ARGV;

When a Perl script starts, the command line arguments are placed in the 
special array @ARGV. See `perldoc -f shift` and also see `perldoc 
perlvar` and search for ARGV.


For each field:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_terms) or die "Can't open $file_terms: $!\n"; #open 
list of terms
open (MEDL, $file_medline) or die "Can't open $file_medline: $!\n"; 
#open records file


chomp( my @terms =  );

while( my $line =  ){
  print $line if grep { $line =~ /\b$_\b/ } @terms;
}

__END__


--

Just my 0.0002 million dollars worth,
   --- Shawn

"Probability is now one. Any problems that are left are your own."
   SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Extract text from file

2005-12-07 Thread Andrej Kastrin

Chris Charley wrote:



- Original Message - From: "Andrej Kastrin" 
<[EMAIL PROTECTED]>

Newsgroups: perl.beginners
To: "Perl Beginners List" 
Sent: Wednesday, December 07, 2005 12:00 PM
Subject: Extract text from file



Hello dears,

I have a file in row data format, which stores different terms (e.g. 
genes) and look like:


ABH
HD
HDD
etc.


Then I have second file which looks like:
--
ID-  001 #ID number
TI-   analysis of HD patients. #title of article
AB- The present article deals with HD patients. #abstract

ID-  002 #ID number
TI-   In reply to analysis of HD patients. #title of article
AB- The present article deals with HDD patients. #abstract
--
etc., where the separator between records is blank line.

Now I have to extract  those ID, TI and AB fields from the second 
file, which involves any term in the first file.


Colleague from BioPerl mailing list helps me with the following code:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_term) or die "Can't open TERM"; #open list of terms
open (MEDL, $file_medline) or die "Can't open MEDL"; #open records file

my @terms = ;



chomp(my @terms = );



while (my ($pmid, $ti, $ab) = split ) {



This line doesn't work. split takes the form: split /PATTERN/,EXPR
Even if you had split stated correctly, it will not give you $pmid, 
$ti, $ab in this program.

See: perldoc -f split


for my $term (@terms) {
if (/$term/ for ($pmid, $ti, $ab)) {



You can't use a 'for' loop as an expression for an if statement.


print "$pmid\t$ti\t$ab";
}
}
}

I'm little confused now, while above example doesn't work and I don't 
know why (compilation error in 15th and 19th line).

I'm still learning...

Thanks for any suggestion, Andrej



I think the program below will give the results you want. Also, it 
leaves the second file, $file_medline, in its original format when 
printed out. Don't know if you really want to have the output lines 
tab separated as in your output.


#!/usr/bin/perl
use strict;
use warnings;

open TERM, "o33.txt" or die $!;
chomp(my @terms = );
close TERM or die $!;

open MEDL, "o44.txt" or die $!;

{# enclose these statements in a block so that change to $/ is 
confined to these statements

local $/ = "\n\n";# set input record separator to 1 'blank line'
while () {
 for my $term (@terms) {
if (/$term/) {
   print;
   last;  # get out of 'for' loop when the first term is 
found - no need to check the rest

}
}
}
}
close MEDL or die $!;


Chris


I try your code; now I try to write each potential target in a record 
tab separated, like:

TI- xx HD x  AB- xxx HD x   #record 1
TI- yy AB x  AB- xxx AB x#record 2
etc...

So \t separated within record and \n separated  between records. I try with

for my $term (@terms) {
   if (/$term/) {
  print  "$_\t";

but it's not OK.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: Extract text from file

2005-12-07 Thread Andrej Kastrin

Shawn Corey wrote:


Andrej Kastrin wrote:


Hello dears,

I have a file in row data format, which stores different terms (e.g. 
genes) and look like:


ABH
HD
HDD
etc.


Then I have second file which looks like:
--
ID-  001 #ID number
TI-   analysis of HD patients. #title of article
AB- The present article deals with HD patients. #abstract

ID-  002 #ID number
TI-   In reply to analysis of HD patients. #title of article
AB- The present article deals with HDD patients. #abstract
--
etc., where the separator between records is blank line.

Now I have to extract  those ID, TI and AB fields from the second 
file, which involves any term in the first file.


Colleague from BioPerl mailing list helps me with the following code:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_term) or die "Can't open TERM"; #open list of terms
open (MEDL, $file_medline) or die "Can't open MEDL"; #open records file

my @terms = ;

while (my ($pmid, $ti, $ab) = split ) {
for my $term (@terms) {
if (/$term/ for ($pmid, $ti, $ab)) {
print "$pmid\t$ti\t$ab";
}
}
}  

I'm little confused now, while above example doesn't work and I don't 
know why (compilation error in 15th and 19th line).

I'm still learning...



So aren't the folks at BioPerl.

Question: Do you want to extract just the fields or the full record if 
a field contain terms from file 1? The following will print the entire 
record.


#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_terms) or die "Can't open $file_terms: $!"; #open 
list of terms
open (MEDL, $file_medline) or die "Can't open $file_medline: $!"; 
#open records file


chomp( my @terms =  );

{
  local $/ = "\n\n";

  while( my $record =  ){
print $record if grep { $record =~ /\b$_\b/ } @terms;
  }
}

__END__




Answer to your question: just the fields;

and additional question: how does Perl know where are the input files 
(while we only wrote: open (TERM, $file_terms) or die "Can't open...)?


Cheers, Adnrej

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Extract text from file

2005-12-07 Thread Shawn Corey

Andrej Kastrin wrote:

Hello dears,

I have a file in row data format, which stores different terms (e.g. 
genes) and look like:


ABH
HD
HDD
etc.


Then I have second file which looks like:
--
ID-  001 #ID number
TI-   analysis of HD patients. #title of article
AB- The present article deals with HD patients. #abstract

ID-  002 #ID number
TI-   In reply to analysis of HD patients. #title of article
AB- The present article deals with HDD patients. #abstract
--
etc., where the separator between records is blank line.

Now I have to extract  those ID, TI and AB fields from the second file, 
which involves any term in the first file.


Colleague from BioPerl mailing list helps me with the following code:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_term) or die "Can't open TERM"; #open list of terms
open (MEDL, $file_medline) or die "Can't open MEDL"; #open records file

my @terms = ;

while (my ($pmid, $ti, $ab) = split ) {
for my $term (@terms) {
if (/$term/ for ($pmid, $ti, $ab)) {
print "$pmid\t$ti\t$ab";
}
}
}   



I'm little confused now, while above example doesn't work and I don't 
know why (compilation error in 15th and 19th line).

I'm still learning...


So aren't the folks at BioPerl.

Question: Do you want to extract just the fields or the full record if a 
field contain terms from file 1? The following will print the entire record.


#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_terms) or die "Can't open $file_terms: $!"; #open list 
of terms
open (MEDL, $file_medline) or die "Can't open $file_medline: $!"; #open 
records file


chomp( my @terms =  );

{
  local $/ = "\n\n";

  while( my $record =  ){
print $record if grep { $record =~ /\b$_\b/ } @terms;
  }
}

__END__



--

Just my 0.0002 million dollars worth,
   --- Shawn

"Probability is now one. Any problems that are left are your own."
   SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Extract text from file

2005-12-07 Thread Chris Charley


- Original Message - 
From: "Andrej Kastrin" <[EMAIL PROTECTED]>

Newsgroups: perl.beginners
To: "Perl Beginners List" 
Sent: Wednesday, December 07, 2005 12:00 PM
Subject: Extract text from file



Hello dears,

I have a file in row data format, which stores different terms (e.g. 
genes) and look like:


ABH
HD
HDD
etc.


Then I have second file which looks like:
--
ID-  001 #ID number
TI-   analysis of HD patients. #title of article
AB- The present article deals with HD patients. #abstract

ID-  002 #ID number
TI-   In reply to analysis of HD patients. #title of article
AB- The present article deals with HDD patients. #abstract
--
etc., where the separator between records is blank line.

Now I have to extract  those ID, TI and AB fields from the second file, 
which involves any term in the first file.


Colleague from BioPerl mailing list helps me with the following code:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_term) or die "Can't open TERM"; #open list of terms
open (MEDL, $file_medline) or die "Can't open MEDL"; #open records file

my @terms = ;


chomp(my @terms = );



while (my ($pmid, $ti, $ab) = split ) {


This line doesn't work. split takes the form: split /PATTERN/,EXPR
Even if you had split stated correctly, it will not give you $pmid, $ti, $ab 
in this program.

See: perldoc -f split


for my $term (@terms) {
if (/$term/ for ($pmid, $ti, $ab)) {


You can't use a 'for' loop as an expression for an if statement.


print "$pmid\t$ti\t$ab";
}
}
}

I'm little confused now, while above example doesn't work and I don't know 
why (compilation error in 15th and 19th line).

I'm still learning...

Thanks for any suggestion, Andrej


I think the program below will give the results you want. Also, it leaves 
the second file, $file_medline, in its original format when printed out. 
Don't know if you really want to have the output lines tab separated as in 
your output.


#!/usr/bin/perl
use strict;
use warnings;

open TERM, "o33.txt" or die $!;
chomp(my @terms = );
close TERM or die $!;

open MEDL, "o44.txt" or die $!;

{# enclose these statements in a block so that change to $/ is confined 
to these statements

local $/ = "\n\n";# set input record separator to 1 'blank line'
while () {
 for my $term (@terms) {
if (/$term/) {
   print;
   last;  # get out of 'for' loop when the first term is 
found - no need to check the rest

}
}
}
}
close MEDL or die $!;


Chris 




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Extract text from file

2005-12-07 Thread Andrej Kastrin

Hello dears,

I have a file in row data format, which stores different terms (e.g. 
genes) and look like:


ABH
HD
HDD
etc.


Then I have second file which looks like:
--
ID-  001 #ID number
TI-   analysis of HD patients. #title of article
AB- The present article deals with HD patients. #abstract

ID-  002 #ID number
TI-   In reply to analysis of HD patients. #title of article
AB- The present article deals with HDD patients. #abstract
--
etc., where the separator between records is blank line.

Now I have to extract  those ID, TI and AB fields from the second file, 
which involves any term in the first file.


Colleague from BioPerl mailing list helps me with the following code:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_term) or die "Can't open TERM"; #open list of terms
open (MEDL, $file_medline) or die "Can't open MEDL"; #open records file

my @terms = ;

while (my ($pmid, $ti, $ab) = split ) {
for my $term (@terms) {
if (/$term/ for ($pmid, $ti, $ab)) {
print "$pmid\t$ti\t$ab";
}
}
}   


I'm little confused now, while above example doesn't work and I don't know why 
(compilation error in 15th and 19th line).
I'm still learning...

Thanks for any suggestion, Andrej


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]