RE: understanding code

mcdavis941 Sat, 31 Jan 2004 06:33:40 -0800

"Nilay   Puri, Noida" <[EMAIL PROTECTED]> wrote:

>Can any one walk me thru this piece of code ::
>
>while(<STDIN>)
>{
>    chomp ;
>    $isbn =(split(/^_/, $_))[0] ;  --- not able to understand what is
>being accessed (......)[0]
>    unless ($KEYS{$isbn} )   ---- isbn is a scalar variable, how keys
>wok on it ?
>    {
>        print "$_\n" ;
>        $KEYS{$isbn} =1 ;
>    }
>}


I'm not sure what the intent of the code is, but I would guess you're parsing a set of 
lines from a file, each line containing an ISBN and some other data, to extract all 
the unique ISBNs from it, and print those numbers without repeating duplicates, as a 
side effect leaving you with a hash containing all those unique ISBNs.

It would be useful to see some of the data the code is intended to process.

Assuming that's the goal, you're not far from it.  The (......)[0] says "take the 
result of the split, which is a list, and get the 0th or first element from it".  This 
works because you can subscript a list the same way you can subscript an array 
variable.  In other words, the expression gets the first thing from the list that 
results from splitting, which is probably meant to be the first thing on the line, 
which is probably meant to be the ISBN.

The $KEYS{$isbn} expression is a hash access, getting from hash %KEYS the value 
associated with key $isbn.  You're right, 'keys' is a function which is used with 
hashes, but in this case KEYS is also a variable.

The split pattern seems unusual.  As written, it says "split the string at the 
beginning if the line starts with an underscore".  The caret character in the pattern 
will match on the start of the string; the string will consist of an entire line read 
from the file, put into the variable $_ by the read operation <STDIN>.  I've never 
tried to split on the beginning of the string, so let's write a test script that does 
that and see what happens.

testsplitBOL.pl
---------------
use warnings;
use strict;
my @result;
while(<STDIN>){
    chomp;
    print "Processing line: >$_<\n";
    @result = split( /^_/, $_);
    print "++Split resulted in ", scalar(@result), " items.\n";
    print "++First element of split is >", $result[0], "<.\n";
    }
    
result:
-------

D:\MCD\dvl\scripts>type testsplitBOL.txt
1234 some text
_5678 some other text

_9012_some_other_text_separated_by_underscores
_7654
0987

D:\MCD\dvl\scripts>type testsplitBOL.txt | perl testsplitBOL.pl
Processing line: >1234 some text<
  Split resulted in 1 items.
  First element of split is >1234 some text<.
Processing line: >_5678 some other text<
  Split resulted in 2 items.
  First element of split is ><.
Processing line: ><
  Split resulted in 0 items.
Use of uninitialized value in print at testsplitBOL.pl line 11, <STDIN> line 3.
  First element of split is ><.
Processing line: >_9012_some_other_text_separated_by_underscores<
  Split resulted in 2 items.
  First element of split is ><.
Processing line: >_7654<
  Split resulted in 2 items.
  First element of split is ><.
Processing line: >0987<
  Split resulted in 1 items.
  First element of split is >0987<.

>From these results we can see several things:

- Splitting on the beginning of the string, when successful, appears to give you an 
empty string as the first elem of the resulting list.  Your code would take this to be 
an ISBN and use it as a hash key, which is certainly not correct.
- When the split does not match its pattern, it yields a list consisting of a single 
element, the original string.
- The pattern in split only matches lines that begin with underscore.  Whether or not 
that's what you want depends on your data.
- The code should have a test to make sure the line is not just an empty string

Note that the Perl documentation for split says 

    A PATTERN of /^/ is treated as if it were /^/m, since it isn't much use otherwise
    
but that doesn't seem to apply here, both because your pattern is not /^/ (rather, it 
is /^_/) and because that doesn't seem to be what's happening in the test results.

I'd be glad to help you code up your loop, but we really need to see a sample of data 
to understand the task.  In any event, I think you want something like this:

use warnings;
use strict;

my %KEYS = ();
my $isbn;
my @result;

while(<STDIN>){
    chomp;
    if( /^_/ ){ # select only those lines to split: not empty and start with 
underscore, or whatever
        @result = split( /:/, $_); # split on whatever separates ISBN from what 
follows it on line
        if( scalar( @result ) > 1 ){ # make sure the split actually split something
            $isbn = $result[0]; # we assume the ISBN is first thing on the line
            unless( $KEYS{$isbn} ){ # make sure this ISBN hasn't already been printed 
before printing it
                print "$_\n";
                $KEYS{$isbn} = 1;
                }
            }
        else {
            die "Error processing line $.: $_ could not be split.\n";
            }
        }
    }

Or something like that.  You could make it more concise, but that's the basic idea.  
Show us your data!

    


__________________________________________________________________
New! Unlimited Netscape Internet Service.
Only $9.95 a month -- Sign up today at http://isp.netscape.com/register
Act now to get a personalized email address!

Netscape. Just the Net You Need.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: understanding code

Reply via email to