Re: Comparing arrays

Dan Brown Fri, 20 Apr 2001 09:07:43 -0700
Excellent description Collin.  I have just a couple of comments to add.

Collin Rogowski wrote:
> 
> A hash is a data structure, which assigns a key to value.
> In Perl the key is given in the curly braces. A key/value
> pair is entered like this: $hash{$key} = $value (assuming
> the variables $key and $value, hold the key and value
> respectivly).

Very true that a hash look up is much faster than an array.  One very
important thing to note about hashes is that a hash's key is unique (I
believe I've seen modules that let you get around this but this is the
built in behavior).  With this bit of information, you can use a hash to
weed out duplicates.  I believe that understanding this concept is key
to understanding this script.

For example, say I have this array:

    my @rray = ( 'table', 'stool', 'desk', 'table', 'stool', 'lamp',
'monitor' );

To weed out the duplicates, I can do any of the following (the results
are basically the same but just to show that there are indeed More Than
One Way To Do It):

For each example, assume that the variable %hash is an empty hash
initialized thus:

    my %hash = ();

1. Pretty standard

       foreach( @rray ) {
         $hash{$_} = 1
       }  

2. Slightly more "interesting"

       foreach( @rray ) {
         $hash{$_}++
       }  

3. Somewhat common

       map( $hash{$_}++, @rray )

4. Less common use of grep

       grep( $hash{$_}++, @rray )


For example 1 above, all the values will be 1.  In the other 3 examples
(2, 3, 4), the current value of each key is incremented by one.  These 3
ways keep track of the number of duplicates in case you need to know
that later.

The variable %hash ends up with 5 elements.  Here's a quick output of
the key/value pairs in %hash (using example 2, 3, or 4):

    Key: [lamp]     Value: [1]
    Key: [monitor]  Value: [1]
    Key: [table]    Value: [2]
    Key: [desk]     Value: [1]
    Key: [stool]    Value: [2]

You see, the @rray has 7 elements but 2 are duplicates.  There are 2
tables and 2 stools.  So when creating the hash, the first time the key
'table' was inserted the value of $hash{table} was incremented by 1 ( 0
+ 1 = 1 ).  The next time it came across 'table', there already exists
an entry $hash{table} so a new entry is not created but the value of
$hash{table} gets incremented again (1 + 1 = 2).

A VERY important thing to note about hashes that can save you trouble
later is that, unline an array, there's no guarantee what order the keys
are going to be accessed.  There is an internal algorithm used to make
accessing the elements in a hash very fast and part of the benefit for
speed is the loss of order.  Do not blindly depend on the order of keys
in a hash.

> now you go through the list with the new users. Maybe you should use
> chomp $item here as well. It depends on where you got the data from.
> (It can't hurt to use chomp, as it will only remove \n, and nothing
> else).

More precicely, chomp removes characters corresponding to the current
value of the global variable $/  Perl has some global variables (global
in the fullest sense, that is they mean the same thing in every
package).  I used one ($_) in my examples above.  To get more
information on these, try running

   perldoc perlvar

from a command line (on a machine in which perl is installed).

You can use this information to expand the use of chomp.  For example,
by setting $/ to "", when you read a file in and chomp it, chomp will
remove 2 or more consecutive blank lines.

As to your final question (why names in the original list got passed
on), it's hard to tell without sample data.  But I bet Collin's
suggestion about using chomp is right on the mark.

I hope this is clear.

Best wishes,

Dan
Re: Comparing arrays

Reply via email to