re: hashes

Paul Fri, 15 Jun 2001 08:45:35 -0700

--- Oliver Glass <[EMAIL PROTECTED]> wrote:
> Paul, thanks so much for all this information!

You're welcome. I'm glad somebody appreciated it. =o)
 
> As well as helping me learn your kindness really brightened up a
> dreary day at the office :)

lol -- worth the time, also.

> Well, I've been working on hashes for the last few weeks and have
> some more questions now. I've dropped them into our original
> correspondence.
> 
> Best wishes,
> Oliver!

I'll do what I can.

> --- Oliver Glass <[EMAIL PROTECTED]> wrote:
> > Could anyone reccomend a tutorial on the subject, preferably aimed
> > at a little higher than novice level?
> 
> I can blather, and folk can ask for clarification. That usually works
> pretty well, since other folk are pretty good at answering specific
> question! lol! Let me start at a little more basic level, though, and
> work up. It gives me more chance to put my foot in my mouth, and then
> when someone corrects me, *I* learn stuff!
> 
> A Hash is an associative array. Whereas "arrays" use square brackets
> to and numbers indicate an index,
> 
>   $array[1]
>  
> "hashes" use curly braces and strings.
> 
>   $hash{'this'}
> 
> ---
> what do the curly braces mean? are they used anywhere else in perl?

In this case, they indicate that you are referencing a cell in a hash,
much the way square brackets "mean" you are accessing a cell in a plain
array. Just as $array[4] accesses the 5th cell of @array, $hash{'foo'}
accesses the 'foo' cell of %hash. The cells are effectively named,
rather than numbered.

Similarly, curlies are an operator in Perl that return an anonymous
hash reference. (At that point, most people say "Huh??" the first time
or three they hear it, so if you did, don't feel bad.) 

A reference in Perl is a value that defines and points to something
else, like a pointer in C but smarter. References to most things are
taken with the backslash operator, like this:

  $ref = \%hash;

Now $ref is a simple scalar, but can be passed around efficiently and
used to get to the stuff in %hash (c.f. perldoc perlref). Using
curlies, however, you can create a hash that never gets a name, and
they pass back a reference to it that you can use the same way.

  $ref = { a => 1, b => 2, c => 3, };

Now $ref is a reference to the hash, so to get to the elements you use
a pointer dereferance, like in C:

  $ref->{b} # returns 2

Curly braces are in fact used in a *lot* of other places in Perl, some
of them more obvious than others. The most immediate use is to create
the scoping blocks of if(){}, while(){}, and friends. Perl's if
statement is a little different from languages like C and Java; in C,
this would be fine:

  if (greet) printf "hi\n";

Notice that there are no braces. In Perl you must either add braces,

  if ($greet) { print "hi\n"; }

or make the condition a postfix, putting it after the operation it
qualifies:

 print "hi\n" if $greet;

Note that in this case, you can even leave off the parentheses.
This syntax also works for while(), foreach(), and of course unless()
and until()....

Braces for scoping do not have to be associated with an if() ir while()
type construct, however. Any time you want a new, limited scope, open a
brace. This is used for lots of things.

One common use is to "slurp" a file using a locaized record seperator.

 open FILE, $file or die $!;
 my $content;
 { local $/ = undef;  # CREATE A SCOPE TO LIMIT THIS LOCAL()
   $content = <FILE>; # no $/ means this grabs the entire file!
 } # scope closes, local() restores $/
 close FILE;

This is efficient, if not terribly readable to the neophyte.
In general, I don't recommend doing this until you can explain it
thoroughly to someone else. As an aside, I recommend the much more
readable but much less efficient code here:
  open FILE, $file or die $!;
  my $content = join '', <FILE>;
  close FILE;
Okay, back to braces. =o)

Another example of creating a block with braces is for a loop
structure. Consider this:

  LOOP1: { # this just creates a label and opens a block.
    # code . . .
    last if $cond1;
    redo LOOP1 if $cond2;
  }

Nothing you couldn't do with an until(), but it does provide a good
example, and I have used this structure for elaborate condition sets.

> ---
> 
> Unlike normal arrays, all hash keys must be unique, because (as I
> understand it) a "hashing" algorithm is used on the key value to
> decide where it goes, so that when retrieving the value it can go
> through the same algorithm to go straight to the value, rather than
> having to walk through them individually and do string compares. In
> general, hashes are a little slower than arrays (I've heard 15-20%),
> and take up more memory -- they allocate memory in blocks, and add
> more when the hash is half full (?), so they tend to take up about
> four times as much space as the data in them would in an array. They
> are wonderfully flexible, however....
> 
> The elements of a hash must be scalars (DEFINITELY correct me on this
> if I misrepresent!), 
> 
> ----
> this is correct, i checked in perldsc - data structures cookbook

Thanks. =o)

> ----
> 
> but since references are scalars, a reference may
> be stored, and the hash lookup mechanism will follow such linkages,
> allowing complex and flexible multilevel structures. This is
> basically the same way that multidimensional arrays are done, and as
> a result you can create structures which include hashes of arrays of
> hashes of values, if you want to get that fancy. Such a structure
> would be accessed like this:
> 
>   $top{'key1'}[0]{'key2'} = $val;
> 
> Every indirection must be traced, however, so depending on your
> processing needs, it's sometimes more efficient to compile the keys
> and make one hash out of it:
> 
>   $top{'key1'.0.'key2'} = $val; # has been known to save me a LOT
>                                 # of time, memory, and headache
> 
> ----
> can i create structures in perl, like in c? 
> do i get this functionality with objects?

It depends on what you mean. (and here begins a long digression, but
someone might appreciate it. ;o)

Objects give you the *access* functionality of C structs, with a lot
more flexibility, but you can't just read a line of code into a generic
object and have all the fields in place the way you can in C. You can,
however, put a simple method on the object that parses the record for
you. Let's look at a simple example, shall we? Assume we have an object
module Foo (c.f. perldoc for perlreftut, perlref, perldsc, perlboot,
perltoot, perlmod, etc.) that is intended to read a file. If the file
layout matches the following C struct:

  struct {
    char name[20];
    char addr[35];
    char age[2];
  }

then our object module could have private variables like this:

  my $layout = "a20a35a2"; # c.f. perldoc -f pack
  my @fields = qw / name addr age /;

and then a method (let's call it readLine) to read and parse the
record. Assuming we used a hash reference to create the object (that's
the most common thing), and that the open filehandle is already saved
on the object as $obj->{_FH}, we could build readLine like this:

  sub readLine {
    my $obj = shift; # object passed as first arg
    @$obj{@fields} = unpack $layout, <$obj->{_FH}>;    
  }

Again, list, please check my syntax, but to elaborate:

$obj is a reference to the object, which is a bless()'d anonymous hash
(c.f. perldoc -f bless). By dereferencing it with the additional symbol
specifying what sort of data we're looking for (in this case, an
array), we can access it normally instead of using the arrow-pointer
syntax. For a clearer example, the two accesses below would be
equivalent:

  $obj->{a} eq $$obj{a}

Since we're handling several fields at once, we treat them like an
array, and use @ instead. Thus @$obj{@fields} is the list of cells in
the anonymous array referenced by $obj that are named by the values of
the array @fields, i.e., $$obj{name}, $$obj{addr}, and $$obj{age}.

We reference them all in a group this way so that we can assign to them
from a list as returned by unpack. unpack() parses the record passed to
it by the <> input operator, which reads a line from the filehandle in
$obj->{_FH} and returns it to unpack() for processing. unpack() splits
the line into strings as specified by the controls in $layout, i.e.,
the 
20 character name field, the 35 character address field, and the 2
character age field. As a side effect, the value of this assignment is
returned as the last operation in the method, so the fields are
returned as a list in a list context, but the last element gets
returned in a scalar context. (As always, ANYONE feel free to stick my
foot in my mouth on these. =o)  The end result is that you can say:

  while($obj->readLine) {
     print $obj->{addr} if $obj->{name} =~ /Joe/;
  }

When the file ends, age will be empty, and the while() exits.

> ----
> 
> Hashes can also be used for complex access. When referring to an
> entire hash, we use %, like this:
> 
>   print %hash;
> 
> The symbol used indicates the type of the final value, so in the
> examples above, a scalar was being returned. It is possible, however,
> to return a list by using the array character @. I have a module for
> EBCDIC to ASCII conversions, which uses a hash table lookup; for the
> reverse conversions, I just flipped and copied the table, like this:
> 
>  @ASCEBC{values %EBCASC} = keys %EBCASC; # flip for the compliment
> 
> What this does is assigns to %ASCEBC in a particular order, as
> specified by the argument "{values %EBCASC}"
> 
> ---
> how are you setting the order here?

That was the part that made me nervous to start with. =o)
THe thing is, hash orders aren't at all random. It's a very predictable
and highly efficient algorithm that will always produce the same
results; it's just not an order that the human mind easily recognizes,
because it's a methematical abstract.

In this example, however, the only relevancy of the order is that they
be the *same*. As long as the correct key/value pairs are matched,
everything else will fall into place. I niether know nor care which
pair gets assigned first, so long as the proper values get placed with
the proper keys. Since the keys operator iterates according to a
standard procedure, it grabs the right pairs for me, so the rsult table
is, in fact, a complimentary match for the source table. Obviously,
however, this is the sort of thing that you want to check carefully
before putting into production.

> ---
> 
> . the @ on @ASCEBC tells
> perl I'm assigning an entire list at once, so what this statement
> does is assigns to the values specified (as keys in %ASCEBC) from
keys
> %EBCASC.
> 
> ---
> i find this confusing... so you're saying if i have a hash
> my %perl;
> my %hash;
> with some keys and values
> %perl = (
>         'intrest' => 'high',
>         'knowledge' => 'sketchy'
> );
> i can assign to it as an array
> @hash{values %perl} = keys %perl;
> resulting in - if i print %hash:
> sketchyknowledgehighintrest

yup...

> could i do the same assignment with
> %hash = %perl;

nope. This just makes a copy of the hash, while the above switches the
keys and values.

> also, why when i print %perl do i get
> intresthighknowledgesketchy
> something to do with the order / hashing algorithm?

Exactly. %hash = %perl makes %hash an exact copy of %perl, but
 @hash{values %perl} = keys %perl;
does several things differently. The main thing it does is makes the
VALUES of %perl become the KEYS of %hash, with the matching KEYS of
%perl becoming the VALUES associated in %hash. So, $perl{interest}
would be 'high', while $hash{high} would be 'interest'. (oops -- sorry,
I changed the spelling on you, but you get the point). 

The other main difference is that while %hash = %perl makes an exact
copy, blasting anything that might have been in %hash before, the other
method will only write to the specific cells of %hash indexed by some
value in %perl, so that if there were already a 'high' in %hash it
would be overwritten, but if %hash{cool} had the value 'beans' before
the assignment, it would remain afterwards.

> ---
> 
> Note that order for hashes, while predictable, is not intuitive; it's
> based on the hashing algorith, ans seems quite random. The above code
> works because the algorithm is constant, but don't think of it as
> sorted. If the order matters, either don't use a hash, or order them
> yourself somehow (which is a topic for another discussion, though
> maybe not here. =o)
> 
> If one takes a reference to a hash, like this:
> 
>   $rh = \%hash;
> 
> then one can access it in several ways:
> 
>   $$rh{foo};  # same as $hash{foo}
>   $rh->{foo}; # again, same as $hash{foo}
>   keys %$rh   # same as keys %hash
> 
> ---
> how do i create a new hash using this reference to populate it?
> i imagined it would be done as:
> my \%hash = $rh;
> i.e. create a %hash, set its \reference to $rh
> but this doesn't seem to work.

To make a *new* hash, you need to *dereference* the old one, like this:

 my %newhash = %$rh;

This makes a new hash and then copies into it the values from the hash
referenced by $rh. Sometimes it's better to use curlies here, too, like
%{$rh}, but often that's more a matter of taste. If it tastes better to
you that way, use'em. =o)

> ---
> 
> This is commonly used as the mthod for objects, though it's best to
> use accessors for the data, rather than fiddling with object
> properties directly.
> 
> The constructor for an anonymous hash is curly braces, which return a
> reference to the data they contain:
> 
>   $rh = { a => 1, b => 2 };
> 
> Now, to print 2, say:
> 
>   print $rh->{b};
>   
> ---
> why is the arrow being used here? what does it mean? where else is it
> used?

The arrow means the thing on it's left is a reference, and it's
dereferencing it for you. THink of structs and struct pointers in c. If
sp is a pointer to struct str, then sp->x is the same field as str.x,
right? The arrow simplifies the access. To get to the same value as in
the hashref arrow example above with a more congested syntax, you could
say 
   print $$rh{b};

instead. That's not too bad here, but in some cases it gets hairy. The
arrowe is usually cleaner. Also, I can't think of anything else it's
used for in perl, so when you see it, it probably means it's
dereferencing a reference. =o)

> ---
> 
> A common mistake is to try using this constructor incorrectly, like
> this:
> 
>   %hash = { a => 1, b => 2 };
> 
> This should give you an warning if you enable -w, saying the hash
> assignment has only one element. You gave the hash a key with no
> value.
> What was the key? the reference to the data that the {} returned. It
> will *not* give you *any* clue as to what you have done wrong if you
> don't use -w!
> 
> ---
> the right way to do this is with () instead of {}, right? i.e.
> %hash = (a => 1, b => 2);

Exactly! You're getting it.

> ---
> 
> Ok, That should have been plenty of opportunity for me to stick my
> foot in my mouth. =o)
> 
> Corrections?
> Questions?
> 
> ---
> could you please give some examples of uses of hashes?
> 
> i've used them in my own code to hold configurations loaded from a
> config file, like %config{'verbose'} = 0 etc.
> 
> what else are they good for? for what reasons would we choose them
> over conventional arrays?

Oh, my. <blows out a long breath....>

Suppose you're reading a big datafile, but you only want records where
a certain field in the record layout is one of these three hundred
values. When you find one, you want to go look up the charge associated
with that particular type of service, and total up how much each of
those services is making for you, and total them as a whole. There are
50,000 possibly entry types in that field, and the file is three
million records. How will you code it?

You probably don't want arrays here. =o)

Instead, you build a hash table of the values you want to accumulate,
assigning the charges as the values (assuming none are free). We'll
call the table something reasonable like %charge for this example.
Let's also assume we've built an object that knows the fields of the
file, like the one we built above. The field we want if called
'service', and we have a method called service that returns that field
(because it's naughty to access object properties directly. ;o)

  # initializations, etc.... then:
  my($grand,%tot);        # places for accumulations
  while($obj->readLine) { # reads and parses the next record
     my $service = $obj->service; # save it to avoid overhead later
     next unless $charge{$service}; # skips irrelevant records
     $tot{$service} += $charge{$service}; # keep per-service totals
  }
  
  # now sum the sums
  for (keys %charge) { # the services we want reported
     unless ($tot{$_}) {
         warn "No revenue from $_!\n";
         next;
     }
     print "$_:\t$tot{$_}\n"; # print revenue per service
     $grand += $tot{$_};      # accumulate for a grand total
  }
  print "Grand total:\t$grand\n";
  

That give you a good example?
Try to do *that* with arrays. =o)
You're program will run forever. ;o]

> ---


=====
print "Just another Perl Hacker\n"; # edited for readability =o)
=============================================================
Real friends are those whom, when you inconvenience them, are bothered less by it than 
you are. -- me. =o) 
=============================================================
"There are trivial truths and there are great Truths.
 The opposite of a trival truth is obviously false.
 The opposite of a great Truth is also true."  -- Neils Bohr

__________________________________________________
Do You Yahoo!?
Spot the hottest trends in music, movies, and more.
http://buzz.yahoo.com/
re: hashes

Reply via email to