On Wed, Jan 04, 2006 at 05:44:04PM +0100, Axel Liljencrantz wrote:
> Sorry for not partaking in this discussion sooner...

> On 1/3/06, Netocrat <[EMAIL PROTECTED]> wrote:

> > John Brock wrote:

[...]

> > > might reasonably do with an array -- such as deleting a single
> > > element -- would totally redefine the hash lookup.  But as the joke
> > > goes, if it hurts when you do that, then don't do that!
> > >
> > > I'm less optimistic about one thing though.  Originally I thought
> > > that the hash lookup could at first be implemented by just searching
> > > an array from beginning to end for a key (which would be a fast
> > > and easy thing to do), and then at some later date something more
> > > sophisticated could be done by attaching indexes to a variable the
> > > first time it was used as a hash, without changing the semantics
> > > (i.e., the array values would not be scrambled when you used a
> > > variable as a hash).  I assumed there was some way to do this was
> > > at least reasonably efficient, even if you didn't end up with the
> > > the most highly optimized hashes on the market.  But I'm not so
> > > sure any more, and I suspect that if you want even moderately
> > > efficient hashes you are going to have to accept that using a
> > > variable as a hash scrambles the array in some way.  Logically this
> > > isn't necessary, but in practice it may be, and it detracts from
> > > the prettiness of the scheme.

> > I noticed something similar but didn't want to focus on it in my last
> > response.  Assuming that an index is used, and that the structure is
> > initially treated as a hash, then when mapping from hash to array, how
> > is order determined?  Alphabetically?  By time-of-insertion?  By
> > internal index order?  The answers can be specified, but they're not
> > necessarily going to be intuitive.

> If more speed is desired, then the order must be either arbitrary(hash
> algorithm) or Alphabetically(Binary search algorithm). The latter should be
> a reasonable tradeof between inuitivity and speed. It should be possible to
> use a redblack tree, giving logarithmic performance on all operations. But
> the initial implementation would probably use time-of-insertion since that
> is the easiest for a naive implementation.

My original idea was that you keep the original array structure
intact always, and then at some point optimize hash lookups by
adding an index to variables when they are first used as hashes,
pointing to the array positions of all keys.  This is simple, and
it works fine for everything but deletes, because when you delete
hash elements (as opposed to adding or modifying) the entire index
may need to be recreated, which is expensive.

My main concern was whether you can add reasonably efficient lookup
opimization which *completely* preserves the original naive "go to
the beginning of the array and look for the key" semantics for hash
key lookups.  I'm not a hash guru, so maybe it's possible, but it's
not clear to me how.  Of course you don't have to promise that the
original semantics will be preserved, which would solve that problem
at, I think, a fairly minor cost, since most users probably won't
be using the same variable both ways anyway.  But it would be
conceptually nice if the naive semantics could be preserved.

> > I still think it's a good idea
> > > though; it gives you all the power of Perl hashes, with only one
> > > variable type and namespace.
> > >
> > > Or almost all the power.  Does fish support sparse arrays?  I.e.,
> > > can you:
> > >
> > > set a[1] Hello
> > > set a[1000] Goodbye
> > >
> > > and then use a[500] and a[1500] (returning the value "")?

> > Did you try it?
> >
> > Yes, it does, but the missing empty elements are space-separated on
> > expansion.  Last I looked at the source, fish would internally be
> > storing 999 ARRAY_SEP characters for the missing elements.

> The 'missing' elements become empty elements, i.e. each one of them contains
> the empty string.

I think the empty string is the proper default.  There are a couple
of other ideas I had intended to suggest for fish, and I might as
well make another one of them here.

I used to work with REXX, which was a very nice language to work
with, even though it had only hashes, and no true arrays.  One nice
feature was that you could give a hash a default value, say "foo",
so that any hash element which had not been explicitly set would
come back as "foo".  This was remarkable useful, and the way I
would implement it in fish would look like this:

set $a[!] foo

Note that I am not setting a hash element (that would be $a{!}),
but rather an array element with a non-numeric index.  This is a
way of assigning meta-data to a variable (you would not see "foo"
if you said "echo $a"), and could perhaps be used for other sorts
of meta-data as well.  After the above statement, the following
statements:

set $a[1] Hello
set $a[1000] Goodbye

would create 998 values of "foo" between the two elements of $a
which had been explicitly set.  In addition, if someone referenced
$a[2000] (or any element beyond the highest which had been set),
the result would also come back as "foo".  And of course you can
also stipulate that all unassigned hashes will also come back as
"foo".  One nice thing about this idea is that it adds *no* new
syntax to the language, it just adds a limited set of non-numeric
strings which are acceptable as array indexes!

Another point is that the 998 values between $a[1] and $a[1000]
don't have to actually be filled in.  If you had a true sparse
array then $a[500] could return the default "foo" for the same
reason that $a[2000] does -- because it hasn't actually been set.
This introduces a subtlety though: an array element which was
explicitly set to "foo" would behave differently than an unset
element if the default was changed.  In a way this sneaks in
something like Perl's "undefined" values, which may be more trouble
than it is worth.

> >>[...]
> > >>
> > >>>>The advantage to all this is that you don't have to introduce a
> > >>>>new data type for hashes,
> > >
> > >>PHP supports a similar data type to that previously discussed on this
> > >>list - an array that may be indexed traditionally (numeric keys) or as a
> > >>hash aka map (string keys) using the same indexing operators [].  This
> > >>seems to be compatible with the model John suggests for the hash
> > >>indexing operators {}.

> > > But what if you want to use "1" as a hash key?  Either you need

> > String "1" is equivalent to integer 1 (in a shell the difference is not
> > very pronounced anyhow); both are a key that maps to a value.  If
> > treating the hash as an array, conceptually that key is the first
> > element, but practically (internal representation) it could be located
> > anywhere.  Where's the problem?

> The problem that I can see is all the potential inconsistency when silently
> moving from array to map.
> 
> >#create array
> >set arr foo bar baz
> >echo $arr
> foo bar baz
> >#Use it as a map
> >set arr[smurf] blue
> >echo $arr
> 1 2 3 smurf
> 
> Not 100% intuitive, IMO.

Wow!  This example baffles me, and I have to admit that I must not
really understand Netocrat's suggestion at all.

My feeling is that people usually expect variables to exhibit either
pure array behavior or pure hash behavior, and to have some mix
would be very confusing.  For example:

set $a[1] foo
set $a[2] bar
set $a[3] baz

If you are setting array elements you expect one kind of behavior,
e.g., that if you delete $a[1] then $a[3] becomes undefined, while
$a[1] and $a[2] still exist, but with different values.

OTOH, if you were setting hash elements you would expect a different
behavior: deleting $a[1] would make $a[1] undefined, while $a[2]
and $a[3] still both exist, with their values unchanged.

The virtue of my suggestion, as I see it, is that at the cost of
just a little new syntax ($a{1}, or $a[[1]], or @a[1], or whatever),
you get to use the same variable type as either a pure array or a
pure hash.  You do get some funny interactions if you treat the
same variable as both, but then you are never required to do that.

> > some kind of new systax, or you have to accept that some strings
> > > cannot be used as hash keys.  I think the former is preferable.

John Brock
[EMAIL PROTECTED]


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Fish-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fish-users

Reply via email to