Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-27 Thread Tom Hughes

In message a05111b09b960bd6a030f@[63.120.19.221]
  Dan Sugalski [EMAIL PROTECTED] wrote:

 Keys are either constant key structs, constant integers, string
 registers, or integer registers. Encoding shouldn't be any different
 than any other constant or register. Jeff's got an opcode function
 naming scheme--I've not browsed back far enough in the discussion to
 see if it's considered insufficient or not.

The problem isn't really the encoding in the sense of the bytecode
encoding - as you say that is no different to any other constant or
register.

The problem is the opcode names. I don't know where Jeff's scheme
is documented, but none of the information I can find in the docs
seems to provide a workable scheme for keyed ops.

Given your description of the valid key types I would suggest that
the key arguments be encoded in opcode names as follows:

   constant key struct   k
   constant integer  kic
   string register   ks
   integer register  ki

Also, am I right in believing that when you talk about a string register
being used as a key you mean that the register will be assumed to be a
pointer to a KEY instead of a pointer to a string, and that this will be
used to handle the case where a key has been build dynamically with the
key ops?

If I am right about that then it means that it is impossible to write
an instruction that does a hash lookup from a string register, as this:

   set S1, P0[S0]

Will be assumed to mean that S0 is a key. Instead you would either have
to put the string into a PMC first, like this:

   set P1, S0
   set S1, P0[P1]

Which could be handled by building a constant key struct containing a
single element. The other option would be to build a key dynamically
and then use that to do the access.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-24 Thread Mike Lambert

 This would only automate the generation of large amounts of code, not get rid
 of the large amount of code being generated. Once again, my complaint here is that
 the L2 cache would buckle under the weight of a dozen PMCs each defining a few dozen
 recursive accessors. The performance gain of making the code smaller is worth
 taking two iterations through the main interpreter.
 Stepping back a bit, my suggestion would be a more general: factor out the recursive
 fetch from the operation. Keeping that as a single PVM op, the PVM could use the
 vtable API to fetch the burried PMC/int/num/string, then fall through to the logic
 that performs operations on immediately available values. Having two seperate VM
 ops isn't strictly required to factor this out. Restating, anytime you massive
 cut and paste code, it is indication of trouble ;)

I think the main problem is that we aren't guaranteed of recursion. Take a
look at Damian's Regex::Common to see what he does. Since Perl5's TIEd
interface only supports single-level, you have to jump through hoops to
perform multi-level tieing.

Likewise, matrices and other explicitly multidimensional arrays don't want
to recurse either. By storing the 'get the value for this key' in the PMC,
we allow each PMC to DTRT.

And if I can clarifyyou're concerned about the code bloat occuring
from PMC methods, or opcodes? It looks like you're referring to PMCs, but
I just wanted to be sure.

I agree there's a lot of c+p going on in the PMC keyed method code. I
personally agree there. The various get_XXX_keyed and get_XXX_keyed_int
methods have a lot of duplication that doesn't need to be there. But
that's not a strike against keys. We could change these to return a
KEY_ATOM, and let the caller deal with the results. Or we could make all
the keyed functions be stubs which merely called get_bignum on the PMC
they retrieve from the aggregate, somewhat like:

STRING* get_string_keyed (KEY * key) {
PMC *value = get_pmc_keyed(key);
if(key-next != NULL)
return value-vtable-get_string_keyed(INTERP, value,
key-next);
else
return value-vtable-get_string(INTERP, value);
}

That should certainly cut down on the code bloat, at the expense of more
recursion.

  variable *contents* requires changing any key structure. This:
 
  for my $i (1..1000) {
for my $j (1..1000) {
  for my $k (1..1000) {
foo[$i;$j;$k]++'
  }
}
  }
 
  requires the construction of exactly *one* key structure, and
  requires that it be updated exactly *once*, through the entire loop.
 
  PMCs don't move. Keys can cache pointers to them. This isn't a problem.

 The definition I have for KEY * here is a linked list. However, if you speak
 truth saying nested access only generating then reusing one key, thats awesome,
 and the allocation overhead is a total win.
 I'm sorry I missed that gem looking at the source. The rough edges do
 obscure the gems. If I can't find this gem at all, I'll be looking for pointers
 on briniging it to light.

Yes, it is a linked list.  Currently, we don't have multidim key support,
so no one sets up a linked list, and our keyed ops merely create
single-element linked lists.

Each KEY_ATOM in the KEY structure can reference a PMC. In the above
triple-for loop, there are only three PMCs, $i, $j, and $k. These PMCs
have a constant address in that function. It means we can create some PASM
that looks like: (note, new_key and add_key_ref don't currently exist ;)
new P0, .PerlInt #i
new P1, .PerlInt #j
new P2, .PerlInt #k
new_key S1
add_key_ref S1, P0
add_key_ref S1, P1
add_key_ref S1, P2

Now we can run the loop, using the key in S1 (or whever it's stored..might
not be Sn registers). Since we're merely incrementing the values in P0,
P1, and P2 each time through the loop, the key that references these PMCs
is valid throughout the life of that triple-nested loop.


  So what? Recursion's not bad. Besides, aggregates may well be
  multidimensional, in which case

 Right. I did address. That was part of my this *is* a win when diatribe.
 When indexing objects that aren't multidim, there is a performance enhancement
 that isn't insignficiant and requires no change to the design. This is your
 cue to jump with joy ;)

Aren't non-multi-dim, single-level access, the common case? Most perl
programmers deal with one level of access into an aggregate at a time, and
if there's a performance improvement with this case, it should put some
big weight on the side of keys.

  Given your objectives of speed, generality and elegance,
 
  I should point out here that elegance appears third in your list
  here. (It's fourth or fifth on mine)

 Ooops.

Yes, Dan's coding objectives are somewhat of a mystery to me as
well. :)

 * function calls consume resources
  Generally incorrect. Function calls are, while not free, damned cheap
  on most 

Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-22 Thread Angel Faus


Sunday 21 July 2002 21:34, Dan Sugalski wrote:

 No. They are not. You're missing the important case where the data
 structure is inherently and intrinsically multidimensional.

my int Array foo : dimensions(3);
foo[1;2;3] = 12;

 Or whatever the syntax is in perl to declare a 3D array and access
 the element at (1,2,3).


In my opinion, there are actually two different things to dicuss:

- keyed opcodes 
- keyed methods on the PMC vtable

You can keep keyed opcodes, without actually implementing the keyed 
methods. Actually, ALL our keyed methods look like:

... stuff to get the value of SELF[key] into VALUE..
VALUE-vtable-method(..)

Puting this code into the opcode, and removing the vtable methods 
(except get_keyed_*), shouldn't make it slower at all IMHO (same job 
being done, in the same way. The same number (two) of vtable calls). 

Keyed opcodes can stay in the interest of code density. 


 No. Keyed access for all methods stays. You're forgetting one very
 important case, the one where the contents of an aggregate isn't a
 PMC. This scheme would then require the aggregate PMC to construct
 fake PMCs to do operations on. Bad, definitely bad.

Is really so bad? As long that what is ultimately stored in the 
aggregate is one of our base types (pmc, int, string, float), the 
opcode shouldn't need to build the fake PMC.

Or, ¿are you talking about multi-dimensional PMCs? I don't see why 
these should need to build fake PMCs: you just call the get_keyed 
method, and it will do the Right Thing (probably getting the value 
directly, maybe propagating a portion of the key to a inner PMC).

As I said, there should no be any speed cost at all, and a significant 
improvement on code size. But this is only an opinion. I am willing 
to submit a benchmarking patch if it has some interest.

Best,

-àngel





Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-22 Thread Scott Walters



On Mon, 22 Jul 2002, Angel Faus wrote:

 In my opinion, there are actually two different things to dicuss:
 
 - keyed opcodes 
 - keyed methods on the PMC vtable
 
...
 
 Keyed opcodes can stay in the interest of code density. 
 
 
  No. Keyed access for all methods stays. You're forgetting one very
  important case, the one where the contents of an aggregate isn't a
  PMC. This scheme would then require the aggregate PMC to construct
  fake PMCs to do operations on. Bad, definitely bad.
 
 Is really so bad? As long that what is ultimately stored in the 
 aggregate is one of our base types (pmc, int, string, float), the 
 opcode shouldn't need to build the fake PMC.
 
 Or, ¿are you talking about multi-dimensional PMCs? I don't see why 
 these should need to build fake PMCs: you just call the get_keyed 
 method, and it will do the Right Thing (probably getting the value 
 directly, maybe propagating a portion of the key to a inner PMC).

By the time you told Parrot what you've actually returned, you might
as well have just returned it in the form of a PMC. 

With fetching PMCs its easy. Atomic values are harder. A few things
spring to mind [for keyed access to atomic data items]:


1. Returning a reference. This would work, since the atomic types 
   have no active behavior. This would require all PMCs that store
   things to work in terms of references in addition to  values, and
   would create another varient of ops [I could be wrong].

2. Load/Store. This would require a fetch, the operation performed,
   then a store. Twice as many traversals would be required in the
   case of:
   %ref-{$x;$y;$z}++;
   However,
   print %ref-{$x;$y;$z}, \n;
   %ref-{$x;$y;$z} = 100;
   ...would both be unaffected, since only a fetch or store would be required.

3. Very light wrapper PMCs. A special int, num, string wrapper could
   be created on startup for each operand position, and persist throughout
   the life of the interpreter, being reused each time a keyed atomic value
   is addressed. Keeping a finate dedicated pool eliminates overhead of
   calculating which to use, adding and removing to free lists, etc.
   This would require a mere 3*3=9 dedicated wrappers. 
   This would seriously damage the value of the primitive register pools
   and place undo strain on the PMC registers. I'm just being thorough ;)

4. Establish an interpreter-global register (Foo) that contains information
   (an enum) about which operation is to be performed, and a UnionVal
   describing the other operand.
   This would require Parrot to look up one operand - either from the
   register file directly or via get_pmc_*, populate the single, fixed
   UnionVal with its information, then invoke do_operation_keyed on
   the other operand. When the 2nd operand was able to eventually track
   down the value in question, it could look to Foo to discover
   which operation to perform, and with which other value, and implement
   an internal switch() on the operation to be performed.
   This would allow for 4 get_* methods, 4 set_* methods, and one do_
   method.
   The overhead that this would create is merely the switch() at the
   end of the second resolution, which is negligable compared to the
   handfull of method calls already being performed.

1 seems to be the fastest while being elegant and flexible. 4 follows 
closely, having the benefit of not generating additional op permutations
but being a bit more contrived. 2 certainly has elegance on its side
but would actually cost significantly more in a frequent case. 3
is not elegant or fast but makes a case for everything is an object
not being quite so bad.

 
 As I said, there should no be any speed cost at all, and a significant 
 improvement on code size. But this is only an opinion. I am willing 
 to submit a benchmarking patch if it has some interest.

The reason I replied to this was not to rant [ranters anonymous?] but
to express interest in this comparison. I'd like to see that. With my
8k instruction cache, any refactoring is welcome =)

 
 Best,
 
 -àngel
 

Cheers,
-scott




Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-21 Thread Dan Sugalski

At 10:26 AM -0700 7/21/02, Scott Walters wrote:
It is pretty clear that no one is happy with the keyed system. It
doesn't do what people want (eg, let you use arrays as keys).
While keys are supposed to be fast, constructing them takes a series
of instructions.

Perspective:

Keys are not needed. They are an optimization over the interpreter
doing a series of lookups in containers using scalar keys.

No. They are not. You're missing the important case where the data 
structure is inherently and intrinsically multidimensional.

   my int Array @foo : dimensions(3);
   @foo[1;2;3] = 12;

Or whatever the syntax is in perl to declare a 3D array and access 
the element at (1,2,3).

@foo is a three-dimensional array of integers. There is none of 
perl's arrays of refs to arrays of refs nonsense--@foo is a single 
PMC. On top of that, each of its elements is *not* a PMC, it is an 
integer.

If we
stuck to that premise, limiting the scope, lots of ugly bad things
would go away.

I still fail to see any ugly bad things. (And, bluntly, I don't much 
care if something's ugly if it provides a speed win, and bad is a 
value judgement that I don't think is appropriate here)

I propose that keyed access do exactly eight things:

* fetch a PMC using a key
* fetch a integer using a key
* fetch a number using a key
* fetch a string using a key
* store PMC
* store int
* store num
* store string

No. Keyed access for all methods stays. You're forgetting one very 
important case, the one where the contents of an aggregate isn't a 
PMC. This scheme would then require the aggregate PMC to construct 
fake PMCs to do operations on. Bad, definitely bad.

-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-21 Thread Dan Sugalski

At 7:06 PM +0100 7/21/02, Tom Hughes wrote:
In message 20020721174150$[EMAIL PROTECTED]
   Scott Walters [EMAIL PROTECTED] wrote:

  I propose that keyed access do exactly eight things:

  * fetch a PMC using a key
  * fetch a integer using a key
  * fetch a number using a key
  * fetch a string using a key
  * store PMC
  * store int
  * store num
  * store string

  To add to a PMC, the PMC would be fetched, then a seperate instruction
  would add to it. This returns keys to their roots of merely optimizing
  access to deeply stored items.

You may well be right. I am certainly concerned about the amount of
cut and paste duplication involved at the moment.

Sounds like a good case for adding some smarts to the pmc and opcode 
preprocessor.

Your argument is orthogonal to my question though - even if we decide
to restrict keyed access to just the set opcode we still have to decide
how to encode that keyed access in the bytecode.

Keys are either constant key structs, constant integers, string 
registers, or integer registers. Encoding shouldn't be any different 
than any other constant or register. Jeff's got an opcode function 
naming scheme--I've not browsed back far enough in the discussion to 
see if it's considered insufficient or not.
-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-21 Thread Scott Walters



Dan,

Thanks for being a good sport. I'm not in a hurry here - don't feel like you
need to be. 

   I propose that keyed access do exactly eight things:
 
   * fetch a PMC using a key
   * fetch a integer using a key
   * fetch a number using a key
   * fetch a string using a key
   * store PMC
   * store int
   * store num
   * store string
 
   To add to a PMC, the PMC would be fetched, then a seperate instruction
   would add to it. This returns keys to their roots of merely optimizing
   access to deeply stored items.
 
 You may well be right. I am certainly concerned about the amount of
 cut and paste duplication involved at the moment.

 [Dan:]
 
 Sounds like a good case for adding some smarts to the pmc and opcode 
 preprocessor.
 

This would only automate the generation of large amounts of code, not get rid
of the large amount of code being generated. Once again, my complaint here is that
the L2 cache would buckle under the weight of a dozen PMCs each defining a few dozen
recursive accessors. The performance gain of making the code smaller is worth 
taking two iterations through the main interpreter.
Stepping back a bit, my suggestion would be a more general: factor out the recursive
fetch from the operation. Keeping that as a single PVM op, the PVM could use the
vtable API to fetch the burried PMC/int/num/string, then fall through to the logic
that performs operations on immediately available values. Having two seperate VM
ops isn't strictly required to factor this out. Restating, anytime you massive
cut and paste code, it is indication of trouble ;)

1) KEY *'s and their atoms are allocated from memory. The memory allocation
   hurts more than it saves, cycle wise.
   The number of CPU cycles used doing a virtual machine instruction *pales*
   in comparison to what is needed to allocate memory.
 
 Wrong. Keys are fixed sized objects, and perfect candidates for the 
 object allocator, which is screamingly fast.

 variable *contents* requires changing any key structure. This:
 
 for my $i (1..1000) {
   for my $j (1..1000) {
 for my $k (1..1000) {
   foo[$i;$j;$k]++'
 }
   }
 }
 
 requires the construction of exactly *one* key structure, and 
 requires that it be updated exactly *once*, through the entire loop. 

 PMCs don't move. Keys can cache pointers to them. This isn't a problem.

The definition I have for KEY * here is a linked list. However, if you speak
truth saying nested access only generating then reusing one key, thats awesome,
and the allocation overhead is a total win.
I'm sorry I missed that gem looking at the source. The rough edges do
obscure the gems. If I can't find this gem at all, I'll be looking for pointers
on briniging it to light.

 So what? Recursion's not bad. Besides, aggregates may well be 
 multidimensional, in which case

Right. I did address. That was part of my this *is* a win when diatribe.
When indexing objects that aren't multidim, there is a performance enhancement
that isn't insignficiant and requires no change to the design. This is your
cue to jump with joy ;)

 Speed. The reason for them is speed. Generality and elegance are 
 nice, but the point is to go faster, which is why the special cases 
 are there.
 
 Given your objectives of speed, generality and elegance,
 
 I should point out here that elegance appears third in your list 
 here. (It's fourth or fifth on mine)

Ooops.

 3. Let PMCs create KEY * lists.
 
 That's what the interpreter is for. While there may not be sufficient 
 information available to the interpreter and compiler to generate 
 efficient key lists (which is possible, definitely) I can guarantee 
 you that the interpreter has more information than any individual PMC 
 vtable function will have.
 
 Keys are the interpreter's way of indicating the element of an 
 aggregate. Construction of the key belongs to the interpreter. 
 Interpretation of the key belongs to the PMC vtable functions. Leave 
 them on the appropriate sides of the wall, please.
 

I certainly have a better feel for how this is supposed to work. I know
being pested isn't fun. I'll submit diffs on the PDDs so that this isn't
lost to the archives ;)

If I'm a little antagonistic, its because I'm used to being in *your* seat,
and I can't stand not having the tools to get things done. That being said,
some of the people you've got volunteering are smart as a whip. Make the 
most of them. Bring them up to speed and turn them loose. You'll be very
glad you did. You don't need to turn out working code and quality documentation
to pass the torch - you can harness these people merely by passing them
well thought out ideas. Most people won't resort to debate if there is a
workable solution on the table - only when we're completely lost and
confused why a proposal is being rejected out-of-hand.

A truely great language is one in which people can do things the designers
never thought of. A truely great development