Hi,

Some first thoughts that come to mind after reading leo's two proposals.

+A typical C structure:
+
+  struct foo {
+    int a;
+    char b;
+  };
+
+could be created in PIR with:
+
+  cs = subclass 'CStruct', 'foo'   # or maybe  cs = new_c_class 'foo'
+  addattribute cs, 'a'
+  addattribute cs, 'b'
+
+The semantics of a C struture are the same as of a Parrot Class.
+But we need the types of the attributes too:
+
+Handwavingly TBD 1)
+
+with ad-hoc existing syntax:
+
+  .include "datatypes.pasm"
+  cs['a'] = .DATATYPE_INT
+  cs['b'] = .DATATYPE_CHAR
+
This (and the addattribute for native types) is one thing that would certainly simplify code generation for the .Net translator by eliminating various boxing and unboxing code that I emit now. I imagine it will help with other languages too.

+Handwavingly TBD 2)
+
+with new variants of the C<addattribute> opcode:
+ + addattribute cs, 'a', .DATATYPE_INT
+  addattribute cs, 'b', .DATATYPE_CHAR
Certainly preferable to syntax 1.

+Probably desired and with not much effort TBD 3):
+
+  addattribute(s) cs, <<'DEF'
+    int a;
+    char b;
+ DEF
I'm not so keen on this part of the proposal. It means the CStruct PMC needs to parse the above syntax (but at least that also means no additions to PIR parsing to support this, but the previous two suggestions did not either).

I think if we could "magically" have the .DATATYPE_INT constants existing without needing to .include them the previous syntax (number 2) would be preferable. It compiles down to just a sequence of bytecode instructions then, rather than a constants table entry for the string. But more importantly, all syntax checking is done at PIR compile time, whereas the string describing the struct elements and types would not be parsed until runtime so typo's in type names or general syntax errors aren't detected until then.

+The generalization of arbitrary attribute names would of course be
+possible too, but isn't likely needed.
Unsure what this means - please clarify this a bit.

+=head2 Syntax variant
+
+  cs = subclass 'CStruct', <<'DEF
+    struct foo {
+      int a;
+      char b;
+    };
+  DEF
+
+I.e. create all in one big step.
Same issues as above.

+=head2 Object creation and attribute usage
+
+This is straight forward and conforming to current ParrotObjects:
+
+  o = new 'foo'                 # a ManagedStruct instance
+  setattribute o, 'a', 4711
+  setattribute o, 'b', 22
+  ...
+
+The only needed extension would be C<{get,set}attribute> variants with
+natural types.
This is the real place, of course, where the .Net translator (and I think other compilers) will save on spitting out box/unbox code.

+=head2 Nested Structures
+
+  foo_cs = subclass 'CStruct', 'foo'
+  addattribute(s) foo_cs, <<'DEF'
+    int a;
+    char b;
+ DEF + bar_cs = subclass 'CStruct', 'bar'
+  addattribute(s) bar_cs, <<'DEF'
+    double x;
+    foo foo;        # the foo class is already defined
May I suggest change second foo there to something else? I know it's the attribute name, but it made me scratch my head to check something odd wasn't going on.
+    foo *fptr;
+ DEF + o = new 'bar'
+  setattribute o, 'x', 3.14
+  setattribute o, ['foo'; 'a'], 4711         # o.foo.a = 4711
+  setattribute o, ['fptr'; 'b'], 255
Can you describe the semantics of foo vs *foo (or *fooptr as it appears in the above code) are more clearly? Is guess it just that in one case the foo structure is a part of the bar one, and in the other case it's a pointer to it, like in C? But please don't rely too much on knowledge of C semantics when describing Parrot ones.

+=head2 Array Structures Elements
+
+  foo_cs = subclass 'CStruct', 'foo'
+  addattribute(s) foo_cs, <<'DEF'
+    int a;
+    char b[100];
+ DEF
With bounds checking on accesses to b, right?

+=head2 Managed vs. Unmanaged Structs
+
+The term "managed" in current structure usage defines the owner of the
+structure memory. C<ManagedStruct> means that parrot is the owner of
+the memory and that GC will eventually free the structure memory. This
+is typically used when C structures are created in parrot and passed
+into external C code.
+
+C<UnManagedStruct> means that there's some external owner of the
+structure memory. Such structures are typically return results of +external code.
+
I think for safety reasons we will later want to have some way of only letting approved code that uses unmanagedstructs run, as with them anyone can segfault the VM in no time at all...but that's for a security PDD or something.

+This proposal alone doesn't solve all inheritance problems. It is also
+needed that the memory layout of PMCs and ParrotObjects deriving from
+PMCs is the same. E.g.
+
+  cl = subclass 'Integer', 'MyInt'
+
...
+
+With the abstraction of a C<CStruct> describing the C<Integer> PMC and
+with differently sized PMCs, we can create an object layout, where the
+C<int_val> attribute of C<Integer> and C<MyInt> are at the same
+location and of the same type.
+
+Given this (internal) definition of the C<Integer> PMC:
+
+  intpmc_cl = subclass 'CStruct', 'Integer'
+  addattribute(s) intpmc_cl, <<'DEF'
+    INTVAL int_val;            # PMC internals are hidden
+ DEF +
+we can transparently subclass it as C<MyInt>, as all the needed
+information is present in the C<CStruct intpmc_cl> class.
+
Maybe a side issue, but how do you propose dealing with languages that allow:

class A {
   private int x;
   ...
}

class B is A {
   private int x; /* Parent's x not visible, but name is the same. */
   ...
}

Where methods in A will access the x defined in A and methods in B will access the x defined in B?

+=head1 Implementation
+
+C<CStruct> is basically yet another PMC and can be implemented and put
+to functionality without any interference with existing code. It is
+also orthogonal with possible PMC layout changes.
+
+The internals of C<CStruct> can vastly reuse code from F<src/objects.c>
+to deal with inheritance or object instantiation. The main difference
+is that attributes have additionally a type attached to it and
+consequently that the attribute offsets are calculated differently
+depending on type, alignment, and padding. These calculations are
+already done in F<unmanagedstruct.pmc>.
I am curious how this hurts our portability. Alignment and padding can differ somewhat between platforms. And don't optimizing compilers sometimes re-order struct elements for better packing? Yes, there will (should!) be flags to disable that of course, but what burden are we putting on people porting Parrot?

(Put another way: how portable is the UnmanagedStruct PMC?)

+C<CStruct> classes can be attached to existing PMCs gradually (and by
+far not all PMCs need that abstract backing). But think e.g. of the
+C<Sub> PMC. Attaching a C<CStruct> to it, would instantly give access
+to all it's attributes and vastly simplify introspection.
A Good Thing. Also we will want an interface to get hold of the attribute names and types...

+=head1 Additional PMC attributes
+
+=head2 pmc->_next_for_GC / opmc->pmc_ext->_next_for_GC +
+All PMCs that refer to other PMCs have a 3rd mandatory attribute
+C<_next_for_GC>, used for garbage collection, The presence of this
+attribute is signaled by the flag bit C<PObj_is_PMC_EXT_FLAG>.
+
+  +---------------+
+  |   vtable      |    
+  +---------------+
+  |   flags       |
+  +---------------+
+  | _next_for_GC  |
+  +---------------+
+  |   ...         |
+  +---------------+
+
Another side-thought - if we know the types of the things in the attributes of the PMC, can we not auto-generate the mark code for any we know are PMC* or STRING*?

+=head2 Locking or opmc->pmc_ext->_synchronize
+
+PMCs do no support locking universally. Creating sharable PMCs at
+runtime (from standard PMCs) is again done by transparent Refs like
+C<SharedRef> or C<STMRef>.
+
+=head2 Shared PMCs
+
+If needed, we can define shared PMCs by allocating the C<_Sync>
+structure in front of the PMC:
+
+  +---------------+
+  |   struct      |    
+  |   _Sync       |    
+  +---------------+ <--- pmc points here
+  |   vtable      |    
+  +---------------+
+  |   flags       |
+  +---------------+
+
+This works of course only, if PMCs are created as C<shared> in the
+first place. The presence of the C<_Sync> structure is stated by a PMC
+flag bit.
Why put it before the location that is pointed to? That seems confusing to me, and inconsistent with the next_for_gc entry that is placed after the flags rather than before the PMC starts. Plus I imagine it complicates de-allocation - you have to check the flag and subtract sizeof(struct _Sync) if it's set...

That's "all" that comes to me right now. ;-)

Thanks,

Jonathan

Reply via email to