clip

Jonathan Worthington Wed, 27 Sep 2006 16:31:15 -0700

Hi,

Some first thoughts that come to mind after reading leo's two proposals.

+A typical C structure:
+
+  struct foo {
+    int a;
+    char b;
+  };
+
+could be created in PIR with:
+
+  cs = subclass 'CStruct', 'foo'   # or maybe  cs = new_c_class 'foo'
+  addattribute cs, 'a'
+  addattribute cs, 'b'
+
+The semantics of a C struture are the same as of a Parrot Class.
+But we need the types of the attributes too:
+
+Handwavingly TBD 1)
+
+with ad-hoc existing syntax:
+
+  .include "datatypes.pasm"
+  cs['a'] = .DATATYPE_INT
+  cs['b'] = .DATATYPE_CHAR
+

This (and the addattribute for native types) is one thing that wouldcertainly simplify code generation for the .Net translator byeliminating various boxing and unboxing code that I emit now. I imagineit will help with other languages too.

+Handwavingly TBD 2)
+
+with new variants of the C<addattribute> opcode:

++ addattribute cs, 'a', .DATATYPE_INT

+  addattribute cs, 'b', .DATATYPE_CHAR

Certainly preferable to syntax 1.

+Probably desired and with not much effort TBD 3):
+
+  addattribute(s) cs, <<'DEF'
+    int a;
+    char b;

+ DEF

I'm not so keen on this part of the proposal. It means the CStruct PMCneeds to parse the above syntax (but at least that also means noadditions to PIR parsing to support this, but the previous twosuggestions did not either).

I think if we could "magically" have the .DATATYPE_INT constantsexisting without needing to .include them the previous syntax (number 2)would be preferable. It compiles down to just a sequence of bytecodeinstructions then, rather than a constants table entry for the string.But more importantly, all syntax checking is done at PIR compile time,whereas the string describing the struct elements and types would not beparsed until runtime so typo's in type names or general syntax errorsaren't detected until then.

+The generalization of arbitrary attribute names would of course be
+possible too, but isn't likely needed.

Unsure what this means - please clarify this a bit.

+=head2 Syntax variant
+
+  cs = subclass 'CStruct', <<'DEF
+    struct foo {
+      int a;
+      char b;
+    };
+  DEF
+
+I.e. create all in one big step.

Same issues as above.

+=head2 Object creation and attribute usage
+
+This is straight forward and conforming to current ParrotObjects:
+
+  o = new 'foo'                 # a ManagedStruct instance
+  setattribute o, 'a', 4711
+  setattribute o, 'b', 22
+  ...
+
+The only needed extension would be C<{get,set}attribute> variants with
+natural types.

This is the real place, of course, where the .Net translator (and Ithink other compilers) will save on spitting out box/unbox code.

+=head2 Nested Structures
+
+  foo_cs = subclass 'CStruct', 'foo'
+  addattribute(s) foo_cs, <<'DEF'
+    int a;
+    char b;

+ DEF+ bar_cs = subclass 'CStruct', 'bar'

+  addattribute(s) bar_cs, <<'DEF'
+    double x;
+    foo foo;        # the foo class is already defined

May I suggest change second foo there to something else? I know it's theattribute name, but it made me scratch my head to check something oddwasn't going on.

+    foo *fptr;

+ DEF+ o = new 'bar'

+  setattribute o, 'x', 3.14
+  setattribute o, ['foo'; 'a'], 4711         # o.foo.a = 4711
+  setattribute o, ['fptr'; 'b'], 255

Can you describe the semantics of foo vs *foo (or *fooptr as it appearsin the above code) are more clearly? Is guess it just that in one casethe foo structure is a part of the bar one, and in the other case it's apointer to it, like in C? But please don't rely too much on knowledge ofC semantics when describing Parrot ones.

+=head2 Array Structures Elements
+
+  foo_cs = subclass 'CStruct', 'foo'
+  addattribute(s) foo_cs, <<'DEF'
+    int a;
+    char b[100];

+ DEF

With bounds checking on accesses to b, right?

+=head2 Managed vs. Unmanaged Structs
+
+The term "managed" in current structure usage defines the owner of the
+structure memory. C<ManagedStruct> means that parrot is the owner of
+the memory and that GC will eventually free the structure memory. This
+is typically used when C structures are created in parrot and passed
+into external C code.
+
+C<UnManagedStruct> means that there's some external owner of the

+structure memory. Such structures are typically return results of+external code.

I think for safety reasons we will later want to have some way of onlyletting approved code that uses unmanagedstructs run, as with themanyone can segfault the VM in no time at all...but that's for a securityPDD or something.

+This proposal alone doesn't solve all inheritance problems. It is also
+needed that the memory layout of PMCs and ParrotObjects deriving from
+PMCs is the same. E.g.
+
+  cl = subclass 'Integer', 'MyInt'
+
...
+
+With the abstraction of a C<CStruct> describing the C<Integer> PMC and
+with differently sized PMCs, we can create an object layout, where the
+C<int_val> attribute of C<Integer> and C<MyInt> are at the same
+location and of the same type.
+
+Given this (internal) definition of the C<Integer> PMC:
+
+  intpmc_cl = subclass 'CStruct', 'Integer'
+  addattribute(s) intpmc_cl, <<'DEF'
+    INTVAL int_val;            # PMC internals are hidden

+ DEF+

+we can transparently subclass it as C<MyInt>, as all the needed
+information is present in the C<CStruct intpmc_cl> class.
+

Maybe a side issue, but how do you propose dealing with languages thatallow:


class A {
   private int x;
   ...
}

class B is A {
   private int x; /* Parent's x not visible, but name is the same. */
   ...
}

Where methods in A will access the x defined in A and methods in B willaccess the x defined in B?

+=head1 Implementation
+
+C<CStruct> is basically yet another PMC and can be implemented and put
+to functionality without any interference with existing code. It is
+also orthogonal with possible PMC layout changes.
+
+The internals of C<CStruct> can vastly reuse code from F<src/objects.c>
+to deal with inheritance or object instantiation. The main difference
+is that attributes have additionally a type attached to it and
+consequently that the attribute offsets are calculated differently
+depending on type, alignment, and padding. These calculations are
+already done in F<unmanagedstruct.pmc>.

I am curious how this hurts our portability. Alignment and padding candiffer somewhat between platforms. And don't optimizing compilerssometimes re-order struct elements for better packing? Yes, there will(should!) be flags to disable that of course, but what burden are weputting on people porting Parrot?


(Put another way: how portable is the UnmanagedStruct PMC?)

+C<CStruct> classes can be attached to existing PMCs gradually (and by
+far not all PMCs need that abstract backing). But think e.g. of the
+C<Sub> PMC. Attaching a C<CStruct> to it, would instantly give access
+to all it's attributes and vastly simplify introspection.

A Good Thing. Also we will want an interface to get hold of theattribute names and types...

+=head1 Additional PMC attributes
+

+=head2 pmc->_next_for_GC / opmc->pmc_ext->_next_for_GC+

+All PMCs that refer to other PMCs have a 3rd mandatory attribute
+C<_next_for_GC>, used for garbage collection, The presence of this
+attribute is signaled by the flag bit C<PObj_is_PMC_EXT_FLAG>.
+
+  +---------------+
+  |   vtable      |    
+  +---------------+
+  |   flags       |
+  +---------------+
+  | _next_for_GC  |
+  +---------------+
+  |   ...         |
+  +---------------+
+

Another side-thought - if we know the types of the things in theattributes of the PMC, can we not auto-generate the mark code for any weknow are PMC* or STRING*?

+=head2 Locking or opmc->pmc_ext->_synchronize
+
+PMCs do no support locking universally. Creating sharable PMCs at
+runtime (from standard PMCs) is again done by transparent Refs like
+C<SharedRef> or C<STMRef>.
+
+=head2 Shared PMCs
+
+If needed, we can define shared PMCs by allocating the C<_Sync>
+structure in front of the PMC:
+
+  +---------------+
+  |   struct      |    
+  |   _Sync       |    
+  +---------------+ <--- pmc points here
+  |   vtable      |    
+  +---------------+
+  |   flags       |
+  +---------------+
+
+This works of course only, if PMCs are created as C<shared> in the
+first place. The presence of the C<_Sync> structure is stated by a PMC
+flag bit.

Why put it before the location that is pointed to? That seems confusingto me, and inconsistent with the next_for_gc entry that is placed afterthe flags rather than before the PMC starts. Plus I imagine itcomplicates de-allocation - you have to check the flag and subtractsizeof(struct _Sync) if it's set...


That's "all" that comes to me right now. ;-)

Thanks,

Jonathan

Re: [svn:parrot-pdd] r14774 - in trunk: . docs/pdds/clip

Reply via email to