I followed up on the perlbug thread on this but so far it hasn't
showed up in p6i, so here's a manual resend.

--- cut here ---

I am unfortunately running out of time to look more into the matter of
bytecode reading being broken in Alpha.  However, here are some notes
for those who want to try, as of src/byteorder.c 1.20 and
src/packfile.c 1.142.  First of all note that I'm no Parrot or PBC
guru, I'm mostly going by what I think I can understand from
docs/parrotbyte.pod, version 2003.11.22.

(1) What is failing is ./parrot t/native_pbc/{integer_1,number_{1,2}.t},
all are saying:

PackFile_unpack: Not a Parrot PackFile!
Magic number was [0x4c524550] not [0x013155a1]
Parrot VM: Can't unpack packfile t/native_pbc/integer_1.pbc.
error:imcc:main: Packfile loading failed

(2) After some glaring at the hex dump of the pbc and the parrotbyte.pod
and pf/pf_items.c:PF_fetch_opcode() and src/byteorder.c:fetch_op_be()
(since pf/pf_items.c:PackFile_assign_transforms() has assigned
fetch_op_mixed() to be the transform, OPCODE_T_SIZE being 8 and
PARROT_BIGENDIAN being 0 for the 64-bit little-endian Alpha) it is
pretty obvious (?) what is happening:

04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 ............p*X.
a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP........
...

The fetch_op_be() reverts the eight bytes 50 45 52 4c 01 31 55 a1
to become a1 55 31 01 4c 52 45 50, and then in fetch_op_mixed()
the 0xa15531014c524550 gets masked to be the 0x4c524550.

(3) Now, does this make any sense?  Not to me, not right now. Allow me
to list the issues I have (or things I don't understand at the moment):

(3a) Why is fetch_op_mixed() reading in 8 bytes at a time when the
.pbc is saying the wordsize is 4 (the first byte)?  Yes, the native
wordsize is eight-sized, but the bytecode is four-sized.

(3b) The byteorder of the .pbc is 0 (the second byte), or little-endian.
Neat, that is the same as ours.  But why are we then reading the
parrot magic (offset 16) in as a bigendian (fetch_op_be()) opcode,
and therefore reverting the bytes?  Had we read in 4 bytes (see 3a)
we would have had the expected PARROT_MAGIC or 0x013155a1 right there
in the bytes a1 55 31 01.

(3c) In PF_fetch_opcode() we have
    o = (pf->fetch_op)(**stream);
    *((unsigned char **) (stream)) += pf->header->wordsize;
where stream is opcode_t** (and the pf->fetch_op is here the fetch_op_mixed).
This is supposed to read in the next opcode and advance the opcode cursor.
But I have a strong suspicion and spotty evidence that this cannot work
reliably. If the opcode_t requires alignment by eight, but the packfile
(pf) bytecode header says the wordsize is four, we have just set up
a time bomb that will go off real soon-- at the next opcode fetch.
(3c1) Assume *stream is X, something nicely aligned by eight.
(3c2) Assume an opcode is read.
(3c3) *stream is increased by four, it then being X+4.
(3c4) The next time around an attempt is made to call (pf->fetch_op)
with the *stream pointing to an address aligned by four but not by eight.
Kaboom.  What I mean by "spotty evidence" is that after some hacking
around and getting the PARROT_MAGIC read properly (I replaced the o&0xffffffff
with (o>>32)&0xffffffff in the last branch of fetch_op_mixed() and one
more byte reverse for the magic in src/packfile.c:PackFile_unpack(), IIRC)
I got a SIGBUS at the o = (pf->fetch_op)(*stream) line, the next time
around.  That was the point where I had to give up hacking this.

In general it is not portable across architectures to cast aligned
(like opcode_t, or long) and "non-aligned" (char, void) pointers back
and forth (like it is done at the PF_fetch_opcode() cursor increment
line).  For example in x86 I believe one can, with impunity, but all
the world's not x86.  In the case of wordsizes of the runtime and the
bytecode being different, I think only a non-aligned pointer could work
as the cursor.

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Reply via email to