Re: [perl #17490] Magic is useless unless verifiable.

2008-03-16 Thread Jonathan Worthington

James Keenan via RT wrote:

Jonathan:  You took this RT some time back.  Could you give us an update
on its status?  (It's the oldest outstanding RT.)
  
Resolved; PDD13 specified doing it a Different Way and that bit of PDD13 
is one of the bits I've gotten around to implementing too, so this is 
fixed in both spec and implementation.


Thanks,

Jonathan


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-28 Thread Joshua Hoblitt
On Tue, Sep 27, 2005 at 01:49:52PM -0700, Chip Salzenberg wrote:
> On Mon, Sep 26, 2005 at 03:29:52PM -1000, Joshua Hoblitt wrote:
> > An updated patch is attached.
> 
> All OK now with me, thanks.

The ASCII art of the 'padding' was wrong.  A corrected patch is
attached.

-J

--
Index: docs/parrotbyte.pod
===
--- docs/parrotbyte.pod (revision 9235)
+++ docs/parrotbyte.pod (working copy)
@@ -7,8 +7,33 @@
 
 =head1 Format of the Parrot bytecode
 
+Parrot's bytecode format consists of a small endian neutral header region
+followed by a series of segments.  ALL words (non-bytes) following the header
+are are stored in native order, unless otherwise specified.
+
+=head1 PBC Header
+
+The PBC header is a fixed 32 bytes in length.  Header values are all encoded as
+either a single byte or a string so that it can be parsed without having to
+consider the endianness of the data.
+
   0  1  2  3
   +--+--+--+--+
+  | 0xfe   0x50   0x42   0x43 |
+  +--+--+--+--+
+  | 0x0d   0x0a   0x1a   0x0a |
+  +--+--+--+--+
+
+The header begins with an eight byte I or I.
+This is equivalent to the C strings C<\376PBC\r\n\032\n> (ASCII) and
+C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a> sans the terminating C bytes.  Bytes
+0 and 4-7 are designed to catch common types of file corruption caused by
+transport encoding mechanisms (for example, FTP ASCII transfers).  This format
+was inspired by the PNG Specification.  Please see RFC 2083 for an explanation
+of the advantages of this strategy.
+
+  8  9  10 11
+  +--+--+--+--+
   | Wordsize | Byteorder|  Major   |  Minor   |
   +--+--+--+--+
 
@@ -20,7 +45,7 @@
 
 Byteorder currently supports two values: (0-Little Endian, 1-Big Endian)
 
-  4  5
+  12 13 14
   +--+--+--+--+
   | INT size | FloatType|  10 Byte  ...   |
   +--+--+--+--+
@@ -29,26 +54,38 @@
   |   core.ops is here|
   +--+--+--+--+
 
-INT size (sizeof(INTVAL)) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
+INT size (C) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
 double, FloatType 1 is i386 little endian 12 byte long double.
 
-  16
+
+  24 25 26 27
   +--+--+--+--+
-  | Parrot Magic = 0x 13155a1 |
+  |  padding  |
   +--+--+--+--+
+  |   |
+  +--+--+--+--+
 
-Magic is stored in native byteorder. The loader uses the byteorder header to
-convert the Magic to verify. More specifically, ALL words (non-bytes) in the
-bytecode file are stored in native order, unless otherwise specified.
+Following the core.ops fingerprint, the header I be padded with C
+bytes to be an overall 32 bytes in length.
 
-  20*
-  +--+--+--+--+
-  | Opcode Type (Perl = 0x5045524c)   |
-  +--+--+--+--+
+All words following the header will be interpreted as Op codes.
 
-The asterisk for the offset states, from here we have opcodes. The given
-offsets are for 32 bit opcode types only.
+=head2 Magic Description
 
+The following is C description of the PBC Header format.
+
+0   string  \xfe\x50\x42\x43\x0d\x0a\x1a\x0a Parrot Bytecode (PBC)
+>10 bytex
+>11 bytex   version %2$d.%1$d,
+>8  bytex   wordsize is %d bytes,
+>9  byte=0  byteorder is little endian,
+>9  byte=1  byteorder is big endian,
+>9  byte>1  byteorder is unknown,
+>12 bytex   integers are %d bytes,
+>13 byte=0  floats are IEEE 754
+>13 byte=1  floats are i387 96-bit
+>13 byte>1  float type is unknown
+
 =head1 PBC FORMAT 1
 
 All segments are aligned at a 16 byte boundary. All segments share a common
@@ -293,6 +330,12 @@
 Eventually there will be a more complete and useful PackFile specification, but
 this simple format works well enough for now (c. Parrot 0.0.5).
 
+=head1 REFERENCES
+
+=head2 RFC 2803
+
+L
+
 =head1 SEE ALSO
 
 F, F, F, F, F, and the
@@ -306,7 +349,9 @@
 
 Variable argument opcodes update by Jonathan Worthington C<[EMAIL PROTECTED]>
 
+The header format was mangled by Joshua Hoblitt (JHOBLITT) C<[EMAIL PROTECTED]>
+
 =head1 VERSION
 
-2005.09.19
+2005.09.25
 


pgpIEe0hHWr6j.pgp
Description: PGP signature


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-27 Thread Joshua Hoblitt
Jonathan,

Chip gave an official OK via irc.

<^conner> chip, Jonathan said that he'd try to do it as part of his changes and 
commit the doc patch when he's done
 ^conner: Oh, that's a good plan

-J

--
On Tue, Sep 27, 2005 at 12:13:06PM +0100, Jonathan Worthington wrote:
> "Joshua Hoblitt" <[EMAIL PROTECTED]> wrote:
> >An updated patch is attached.
> >
> Looks good.  Provided there's no further issues brought up with it, I'll 
> put it on my "to implement" list and do it when I'm doing the changes 
> relating to the PASM/PIR debug segment (bytecode format changes are a pain, 
> so it's best to munge them together).  Then I'll apply the doc patch at the 
> same time as the implementation changes so they're kept in sync.
> 
> Jonathan 
> 


pgpjr6LoVVLL7.pgp
Description: PGP signature


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-27 Thread Chip Salzenberg
On Mon, Sep 26, 2005 at 03:29:52PM -1000, Joshua Hoblitt wrote:
> An updated patch is attached.

All OK now with me, thanks.
-- 
Chip Salzenberg <[EMAIL PROTECTED]>


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-27 Thread Jonathan Worthington

"Joshua Hoblitt" <[EMAIL PROTECTED]> wrote:

An updated patch is attached.

Looks good.  Provided there's no further issues brought up with it, I'll put 
it on my "to implement" list and do it when I'm doing the changes relating 
to the PASM/PIR debug segment (bytecode format changes are a pain, so it's 
best to munge them together).  Then I'll apply the doc patch at the same 
time as the implementation changes so they're kept in sync.


Jonathan 



Re: [perl #17490] Magic is useless unless verifiable.

2005-09-27 Thread Joshua Hoblitt
On Sun, Sep 25, 2005 at 09:43:15PM -0700, Chip Salzenberg wrote:
> On Sun, Sep 25, 2005 at 10:04:16AM -1000, Joshua Hoblitt wrote:
> > *   The magic number is no longer an opcode outside the header.  It is
> > now an 8 byte magic string at the the beginning of the header.
> 
> I should think four would do, but no matter.

It's so 'large' because of an idea 'borrowed' from the PNG spec.  One or
more of the bytes 0 & 4-7 are likely to be damaged by common transport
encoding errors.  I've changed my proposal to explicitly note this.

> > *   Bytes 20 through 31 are now padding so the core.op fingerprint can
> > be expanded in the future.
> 
> Marvy.  Important note: All those bytes *must* be zeros in the current
> implementation.  See below.

That was already in my proposal but I've changed the wording to include
I.

> > * Do we need to keep the Opcode Type?  It's not clear to me what it's used
> >   for.
> > 
> >+--+--+--+--+
> >| Opcode Type (Perl = 0x5045524c)   |
> >+--+--+--+--+
> 
> I don't think it's useful.  A pbc file is Parrot byte code; if Parrot
> learns to translate .NET, Python, or JVM files, it'll read them in
> their native formats.

Sounds reasonable.  It's been dumped.

> 
> > * Does it make sense to use a fix size header?  The offset of the first
> > segment could be calculated by multiplying an "offset byte" and the
> > wordsize.
> 
> We don't have to decide that.  A fixed size header now does not
> foreclose the possiblity that byte #31 will be that "how many more
> words should be considered part of the header" feature you suggest.

Fair enough.

An updated patch is attached.

-J

--
Index: docs/parrotbyte.pod
===
--- docs/parrotbyte.pod (revision 9235)
+++ docs/parrotbyte.pod (working copy)
@@ -7,8 +7,33 @@
 
 =head1 Format of the Parrot bytecode
 
+Parrot's bytecode format consists of a small endian neutral header region
+followed by a series of segments.  ALL words (non-bytes) following the header
+are are stored in native order, unless otherwise specified.
+
+=head1 PBC Header
+
+The PBC header is a fixed 32 bytes in length.  Header values are all encoded as
+either a single byte or a string so that it can be parsed without having to
+consider the endianness of the data.
+
   0  1  2  3
   +--+--+--+--+
+  | 0xfe   0x50   0x42   0x43 |
+  +--+--+--+--+
+  | 0x0d   0x0a   0x1a   0x0a |
+  +--+--+--+--+
+
+The header begins with an eight byte I or I.
+This is equivalent to the C strings C<\376PBC\r\n\032\n> (ASCII) and
+C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a> sans the terminating C bytes.  Bytes
+0 and 4-7 are designed to catch common types of file corruption caused by
+transport encoding mechanisms (for example, FTP ASCII transfers).  This format
+was inspired by the PNG Specification.  Please see RFC 2083 for an explanation
+of the advantages of this strategy.
+
+  8  9  10 11
+  +--+--+--+--+
   | Wordsize | Byteorder|  Major   |  Minor   |
   +--+--+--+--+
 
@@ -20,7 +45,7 @@
 
 Byteorder currently supports two values: (0-Little Endian, 1-Big Endian)
 
-  4  5
+  12 13 14
   +--+--+--+--+
   | INT size | FloatType|  10 Byte  ...   |
   +--+--+--+--+
@@ -29,26 +54,40 @@
   |   core.ops is here|
   +--+--+--+--+
 
-INT size (sizeof(INTVAL)) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
+INT size (C) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
 double, FloatType 1 is i386 little endian 12 byte long double.
 
-  16
+
+  20 21 22 23
   +--+--+--+--+
-  | Parrot Magic = 0x 13155a1 |
+  |  padding  |
   +--+--+--+--+
-
-Magic is stored in native byteorder. The loader uses the byteorder header to
-convert the Magic to verify. More specifically, ALL words (non-bytes) in the
-bytecode file are stored in native order, unless otherwise specified.
-
-  20*
+  |   |
   +--+--+--+--+
-  | Opcode Type (Perl = 0x5045524c)   |
+  |   |
   +--+--+--+--+
 
-The asterisk for the offset states, from here we have opcodes. The given
-offsets are for 32 bit opcode types only.
+Following the core.ops fingerprint, the header I be padded with C
+bytes to be an overall 32 bytes in length.
 
+All words following the header will be interpreted as Op codes.
+
+=head2 Magic Description
+
+The fol

Re: [perl #17490] Magic is useless unless verifiable.

2005-09-25 Thread Chip Salzenberg
On Sun, Sep 25, 2005 at 10:04:16AM -1000, Joshua Hoblitt wrote:
> *   Expands the header to be 32 bytes in size.

OK

> *   The magic number is no longer an opcode outside the header.  It is
> now an 8 byte magic string at the the beginning of the header.

I should think four would do, but no matter.

> *   Bytes 20 through 31 are now padding so the core.op fingerprint can
> be expanded in the future.

Marvy.  Important note: All those bytes *must* be zeros in the current
implementation.  See below.

> * Do we need to keep the Opcode Type?  It's not clear to me what it's used
>   for.
> 
>+--+--+--+--+
>| Opcode Type (Perl = 0x5045524c)   |
>+--+--+--+--+

I don't think it's useful.  A pbc file is Parrot byte code; if Parrot
learns to translate .NET, Python, or JVM files, it'll read them in
their native formats.

> * Does it make sense to use a fix size header?  The offset of the first
> segment could be calculated by multiplying an "offset byte" and the
> wordsize.

We don't have to decide that.  A fixed size header now does not
foreclose the possiblity that byte #31 will be that "how many more
words should be considered part of the header" feature you suggest.
-- 
Chip Salzenberg <[EMAIL PROTECTED]>


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-25 Thread Joshua Hoblitt
On Sun, Sep 25, 2005 at 12:24:52PM -0700, Chip Salzenberg via RT wrote:
> I think the right answer is to use a magic string rather than a
> magic number.

Leo and I been discussing this on #parrot and we've come to the same
conclusion.  Attached is a possible patch for parrotbyte.pod that
implements a number of changes to the header region.  It:

*   Expands the header to be 32 bytes in size.
*   The magic number is no longer an opcode outside the header.  It is
now an 8 byte magic string at the the beginning of the header.
*   Bytes 20 through 31 are now padding so the core.op fingerprint can
be expanded in the future.

Remaining issues are:

* Do we need to keep the Opcode Type?  It's not clear to me what it's used
for.

   +--+--+--+--+
   | Opcode Type (Perl = 0x5045524c)   |
   +--+--+--+--+

* Does it make sense to use a fix size header?  The offset of the first
segment could be calculated by multiplying an "offset byte" and the
wordsize.  That would allow more then enough room for growth (at least
1KB) and ensure that the first segment is always 32-bit aligned.  Leo
and I disagree on this but I think it makes sense.  Additional metadata
could be added to the header without breaking backwards compatibility.

-J

--
Index: docs/parrotbyte.pod
===
--- docs/parrotbyte.pod (revision 9235)
+++ docs/parrotbyte.pod (working copy)
@@ -7,8 +7,30 @@
 
 =head1 Format of the Parrot bytecode
 
+ALL words (non-bytes) in the bytecode file are stored in native order, unless
+otherwise specified.
+
+=head1 PBC Header
+
+A PBC file starts with a header that is a fixed 32 bytes in length.  Header
+values are all encoded as either a single byte or a string so that it can be
+parsed without having to consider the endianness of the data.
+
   0  1  2  3
   +--+--+--+--+
+  | 0xfe   0x50   0x42   0x43 |
+  +--+--+--+--+
+  | 0x0d   0x0a   0x1a   0x0a |
+  +--+--+--+--+
+
+The header begins with an eight byte  I.  This is equivalent to
+the C strings C<\376PBC\r\n\032\n> (ASCII) and
+C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a> sans the terminating C bytes.  This
+format was inspired by the PNG Specification.  Please see RFC 2083 for an
+explanation of the advantages of this strategy.
+
+  8  9  10 11
+  +--+--+--+--+
   | Wordsize | Byteorder|  Major   |  Minor   |
   +--+--+--+--+
 
@@ -20,7 +42,7 @@
 
 Byteorder currently supports two values: (0-Little Endian, 1-Big Endian)
 
-  4  5
+  12 13 14
   +--+--+--+--+
   | INT size | FloatType|  10 Byte  ...   |
   +--+--+--+--+
@@ -29,19 +51,43 @@
   |   core.ops is here|
   +--+--+--+--+
 
-INT size (sizeof(INTVAL)) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
+INT size (C) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
 double, FloatType 1 is i386 little endian 12 byte long double.
 
-  16
+
+  20 21 22 23
   +--+--+--+--+
-  | Parrot Magic = 0x 13155a1 |
+  |  padding  |
   +--+--+--+--+
+  |   |
+  +--+--+--+--+
+  |   |
+  +--+--+--+--+
 
-Magic is stored in native byteorder. The loader uses the byteorder header to
-convert the Magic to verify. More specifically, ALL words (non-bytes) in the
-bytecode file are stored in native order, unless otherwise specified.
+Following the core.ops fingerprint, the header is padded with C bytes to
+be an overall 32 bytes in length.
 
-  20*
+All words following the header will be interpreted as Op codes.
+
+=head2 Magic Description
+
+The following is C description of the PBC Header format.
+
+0   string  \xfe\x50\x42\x43\x0d\x0a\x1a\x0a Parrot Bytecode (PBC)
+>10 bytex
+>11 bytex   version %2$d.%1$d,
+>8  bytex   wordsize is %d bytes,
+>9  byte=0  byteorder is little endian,
+>9  byte=1  byteorder is big endian,
+>9  byte>1  byteorder is unknown,
+>12 bytex   integers are %d bytes,
+>13 byte=0  floats are IEEE 754
+>13 byte=1  floats are i387 96-bit
+>13 byte>1  float type is unknown
+
+FIXME: do we still need this Opcode?
+
+  32*
   +--+--+--+--+
   | Opcode Type (Perl = 0x5045524c)   |
   +--+--+--+--+
@@ -293,6 +339,12 @@
 Eventually there wi

Re: [perl #17490] Magic is useless unless verifiable.

2005-09-25 Thread Chip Salzenberg
On Thu, Sep 22, 2005 at 12:07:48PM -0400, Matt Fowles wrote:
> 
> > Mark Biggar writes:
> > > d) use a magic number that can also be used as the byte order indicator.
> 
> I have seen architectures that swap byte ordering for 8 byte things
> (like doubles) but not 4 byte things.  So that gives 3 options and
> requires an 8 byte magic number if you want to do it that way.

"Ordering" is at least three potentially independent variables: byte
order in words, word order in dwords, and dword order in quads.
Writing a quad magic number in native order thus produces eight
possible eight-byte strings in 'file' databases.  Seems like we're
not playing to the strengths of the system that way.

Worse, a quad integer can't express other variations in machine
ordering that may arise, e.g. if dword order in quad integers differs
from dword order in doubles.

I think the right answer is to use a magic string rather than a
magic number.
-- 
Chip Salzenberg <[EMAIL PROTECTED]>


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-23 Thread Joshua Hoblitt
On Thu, Sep 22, 2005 at 05:00:11PM +0100, Jonathan Worthington wrote:
> Interesting, thanks - they make some good suggestions there.  Our current 
> magic number is "13155a1" - I'm unsure of the rationale behind it, but 
> there may be a reason.  If we're going to change the packfile format, we 
> may as well make sure we're squeezing whatever use we can out of our magic 
> number.

You raise a good question; how was the magic number chosen?

> "Mark A. Biggar" <[EMAIL PROTECTED]> wrote:
> >Joshua Hoblitt wrote:
> >
> >>a) live with it
> >>b) change the magic number to be two identical bytes so the byte
> >>   ordering doesn't matter
> >>c) shrink the magic number to be a single byte

I left out another good option ... 4 identical bytes. ;)

> When I talked about doing something endian-independent, I meant something 
> along the lines of store a sequence of, say, 4 bytes that will have certain 
> values.  Forget reading the 4 bytes as an int at all, read it as a char[4] 
> and check each element is what it should be.  Makes adding support to 
> "file" easy enough, and is my preferred solution.

That would work if the magic 'number' was written as a 'string', which
is not.  Currently on x86 the magic number as written by parrot is
0x55a1 0x0131.

I've figured out how to make C to understand the current scheme
but it's rather ugly. 

--
16  lelong  0x013155a1 Parrot Bytecode (PBC),
>0  bytex   wordsize %d bytes,
>1  byte=0  little endian,
>1  byte=1  big endian,
>2  bytex   major %d,
>3  bytex   major %d,
>4  bytex   sizeof(INTVAL) == %d,
>5  byte=0  FloatType is IEEE 754
>5  byte=1  FloatType is i387 `long double'

16  belong  0x013155a1 Parrot Bytecode (PBC),
>0  bytex   wordsize %d bytes,
>1  byte=0  little endian,
>1  byte=1  big endian,
>2  bytex   major %d,
>3  bytex   major %d,
>4  bytex   sizeof(INTVAL) == %d,
>5  byte=0  FloatType is IEEE 754
>5  byte=1  FloatType is i387 `long double'
--

> So, now we have two design decisions:-
> 1) How to store the magic "number"
> 2) What the magic "number" should be

Good questions.

Cheers,

-J

--


pgpqYYa7Uy4K7.pgp
Description: PGP signature


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-22 Thread Matt Fowles
Jonathan~

On 9/22/05, Jonathan Worthington <[EMAIL PROTECTED]> wrote:
> "Roger Browne" <[EMAIL PROTECTED]> wrote:
> > If you do tweak the signature for the packfile format, I suggest you
> > take a leaf out of the PNG specification and ensure that the signature
> > will robustly detect common errors such as byte order transpositions,
> > CRLF-to-newline mappings (e.g. when binary files are FTPd using ASCII
> > mode), etc.
> >
> > See section 12.11 of the PNG specification:
> > http://www.faqs.org/rfcs/rfc2083.html
> >
> Interesting, thanks - they make some good suggestions there.  Our current
> magic number is "13155a1" - I'm unsure of the rationale behind it, but there
> may be a reason.  If we're going to change the packfile format, we may as
> well make sure we're squeezing whatever use we can out of our magic number.
>
> "Mark A. Biggar" <[EMAIL PROTECTED]> wrote:
> > Joshua Hoblitt wrote:
> >
> >> a) live with it
> >> b) change the magic number to be two identical bytes so the byte
> >>ordering doesn't matter
> >> c) shrink the magic number to be a single byte
> >
> When I talked about doing something endian-independent, I meant something
> along the lines of store a sequence of, say, 4 bytes that will have certain
> values.  Forget reading the 4 bytes as an int at all, read it as a char[4]
> and check each element is what it should be.  Makes adding support to "file"
> easy enough, and is my preferred solution.
>
> > d) use a magic number that can also be used as the byte order indicator.
> >
> Clever, though not sure it helps with writing something to independently
> identify a Parrot packfile, if it can be one of a number of things (though I
> guess in this case, one of only two things - unless there's some insane
> ordering scheme I've not heard of).

I have seen architectures that swap byte ordering for 8 byte things
(like doubles) but not 4 byte things.  So that gives 3 options and
requires an 8 byte magic number if you want to do it that way.

Matt
--
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-Stan Kelly-Bootle, The Devil's DP Dictionary


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-22 Thread Jonathan Worthington

"Roger Browne" <[EMAIL PROTECTED]> wrote:

If you do tweak the signature for the packfile format, I suggest you
take a leaf out of the PNG specification and ensure that the signature
will robustly detect common errors such as byte order transpositions,
CRLF-to-newline mappings (e.g. when binary files are FTPd using ASCII
mode), etc.

See section 12.11 of the PNG specification:
http://www.faqs.org/rfcs/rfc2083.html

Interesting, thanks - they make some good suggestions there.  Our current 
magic number is "13155a1" - I'm unsure of the rationale behind it, but there 
may be a reason.  If we're going to change the packfile format, we may as 
well make sure we're squeezing whatever use we can out of our magic number.


"Mark A. Biggar" <[EMAIL PROTECTED]> wrote:

Joshua Hoblitt wrote:


a) live with it
b) change the magic number to be two identical bytes so the byte
   ordering doesn't matter
c) shrink the magic number to be a single byte


When I talked about doing something endian-independent, I meant something 
along the lines of store a sequence of, say, 4 bytes that will have certain 
values.  Forget reading the 4 bytes as an int at all, read it as a char[4] 
and check each element is what it should be.  Makes adding support to "file" 
easy enough, and is my preferred solution.



d) use a magic number that can also be used as the byte order indicator.

Clever, though not sure it helps with writing something to independently 
identify a Parrot packfile, if it can be one of a number of things (though I 
guess in this case, one of only two things - unless there's some insane 
ordering scheme I've not heard of).


Before rushing into fixing this, it's worth pondering why the designer of 
the packfile format might have chosen to have the magic number in native 
endian format.  All I came up with was that it was a good way of making sure 
we really had transformed the input to the correct byte ordering.  If we 
didn't find out at the magic, we probably wouldn't until we got to byte 24 - 
the directory format.


So, now we have two design decisions:-
1) How to store the magic "number"
2) What the magic "number" should be

Jonathan 



Re: [perl #17490] Magic is useless unless verifiable.

2005-09-22 Thread Mark A. Biggar

Joshua Hoblitt wrote:


a) live with it
b) change the magic number to be two identical bytes so the byte
   ordering doesn't matter
c) shrink the magic number to be a single byte


d) use a magic number that can also be used as the byte order indicator.

--
[EMAIL PROTECTED]
[EMAIL PROTECTED]


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-22 Thread Joshua Hoblitt
On Wed, Sep 21, 2005 at 11:44:17AM +0100, Jonathan Worthington wrote:
> "Joshua Hoblitt via RT" <[EMAIL PROTECTED]> wrote:
> >>[jhoblitt - Mon Sep 19 22:28:00 2005]:
> >>
> >>> [EMAIL PROTECTED] - Sun Sep 22 07:13:56 2002]:
> >>>
> >>> If you're going to check the magic after the wordsize and bytecode, you
> >>> might as well get rid of it altogether.
> >>>
> The only way we can *really* fix this is by not storing the magic number in 
> native endian form.  At the moment we have to read the byteorder before the 
> magic number so we can transform it into native form.
> 
> Of course, there's nothing to prevent us putting in a "hack" that says "is 
> this magic number OK in any of the byte orderings we support".

I was looking at adding pbc support to 'file' this morning and the only
way to handle that would be to test for both byte orderings of the magic
number.

> This is a design decision - Chip (or leo), which road should we go down? 
> Change the packfile format, or code around the current way we do it?

I agree.  Some possible options are:

a) live with it
b) change the magic number to be two identical bytes so the byte
   ordering doesn't matter
c) shrink the magic number to be a single byte

> >>The issue seems to be related to the jit core being in use.  I can't
> >>recreate it on amd64 (no jit)
> I can't see any way it could be something to do with the JIT core, or any 
> runcore.  We haven't even entered one at the point the above error is given.

Fair enough.  I should have said it's related to the '-j' flag.

> >Jonathan has volunteered to look into this.  Thanks.
> >
> I'll do what I can.

Your willingness to help is much appreciated.  

-J

--


pgpGOkZfn2cxQ.pgp
Description: PGP signature


Re: [perl #17490] Magic is useless unless verifiable.

2005-09-21 Thread chromatic
On Wed, 2005-09-21 at 11:44 +0100, Jonathan Worthington wrote:

> >> but I can cause a segfault from random input on x86.
> >>
> >> --
> >> $ ./parrot -j docs/running.pod
> >> Segmentation fault

> This is a Bad Thing and needs fixing.  I'll see what I can find - I don't 
> even see a segfault or any other error mesage under Win32, which is at least 
> as bad.

It segfaults on me in Linux.  The problem is that the JIT core always
expects there to be valid op_start and op_end members in
interpreter->code, so when there's no code there, it blindly
dereferences them.  I don't have time now to trace what the other
runcores do in that situation, but I put a couple of guards in
src/interpreter.c in init_jit() and caused different errors.

-- c



Re: [perl #17490] Magic is useless unless verifiable.

2005-09-21 Thread Roger Browne
simon:
> >> > If you're going to check the magic after the wordsize and bytecode, you
> >> > might as well get rid of it altogether.
...
Jonathan:
> ...Change the packfile format, or code around the current way

If you do tweak the signature for the packfile format, I suggest you
take a leaf out of the PNG specification and ensure that the signature
will robustly detect common errors such as byte order transpositions,
CRLF-to-newline mappings (e.g. when binary files are FTPd using ASCII
mode), etc.

See section 12.11 of the PNG specification:
http://www.faqs.org/rfcs/rfc2083.html

Regards,
Roger Browne



Re: [perl #17490] Magic is useless unless verifiable.

2005-09-21 Thread Jonathan Worthington

"Joshua Hoblitt via RT" <[EMAIL PROTECTED]> wrote:

[jhoblitt - Mon Sep 19 22:28:00 2005]:

> [EMAIL PROTECTED] - Sun Sep 22 07:13:56 2002]:
>
> The point of having a validifiable magic number at the start
> of a bytecode file is to avoid this sort of thing:
>
>  % ../../parrot -j mops.pasm
> PackFile_unpack: Unimplemented wordsize transform.
> File has wordsize: 35 (native is 4)
> Parrot VM: Can't unpack packfile mops.pasm.
>
> If you're going to check the magic after the wordsize and bytecode, you
> might as well get rid of it altogether.
>
The only way we can *really* fix this is by not storing the magic number in 
native endian form.  At the moment we have to read the byteorder before the 
magic number so we can transform it into native form.


Of course, there's nothing to prevent us putting in a "hack" that says "is 
this magic number OK in any of the byte orderings we support".


This is a design decision - Chip (or leo), which road should we go down? 
Change the packfile format, or code around the current way we do it?



The issue seems to be related to the jit core being in use.  I can't
recreate it on amd64 (no jit)
I can't see any way it could be something to do with the JIT core, or any 
runcore.  We haven't even entered one at the point the above error is given.



but I can cause a segfault from random input on x86.

--
$ ./parrot -j docs/running.pod
Segmentation fault
--

This is a Bad Thing and needs fixing.  I'll see what I can find - I don't 
even see a segfault or any other error mesage under Win32, which is at least 
as bad.



Jonathan has volunteered to look into this.  Thanks.


I'll do what I can.

Jonathan