Q: MMD and non PMC value (was: keyed vtables and mmd)

2004-05-01 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

 ... And... we
 move *all* the operator functions out of the vtable and into the MMD
 system. All of it.

This *all* includes vtable functions like add_int() or add_float() too,
I presume. For these we have left argument dispatch only. But what is
the right argument? A PerlInt, TclInt, PyInt (or ..Float)? Or is it
assumed to be the same as the left argument type?

leo


Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
 
 The bitshift operations on S-register contents are valid, so long as 
 the thing hanging off the register support it. Binary data ought 
 allow this. Most 8-bit string encodings will have to support it 
 whether it's a good idea or not, since you can do it now. If Jarkko 
 tells me you can do bitwise operations with unicode text now in Perl 
 5, well... we'll support it there, too, though we shan't like it at 
 all.

We can and I don't like it at all :-)  What they basically operate on
are the internal UTF-8 bit patterns, in other words utter crapola from
the viewpoint of traditional bit strings.  Especially fun was
getting the semantics of ~ to make any sense whatsoever.  None of it
anything I want to propagate anywhere.

 I *think* most of the variable-width encodings, and the character 
 sets that sit on top of them, can reasonably forbid this.


RE: Bit ops on strings

2004-05-01 Thread Bryan C. Warnock
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote:
 Parrot, at the very low levels, makes no distinction between strings 
 and buffers--as far as it's concerned they're the same thing, and 
 either can hang off an S register. (Ultimately, when *I* talk of 
 strings I mean A thing I can hang off an S register, though I'm in 
 danger of turning into Humpty Dumpty here) That's part of the 
 problem. There are already bitwise operations on S-register things in 
 the core, which is OK.

Ahhh, now things are beginning to make a little more sense.  Bear with
me for a question or two more.

 
 The bitshift operations on S-register contents are valid, so long as 
 the thing hanging off the register support it. Binary data ought 
 allow this. Most 8-bit string encodings will have to support it 
 whether it's a good idea or not, since you can do it now. If Jarkko 
 tells me you can do bitwise operations with unicode text now in Perl 
 5, well... we'll support it there, too, though we shan't like it at 
 all.
 
 I *think* most of the variable-width encodings, and the character 
 sets that sit on top of them, can reasonably forbid this.

mode=dave barry
  Since text strings are a proper subset of a binary buffer,
which is really what the string registers really are, what we've
logically got is this:
/mode

LAYER 1   2 3
   +-- Text Ops --- (Hosted Language)
SREG --+  
   +-- Bin Ops  --- (Hosted Language)

or maybe this:

   SREG --- Bin Ops --- (Hosted Language)
   +-- Text Ops --- (Hosted Language)

where semantics are found in Layers 2 and 3.  (Layer 3 could also be
merged.)

Now I think that's more less what Parrot has, right?  Except that the
Layer 2 semantics are tracked (and locked in?) at Layer 1?  (To prevent
the aforementioned bit-shifting of WTF strings.)


-- 
Bryan C. Warnock
bwarnock@(gtemail.net|raba.com)



RE: Bit ops on strings

2004-05-01 Thread Bryan C. Warnock
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote:
 If you want, you could think of the S-register strings as mini-PMCs. 
 The encoding and charset stuff (we'll ignore language semantics for 
 the moment) are essentially small vtables that hang off the string, 
 and whatever we do with it mostly goes through those vtable functions.

Yeah, I was thinking that perhaps all the non-buffer semantics 
should have been a PMC (that then wrapped and used the SREG with 
its byte-buffer semantics).  The PMCs would then be as lax or strict
with text semantics as it needed to be, without causing semantic
interference to different language's needs.  Everything's possible with
enough abstraction, and all that.  Slow and bulky, though.
Oh, well, two years late and a dollar short.

 
 Which sort of argues for putting the bitstring stuff in there 
 somewhere as well. (And may well argue for MMD on string operations, 
 but I think that makes my head hurt so I'm not going there righ tnow)

Good 'nuff.  Thanks,

-- 
Bryan C. Warnock
bwarnock@(gtemail.net|raba.com)



MMD syntax in PMCs (was: keyed vtables and mmd)

2004-05-01 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:
 ... We rework the current pmc
 processor to take the entries that are getting tossed and
 automatically add them to the MMD tables on PMC load instead.

I've now implemented MMD for PerlInt's bitwise_xor as a test case. Syntax
looks like this:

void bitwise_xor (PMC* value, PMC* dest) {
MMD_PerlInt: {
VTABLE_set_integer_native(INTERP, dest,
PMC_int_val(SELF) ^ PMC_int_val(value));
 }
MMD_DEFAULT: {
VTABLE_set_integer_native(INTERP, dest,
PMC_int_val(SELF) ^
VTABLE_get_integer(INTERP, value));
 }
}

This creates two functions:

Parrot_PerlInt_bitwise_xor()
Parrot_PerlInt_bitwise_xor_PerlInt()

with the body parts from above and these initializer code snippet:

{ MMD_BXOR, enum_class_PerlInt, 0,
(funcptr_t) Parrot_PerlInt_bitwise_xor },
{ MMD_BXOR, enum_class_PerlInt, enum_class_PerlInt,
(funcptr_t) Parrot_PerlInt_bitwise_xor_PerlInt }


leo


Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote:
  If Jarkko 
  tells me you can do bitwise operations with unicode text now in Perl 
  5, well... we'll support it there, too, though we shan't like it at 
  all.
 
 We can and I don't like it at all [...]
 None of it anything I want to propagate anywhere.

Please correct me if I'm wrong here, but I'm going to lay out my
understanding as a set of assertions:

  * Parrot will be able to convert any encoding to any other
encoding
  * though, some conversions will result in an exception, that's
still a defined behavior
  * We've agreed that only raw binary 8-bit strings make sense for
bit vector operations

So it seems to me that the obvious way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.

This means that UTF-8 strings will be handled just fine, and (as I
understand it) some subset of Unicode-at-large will be handled as well.
In other-words, the burden goes on the conversion functions, not on the
bit ops.

It's not that it's going to be meaningful in the general case, but if
you have code like:

sub foo() { return \x01+|\x02 }

I would expect the get the bit-string, \x03 back even though strings
may default to Unicode in Perl 6.

You could put this on the shoulders of the client language (by saying
that the operands must be pre-converted, but that seems to be contrary
to Parrot's usual MO.

Let me know. I'm happy to do it either way, and I'll look at modifying
the other bit-string operators if they don't conform to the decision.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
 
 So it seems to me that the obvious way to go is to have all bit-s
 operations first convert to raw bytes (possibly throwing an exception)
 and then proceed to do their work.

If these conversions croak if there are code points beyond \x{ff}, I'm
fine with it.  But trying to mix \x{100} or higher just leads into silly
discontinuities (basically we would need to decide on a word width, and
I think that would be a silly move).

 This means that UTF-8 strings will be handled just fine, and (as I

Please don't mix encodings and code points.  That strings might be
serialized or stored as UTF-8 should have no consequence with bitops.

 understand it) some subset of Unicode-at-large will be handled as well.
 In other-words, the burden goes on the conversion functions, not on the
 bit ops.
 
 It's not that it's going to be meaningful in the general case, but if

I'd rather have meaningful results.

 you have code like:
 
   sub foo() { return \x01+|\x02 }

Please consider what happens when the operands have code points beyond 0xff.

 I would expect the get the bit-string, \x03 back even though strings
 may default to Unicode in Perl 6.

Of course.  But I would expect a horrible flaming death for
\x{100}|+\x02.

 You could put this on the shoulders of the client language (by saying
 that the operands must be pre-converted, but that seems to be contrary
 to Parrot's usual MO.
 
 Let me know. I'm happy to do it either way, and I'll look at modifying
 the other bit-string operators if they don't conform to the decision.
 


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen


Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 11:26, Jarkko Hietaniemi wrote:

As for codepoints outside of \x00-\xff, I vote exception. I don't think
there's any other logical choice, but I think it's just an encoding
conversion exception, not a special bit-op exception (that's arm-waving,
I have not looked at Parrot's exception model yet... miles to go...)

  This means that UTF-8 strings will be handled just fine, and (as I
 
 Please don't mix encodings and code points.  That strings might be
 serialized or stored as UTF-8 should have no consequence with bitops.

What I meant was that UTF-8 IS going to be represented in a way that
will guarantee you won't get an exception when trying to do bit-ops. All
bets are off for many other encodings. While you're right that you might
get lucky, that wasn't really the point I was making. Many languages
(Perl included, I think) are going to encode strings as UTF-8 by
default, and this means that in the general case, we should not expect
exceptions to be thrown around any time we do a bit-op and 'A'|'B' will
still be 'C' :-)

 Of course.  But I would expect a horrible flaming death for
 \x{100}|+\x02.

Well, if you consider a string conversion exception to be horrible
flaming death, then I hate to see what you do with a divide-by-zero ;-)

None of your response sounds overly scary to me, so I'll start looking
at what Parrot does NOW for bit-string-ops and see if it needs to mutate
to fit this model. Then I'll add in the rest. Then I get to see what
evil Dan and Leo perform upon my patch ;-)
 
-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: MMD performance

2004-05-01 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:

[ another MMD performance compare ]

Just an update. Last benchmark still called MMD via the vtable. Here is
now a compare of calling MMD from the run loop:

$ parrot -C mmd-bench.imc
vtbl add  PerlInt PerlInt 1.072931
vtbl add  PerlInt Integer 1.085116
MMD  bxor PerlInt PerlInt 0.849723
MMD  bxor PerlInt Integer 0.989387

$ parrot -j mmd-bench.imc
vtbl add  PerlInt PerlInt 0.685505
vtbl add  PerlInt Integer 0.692237
MMD  bxor PerlInt PerlInt 0.628078
MMD  bxor PerlInt Integer 0.790955

JITed vtable add calls directly into the vtable, while the MMD bxor is
still a function that calls mmd_dispatch.

Compiled with -O3, 5 Meg operations on Athlon 800.

leo


Re: Bit ops on strings

2004-05-01 Thread Jeff Clites
On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:

So it seems to me that the obvious way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.
If these conversions croak if there are code points beyond \x{ff}, I'm
fine with it.  But trying to mix \x{100} or higher just leads into 
silly
discontinuities (basically we would need to decide on a word width, and
I think that would be a silly move).
Just FYI, the way I implemented bitwise-not so far, was to bitwise-not 
code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as 
uint16-sized things, and  0x{} as uint32-sized things (but then 
bit-masking them with 0xF to make sure that they fell into a valid 
code point range). That's pretty arbitrary, but if you bitwise-not as 
though everything were 32-bits wide, you'll end up with a string 
containing no assigned code points at all (they'll all be  0x10F). 
But from a text point of view, bitwise-not on a string isn't a sensible 
operation no matter how you slice it (that is, even for 0x{00}-0x{FF}), 
so one flavor of arbitrary is just about as good as any other. We could 
also make anything  0x{FF} map to either 0x{00} or 0x{FF}, or mask if 
with 0xFF to push it into that range. It's all pretty meaningless, as 
text transformations go, and I can't imagine anyone using it for 
anything, except maybe weak encryption.

This means that UTF-8 strings will be handled just fine, and (as I
Please don't mix encodings and code points.  That strings might be
serialized or stored as UTF-8 should have no consequence with bitops.
Exactly. And also realize that if you bitwise-not (or shift or 
something similar) the bytes of a UTF-8 serialization of something, the 
result isn't going to be valid UTF-8, so you'd be hard-pressed to lay 
text semantics down on top of it.

understand it) some subset of Unicode-at-large will be handled as 
well.
In other-words, the burden goes on the conversion functions, not on 
the
bit ops.

It's not that it's going to be meaningful in the general case, but if
I'd rather have meaningful results.
Exactly--and, meaningful operations to begin with.

I'm beginning to wonder if we're going to be square-rooting strings, 
and taking the array-th root of a hash :)

JEff



Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
 On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:

 Just FYI, the way I implemented bitwise-not so far, was to bitwise-not 
 code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as 
 uint16-sized things, and  0x{} as uint32-sized things (but then 
 bit-masking them with 0xF to make sure that they fell into a valid 
 code point range). That's pretty arbitrary, but if you bitwise-not as 
 though everything were 32-bits wide, you'll end up with a string 
 containing no assigned code points at all (they'll all be  0x10F). 
 But from a text point of view, bitwise-not on a string isn't a sensible 
 operation no matter how you slice it (that is, even for 0x{00}-0x{FF}), 
 so one flavor of arbitrary is just about as good as any other. We could 
 also make anything  0x{FF} map to either 0x{00} or 0x{FF}, or mask if 
 with 0xFF to push it into that range. It's all pretty meaningless, as 
 text transformations go, and I can't imagine anyone using it for 
 anything, except maybe weak encryption.

I think Dan and I were both thinking in terms of bit-vector operations
on byte-streams for any purpose that would require such a beast. In
Perl, you have the vec function to make this slightly easier.

This is one of those places where thinking about strings as text is
highly misleading. They're used for an awful lot more.

 Exactly. And also realize that if you bitwise-not (or shift or 
 something similar) the bytes of a UTF-8 serialization of something, the 
 result isn't going to be valid UTF-8, so you'd be hard-pressed to lay 
 text semantics down on top of it.

How are you defining valid UTF-8? Is there a codepoint in UTF-8
between \x00 and \xff that isn't valid? Is there a reason to ever do
bitwise operations on anything other than 8-bit codepoints?

 I'm beginning to wonder if we're going to be square-rooting strings, 
 and taking the array-th root of a hash :)

Strings are not numbers, but there's a heck of a lot of code out there
that treats existing strings as bit-vectors (note: bit vectors are not
numbers either), and that code needs to be supported, no?

Now, shift operations aren't usually part of the package, but I figured
that as long as we were going to have the rest of the bit-manipulators,
finishing off the set would be of value.

More to the point, I said all of this at the beginning of this thread.
You should not, at this point, be confused about the scope of what I
want to do, as it was very narrowly and clearly defined up-front.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
 How are you defining valid UTF-8? Is there a codepoint in UTF-8
 between \x00 and \xff that isn't valid? Is there a reason to ever do

Like, half of them?  \x80 .. \xff are all invalid as UTF-8.

 bitwise operations on anything other than 8-bit codepoints?

I am very confused.  THIS IS WHAT WE ALL SEEM TO BE SAYING.  BITOPS ONLY
ON EIGHT-BIT DATA.  AM I WRONG?

 


Re: Bit ops on strings

2004-05-01 Thread Jeff Clites
On May 1, 2004, at 12:00 PM, Aaron Sherman wrote:

On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:

Exactly. And also realize that if you bitwise-not (or shift or
something similar) the bytes of a UTF-8 serialization of something, 
the
result isn't going to be valid UTF-8, so you'd be hard-pressed to lay
text semantics down on top of it.
How are you defining valid UTF-8? Is there a codepoint in UTF-8
between \x00 and \xff that isn't valid? Is there a reason to ever do
bitwise operations on anything other than 8-bit codepoints?
If you're dealing in terms of code points, then the UTF-8 encoding (or 
any other) has nothing to do with it.

If you are dealing in terms of bytes, then there are bytes sequences 
which don't encode any code point in the UTF-8 encoding. By valid 
UTF-8, I'm referring to the definition of that encoding (and I should 
have said, well-formed)--see section 3.9, item D36 of the Unicode 
Standard. In particular, bytes 0xC0, 0xC1, and 0xF5-0xFF cannot occur 
in UTF-8.

But if you're speaking in terms of code points, that's not relevant, 
but then neither is the encoding.

More to the point, I said all of this at the beginning of this thread.
You should not, at this point, be confused about the scope of what I
want to do, as it was very narrowly and clearly defined up-front.
And yet, I am confused. You said near the beginning of the thread:

On Fri, 2004-04-30 at 10:42, Dan Sugalski wrote:

Bitstring operations ought only be valid on binary data, though,
unless someone can give me a good reason why we ought to allow
bitshifting on Unicode. (And then give me a reasoned argument *how*,
too)
100% agree. If you want to play games with any other encoding, you may
proceed to write your own damn code ;-)
Given that, I'm not sure how UTF-8 is coming into the picture.

JEff



[perl #29299] [PATCH] MSWin32 Fix spawn stdout handling and PerlNum.get_string()

2004-05-01 Thread via RT
# New Ticket Created by  Ron Blaschke 
# Please include the string:  [perl #29299]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=29299 


spawn on win32 should inherit the filehandles to the child process,
because the child is supposed to write on the parents stdout.
(t/pmc/sys.t#1)

config/gen/platform/win32/exec.c


PerlNum.get_string() should print -0.00 for the value -0.0, but
prints 0.00 on win32.  get_string() now prints the sign symbol
itself, instead of relying on sn?printf.
(t/pmc/perlnum.t#36)

classes/perlnum.pmc


mswin32_spawn_and_perlnum.patch
Description: Binary data


Re: Win32 build fails on src\interpreter.str

2004-05-01 Thread Ron Blaschke
On Tue, 27 Apr 2004 10:09:43 +0200, Leopold Toetsch wrote:

 Does anyone need the Edit and Continue feature?
 If yes, it can be easily turned on in the local Makefile.

Just a final remark that just popped up: Since parrot doesn't compile with
-ZI (because of __LINE__), it would make little sense to enable it in the
Makefile.  For me that's ok, as I have never used it anyway. ;-)

Ron


[perl #29300] [PATCH] MSWin32 passing libnci.def to linker

2004-05-01 Thread via RT
# New Ticket Created by  Ron Blaschke 
# Please include the string:  [perl #29300]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=29300 


link needs to be told that libnci.def is a module definition file, via
the -def: flag.  The patch changes libnci.def to -def:libnci.def.

config/gen/makefiles/root.in


mswin32_libnci_flag.patch
Description: Binary data


[perl #29302] [PATCH] Invalid HTML doc links for Win32 Firefox

2004-05-01 Thread via RT
# New Ticket Created by  Philip Taylor 
# Please include the string:  [perl #29302]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=29302 


On a Windows system, File::Spec returns paths with backslashes. The 
HTML documentation generator (write_docs.pl etc) uses these paths in 
the HTML code, resulting in links like
a href= docs\pdds\pdd00_pdd.pod.html.../a

IE handles these with no problems. Firefox (0.8) follows the link to 
the right place, but then refers to itself as something like
file:///e:/parrot/cvs/parrot/docs/html/docs%5Cpdds%5Cpdd00_pdd.pod.html
(apparently forgetting that it used to think \ was a path delimiter 
and now considering it part of the filename) and so all the relative 
links in that page, like
a href=..\..\../html/index.htmlContents/a
(as well as all the images and stylesheets) are incorrect.

All appears to work (in IE and Firefox) after changing 
relative_path() in lib/Parrot/IO/Directory.pm to replace backslashes 
with forward-slashes before returning. (The same can be achieved by 
altering the two link-generating bits in lib/Parrot/Docs/Item.pm, but 
I have no idea whether that would be a better place to do it.)

-- 
Philip Taylor
[EMAIL PROTECTED]


Index: parrot/lib/Parrot/IO/Directory.pm
===
RCS file: /cvs/public/parrot/lib/Parrot/IO/Directory.pm,v
retrieving revision 1.9
diff -u -b -r1.9 Directory.pm
--- parrot/lib/Parrot/IO/Directory.pm   27 Mar 2004 22:22:54 -  1.9
+++ parrot/lib/Parrot/IO/Directory.pm   1 May 2004 17:00:40 -
@@ -161,7 +161,9 @@

$path = $path-path if ref $path;

-   return File::Spec-abs2rel($path, $self-path);
+   my $rel_path = File::Spec-abs2rel($path, $self-path);
+   $rel_path =~ tr~\\~/~;
+   return $rel_path;
 }
 
 =item Cparent()


Re: Outstanding parrot issues?

2004-05-01 Thread Arthur Bergman
On 30 Apr 2004, at 12:54, Leopold Toetsch wrote:


... Would it be possible for parrot to
provide an embedder's interface to all the (exported) functions that
checks whether the stack top pointer is set, and if not (ie NULL) it
pulls the address of a local variable in it
This doesn't work:

  {
PMC *x = pmc_new(..);
{
   some_parrot_func();
}
  }
Cx would be outside of the visible range of stack items. The braces 
do
of course indicate stack frames.
Since in this case I am outside or parrot and have chosen to use the 
interface, i better use register_pmc and if I did, then this sceme 
would work?



Arthur




Re: Outstanding parrot issues?

2004-05-01 Thread Arthur Bergman
On 30 Apr 2004, at 19:30, Leopold Toetsch wrote:

Like it or not DOD/GC has different impacts on the embedder. Above 
rules
are simple. There is no when the PMC isn't used any more decrement a
refcount and when you do that and that then icnrement a refcount or
some such like in XS. This is really simple. Simplest is to just set 
the
top of stack.

I am now going to be impolite.

THERE ARE CASES WHERE YOU CAN NOT SET A TOP OF STACK, FOR EXAMPLE IF 
YOU ARE WRITING A PLUGIN TO A BINARY ONLY APPLICATION LIKE INTERNET 
EXPLORER OR WRITING AN APACHE2 SHARED LIBRARY THAT IS SUPPOSED TO WORK 
WITH PRE COMPILED BINARIES, NOT TO MENTION A LOT OF APPLICATIONS THAT 
MIGHT WANT TO EMBED PARROT AS AN OPTION MIGHT FEEL IT IS A TAD FUCKING 
UNCLEAN TO RUN THEIR ENTIRE APPLICATION THROUGH PARROT (THINK 
OPENOFFICE)

I am amazed by the fact that parrot seems determined to redo the same 
misstakes perl5 did.

Arthur



Re: Outstanding parrot issues?

2004-05-01 Thread Brent 'Dax' Royal-Gordon
Arthur Bergman wrote:
I am now going to be impolite.
Meh...

Leo: There are some embedding applications where it's simply not 
possible to get the top of the stack.  For example, let's say I want to 
write a Parrot::Interp module for Perl 5 (on a non-Ponie core):

my $i=new Parrot::Interp;
my $argv=$i-new_pmc('PerlArray');
$argv-push($i-new_pmc('PerlString')-set_string('foo'));
$i-load_bytecode(foo.pbc);
$i-run_bytecode($argv);
Now, theoretically Parrot::Interp::new should capture the top of the C 
stack, but there's no way it could do so.  If it captured an auto 
variable in its own body, that variable might not even be part of the 
stack by the time run_bytecode is invoked.

Having said that, the PMC registration technique ought to be good enough 
for this particular application.

Arthur: Embedding Parrot will never be quite as simple conceptually as 
embedding Perl.  The garbage collection system ensures that.  Even so, 
there does need to be a way to embed Parrot without having it take over 
your program--and it appears that PMC registration and other alternative 
methods of dealing with the GC will do that.  There's no need to disable 
the GC outside of a runloop, and in fact I could easily imagine someone 
using Parrot buffers and the GC system without using the runloop itself 
as a convenient memory management system for an application otherwise 
written in straight C.  (Not to mention that Parrot I/O and strings 
should be a lot nicer than the straight C equivalents...)

Parrot must be embeddable in virtually any environment Perl can be. 
That doesn't mean it has to be as easy, but it has to be possible.  If 
it isn't, we might as well give up on the embedding interface altogether.

--
Brent Dax Royal-Gordon [EMAIL PROTECTED]
Perl and Parrot hacker
Oceania has always been at war with Eastasia.


Re: Bit ops on strings

2004-05-01 Thread Andrew E Switala
It's been said that what the masses think of as binary data is outside
the concept of a string, and this lurker just don't see that.  A binary
string is string over a character set of size two, just like an ASCII
string is a string over a character set of size 128.  [Like character
strings, so-called binary data can even have different encodings besides
the usual eight bits packed into a byte, e.g. Base64 or 7E1 (7 bits even
parity 1 stop bit).]  And shifting is not at all limited to bit strings.
 If I have the bit string of length 5 (1, 0, 0, 0, 1) or 17 for short,
and the text string Hello then I can shift the second left by two to
get llo   as easily as I can shift the first left by two to get 4.
(The choice of fill character is of course up for debate.) Or
arithmetically shift the second right by two to form HHHel analagous
to an ASR of the bit string to yield 28. And, xor, etc. are less
obvious because there are multiple ways to define such operations when
there are more than two truth values. But they can, at any rate, be
defined: you can apply a function of two arguments element-wise to two
character strings of equal length to produce a third character string of
the same length. It seems right to leave these ops undefined by default
for non-binary strings (since there is definitely no one right
definition), but the prevailing notion that they *can't* be applied to,
say, Unicode text without making a horrible mess is just wrong.

 Dan Sugalski [EMAIL PROTECTED] 04/30/04 10:25 PM 
At 7:07 PM -0700 4/30/04, Jeff Clites wrote:
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote:

At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something 
of a string type as raw binary data and vice versa, but don't mix 
binary data with strings: they are completely different types, and 
raw binary data should never be able to be put into a string 
register.  Maybe some blurring of binary data/strings should 
happen at the Perl layer, but Parrot should keep them as distinct 
as possible, IMHO.

I'm trying to make sure that keeping them separate is possible, but 
it's important for everyone to remember that we're limited in what 
we can do.

Parrot *can't* dictate semantics. That's not what we get to do.

But your plan seems to be very much dictating semantics--treating a 
whole class of reasonable string operations as in that case, punt 
and throw an exception.

That's why it's overridable. I fully expect most languages will do so 
by default, but the option to leave the exceptions on as a debugging 
aid.

  And it's not clear that the semantics it is dictating in fact match 
any of the target languages (or in fact, any existing language at 
all). The at-runtime association of character set/encoding/language, 
and the semantics it implies, is what I'm referring to here.

Yep, but with the exceptions disabled things'll act the way they should.
-- 
 Dan

--it's like
this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote:
  How are you defining valid UTF-8? Is there a codepoint in UTF-8
  between \x00 and \xff that isn't valid? Is there a reason to ever do
 
 Like, half of them?  \x80 .. \xff are all invalid as UTF-8.

Heh, damn Ken Thompson and his placemat!

I am too new to UCS and UTF-8, and had thought it was always 8-bit. I
stand corrected, having read up on the UTF-8 and Unicode FAQ.

Jeff, yeah I have to take back my statement. If Perl defaults to UTF-8,
then it's not a valid assumption that a UTF-8 input string won't throw
an exception. I still think that's ok, and better than
representation-expanding to the larger representation and doing the
bit-op in that, since that  means that bit-vectors would have to be
valid in enum_stringrep_one, _two and _four as sort of alternate
datastructures. I don't think we want to go there.

For everything else, as Jeff correctly points out, this has nothing to
do with encoding. Only in the sense that default encoding in a language
like (only one example) Perl 6 dictates what representation you will
have to expect to be the common case.

  bitwise operations on anything other than 8-bit codepoints?
 
 I am very confused.  THIS IS WHAT WE ALL SEEM TO BE SAYING.  BITOPS ONLY
 ON EIGHT-BIT DATA.  AM I WRONG?

No, it's not, and could you please not get emotional about this? It's
what you, Dan and I have been saying, but I was responding to Jeff who
said:

Just FYI, the way I implemented bitwise-not so far, was to
bitwise-not code points 0x{00}-0x{FF} as uint8-sized things,
0x{100}-0x{} as uint16-sized things, and  0x{} as
uint32-sized things (but then bit-masking them with 0xF to
make sure that they fell into a valid code point range).

It was kind of important that I deal with the fact that I was proposing
a very different behavior for bit-shifting than exists currently for
boolean operations, I thought.

The question becomes should I CHANGE the existing bit-ops so that they
don't work on representations in two or four bytes for symmetry?

If this continues to be so contentious, I'm tempted to agree with the
nay-sayers and say that Parrot shouldn't do bit-vectors on strings, and
we should just implement a bit-vector class later on. Perl will just
have to suffer the overhead of translation. This just IS NOT important
enough to waste this many brain cells on.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Strings Manifesto

2004-05-01 Thread Jeff Clites
[Finishing this discussion on p6i, since it began here.]

On Apr 28, 2004, at 5:05 PM, Larry Wall wrote:

On Wed, Apr 28, 2004 at 03:30:07PM -0700, Jeff Clites wrote:
: Outside. Conceptually, JPEG isn't a string any more than an XML
: document is an MP3.
I'm not vehemently opposed to redefining the meaning of string
this way, but I would like to point out that the term used to have
a more general meaning.  Witness terms like bit string.
Good point. However, the more general usage seems to have largely 
fallen out of use (to the extent to which I'd forgotten about it until 
now). For instance, the Java String class lacks this generality. 
Additionally, ObjC's NSString and (from what I can tell) Python and 
Ruby conceive of strings as textual.

[And of course, it would be permissible in terms of English usage to 
say that a bit string isn't a string, much like a fire house isn't a 
house, and a suspected criminal isn't necessarily a criminal, and 
melted ice isn't ice.]

: Some languages make this very clear by providing a separate data type
: to hold a blob of bytes. Java uses a byte[] for this (an array of
: bytes), rather than a String. And Objective-C (via the Foundation
: framework) has an NSData class for this (whereas strings are
: represented via NSString).
Another approach is to say that (in general) strings are sequences
of abstract integers, and byte strings (and their ilk) impose size
constraint, while text strings impose various semantic constraints.
This is more in line with the historical usage of string.
Yes, though I think that this diverges from current usage (in general 
programming contexts), and more importantly promotes the confusion that 
text is inherently byte-based (or even, semantically number-based). 
The parenthesized point there is that a representation of text a 
sequence of numbers is an implementation detail--it's not inherent in 
the notion of text. The semantics of text do not imply that it is a 
semantic constraint layered on top of a sequence of numbers. In the 
vein of the Perl philosophy of making different things look different, 
I think it's important to linguistically distinguish between the two. 
Many programming languages do that, and users of those languages suffer 
less confusion in this area.

The key point is that text and uninterpreted byte sequences are 
semantically oceans apart. I'd say that as data types, byte sequences 
are semantically much simpler than hashes (for instance), and 
strings-as-text are much more complex. It makes little sense to 
bitwise-not text, or to uppercase bytes.

: (And it implies that you can uppercase a JPEG, for instance).
: Only some encodings let you get away with this--for example, not 
every
: byte sequence is valid UTF-8, so an arbitrary byte blob likely 
wouldn't
: decode if you tried to pretend that it was the UTF-8-encoded version 
of
: something. The major practical downside of doing something like this 
is
: that it leads to confusion, and propagates the viewpoint that a 
string
: is just a blob of bytes. And the conceptual downside is that if a
: string is fundamentally intended to represent textual data, then it
: doesn't make much sense to use it to represent something non-textual.

I think of a string as a fundamental data type that can be *used* to
represent text when properly typed.  But strings are more fundamental
than text--you can have a string of tokens, for instance.  Just because
various string types were confused in the past is no reason to settle
on a single string type as the only true string.  If you can do it,
fine, but you'll have to come up with a substitute name for the more
general concept, or you're going to be fighting the culture continually
from here on out.  I don't like culture wars...
I think the more general concept is array.

The major problem with using string for the more general concept is 
confusion. People do tend to get really confused here. If you define 
string of blahs to mean sequence of blahs (to match the historical 
usage), that's on its face reasonable. But people jump to the 
conclusion that a string-as-bytes is re-interpretable as a 
string-as-text (and vice-versa) via something like a cast--a 
reinterpretation of the bytes of some in-memory representation. As a 
general sequence, one wouldn't be tempted to think that a 
string-of-quaternions was necessarily re-interpretable as a 
string-of-PurchaseOrders. I don't think it's culturally possible to 
shake this view of text-is-really-just bytes without using distinct 
terminology.

I'm not vehemently opposed to jettisoning the word string entirely, 
and instead using Text and Sequence for the above concepts--that's 
the usual way to deal with an ambiguous term. But the downside is that 
it forms a learning barrier for people coming from other languages. I 
think that string meaning text, and array meaning general sequence 
would be the most consistent with current general usage. But my main 
concern is that we distinguish 

Re: Outstanding parrot issues?

2004-05-01 Thread Leopold Toetsch
Arthur Bergman [EMAIL PROTECTED] wrote:

 THERE ARE CASES

Arthur, please let's quietly talk about possible issues.

Many libraries that you want to use, demand that you call
The_lib_init(bla). This isn't inappropriate, it's a rule. (dot).
Parrot is GC based. (dot).

This imposes different semantics for embedders. I've listed four
different very simple ways to not get your PMC collected to early.

GC and refcounting are different schemes to achieve the same thing. You
know that. But nethertheless you have to follow these GC specific rules.

leo


Re: Outstanding parrot issues?

2004-05-01 Thread Leopold Toetsch
Brent 'Dax' Royal-Gordon [EMAIL PROTECTED] wrote:
 Arthur Bergman wrote:
 I am now going to be impolite.

 Meh...

 Leo: There are some embedding applications where it's simply not
 possible to get the top of the stack.

Not possible, or some of ... just don't like that ;)

 write a Parrot::Interp module for Perl 5 (on a non-Ponie core):

   my $i=new Parrot::Interp;
   my $argv=$i-new_pmc('PerlArray');

If there is such an interface, it's responsible for anchoring the PMC.
This is *one* simple rule.

Shit: really. I dont't get it. Please read (again):

$ perldoc perlguts
/increment
... and does not increment
... do not increment the reference count
... As a side effect, it increments
... has been incremented to two.

...If it is not the same as the sv
   argument, the reference count of the obj object is
   incremented.  If it is the same, or if the how argument is
   PERL_MAGIC_arylen, or if it is a NULL pointer, then obj is
   merely stored, without the reference count being
   incremented.

*That could make me cry*

Think different,
have fun,

leo


Re: Outstanding parrot issues?

2004-05-01 Thread Arthur Bergman
On 2 May 2004, at 00:20, Leopold Toetsch wrote:

Arthur Bergman [EMAIL PROTECTED] wrote:

THERE ARE CASES
Arthur, please let's quietly talk about possible issues.

Many libraries that you want to use, demand that you call
The_lib_init(bla). This isn't inappropriate, it's a rule. (dot).
Parrot is GC based. (dot).
Yes, but they don't demand that at the top level, by demanding that at 
a top level you cut out all non opensource applications with a plugin 
based API, if this is your goal then I am going to stop playing right 
now.

This imposes different semantics for embedders. I've listed four
different very simple ways to not get your PMC collected to early.
GC and refcounting are different schemes to achieve the same thing. You
know that. But nethertheless you have to follow these GC specific 
rules.

Leo, I am not an idiot, please do not treat me like one. I fail to see 
how the register/unregister PMC issue is semantically different from a 
reference count.

All I want to do is.

1) create a parrot interpreter
2) create some pmcs
3) call some code inside parrot with those pmcs
Now I am fine registering those PMCs that I create and unregister them 
afterwards, but inside the call to parrot everything should behave as 
normal! Currently there is no easy way to do this. The obvious answer 
seems to be to have the embedding interface set the top of stack in 
each embedding function if it is not set. This would do the right thing 
and make it easy to embed parrot.

Arthur