Re: [PATCH] The Code Police [1/

2001-12-30 Thread Bart Lateur

On Sun, 30 Dec 2001 16:11:35 -0800 (PST), Boris Tschirschwitz wrote:

>Yeah,
>
>int *num;
>
>is customary in C, but for some reason C++ people like to write
>
>int* num;
>
>I am sure I saw some rationale for that in gcc's C++ part, but I can't
>find it anymore. Apparently C programmers do not fall for that.

It declares a pointer, not an integer?

FWIW, it's one of the reasons I really really don't like C at all.

The other main reason is the mess you likely get when trying to write
cross-platform code. Just watch these perl6 mailing lists for a while,
and you'll see that it really is a major problem.

-- 
Bart.



Re: Moving string -> number conversions to string libs

2001-12-06 Thread Bart Lateur

On Thu, 06 Dec 2001 00:16:34 GMT, Tom Hughes wrote:

>So far I have added as is_digit() call to the character type layer
>to replace the existing isdigit() calls.

There seems to be an overlap with the /\d/ character class in regexes.
Can't you use the same test? Can't you use the definition of that
character class, whatever form it may be in?

-- 
Bart.



Re: Parrot FAQ

2001-12-05 Thread Bart Lateur

On Wed, 05 Dec 2001 13:32:32 -0500, Dan Sugalski wrote:

>Right, but FORTH's not an interpreted language, generally speaking.

The old FORTH's in the 80's worked pretty much like the p-copde
interpreter.

Nowadays, FORTH compilers are really optimizing compilers. There are
excellent commercial offerings, like VFX from MPE (MPE's Stephen Pelc
showed some examples in comp.lang.forth of what machine code got
generated from some FORTH code, and it was really cool -- but I can't
find it back with groups.google.com, as Stephen Pelc advertises for VFX
in every single post. I did manage to find something, see link below),
and some less ambitious free FORTHS, like iFORTH (if I'm not mistaking)
and BigFORTH. The latter has a really simple optimizer consisting of 3
screens (1 screen = 1k) of source code.

  



-- 
Bart.



Re: Parrot FAQ

2001-12-05 Thread Bart Lateur

On Tue, 04 Dec 2001 15:57:56 -0500, Dan Sugalski wrote:

>Q: Don't you know that stack machines are the way to go in software?
>A: No, in fact, I don't.
>
>Q: But look at all the successful stack-based VMs!
>A: Like what? There's just the JVM.
>
>Q: What about all the others?
>A: *What* others? That's it, unless you count perl, python, or ruby.

I thought Pascal's (ancient) p-code was a stack VM... Yup, some web
pages that I can find in a hurry, confirm that.



-- 
Bart.



Re: Benchmarking the proposed RE engine

2001-11-26 Thread Bart Lateur

On Sun, 25 Nov 2001 19:34:15 -0800, Brent Dax wrote:

>Perl 5's REs will always appear faster because Perl 5 has an
>intelligent, optimizing regex compiler.  For example, take the following
>simple regex:
>
>   /a+bc+/
>
>pregcomp will optimize that by searching for a 'b' and working outwards
>both ways from there.  (Actually, it might search for 'abc' and work
>from there; I'm not really sure.)  Without considering pregcomp's
>optimizations, that RE is pretty easy to write in Parrot:
>
>RE_0:
>   #/a+bc+/
>   rx_minlength 3
>   branch $start
>$advance:
>   rx_advance $fail
>$start:
>   rx_literal P0, "a", $advance
>$a_loop:
>   rx_literal P0, "a", $b
>   rx_pushindex P0
>   branch $a_loop
>$a_back:
>   rx_popindex P0, $advance
>$b:
>   rx_literal P0, "bc", $a_back
>$c_loop:
>   rx_literal P0, "c", $succeed
>   branch $c_loop
>$succeed:
>   rx_succeed P0
>$fail:
>   rx_fail P0

Before we go down that road for good, may I draw your attention to the
principle of a regex matcher that appeared in an article in DDJ a few
years ago? The name was "Grouse Grep", it has a website
(), but I think (I'm not absolutely
sure) that the latest version has stepped away from the principle that
made the original interesting.

That principle was, in a nutshell: implement each character class as a
jump table.

I guess that is unfeasable for Unicode, but for 8 bit characters it's
easy to do.

What you need is a table of 256 entries for each state, and using the
next character in the input stream, you simply look up the address of
the next state in the lookup table. Yes, that means that you *always*
use character classes, even if just to match one literal character.

The original used i386 machine code to do it, (I think it was a
combination of LODSB to fetch the byte, and XLAT to look up the lowest
byte of the address in the jump table), but I would think that jumping
through a lookup table should be fairly easy to implement on top of
Parrot VM instructions.

The DDJ article appeared in november 1997, but it looks like it's not
online. (The table of contents for that issue is at
)

(Grouse Grep 2 appears to be released under the GNU license, but I
wouldn't *use* the code, only re-implement the idea.)

-- 
Bart.



Re: PMC Classes Preprocessor

2001-11-25 Thread Bart Lateur

On Sun, 25 Nov 2001 13:14:22 +, Simon Cozens wrote:

>On Sun, Nov 25, 2001 at 02:32:34AM -0500, Angel Faus wrote:
>> use Text::Balanced 'extract_bracketed';
>
>Urgh.  We need to work around this.

Can somebody fill me in exactly how this is supposed to behave?

I think that this may come close:

$_ = "  {  a { b } c { d } } { e { f } g } ";

print "Extracted: '". extract_balanced($_) . "'\n";
print "Remains: '$_'\n";

sub extract_balanced {
my $balance = 0;
for(shift) {
s/^\s+//;
/^\{/ or die "bad block open";
while(/(\{)|(\})/g) {
if($1) {
$balance++;
} else { # $2
--$balance or return substr($_, 0, pos, ""); 
}
}
die "Badly balanced" if $balance;
}
}

This prints:

Extracted: '{  a { b } c { d } }'
Remains: ' { e { f } g } '

(p.s. for various reasons I am not capable of submitting an actual
patch)

-- 
Bart.



Re: sizeof(INTVAL), sizeof(void*), sizeof(opcode_t)

2001-11-23 Thread Bart Lateur

On Wed, 21 Nov 2001 13:46:09 -0500, Dan Sugalski wrote:

>Nah, using an I register as a host-machine-address for jumps doesn't argue 
>for sizeof(INTVAL) >= sizeof(void *). Instead, it argues that the design 
>that uses an int as an absolute address is wrong.
>
>I'm going to rewrite the docs and ops to use a S register instead. Now all 
>I need to do is figure out something to make S stand for that encompasses 
>both uses. (Buffer pointer and generic pointer)

That sounds equally bad. This opens the door into jumping into user data
as if it was code. Plus, will your code be garbage collected too?

-- 
Bart.



Re: Revamping the build system

2001-10-23 Thread Bart Lateur

On Tue, 23 Oct 2001 08:39:29 -0400, John Siracusa wrote:

>As one of the few rabid Mac users on this list, let me just say that I
>personally have no problem with classic Mac OS support being totally dropped
>from Parrot if it'll get stuff out the door sooner :)  Classic Mac OS is
>(somewhat sadly) a dead OS at this point.  By the time Parrot is "done",
>Apple will probably be shipping hardware that won't even *boot* classic Mac
>OS outside of a virtual machine in OS X.

I disagree. OS X is but slowly catching on. You may drop 68k support if
you want, but please don't drop MacOS 8.x/9.x for PPC. Those Macs aren't
dead yet, and most of them will never be "upgraded" to OS X.

But I am not happy of having to use a proprietary mechanism for building
things.

-- 
Bart.



Re: string weirdness

2001-10-16 Thread Bart Lateur

On Mon, 15 Oct 2001 22:12:58 -0400 (EDT), Dan Sugalski wrote:

>>doing:
>>   save  S0
>>   restore S1
>>
>>(since there's no set S1,S0)
>>
>>binds the registers together, so a change to one is a change to
>>both...which doesn't happen on int registers.

>Right. Save on a string register pushes the pointer to the string
>structure in the register onto the stack. The same thing happens with
>PMCs, or will when they're implemented.
>
>The assumption is that, when you push a register onto the stack, you'll
>then stomp on the contents of the register. (Rather than what the register
>points to...) Otherwise a push would need to create a copy of the string
>structure and a copy of the string contents.

Aren't you the guy who kept shouting "Copy on Write! Copy on Write!" all
the time? ;-)

Of course, there's a level at which this also must be implemented, and
likely the level has become just too low to do something still as
magical at this time. Perhaps it is time just to implement this copy on
write scheme, right here.

Perhaps push on a string register could be the ticket: if you do push on
a string register, you're going to modify its contents, and you still
want to hang on to the register. So it might be the perfect time to copy
the string.

Or you need a "string clone" op, and leave the "push" op alone.

-- 
Bart.



Re: Fetching the PC?

2001-10-12 Thread Bart Lateur

On Fri, 12 Oct 2001 18:18:27 +0200, Ritz Daniel wrote:

>within the vm address 0 should be address 0 of the bytecode, not the
>real cpu. but it would be nice to have a null pointerso what about the first 
>instruction in bytecode is at vm address 1?

All you have to do is reserve the location at address 0 for a special
purpose, for example, put a "magic word" there, by which the bytecode
can be identified as such.

-- 
Bart.



Re: Fetching the PC?

2001-10-12 Thread Bart Lateur

On Thu, 11 Oct 2001 22:23:06 -0400, Dan Sugalski wrote:

>>Are they going to be segmented somehow
>>so there's a "far jump" which takes us out of the current block?
>
>Nope. Jumps and jsrs take absolute addresses, so they can go anywhere. 
>Branches are relative so fixing them up to bounce between segments would be 
>tough, but we're not going to do that. :)

So you'll actually jump to the absolute *machine* address? That way
you'll need to patch them up at load time. If they were relative to the
start of the code, you could just save them and get reloadable bytecode,
without a need for patching up anything at all.

Oh well. There are pro's and cons to everything. I just get the feeling
that your choice as displayed here, originates in conservatism, i.e. do
the same as physical CPU's do. There's no need for that restriction.
This isn't a real CPU.

-- 
Bart.



Re: Revamping the build system

2001-10-11 Thread Bart Lateur

On Thu, 11 Oct 2001 09:59:56 -0400, Dan Sugalski wrote:

>At 06:10 PM 10/10/2001 -0700, Dave Storrs wrote:
>>Any interest in using something less painful than Make for this?  I was
>>thinking of Cons, myself...built in Perl 5 (which we are already requiring
>>you to have), and much more friendly than Make.
>
>Don't forget that our requirement for perl 5 is ultimately temporary. The 
>build system is the one thing that *can't* be in perl, since we'd need a 
>working simple makefile to build the first go-round of perl 6 which then 
>configures and rebuilds itself.

OTOH, "make" isn't very user friendly (the tabs vs. spaces thing is
notorious), and not all "make" tools work the same on all platforms.

-- 
Bart.



Re: Transcoding patch

2001-10-10 Thread Bart Lateur

On Tue, 09 Oct 2001 21:12:00 -0400, Dan Sugalski wrote:

>Does anyone handy have 
>an 8-bit set that's not US ASCII as their default character set?

EBCDIC?

Not me.

-- 
Bart.



Re: Cygwin Problems (was: [PATCH assemble.pl] Fix binary values i n bytecode)

2001-10-02 Thread Bart Lateur

On Fri, 28 Sep 2001 22:53:35 +0200, Andreas Buggs Hauser wrote:

>On Friday 28 September 2001 19:55, Gibbs Tanton - tgibbs wrote:
>> Ooohh, that's bad.  Cygwin works fine for me.  What test is it failing on?
>> What version of perl?
>
>I reinstalled Cygwin with "Default Text File Type" set to Unix
>instead of DOS and the tests went ok.
>Don't know if that's good or bad.

Sniff... sniff... do I smell the need for binmode()?

-- 
Bart.



Re: Wow.

2001-09-25 Thread Bart Lateur

On Mon, 24 Sep 2001 11:29:10 -0400, Dan Sugalski wrote:

>However...
>
>I was talking about a different instance of "bitmap". More like:
>
>   newbm P3, (640, 480, 24, 8)   # Make a 640X480, 24 bit image
> # with 8 bits of alpha
>   drawline P3, (100, 100, 200, 200, green)  # Draw a green line from
> # 100, 100 to 200,200
>
>and so on.

Tell me you're joking. Perl 5 has no clue about graphics, and rightfully
so. Implementing graphics primitives into the core sounds like an
extremely bad idea. If anything belongs in a module, this is it.

What underlying graphics engine would you use? Would it provide for
anti-aliasing (aka "getting rid of the jaggies")? How about support for
text as graphics? Fonts? Etc.  the list is endless.

-- 
Bart.



Re: Parrot multithreading?

2001-09-25 Thread Bart Lateur

On Thu, 20 Sep 2001 14:04:43 -0700, Damien Neil wrote:

>On Thu, Sep 20, 2001 at 04:57:44PM -0400, Dan Sugalski wrote:
>> >For clarification: do you mean async I/O, or non-blocking I/O?
>> 
>> Async. When the interpreter issues a read, for example, it won't assume the 
>> read completes immediately.
>
>That sounds like what I would call non-blocking I/O. 

Nonono. Nonblocking IO returns immediately. Async IO lets the
interpreter go on with another thread, until the read is done.

-- 
Bart.



Re: RFC: Bytecode file format

2001-09-24 Thread Bart Lateur

On Fri, 14 Sep 2001 16:42:21 -0400, Dan Sugalski wrote:

>Nope. At the very least, a bytecode file needs to start with:
>
>8-byte word:endianness (magic value 0x123456789abcdef0)
>byte:   word size
>byte[7]:empty
>word:   major version
>word:   minor version
>
>Where all word values are as big as the word size says they are.

I'm just wondering... Since we need a conversion tool for reading
non-native bytecode formats anyway, and since all bytecodes will be
limited to 32 bit... could it not be that on current day processors,
reading and converting of 32 bit bytecodes to 64 bit, if that is the
native format, could actually be faster than reading in 64 bit bytecodes
with no conversion? I would think that CPU cycles are cheap when
compared to disk I/O.

I can't test myself, I don't have that kind of machine.

-- 
Bart.



Re: Using int32_t instead of IV for code

2001-09-23 Thread Bart Lateur

On Sun, 23 Sep 2001 21:45:39 -0400, Dan Sugalski wrote:

>>We're talking bytecode. That will indeed be a case of "huge arrays of
>>tightly packed integers".
>
>For bytecode, it's not a big problem, certainly not one I'm worried about. 
>Machines that want 64-bit ints have, likely speaking, more than enough 
>memory to handle the larger bytecode.

I'm more worried about the cache. For 32 bit bytecodes, the same program
will be only half the size than for 64 bit. Or: you can fit a twice as
large program in the cache, or two of them (for multitasking). That will
mean a speed-up, and likely a vast one for programs with sizes close
enough the cache size.

-- 
Bart.



Re: Using int32_t instead of IV for code

2001-09-23 Thread Bart Lateur

On Thu, 13 Sep 2001 06:27:27 +0300 [ooh I'm far behind on these lists],
Jarkko Hietaniemi wrote:

>I always see this claim ("why would you use 64 bits unless you really
>need them big, they must be such a waste") being bandied around, without
>much hard numbers to support the claims.

>Unless you are thinking of huge and/or multidimensional arrays
>of tightly packed integers, I don't think you should care.

We're talking bytecode. That will indeed be a case of "huge arrays of
tightly packed integers".

-- 
Bart.



Re: Parrot coredumps on Solaris 8

2001-09-20 Thread Bart Lateur

[I'm behind on my mail  :-)]

On Wed, 12 Sep 2001 13:19:40 -0400, Dan Sugalski wrote:

>We're trying to align to a power-of-two boundary, and mask is set to 
>chop off the low bits, not the high ones. It should be something like:
>
>
>
>The calc:
>
> mem & mask + (~mask + 1)
>
>will chop the low bits off of mem, making it too small, but power-of-two 
>aligned. Then we add in the inverse of mask + 1 (in the above example, 
>that'd be 1) to jump it to the next power-of-two boundary.
>
>Horribly wasteful of memory, definitely, and the final allocation system 
>will do things better, but this is OK to start.

So to stop it waste memory, subtract 1 first and add it again later.

((mem-1) | ~mask) + 1

or (equivalent):

(mem+~mask) & mask


mask is a constant, like -8 (...11000), so ~mask is a constant too,
like 7 (...0111).

There must be even more tricks like this.

-- 
Bart.



Re: Math functions? (Particularly transcendental ones)

2001-09-11 Thread Bart Lateur

On Mon, 10 Sep 2001 18:48:01 -0400, Dan Sugalski wrote:

>At 12:35 AM 9/11/2001 +0200, Bart Lateur wrote:
>>On Mon, 10 Sep 2001 17:13:44 -0400, Dan Sugalski wrote:
>>
>> >Who the heck is going to override arctangent? (No, don't tell me, I don't
>> >want to know)
>>
>>Perhaps you do. Think BigFloat. Or Complex.
>
>I'm not too worried about bigfloats, since the precision loss you get 
>converting the argument to a float isn't a big deal. (All the 
>transcendentals are only good to four or six places anyway, so...) 

Perhaps that might just be the reason to overload them? Somebody might
want higher precision transcendentals? It won't be fast...

-- 
Bart.



Re: Math functions? (Particularly transcendental ones)

2001-09-10 Thread Bart Lateur

On Mon, 10 Sep 2001 17:13:44 -0400, Dan Sugalski wrote:

>Who the heck is going to override arctangent? (No, don't tell me, I don't 
>want to know)

Perhaps you do. Think BigFloat. Or Complex.

-- 
Bart.



Re: Math functions? (Particularly transcendental ones)

2001-09-09 Thread Bart Lateur

On Sat, 08 Sep 2001 13:02:04 -0400, Dan Sugalski wrote:

>>Uri mentioned exp(x) = e^x, but I think if you are going to include
>>log2, log10, log, etc, you should also include ln.
>
>Added.

Er... aren't ln and log synonyms?

-- 
Bart.



Re: Should MY:: be a real symbol table?

2001-09-06 Thread Bart Lateur

On Mon, 03 Sep 2001 19:30:33 -0400, Dan Sugalski wrote:

>The less real question, "Should pads be hashes or arrays", can be answered 
>by "whichever is ultimately cheaper". My bet is we'll probably keep the 
>array structure with embedded names, and do a linear search for those rare 
>times you're actually looking by name.

Perhaps a lookup hash for the names, containing the offsets?

-- 
Bart.



Re: Should MY:: be a real symbol table?

2001-09-06 Thread Bart Lateur

On Mon, 03 Sep 2001 19:29:09 -0400, Ken Fox wrote:

>> *How* are they "fundamentally different"?
>
>Perl's "local" variables are dynamically scoped. This means that
>they are *globally visible* -- you never know where the actual
>variable you're using came from. If you set a "local" variable,
>all the subroutines you call see *your* definition.
>
>Perl's "my" variables are lexically scoped. This means that they
>are *not* globally visible. Lexicals can only be seen in the scope
>they are introduced and they do not get used by subroutines you
>call. This is safer and a bit easier to use because you can tell
>what code does just by reading it.
>
>> But in this case the pad is actually a full symbol table.  The
>> concept is the same, the data structure is different.
>
>The concept isn't the same. "local" variables are globals. 

This is nonsense.

Firs of all, currently, you can localize an element from a hash or an
array, even if the variable is lexically scoped. This works:

use Data::Dumper;
my %hash = ( foo => 42, bar => '007' );
{
 local $hash{foo} = 123;
 print "Inner: ", Dumper \%hash;
}
print "Outer: ", Dumper \%hash;
-->
Inner: $VAR1 = {
  'foo' => 123,
  'bar' => '007'
};
Outer: $VAR1 = {
  'foo' => 42,
  'bar' => '007'
};

So local and global are not one and the same concept.

Unfortunately, this doesn't work with plain lexical scalars. I wonder
why. Really.

How are globals conceptually different than, say, globally scoped
lexicals? Your description of global variables might just as well apply
to file scoped lexicals. Currently, that is the largest possible scope,
but why stop there?

Typeglobs are on the verge of extinction. Perhaps the current concept of
symbol tables may well follow the same route? A symbol table will be
different in perl6, anyway. If the implementation of lexicals is
consistently faster than that of globals, perhaps globals ought to be
implemented in the same way as lexicals?

From the top of my head, I can already think of one reason against:
dynamic creating of new global variables, for example while loading new
source code. It's a situation you just can't have with lexicals. For
globals, it can, and will, happen, and it would require extending the
"global pad" or something like that.

-- 
Bart.



Re: Something to hash out

2001-08-27 Thread Bart Lateur

On Sat, 25 Aug 2001 18:58:50 +0100, Simon Cozens wrote:

>I was using .pas and .pac. Gotta think about 8.3ness, unfortunately.

The "8" might not be that relevant nowadays, but the "3" still matters.
On Win32, file extensions get cut off after 3 characters. So a ".html"
file is actually the same as a ".htm" file, I think.

-- 
Bart.



Re: Draft assembly PDD

2001-08-07 Thread Bart Lateur

On Mon, 06 Aug 2001 21:55:07 -0400, Dan Sugalski wrote:

>>But I do not agree that calculated jumps should be done in such a hard
>>way.
>
>Nothing hard about it, really. 

I was referring to Hong Zhang's proposal, not yours.

-- 
Bart.



Re: Draft assembly PDD

2001-08-06 Thread Bart Lateur

On Mon, 6 Aug 2001 15:41:59 -0700 , Hong Zhang wrote:

>>Branches should work from 
>> both constants and registers.
>
>Even so, the "branch #num" should have better performance, and
>it is part of any machine language. Since we already have jump 
>instruction, do we really need the "branch %r", which can be
>simulated by "add %r, %pc, #num; jump %r".

In a way, I feel like agreeing. Isn't branch #num the normal case?
Should you waste time on the normal case because you want to be able to
do exceptional stuff too?

And don't calculated jumps kill caching efficiency? (Or is this "old
CPU" wisdom?)

But I do not agree that calculated jumps should be done in such a hard
way.

-- 
Bart.



Re: Modules, Versioning, and Beyond

2001-07-30 Thread Bart Lateur

On Tue, 31 Jul 2001 07:24:45 +0200, Bart Lateur wrote:

>For example, with simple file names, it's impossible to run a perl 5.005
>and a perl 5.6 both using XML::Parser, at the same time.

It's also impossible, on Win32, to use XML::Parser and (an XS version
of) HTML::Parser at the same time, because the DLL is called
"Parser.dll" for both.

>This came up on comp.lang.perl.misc once, and Ilya Z. then wrote, IIRC,
>that there's no reason why the DLL (if I may call it this way) should
>have a name identical to the module name. His example was that on his
>port, for OS/2, he added a (machine generated) versioning string.

I looked, and found that message back on groups.google.com.

http://groups.google.com/groups?as_umsgid=8sverk%244k3%241%40charm.magnus.acs.ohio-state.edu

In case the URL got corrupted in the mail, go to "advanced search",
<http://groups.google.com/advanced_group_search>, and search for message
with ID <8sverk$4k3$[EMAIL PROTECTED]>.

-- 
Bart.



Re: Modules, Versioning, and Beyond

2001-07-30 Thread Bart Lateur

On Mon, 30 Jul 2001 22:32:54 -0400 (EDT), Sam Tregar wrote:

>On Mon, 30 Jul 2001, Dan Sugalski wrote:
>
>> When you actually use a module, the simple name (like IO) will be
>> internally expanded out to the three value thing. So if you have two
>> modules that each use a different version of the same module, they won't
>> interact because each will be dealing with a separate thing.
>
>How will this work with XS modules that load external libraries?  Won't
>trying to load two versions of mysql.so cause symbol collision?

On Windows, it most certainly will. That OS simply refuses to load more
than one DLL with the same name, and it will load a second copy of the
first one, even if you used an overriding explicit path to the file.

For example, with simple file names, it's impossible to run a perl 5.005
and a perl 5.6 both using XML::Parser, at the same time.

This came up on comp.lang.perl.misc once, and Ilya Z. then wrote, IIRC,
that there's no reason why the DLL (if I may call it this way) should
have a name identical to the module name. His example was that on his
port, for OS/2, he added a (machine generated) versioning string.

I think this is a good time to generalise that practice.

-- 
Bart.



Re: The internal string API

2001-06-28 Thread Bart Lateur

On Tue, 19 Jun 2001 14:51:43 -0500, Jarkko Hietaniemi wrote:

>But a locale is a collection of user preferences.  How I want
>my dates to be formatted, how I want my strings to be sorted.

That's not right. If I do a text conversion from Windows to Mac, I would
want to source to use the CP-1522 locale, and the output the MAc-Roman
locale. If I have a file in French, and a file in Chinese, I want one to
be treated as French, and the other as Chinese.

If this can't be done, I don't need locale's. I'll make my own kludges
thank you very much.

-- 
Bart.



Re: The internal string API

2001-06-20 Thread Bart Lateur

On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote:

>> * Do a substr operation by character and glyph
>
>The byte based is more useful. I have utf-8, and I want to substr it
>to another utf-8. It is painful to convert it or linear search for
>charaacter
>position.

I tend to agree.

I currently use substr(), length() and read()/sysread(), based on a byte
count. It's a mindset. Even if my encoding is in (16 bit) Unicode or
UTF8, I still prefer to use bytes as my count base.

Personally, I would prefer if it stayed this way, i.e. that the raw,
non-OO keywords for the above kept counting in bytes.

Why? Just imagine processing a binary file like a JPEG file, with
embedded comments in (16-bit) Unicode. You wouldn't want Perl preventing
you from treating this comment as Unicode, or having to process this
entire binary file as Unicode, would you? I'd hate that. I want to
remain in control.

I would not mind if OO versions of these words were smarter, and did
their count in characters for whatever character mode they're set to.
For example, if $string is a UTF8 object, then $string->length may
return a length in (UTF8) characters.

-- 
Bart.



Re: More character matching bits

2001-06-15 Thread Bart Lateur

On Fri, 15 Jun 2001 06:52:32 -0400, Bryan C. Warnock wrote:

>On a side note (and this *will* sound stupid, but there is a reason I'm 
>asking).  Why is there no logical opposite to '.'; that is, a character 
>which never matches another character?  (Besides, of course, that it's 
>utterly useless from a classic regex perspective.)

You mean, like

(?!)

?

Actually that's a lookahead that always fails.

For single byte character sets, there's also

[^\000-\377]

-- 
Bart.



Re: More character matching bits

2001-06-14 Thread Bart Lateur

On Wed, 13 Jun 2001 13:39:16 -0400, Dan Sugalski wrote:

>> > Something that should be part of the core? I'll leave
>> >that for you to decide.
>>
>>Most definitely NOT.
>
>Most definitely sort of.
>
>>There is no reason to put fucntionality for free matching of Japanese
>>characters into the basic perl executable.
>
>No, you're right. But the core must take into account the capabilities that 
>need to be available for comparison and matching of the languages perl's 
>going to make at least some effort to support.

If you're saying that the perl core shsould include hooks into the regex
engine for custom character classes, I agree. But nothing more.
Currently, Perl5 provides a hook for "use locale;", but I wish there was
something more general than this, more customizable. For example, I
sometimes have user defined character encodings, that don't follow any
standard. I wish there was a simple, perl-only, way to cope with them.

Also, for example, I would like be able to match "รก" with /[a]/, but
without changing the sort order. "locale" is a bit too much "all or
nothing" for me.

-- 
Bart.



Re: More character matching bits

2001-06-13 Thread Bart Lateur

On Wed, 13 Jun 2001 01:22:32 +0100, Simon Cozens wrote:

> Something that should be part of the core? I'll leave
>that for you to decide.

Most definitely NOT.

There is no reason to put fucntionality for free matching of Japanese
characters into the basic perl executable. There were already voices
here, on how to strip Unicode support completely from the perl core,
because these people don't feel like they need it. so this is most
definitely going to far.

If you want to free matching on Japanese text, stuff it in a module. If
you want the same kind of support for Korean, stuff it in another
module. Very few people will need both at the same time. Perhaps it can
even be put into locale...?

But, as a summary: Unicode support in the perl core should be minimal,
so nobody feels the need to strip anything.

-- 
Bart.



Re: should vtables be vtables?

2001-06-13 Thread Bart Lateur

On Wed, 13 Jun 2001 12:00:21 +0100 (BST), Dave Mitchell wrote:

>I was thinking back to the earlier discusions on opcode dispatch,
>and the fact that some people thought that a big switch was as good as,
>or possibly faster than a dispatch table. Which led me to think...

I would think that a switch could be optimized by the compiler by
turning it into a jump table. That way, it's not surprising that it "can
be as fast". Un der the surface, it is the same thing!

>should we abandon vtables (ie arrays of fn pointers indexed by op),
>and just have a single hander function per type which has the op as an arg?

You mean, like Windows' "window function? Shudder.

I wouldn't do that. For one thing, a vtable can grow dynamically,
functions can be added at runtime. It doesn't matter, the machine code
to execute remains the same for built-in, or for freshly added
functions.

In a statically compiled switch, it is impossible to add a new function,
i.e. a new switch branch.

Furthermore, unoptimized switches are definitely slower than vtables.
Internally, a switch is turned into something like:

if($fn == FN1) {
...
} elsif($fn == FN2) {
...
} elsif($fn == FN3) {
...
} elsif($fn == FN4) {
...
} elsif ...
}

The further down the chain, the slower the dispatching. Do you really
want that?

-- 
Bart.



Re: Should the op dispatch loop decode?

2001-06-13 Thread Bart Lateur

On Tue, 12 Jun 2001 18:12:35 -0400, Dan Sugalski wrote:

>'Kay, here's a question to ponder. Should the op dispatch loop handle 
>argument decoding, or should that be left to the opcode functions?

Are you talking about lazy vs. normal evaluation?

Lisp knows basically two modes, normal evaluation, where parameters are
interpreted before the function is called, and lazy evaluation, where
each parameter is only interpreted as it is needed. The latter is useful
in Perl for shortcircuiting operators like && || or and but also for ?:

I'd say that for the general case, you need these two modi. How you
implement them, is for you to choose.

-- 
Bart.



Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bart Lateur

On 05 Jun 2001 11:07:11 -0700, Russ Allbery wrote:

>Particularly since part of his contention is that 16 bits isn't enough,
>and I think all the widely used national character sets are no more than
>16 bits, aren't they?

It's not really important.

UTF-8 is NOT limited to 16 bits (3 bytes). With 4 bytes, UTF-8 can
represent 20 bit charatcers, i.e. 6 times more than the "desired number"
of 17. See  for how it this is done.

And the major flaw that I see in acceptance of Unicode, is that the
Unicode "text" files are not Ascii compatible. UTF-8 file are. That
makes for a very nice upgrade path.

-- 
Bart.



Re: PDD 2nd go: Conventions and Guidelines for Perl Source Code

2001-06-05 Thread Bart Lateur

On Tue, 29 May 2001 18:25:45 +0100 (BST), Dave Mitchell wrote:

>diffs:
>
>-"K&R" style for indenting control constructs
>+"K&R" style for indenting control constructs: ie the closing C<}> should
>+line up with the opening C etc.

On Wed, 30 May 2001 10:37:06 -0400, Dan Sugalski wrote:

>I realize that no matter what style we choose, there will be a good crop of 
>people who won't be thrilled with it. (For the record, we can count me as 
>one, if that makes anyone feel any better :) That's inevitable.

If you have a diff/patching suite that falls over whitespace, you have a
problem with diff, not with style.

One can always to a pretty-print cleanup of the code, before doing the
diff, if all else fails.

IMO this is not worth bickering over.

-- 
Bart.



Re: Tying & Overloading

2001-04-25 Thread Bart Lateur

On Wed, 25 Apr 2001 11:01:07 -0300, Branden wrote:

>If the idea is supporting arbitrary add-on operators, which I believe will 
>be done seldom, for only some specific classes, wouldn't it be better to 
>have a ``catch all'' entry for operators different than the built-in ones?
>
>Of course, add-on operators would not have the same ``performance'' of 
>built-in ones

I think I second that. I would think of a fixed table for the built-in
ones, and a linked list for the add-ons. It's not necessary that a new
node is added for each and every method; instead, a structure similar to
those used in TIFF files could be used, where each linked in node
contains a table with several items, and a new node is only added when
that table is full.

-- 
Bart.



Re: So, we need a code name...

2001-04-24 Thread Bart Lateur

On Tue, 24 Apr 2001 19:17:08 -0500, Jarkko Hietaniemi wrote:

>Wasn't Perl also taken, so why care...?  I vaguely remember reading
>about another language called PERL...

It was "Pearl", AFAIK. That's why the "a" got missing. So I've been
told... ("Practical Extracting And Reporting Language"... yup, there is
an "a" there.)

-- 
Bart.



Re: PDD 4 internal data types, version 1.1

2001-03-30 Thread Bart Lateur

On Thu, 29 Mar 2001 19:24:21 +0200 (CEST), Tels wrote:

>And then, if we have BigFloat, we need a way to specify rounding and
>precision. Otherwise 1/3 eats up all memory or provides limits ;o)

Er... may I suggest ratio's as a data format? It won't work for sqrt(2)
or PI, but it can easily store 1/3 as two (long) integers. You can
postpone doing integer divisions until you need a result, at which time
you can reorder calculations between * and /, so

(2/3)*9 

will return exactly 6.

-- 
Bart.



Re: vtables: Assignment vs. Aliasing

2001-02-07 Thread Bart Lateur

[CC'ed to language, because I think it's there that it belongs]

On Mon, 5 Feb 2001 15:35:18 -0200, Branden wrote:

>There are two possible things that could happen when you say:
>$a = $b;
>@a = @b;  # or
>%a = %b;
>
>These two things are assignment and aliasing.

No way. Although I think aliasing is a great tool, but assignment is by
value. Always. (Well, except for referenced things...)

>In perl5 terms:
>*a = \$b;
>*a = \@b;  # or
>*a = \%b;

>However, typeglobs are said to disappear from Perl6,

I think Larry wants to drop typeglobs themselves, i.e. keeping different
kinds of variables of the same name in one record, but not the
possibilities they offer. Aliasing is likely the most interesting
feature of them all.

...

My preference:

>* Alias when assigning to a reference:
>\$a = \$b;
>\@a = \@b;
>\%a = \%b;

I think this is a nice symmetrical syntax.

>* Make aliasing the default for = and provide another way of assigning (NO
>WAY!!!)

Indeed, no way.

Look, if you'd do the latter, you would not only make Perl effectively a
different language, but you'd also be missing out on one of the great
benefits of aliasing. For example, you pass a reference of a hash to a
sub, so the original hash can be accessed and modified. With the latter
syntax, you can't even do that through an alias. In the former syntax:

foo(\%bar);

sub foo {
my \%hash = shift;  # alias through reference
print $hash{FOO};
}

You can now access the passed hash as a hash, and not through the
slightly awkward syntax of accessing it through a reference:

sub foo2 {
my $hash = shift;
print $hash->{FOO};
}

(You don't think it's that awkward? Try getting a hash slice through a
hash reference. Ugh.)

-- 
Bart.



Re: Magic [Slightly Off-Topic... please point me to documentation]

2001-02-07 Thread Bart Lateur

On Tue, 6 Feb 2001 17:53:17 -0200, Branden wrote:

>It appears you're blessing one reference and returning another... like
>
>sub new {
>my $key;
>my $a = \$key;
>my $b = \$key;
>bless $a;
>return $b;
>}
>
>I think the problem is not with the overloading magic, but with the code
>snippet...

A recent thread on comp.lang.perl.misc discussed how bless() works with
the reference, but alledgedly, it's the underlying thing that gets
blessed, not the reference itself.

my $a = \$x;
my $b = \$x;
bless $a, 'FOO';
print $b;
-->
FOO=SCALAR(0x8a652e4)

It sure looks that they're right. Oh, this is perl 5.6.0.

-- 
Bart.



Re: perl IS an event loop (was Re: Speaking of signals...)

2001-01-08 Thread Bart Lateur

On Sat, 6 Jan 2001 00:45:11 +, Simon Cozens wrote:

>No, it's exactly what Perl 5 does.
>
>This is the Perl interpreter:
>while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
>PERL_ASYNC_CHECK();
>}
>
>The only problem is that right now, PERL_ASYNC_CHECK doesn't actually
>do anything. :)

I don't get it. Does this *have* to give a 3-5% performance hit? Even if
you do it this way (syntax is a Perlish extension to C):

while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
async_waiting or next;
PERL_ASYNC_PROCESS();
}


BTW I agree with Nicolas Clark's remark (but I don't subscribe to p5p,
so I won't post there either):

>Hmm. No-one produced a patch with 2 loops, 1 for normal use, and 1 when
>%SIG has handlers other than default or ignore assigned to it.
>Would that be an acceptable perl5 compromise?

Since this "event loop" is so tiny, this doesn't even look like much of
a compromise.

Apropos safe signals, isn't it possible to let perl6 handle avoiding
zombie processes internally? What use does having to do wait() yourself,
have anyway?

-- 
Bart.



Re: SvPV*

2000-11-24 Thread Bart Lateur

On Fri, 24 Nov 2000 08:54:43 +0100, Roland Giersig wrote:

>Maybe the title should be :
>
>"Perl should use XML as its basic data type instead of linear strings"

Horrible.

I kinda liked your original proposal. But you should NOT focus on XML.
That leaves out too many other possible data sources: RTF, for example,
or TeX. What is typical, is that it is marked up text, in the form of a
tree, i.e. properly nested.

The internal structure might as well be easily representable as XML.

I do think that the term "non-linear text" is absolutely unclear.

-- 
Bart.



Re: RFC 361 (v1) Simplifying split()

2000-10-07 Thread Bart Lateur

On Fri, 6 Oct 2000 23:26:44 -0500, Jonathan Scott Duff wrote:

>   @foo = split;
>   # BECOMES
>   @foo = split; pop @foo until $foo[-1];

That doesn't fly. What if that last field is "0"?


>   @foo = split ' ';
>   # BECOMES
>   @foo = split /\s+/; shift @foo;

What if there is no leading whitespace? You shift out a perfectly valid
field.

shift @foo if @foo and length $foo[0];

-- 
Bart.



Re: A tentative list of vtable functions

2000-10-02 Thread Bart Lateur

On Mon, 2 Oct 2000 09:40:33 -0500, Jarkko Hietaniemi wrote:

>
>For the record: I hate the current policy of defaulting to NVs for
>arithmetic ops.  If I say '2' I do mean an IV of 2, not an NV of
>2.000.  Currently if I say
>
>  $a = 2;
>  $b = 3;
>  $c = $a + $3;

s/\$3/\$b/

>the $c will be an NV of of 5.000, or thereabouts, een
>while $a and $b are IVs.

Note: integers have an exact representation in floating point. There is
no "thereabouts". It is exactly 5.000.

However, 0.2 + 0.3 is not exactly 0.5, because 0.2 and 0.3 are both
approximations. 0.5 has an exact representation in FP, as has 1/1024.

Since FP calculations on modern processors are at least as fast as
integer calculations, there is hardly any reason to prefer integers to
FP. Bigints, that's another matter.

-- 
Bart.



Re: RFC 361 (v1) Simplifying split()

2000-10-01 Thread Bart Lateur

On Sun, 01 Oct 2000 11:18:58 +0200, Bart Lateur wrote:

>   my @a = split /:/, "", -1;

Oops. that should be 

my @a = split /:/, $_, -1;

-- 
Bart.



Re: RFC 361 (v1) Simplifying split()

2000-10-01 Thread Bart Lateur

On 1 Oct 2000 06:40:08 -, Perl6 RFC Librarian wrote:

>Perl 5 split does five things that I think are just annoying, and
>which I suggest be removed:

I've got one more problem.

for my $i (0 .. 4) {
$_ = ':' x $i;
my @a = split /:/, "", -1;
my $count = @a;
print "$i - $count\n";
}
-->
0 - 0
1 - 2
2 - 3
3 - 4
4 - 5

See the jump at the front?

-- 
Bart.



Re: RFC 136 (v3) Implementation of hash iterators

2000-09-30 Thread Bart Lateur

On 28 Sep 2000 19:40:01 -, Perl6 RFC Librarian wrote:

>=head2 How iterators might work in perl 6
>
>In perl 6 the keys and values functions should no longer use the
>same iterator as the each function - each use of keys and values
>should use it's own private iterator instead.

Is that per Damian? The iterator stored in the syntax tree? So, what
happens if there's a recursive function call, to a function containing
keys()?

And what if I want the old effect? Pretty much like /PATTERN/g continues
where the previous pattern match stopped, even if it was another regex.

-- 
Bart.



Re: RFC 313 (v1) Perl 6 should support I18N and L10N

2000-09-25 Thread Bart Lateur

On 25 Sep 2000 20:15:19 -, Perl6 RFC Librarian wrote:

>Erreur de syntaxe. Syntaxfehler. Errore di sintassi. suntaktik'o sphalm'a.
>
>Perl 6 needs some kind of internationalisation and therefore message
>catalogue support. Really needs, with great urgency.

Eh? Are you saying that Perl's error message should be adapted to the
language of the computer user? I don't like that. And I speak Dutch
natively. Computers speak English.

How would Perl decide on what language to use? Some environment
variable?

And what about programmer supplied error messages? Should the programmer
supply lots of languge versions as well?

It reminds me of Applescript, where the system error messages and
buttons are localized, but the custom error messages and buttons are
not. So you get an annoying mix of English and Dutch buttons, and
English and Dutch texts. I want one language, please. Even if it's not
my native language.

But if you insist on going ahead and implementing this, I won't stop
you.

-- 
Bart.



Re: RFC 214 (v1) Emit warnings and errors based on unoptimized code

2000-09-15 Thread Bart Lateur

On Thu, 14 Sep 2000 15:47:43 -0700, Steve Fink wrote:

>Currently, toke.c turns "foo$bar" into "foo".$bar before the parser or
>anything else sees it. So any features implemented in the tokenizer have
>to get smarter about remembering what they did.

This sound pretty much like the same problem you face when designing a
source level debugger, for any compiled language.

-- 
Bart.



Re: RFC 130 (v4) Transaction-enabled variables for Perl6

2000-09-07 Thread Bart Lateur

On Wed, 06 Sep 2000 11:23:37 -0400, Dan Sugalski wrote:

>>Here's some high-level emulation of what it should do.
>>
>> eval {
>> my($_a, $_b, $c) = ($a, $b, $c);
>> ...
>> ($a, $b, $c) = ($_a, $_b, $_c);
>> }
>
>Nope. That doesn't get you consistency. What you need is to make a local 
>alias of $a and friends and use that.

My example should have been clearer. I actually intended that $_a would
be a variable of the same name as $a. It's a bit hard to write currently
valid code that way. Second attempt:

eval {
($a, $b, $c) = do {
local($a, $b, $c) = ($a, $b, $c); #or my(...)
... # code which may fail
($a, $b, $c);
};
};

So the final assignment of the local values to the outer scoped
variables will happen, and in one go, only if the whole block has been
executed succesfully.

>You also need to lock down those 
>variables so other threads will block if they write to them, and make 
>copies if they need to only read them.

That is partly why I used lexical variables. Other threads will NOT see
the new values, but the old values, as long as the final back assignment
hasn't happened.

I would simply block ALL other threads while the final group assignment
is going on. This should finish typically in a few milliseconds.

>It also means that if we're including *any* sort of external pieces (even 
>files) in the transaction scheme we need to have some mechanism to roll 
>back changes. If a transaction fails after truncating a 12G file and 
>writing out 3G of data, what do we do?

That does not belong in the kernel of a language. All that you may
expect, is transactions on simple variables; plus maybe some hooks to
attach external transaction code (transactions on files etc) to it. A
simple "create a new file, and rename to the old filename when done"
will usually do.

-- 
Bart.



Re: RFC 130 (v5) Transaction-enabled variables for Perl6

2000-09-05 Thread Bart Lateur

On Tue, 5 Sep 2000 10:48:45 +0200, dLux wrote:

>/--- On Mon, Sep 04, 2000 at 07:18:56PM -0500, Greg Rollins wrote:
>| Will perl monitor the commit and rollback actions of transactions?
>\---
>
>What exactly you mean?

And did you have to quote 500+ lines of the RFC just to add this one
sentence?

-- 
Bart.



Re: RFC 130 (v4) Transaction-enabled variables for Perl6

2000-09-05 Thread Bart Lateur

On Tue, 05 Sep 2000 11:48:38 -0400, Dan Sugalski wrote:

>>- two-phase  commit handler,  rollback coordinator  (the above  two is
>>   connected to this: very simple algorhythm!)
>
>Here's the killer. This is *not* simple. At all. Not even close.
>
>Doing this properly with data sources you completely control in a 
>multi-access situation (read: with threads) is *hard*.

Is it?

Here's some high-level emulation of what it should do.

eval {
my($_a, $_b, $c) = ($a, $b, $c);
...
($a, $b, $c) = ($_a, $_b, $_c);
}

Now, "all" that needs to be taken care of, is make sure that the final
assignment from the localized and changed variables to their
outer-scoped counterparts happens in *one step*, i.e. no task switching
while this is going on.

-- 
Bart.



Re: RFC 146 (v1) Remove socket functions from core

2000-08-25 Thread Bart Lateur

On Fri, 25 Aug 2000 12:19:24 -0400, Dan Sugalski wrote:

>Code you don't call won't eat up any cache space, nor crowd 
>out some other code. And if you do call it, well, it ought to be in the cache.

Probably a stupid question... But can't you group the code for the most
often used constructs? So that, if one of those things is loaded in the
cache, the others are in there with it?

If all the less needed stuff is more at the back of the executable, it
wouldn't even have to be loaded, most of the time.

Besides, I'm more worried about unnecessarily loading 600k from disk,
than from main memory to cache. For short-lived scripts, this loading
overhead could be quite significant.

-- 
Bart.



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-24 Thread Bart Lateur

On Thu, 24 Aug 2000 09:38:28 +0100, Hildo Biersma wrote:

>> I expect that we'll get more compile-time benefit from
>> 
>> my HASH sub foo {
>> ...
>> }
>> 
>> %bar = foo();
>
>Ah, the Return Value Optimization so loved in C++...
>
>For those who haven't seen it before, you can optimize this by passing
>in a reference to %bar to foo() and then use that in the function.

Just a remark: this is only safe if all other references to the hash
returned are abandoned. Otherwise you'd have an alias where you should
have gotten a copy.

-- 
Bart.



Re: Hooks for array notation (was Re: Ramblings on "base class" for SV etc.)

2000-08-11 Thread Bart Lateur

On Thu, 10 Aug 2000 05:03:38 +0200, Bart Lateur wrote:

[description of a mechanism for storing sparse arrays:]

>Imagine
>that it will be traversed based upon the groups of bits in the array
>index. Say, with 32 bit indices, subdivided into 4 bytes. You can start
>with the lower byte, which can give you one of 256 pointers to the next
>table -- or null (zero), if that segment is completely empty and the
>next table nonexistent. Repeat with the second byte, get another table
>pointer, or null. Repeat and follow the pointer, at most 4 times in
>total. In the last pointer table, a nonzero value points to the thing
>itself.

Stike that. you can start with the *higher* byte, instead of the lower,
and this mechanism could even be used to store ordinary one-dimensional
arrays, if you must.

(I am *not* familiar with the current system behind arrays; reading the
source to extract the idea doesn't sound smart. If somebody could point
me to an explanation on how it works, I would be much obliged.)

I thought that starting with the lower byte would nicely split all
indices up into groups. It will; but there's absolutely no advantage in
it. You'll always get a lot of tables. The number of required steps is
always the same anyway. The only advantage would be ease of
implementation: do a bitwise AND to get the lower byte, and do a shift
to the right to get at the next chunk.

I think that the common case would be that used array places will be
used in chunks, for example, indices 1 till 10100 are in use, and
the rest is empty. If you start with the higher byte, there will be a
lot more NULL pointers, i.e. no 1k pointer block attached. For more
randomly distributed indices, the pointer blocks would still be largely
empty. The situation would be much the same, as far as I can gather.

But: the number of steps need not be a fixed number! Say that the array
descriptor contains a field containing the number of steps, for example,
3 if the largest index is above 65k but below 16 million. (It is the
number of bytes that this largest index can fit in.) You can traverse
the tree with a simple loop.

Say that you start with an ordinary array of 250 items. This first into
one byte, so you just need one step, and one 1k block to hold the
pointers to the array items. So it's almost as fast as it can be, just
one indirection and you've got your pointer to the scalar value.

Now it turns out that you need an array index of 300, beyond the 255
limit. What do you do? You allocate another 1k block, clear it, and put
a pointer to the original block in slot #0. Increment the steps field to
2, and set the tree root pointer to the address of this new block. Tada!

You can now simply allocate and clear a new 1k block for the second
page, array indices 256 to 511, and add the pointer to slot #1 of the
root block.

All in all, you need just 3 1k blocks to hold the whole tree, and
extending it is a simple algorythm. That's why I said it can be used for
ordinary arrays as well.

Problems? Yes: first of all, doing a splice isn't as simple as it might
have been without this tree structure. You'll need lots of copying of
chunks of pointers between the 1k blocks, a few per block.

Secondly, the subdivision of the index into bit groups isn't as easy (or
as fast) as before. A simple "mask and shift" won't do. A "roll and
mask" would; but that's in assembler. C doesn't support rolls (shift
left, but move top byte of register into lower byte), AFAIK.

-- 
Bart.



Re: Hooks for array notation (was Re: Ramblings on "base class" for SV etc.)

2000-08-09 Thread Bart Lateur

On Wed, 09 Aug 2000 12:03:40 -0400, Dan Sugalski wrote:

>> >I hope this RFC will be "Arrays should be sparse when possible, and
>> >compact" and just about nothing else. :)
>>
>>You mean, something like hashes?
>
>Nope.
>
>>Faster hashes, maybe, with a hash function optimized for numerical
>>integer keys.
>
>I was thinking we might keep a bitmap for used/unused cells (with unused 
>doubling as undef) and a two-level array pointing to chunks of real 
>elements/element pointers.

Ouch. If you can have any index that first in a 32-bit integer, your
bitmap list may use up 2^24 bytes, or 16Mb. For one array!

>So, for example, if you did a:
>
>$foo[$elem];
>
>perl would first check bit $elem to see if that element is in use. If so, 
>it'd do a ($elem / chunk_size) to get the index to the chunk pointer in the 
>array structure, then access element ($elem % chunk_size) to get a pointer 
>to the ultimate thing. (Or the thing itself, if it were an int, say)

Hmm.. that may turn out to be a sparsely filled table in itself.

>Other methods are possible, of course, and which would be a win depends on 
>your element distribution. (A linked list would be a win for very sparse 
>arrays--$foo[time])

A linked list? How that? It sounds like a far slower approach even than
a hash.

I myself would have though of something remotely like B-trees. Imagine
that it will be traversed based upon the groups of bits in the array
index. Say, with 32 bit indices, subdivided into 4 bytes. You can start
with the lower byte, which can give you one of 256 pointers to the next
table -- or null (zero), if that segment is completely empty and the
next table nonexistent. Repeat with the second byte, get another table
pointer, or null. Repeat and follow the pointer, at most 4 times in
total. In the last pointer table, a nonzero value points to the thing
itself.

In case of one item, you'd have 4 tables of 1k each, and one slot in the
fourth table that points to the value.

How's that for an idea from the top of my head?

-- 
Bart.



Re: vector and matrix calculations in core? (was: Re: Ramblings on "base class" for SV etc.)

2000-08-09 Thread Bart Lateur

On Wed, 09 Aug 2000 12:46:32 -0400, Dan Sugalski wrote:

>> >   @foo = @bar * @baz;

>Given that the default action of the multiply routine for an array in 
>non-scalar context would be to die, allowing user-overrides of the 
>functions would probably be a good idea... :)

[Is this still -internals? Or should we stop CC'ing?]

One problem: overloading requires objects, or at least one. Objects are
(currently) scalars. You can't make an array into an object.

Well, you can try:

bless \@ary, 'Vector';
print ref \@ary;

which says: 

Vector

so you COULD get there in the end. A nicer syntax would be along the
proposed syntax of

my Vector @ary;

but a problem is:

@copy = @ary;

which only copies the items by value, as a list, and thus ignores the
blessing. @copy is a plain array.

-- 
Bart.



Re: Hooks for array notation (was Re: Ramblings on "base class" for SV etc.)

2000-08-09 Thread Bart Lateur

On Wed, 09 Aug 2000 10:04:15 -0400, Dan Sugalski wrote:

>>5- Compact array storage: RFC still coming
>
>I hope this RFC will be "Arrays should be sparse when possible, and 
>compact" and just about nothing else. :)

You mean, something like hashes?

Faster hashes, maybe, with a hash function optimized for numerical
integer keys.

-- 
Bart.



vector and matrix calculations in core? (was: Re: Ramblings on "base class" for SV etc.)

2000-08-09 Thread Bart Lateur

On Wed, 09 Aug 2000 09:41:22 -0400, Dan Sugalski wrote:

>> >>  @foo = @bar * 12;

>> @foo = map { $_ * 12 } @bar;

>>I don't see the need for a new notation.
>
>Well, compactness for one. With a scalar on one side it's less odd (it was 
>a bad example). When funkier, though:
>
>   @foo = @bar * @baz;
>
>the expansion becomes less obvious and quite a bit larger, especially if 
>the arrays are multidimensional.

(Isn't this becoming a "language" issue? -language CC'ed.)

If you're talking about matrix manipulations, I should immediately hold
you back. Perl arrays are pretty bad as is for representing matrices.
Don't let anybody tell you otherwise: Perl data structures are
one-dimensionaly by nature.

For vector manipulation, I can understand that, *in principle*, but not
really. I remember that a few functional language extensions have been
proposed, including "apply a function (code block) to each combination
of item i from the first list with item i of the second list, for each
i". You can easily roll your own. It will do what you want, not what the
implementors thought useful. For example: vector multiplication. What
will @foo*@bar do? Will it return the scalar product
(abs(@foo)*abs(@bar)*cos(angle), the vector product (returning a vector
orthogonal to both others), or a matrix multiplication? If you roll your
own, you can choose.

No, I am more in favor of solid yet flexible overloading mechanism, so
that you can use the appropriate module, which does the multiplication
of your choice, and which still allows you to use something like the
above notation. $foo and $bar will then be (vector or matrix) objects,
instead of plain Perl arrays. Much like BigInt now.

-- 
Bart.



Re: Ramblings on "base class" for SV etc.

2000-08-09 Thread Bart Lateur

On Wed, 9 Aug 2000 09:11:55 +0100 (BST), Nick Ing-Simmons wrote:

>>  @foo = @bar * 12;
>
>I like it. 

>It is pretty obvious what above should do:
>
>@foo = ();
>foreach my $elem (@bar)
> {
>  push(@foo,$elem * 12);
> } 

@foo = map { $_ * 12 } @bar;


I don't see the need for a new notation.

-- 
Bart.



Re: pramgas as compile-time-only

2000-08-09 Thread Bart Lateur

On Tue, 8 Aug 2000 20:58:46 -0400 (EDT), Dan Sugalski wrote:

>On Tue, 8 Aug 2000, Bart Lateur wrote:
>
>> Time for subroutine threading, isntead of op threading?
>
>Probably, depending on your definition of subroutine threading.
>
>> That would definitely make the "compiled" code at least twice as big.
>
>Could, yep. (Depending, of course, on what you're talking about... :)

FORTH lingo again. "Threading" is the common name there for any kind of
P-code interpretation.

"Subroutine threading" is simply a very primitive form of native code
generation, where every op (aka "token") is simply a call instruction,
and inlined branches for if/else processing. You may optimize a little
by inlining some very often used instructions.

Because you need to compile in a complete (relative?) address for the
call, it's a bit of a wasteful way of storing P-code. But, at least, it
really is native code and thus it will reside in the code cache.

p.s. I wonder why nobody said anything about that typo in the subject
line. I get visions of a dangerous baby...

-- 
Bart.



Re: pramgas as compile-time-only

2000-08-08 Thread Bart Lateur

On Tue, 08 Aug 2000 11:33:06 -0400, Dan Sugalski wrote:

>The problem perl will always run into is that our executable code counts as 
>data to CPUs, and lives in the D cache, along with all the data we work on. 
>Ripping through a few 100K strings'll kill any sort of benefits to keeping 
>the optree small

Time for subroutine threading, isntead of op threading?

That would definitely make the "compiled" code at least twice as big.

Er, I should shut up, because I haven't got a clue how Perl is (or would
be) implemented. I suspect it's similar to P-code.

-- 
Bart.



Re: Language RFC Summary 4th August 2000

2000-08-06 Thread Bart Lateur

On Sun, 06 Aug 2000 01:38:13 -0400, Dan Sugalski wrote:

>>Even in perl5 an XS module can do _anything at all_.
>
>It can't access data the lexer's already tossed out. That's where the 
>current format format (so to speak) runs you into trouble.

Only if you insist on the identical syntax as it has been until now. You
can cock something together something with here-docs. Implementation can
at least partly be based upon sprintf.

-- 
Bart.