Re: ICU data file location issues

2004-04-15 Thread Jeff Clites
On Apr 14, 2004, at 10:20 PM, Jarkko Hietaniemi wrote:

Finding stuff relative to the executable/DLL would be coolest scheme,
but that is admittedly somewhat tricky to get working cross-platform.
Excellent idea. Pretty much every single resource in Cocoa applications 
and frameworks on Mac OS X is located using a scheme such as this, and 
I believe it all used to work correctly for OpenStep applications on 
Windows, so there's a good chance it could be made to work.

For Unix platforms at least, you should be able to do this:

	executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)

	(to mix a bunch of syntaxes)

during initialization before you've had a chance to chdir, and store 
that away on the interpreter struct. That should work unless you've 
gone out of your way to execute parrot with argv[0] set to something 
fake. I don't know what you'd do on Windows, but there must be 
something.

An embedded parrot would need to be told explicitly where to find its 
resources, just by using the API that standalone parrot would call to 
store this information.

JEff



Re: Method Name Truncation in PIR

2004-04-15 Thread Leopold Toetsch
Chromatic [EMAIL PROTECTED] wrote:
 Method 'layou' not found
 in file '(unknown file)' near line -1

Did you turn on debugging? Most of these name mangling and string
constant stuff should be covered, e.g.:

$ parrot -d /tmp/object-meths_15.pasm 21 | grep meth

leo


Re: ICU data file location issues

2004-04-15 Thread Jonathan Worthington
Jeff Clites [EMAIL PROTECTED] wrote:
 On Apr 14, 2004, at 10:20 PM, Jarkko Hietaniemi wrote:

  Finding stuff relative to the executable/DLL would be coolest scheme,
  but that is admittedly somewhat tricky to get working cross-platform.

 Excellent idea. Pretty much every single resource in Cocoa applications
 and frameworks on Mac OS X is located using a scheme such as this, and
 I believe it all used to work correctly for OpenStep applications on
 Windows, so there's a good chance it could be made to work.

 For Unix platforms at least, you should be able to do this:

 executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)

 (to mix a bunch of syntaxes)

 during initialization before you've had a chance to chdir, and store
 that away on the interpreter struct. That should work unless you've
 gone out of your way to execute parrot with argv[0] set to something
 fake. I don't know what you'd do on Windows, but there must be
 something.

Strangely enough, I'm in the middle of putting something like this in place
for another project...  On Win32 you do:-

GetModuleFileName(NULL, buffer, buffer_size)

Passing NULL in as the first parameter returns the path to the executable
the currently executing process (e.g. Parrot in our case) was created from.
You then just need to chop off the executable name to find your path.

Jonathan




Re: Plans for string processing

2004-04-15 Thread Leopold Toetsch
Aaron Sherman [EMAIL PROTECTED] wrote:

 So, why is that:

   my dog Fiffi:language(blah) eq my dog Fi\x{fb03}:langauge(blah)

 and not

   use language blah;
   my dog Fiffi eq my dog Fi\x{fb03}

What, if this is:

$dog eq my dog Fi\x{fb03}

and C$dog hasn't some language info attached?

leo


Re: ICU data file location issues

2004-04-15 Thread Nicholas Clark
On Wed, Apr 14, 2004 at 11:25:22PM -0700, Jeff Clites wrote:

 For Unix platforms at least, you should be able to do this:
 
   executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)
 
   (to mix a bunch of syntaxes)
 
 during initialization before you've had a chance to chdir, and store 
 that away on the interpreter struct. That should work unless you've 
 gone out of your way to execute parrot with argv[0] set to something 
 fake. I don't know what you'd do on Windows, but there must be 
 something.

I think that it can be fun on HP-UX (where for #! the kernel sets argv[0]
to the path of the script not the interpreter, despite the fact that the
script's path is going to be somewhere else in argv) and AIX (where it seems
that the kernel sets argv[0] to only the leafname of the interpreter,
rather than the full path).

But all this is from memory, and in turn for #! invocation one can always
parse the #! line to work out where the interpreter was (mmm. race
condition)

Nicholas Clark


Re: Plans for string processing

2004-04-15 Thread Michael Scott
On 14 Apr 2004, at 20:16, Larry Wall wrote:

I think the idea of tagging complete strings with language is not
terribly useful.  If it's to be of much use at all, then it should
be generalized to a metaproperty system for applying any property to
any range of characters within a string, such that the properties
float along with the characters they modify.  The whole point of
doing such properties is to be able to ignore them most of the time,
and then later, after you've constructed your entire XML document,
you can say, Oh, by the way, does this character have the toetsch
property?  There's no point in tagging text with language if 99%
of it gets turned into Dunno, or English, but not really.
It seems natural to associate language with utterances. When these 
utterances are written down - or as I'm doing here, skipping the 
speaking part and uttering straight to text - then the association 
still works. But once we start emitting written things (strings) in a 
less aural way, then the notion of an associated language can easily 
become forced or inaccurate.

The process whereby we read a string like

Is bthis/b string in Englisch?

is generally a kind of lossy conversion to our language of preference 
for that particular string. It's very difficult for us to do otherwise. 
This natural generalization means that there will always be a demand 
for strings to have language associated with them, no matter how 
illogical it may seem to those who reflect upon it a bit.

I think it is this user state that Dan is trying to support. And, in so 
far as it models natural and common perception, I think I agree with 
him.

Lossy conversion is a kind of info-sin, especially when it should be 
avoided. There are circumstances where it would be more natural to read 
the above string as

Is open-bold-tag this close-bold-tag string in 
the-German-word-for-English question mark

i.e. when we are being more precise.

It is for this more precise user state that we would be preserving 
information on substrings.

There are plenty of strings which are simply never intended to be 
uttered, and therefore are effectively language-less. And many strings 
obviously in particular languages are often treated as if they weren't. 
It would be odd to submit the processing of such strings to a 
requirement of non or useless information preservation. Any sensible 
user would want to turn off language processing in such cases.

So, we need to ask the user their state, and have the necessary level 
of support in place to be able to behave accordingly.

Looking at this from an object-oriented perspective I can't help but 
wonder why we don't have a hierarchy of Parrot string types

String
LanguageString
MultiLanguageString
with a left wins rule for composition.

Mike






Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.
Well, then...

Given that everyone's weighing in on this one, it seems worthy of 
sane consideration. (I keep not thinking about this, as I'm used to 
the nicely sane VMS logical system :)

As we've got the unpleasant issues of OSes with Really Lame schemes, 
and embedders that may want to use alternate resource locations, it 
seems like the right thing to do here is to make this a part of the 
embedding interface and have the main parrot wrapper set it.

So, I'm thinking a few things:

1) We add a Parrot_set_library_base(char *lib_path, int length) 
function to set the base library path
2) We add a Parrot_get_base_library_path() function to the 
platform-specific interface so platforms can return the base path
3) Parrot itself (the main executable) has a static, global 1K buffer 
in it that starts and ends with some recognizable string (like, say, 
***+++***START| and |END***+++***) so we can find it and 
overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to argue 
that, though :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jarkko Hietaniemi
Dan Sugalski wrote:
 At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
 
TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.
 
 
 Well, then...
 
 Given that everyone's weighing in on this one, it seems worthy of 
 sane consideration. (I keep not thinking about this, as I'm used to 
 the nicely sane VMS logical system :)

Brag :-)

(in case someone is wondering, the VMS logicals nicely solve this
problem, basically by each piece of software being installed into and
used/accessed throuh a super environment variable-- so basically Dan
can't understand why us others are having these problems and talk of it
as a new fancy thing :-)





Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 6:23 PM +0300 4/15/04, Jarkko Hietaniemi wrote:
Dan Sugalski wrote:
 At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:

TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.


 Well, then...

 Given that everyone's weighing in on this one, it seems worthy of
 sane consideration. (I keep not thinking about this, as I'm used to
 the nicely sane VMS logical system :)
Brag :-)
:-P

(in case someone is wondering, the VMS logicals nicely solve this
problem, basically by each piece of software being installed into and
used/accessed throuh a super environment variable-- so basically Dan
can't understand why us others are having these problems and talk of it
as a new fancy thing :-)
Oh, and have I mentioned they're group and system wide, persistent, 
group-protected, and leveled by protection, so they're actually safe 
to trust? (So if you look for an entry in a system logical table you 
can trust it, since someone needed compromise-the-world privs to set 
it in the first place so you've got bigger things to worry about if 
it's bad? :)

Not to, y'know, show off or anything. :)
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU data file location issues

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 3:03 AM, Nicholas Clark wrote:

But all this is from memory, and in turn for #! invocation one can 
always
parse the #! line to work out where the interpreter was (mmm. race
condition)
And a race isn't too bad here actually--even if we know the path 
reliably, it's always possible to move or alter the resources which 
we're trying to locate, at any time (before, during, or after launching 
the process). So we need to treat them with as much skepticism as 
anything else on the file system.

And for parrot-the-executable we should offer a command-line parameter 
to override the location. That would give people an escape hatch for 
special situations (for instance, if you are going to chroot or 
something).

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to argue 
that, though :)
Makes sense to me to just store the path--keep it simple. As long as 
we've stored it away, anything using it later can chop it up into 
pieces itself if it wants too--anything we could have done in splitting 
it up, the consumer can do too. The only thing we really have to do is 
grab the info before it's too late--before something might have 
chdir'd, and before argv is either inaccessible, or could have been 
overwritten.

JEff



Re: Method Name Truncation in PIR

2004-04-15 Thread chromatic
On Thu, 2004-04-15 at 00:58, Leopold Toetsch wrote:

 Did you turn on debugging? Most of these name mangling and string
 constant stuff should be covered, e.g.:
 
 $ parrot -d /tmp/object-meths_15.pasm 21 | grep meth

Aha, here's an interesting difference.  I've been using single quotes
for string constants.  Here's what happens when I change the double
quotes around the method name to single quotes in that test:

emit newclass P3, Foo
emit find_type I0, Foo
emit new P2, I0
emit set S0, 'meth
emit fetchmethod P0, P2, S0
emit print main\n
emit invokecc 
emit print back\n
emit fetchmethod P0, P3, S0
emit set P2, P3
emit invokecc 
emit print back\n
emit end 
emit  _Foo@@@meth:
emit print in meth\n
emit invoke P1

For what it's worth, if I switch back and forth, as in this PASM:

.local pmc args
new args, .PerlHash
set args['height'], 100
set args[width],  100
set args['bpp'],  0
set args[flags],1

The debug output indicates:

emit new P16, 33
emit set P16['height], 100
emit set P16[width], 100
emit set P16['bpp], 0
emit set P16[flags], 1

That may not be the root cause, but it's certainly suspicious.

-- c



Re: Plans for string processing

2004-04-15 Thread Aaron Sherman
On Thu, 2004-04-15 at 05:00, Leopold Toetsch wrote:
 Aaron Sherman [EMAIL PROTECTED] wrote:
 
  So, why is that:
 
  my dog Fiffi:language(blah) eq my dog Fi\x{fb03}:langauge(blah)
 
  and not
 
  use language blah;
  my dog Fiffi eq my dog Fi\x{fb03}
 
 What, if this is:
 
   $dog eq my dog Fi\x{fb03}
 
 and C$dog hasn't some language info attached?

Looks good to me. Great example!

Seriously, why is that a problem? That was my entry-point to this
conversation: I just don't see any case in which performing a comparison
of ANY two strings according to whatever arbitrary SINGLE language rules
is a problem. I cannot imagine the case where you need two or more
language rules AND could start off with any sense of what that would
mean, and even if you could contrive such a case, I would suggest that
its rarity should dictate it being attached to a class that defines a
string-like object which mutates its behavior based on the language
spoken by the maintainer of the database from which it was fetched or
somesuch.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to argue 
that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate paths 
for parrot's low-level libraries (basically the things we need for 
parrot to run in the first place) and higher-level libraries (modules 
installed off of CPAN and whatnot).
That's true. But as long as we grab the here's where the executable 
is, we can (later) build API on top of that if we want. For instance, 
we could decide that core, low-level resources will be located relative 
to that path, and one of those resources will undoubtedly be a config 
file of some sort, and that config file could contain the path(s) to 
look for higher-level stuff. As long as we've rescued and stored our 
location, we've sort of bootstrapped that process.

(And to loop back a bit, the nice thing about bootstrapping this stuff 
based on our executable's location is that it makes it a no-brainer to 
have multiple, relocatable installs of parrot. And people would even be 
able to have 10 different versions of parrot sitting around, but have 
them all configured to share the same high-level resources.)

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we need 
for parrot to run in the first place) and higher-level libraries 
(modules installed off of CPAN and whatnot). I'm firmly in the Don't 
care camp here, so I figured I'd open it to discussion before 
enshrining the result in the API. :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Brent 'Dax' Royal-Gordon
Dan Sugalski wrote:
1) We add a Parrot_set_library_base(char *lib_path, int length) function 
to set the base library path
2) We add a Parrot_get_base_library_path() function to the 
platform-specific interface so platforms can return the base path
Works for me...

3) Parrot itself (the main executable) has a static, global 1K buffer in 
it that starts and ends with some recognizable string (like, say, 
***+++***START| and |END***+++***) so we can find it and overwrite 
the contents if the library gets moved, for use on platforms where the 
only way to put a path in is to stick it statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.
Sound sane? I can see splitting up the library base path into sections, 
but I'm not sure it's worth it. Now'd be the time to argue that, though :)
--
Brent Dax Royal-Gordon [EMAIL PROTECTED]
Perl and Parrot hacker
Oceania has always been at war with Eastasia.




Re: Method Name Truncation in PIR

2004-04-15 Thread Leopold Toetsch
chromatic wrote:

	emit set P16['height], 100
Ah. Relikt of Jeff's patch. If that constant got reused elsewhere, e.g. 
as a method name, it were one too short.

Fixed.
leo


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we 
need for parrot to run in the first place) and higher-level 
libraries (modules installed off of CPAN and whatnot).
That's true. But as long as we grab the here's where the executable 
is, we can (later) build API on top of that if we want.
Well, yeah, but... where the executable is ought, honestly, to be 
irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that 
I'll have parrot's library files hanging off of /usr/bin. And if I've 
got a few hundred machines with parrot's library NFS mounted in 
different places (to match conflicting vendor standards and other 
whackjob breakage which is endemic in, well, the world) it really 
falls down. :) Add to that you can't always figure out where Parrot 
really is both because of chroot behaviour and some odd where am I 
really problems with suid scripts in some places.

There are a couple of folks who could make your brain melt and flow 
out your ears with all this stuff too.

Having the executable path as an optional way to get the info's not 
necessarily a bad thing, but I think it's safe to say that it's not 
The Right Thing. (If there even is one)

If nothing else this has convinced me we need a way to specify site 
policy at build time for all this nonsense^Wfun. :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 9:05 AM -0700 4/15/04, Brent 'Dax' Royal-Gordon wrote:
Dan Sugalski wrote:
3) Parrot itself (the main executable) has a static, global 1K 
buffer in it that starts and ends with some recognizable string 
(like, say, ***+++***START| and |END***+++***) so we can find 
it and overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
There isn't one, alas, at least for some people.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?
That's global. Bad idea, it messes up multiple installs of the same 
version, or similar-enough versions that they're indistinguishable.

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.
This wouldn't be a bad thing, nope. I could see security issues--it'd 
probably be better to link the config file right into parrot.

--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Method Name Truncation in PIR

2004-04-15 Thread chromatic
On Thu, 2004-04-15 at 09:18, Leopold Toetsch wrote:

 Ah. Relikt of Jeff's patch. If that constant got reused elsewhere, e.g. 
 as a method name, it were one too short.

Confirmed.  Thanks, Leo!

Would a test patch such as the following be good to catch regressions,
or should it go elsewhere?  If elsewhere, do you prefer a separate test
in object-meths.t or somewhere in imcc/t?

-- c


Index: t/pmc/object-meths.t
===
RCS file: /cvs/public/parrot/t/pmc/object-meths.t,v
retrieving revision 1.17
diff -u -u -r1.17 object-meths.t
--- t/pmc/object-meths.t	10 Apr 2004 12:50:23 -	1.17
+++ t/pmc/object-meths.t	15 Apr 2004 16:54:04 -
@@ -428,7 +428,7 @@
 find_type I0, Foo
 new P2, I0
 
-set S0, meth
+set S0, 'meth'
 fetchmethod P0, P2, S0
 print main\n
 # P2, S0 are as in callmethod


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Brent 'Dax' Royal-Gordon
Dan Sugalski wrote:
#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?
That's global. Bad idea, it messes up multiple installs of the same 
version, or similar-enough versions that they're indistinguishable.
Good point.

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.
This wouldn't be a bad thing, nope. I could see security issues--it'd 
probably be better to link the config file right into parrot.
Install it with root ownership and 644 permissions, in a directory with 
similar settings.  (Or the system's equivalent, of course.)  Then put 
big blinking security warnings wherever the documentation talks about 
editing that file.  We can't protect sysadmins from their own idiocy.

--
Brent Dax Royal-Gordon [EMAIL PROTECTED]
Perl and Parrot hacker
Oceania has always been at war with Eastasia.


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jarkko Hietaniemi
 Well, yeah, but... where the executable is ought, honestly, to be 
 irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that 
 I'll have parrot's library files hanging off of /usr/bin.

Bah.  BAH, I say.  The /usr/bin/parrot is of course a symlink
to, say, /platform/os/version/parrot/version/bin/parrot, and we
parse the real path, not the symlink.

  And if I've got a few hundred machines with parrot's library NFS mounted in 
 different places (to match conflicting vendor standards and other 
 whackjob breakage which is endemic in, well, the world) it really 
 falls down. :) Add to that you can't always figure out where Parrot 
 really is both because of chroot behaviour and some odd where am I 
 really problems with suid scripts in some places.
 
 There are a couple of folks who could make your brain melt and flow 
 out your ears with all this stuff too.

Yes, I was once one of those people :-)

 Having the executable path as an optional way to get the info's not 
 necessarily a bad thing, but I think it's safe to say that it's not 
 The Right Thing. (If there even is one)
 
 If nothing else this has convinced me we need a way to specify site 
 policy at build time for all this nonsense^Wfun. :)



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote:

At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we need 
for parrot to run in the first place) and higher-level libraries 
(modules installed off of CPAN and whatnot).
That's true. But as long as we grab the here's where the executable 
is, we can (later) build API on top of that if we want.
Well, yeah, but... where the executable is ought, honestly, to be 
irrelevant.
Yes, in a sense it's irrelevant, but it's the only thing that's 1:1 
with a particular copy of parrot. It's the only thing (that I can 
think of) which continues to work if you move your distro around, and 
which naturally avoids problems with having multiple copies, and lets 
things work even if you don't install.

If I've stuck Parrot in /usr/bin it seems unlikely that I'll have 
parrot's library files hanging off of /usr/bin.
Right, so you do what Mac OS X does with the java executable--you put a 
symlink in /usr/bin, pointing to the real location. And your path to 
the executable has to call realpath() or the equivalent to resolve 
such symlinks (which you need to do in order for path logic to 
do-the-right-thing).

And if I've got a few hundred machines with parrot's library NFS 
mounted in different places (to match conflicting vendor standards and 
other whackjob breakage which is endemic in, well, the world) it 
really falls down. :)
I'm not sure I get your meaning here. By executable, I mean 
standalone-parrot, not libparrot, of course. If you mean that libparrot 
might end up in 100 different places, then you'll not end up with the 
dynamic linker finding things properly, so you'll have a bigger problem 
to solve. If you mean that standalone-parrot could end up in 100 
different places, then you're going to have 100 different ways you need 
to set up $PATH just to launch it, but once it's executing you'd still 
be fine. Or each host will have its own separate symlink in /usr/bin to 
the right location for that host, and everything will just be fine.

Add to that you can't always figure out where Parrot really is both 
because of chroot behaviour and some odd where am I really problems 
with suid scripts in some places.
With chroot, frankly, you have the same problem with DLLs, and you end 
up needing to have all of your necessary external resources located in 
your chroot-dir so that their paths after the chroot match their paths 
before. So that was a bad example on my part, really. (And, if you are 
chroot-ing from within a parrot script, you're in a place where you'd 
want to re-point your config dir path to match.)

But with interpreter files we could have the problem that the kernel 
hides the info from us. But for bytecode files, if they're launched 
like java apps are launched, with parrot foo, then that problem 
wouldn't come up.

Having the executable path as an optional way to get the info's not 
necessarily a bad thing, but I think it's safe to say that it's not 
The Right Thing. (If there even is one)
Yeah, I don't think there's a 100% solution, but it would be nice to 
have something which works 95% of the time and is flexible/convenient, 
in preference to something that works 96% of the time and is less 
powerful.

I think a reasonable approach would be:

1) Always allow the config location to be overridden via a command-line 
parameter, and change-able from bytecode. (That let's you be 100% 
unambiguous, at the cost of needing to execute parrot in a particular 
way. And it's convenient for testing against a whole bunch of different 
sets of configs without rebuilding.)

2a) On platforms which support it, auto-find the executable, and base 
the config path on that.

2b) On platforms which don't support that (and even, as a compile-time 
option for those which support it), have a compiled-in path to use.

This basically matches the API you mentioned before, and boils down to 
what gets passed to Parrot_set_library_base() (or, call it 
Parrot_set_configuration_base maybe) at launch time--it gets passed 
either an explicitly supplied value, an inferred value, or a 
compiled-in value).

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Leopold Toetsch
Brent 'Dax' Royal-Gordon [EMAIL PROTECTED] wrote:
 Dan Sugalski wrote:
 ***+++***START| and |END***+++***) so we can find it and overwrite

 That's pretty disgusting, but I don't know that I have a better idea.

Same scheme as with fingerprint.c?

leo


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 9:05 AM, Brent 'Dax' Royal-Gordon wrote:

Dan Sugalski wrote:
3) Parrot itself (the main executable) has a static, global 1K buffer 
in it that starts and ends with some recognizable string (like, say, 
***+++***START| and |END***+++***) so we can find it and 
overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
It's yucky, but it matches what's done for dynamic libs, at least on 
some platforms. (That is, at build-time a library gets its 
path-where-I'll-be-installed compiled into it, and apps linked against 
that lib copy that path into themselves, so that at runtime the dynamic 
linker searches that location, in addition to standard locations, to 
find the library. And, there's then a tool which lets you modify you 
library to change its built-in install location, without re-compiling.) 
So that least there's precedent.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to 
search
for PMCs, and whatever else we can think of, without a hardcoded limit.
The idea (for me, at least) was to specify a directory, and the config 
file could be a conventional name relative to that--that lets you 
locate multiple resources without having do read on the config file in 
order to find them. And semantically, I think of it not as the 
executable's path--that just happens to be something that's 1:1 with a 
particular copy of parrot. And definitely not libparrot's 
path--embedded cases would have to specify the path explicitly, though 
they could partially mimic the same scheme.

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 10:23 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote:

At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we 
need for parrot to run in the first place) and higher-level 
libraries (modules installed off of CPAN and whatnot).
That's true. But as long as we grab the here's where the 
executable is, we can (later) build API on top of that if we want.
Well, yeah, but... where the executable is ought, honestly, to be irrelevant.
Yes, in a sense it's irrelevant, but it's the only thing that's 1:1 
with a particular copy of parrot. It's the only thing (that I can 
think of) which continues to work if you move your distro around, 
and which naturally avoids problems with having multiple copies, and 
lets things work even if you don't install.
At this point I can say I don't honestly care all that much, and most 
of my worries are based on vague feelings that there are platforms 
out there where finding the actual executable name is somewhere 
between hard and impossible. I will, then, do the sensible thing and 
just punt on this--we can work out a best practices thing and 
enshrine it as the default on systems which can support it and be 
done with it.

The other question, then, is do we see the need for multiple 
categories of library which would want separately settable library 
paths? (Don't, here, forget the potential needs of embedders such as 
Apache) Once we get that thumped out I'll make the API additions.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 9:41 AM, Dan Sugalski wrote:

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to 
search
for PMCs, and whatever else we can think of, without a hardcoded 
limit.
This wouldn't be a bad thing, nope. I could see security issues--it'd 
probably be better to link the config file right into parrot.
There'll be the same security issue with anything located on the 
filesystem--the config is not particularly worse than anything else 
(DLLs, etc.). The security of anything you run is only as good as the 
integrity of the filesystem used to locate the resources. 
(Specifically, if I were a hacker and could compromise your system by 
replacing the config, I just as easily replace parrot itself.) But it 
would be nice to bake in things which you can't really change without 
rebuilding anyway--thinks like UINTVAL size, etc. Monkeying with them 
after-the-fact would be a definite security risk (buffer overruns, 
etc.), and wouldn't ever be useful. But stuff like finding ICU's data 
files (or add-on libraries) we'd want to be easily changeable without a 
rebuild. (And again, if you have to rebuild to change them, then people 
will tend to keep around the tools needed to do that, which would give 
a hacker the tools they need to do the same.) But we certainly need to 
define/articulate a security model, no matter what approach we take. 
(But my gut reaction is always against something which decreases 
flexibility, and only _seems_ to increase security.)

But there of course are security issues with anything located relative 
the the cwd(). (That is, if resources are located relative to the cwd, 
then I can trick you into loading my copies by taking you into 
chdir-ing into my home directory.)

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 10:30 AM, Jeff Clites wrote:

And semantically, I think of it not as the executable's path--that  
just happens to be something that's 1:1 with a particular copy of  
parrot. And definitely not libparrot's path--embedded cases would have  
to specify the path explicitly, though they could partially mimic the  
same scheme.
I take that back--the path to the library might actually work just as  
well (and may or may not be less ambiguous to find; the dynamic linker  
had to find it, and may have left breadcrumbs). This is all, by the  
way, exactly the NSBundle/CFBundle API from Mac OS X (and before that,  
OpenStep). See:  
http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ 
ObjC_classic/Classes/NSBundle.html.

JEff



ICU data loading and platform support

2004-04-15 Thread George R
Hello Perl6 people,

I couldn't help but notice that you were talking about ICU on this 
mailing list.  Let me interject with some suggestions.

I should mention that ICU 2.6 can build a static data library.  I 
recommend that ICU be built without the --with-data-packaging=archive 
configure option. You will probably have fewer path issues if you did 
that, and used ICU static data library.

For those people that are interested here is some more information about 
building ICU data and its options: 
http://oss.software.ibm.com/icu/userguide/icudata.html.

If you are having problems and need to patch ICU, you should consider 
submitting the patches to ICU to our jitterbug system 
http://www.jtcsv.com/cgibin/icu-bugs.  If you submit them early 
enough, the changes may be able to make ICU 3.0.  ICU 3.0 should out in 
mid June.  I'm sure that you don't want to keep patching ICU every time 
you upgrade to a new version of ICU.

ICU 3.0 should work a little better with Windows and Cygwin.  ICU 3.0 
should also build faster than before due to some build changes done 
recently.  I'm sure some people on this list would be interested in that.

When the alpha and beta releases of ICU come out in a few weeks, I 
recommend someone from this group try building it on your machines.  
Some people here seem to have access to some machine configurations that 
are unavailable to the ICU team.  Testing these pre-releases will help 
to verify that ICU release is as portable as possible.

I'm glad to see that perl will be improving its Unicode support :-)

George Rhoten
http://oss.software.ibm.com/icu/


Re: ICU data loading and platform support

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 11:12 AM, George R wrote:

I couldn't help but notice that you were talking about ICU on this 
mailing list.  Let me interject with some suggestions.
Thanks much for the message.

I should mention that ICU 2.6 can build a static data library.  I 
recommend that ICU be built without the --with-data-packaging=archive 
configure option. You will probably have fewer path issues if you did 
that, and used ICU static data library.
I went with the --with-data-packaging=archive initially for 3 pragmatic 
reasons: (1) it seems to take a really, really long time to build them 
into a library, and (2) once parrot ships, if we use 
--with-data-packaging=archive or --with-data-packaging=files then that 
would permit end users to add/remove encoding without needing access to 
a compiler, and (3) our automated tests end up linking in our 
libraries, so building the ICU data as a static library slows this down 
significantly, since it has to copy around all of the bits for each 
test.

But (1) and (3) are just short-term, convenient-for-ongoing-development 
reasons.

Long term, it make sense to expose all of the packaging choices to the 
parrot build-configuration process, and let end users decide what's 
best for them.

In the short term, this is having the nice side effect of forcing some 
issues of resource location that we'll need to solve for other 
resources anyway.

If you are having problems and need to patch ICU, you should consider 
submitting the patches to ICU to our jitterbug system 
http://www.jtcsv.com/cgibin/icu-bugs.
I have a few (bugs if not patches) to submit. (For instance, with ICU 
2.8, building the tools seems to fail in the case of --enable-static 
--disable-shared, because of the s prefix.) I'll be in touch!

Thanks again,

JEff



Re: ICU data loading and platform support

2004-04-15 Thread George R
Jeff Clites wrote:

On Apr 15, 2004, at 11:12 AM, George R wrote:

I went with the --with-data-packaging=archive initially for 3 
pragmatic reasons: (1) it seems to take a really, really long time to 
build them into a library, and (2) once parrot ships, if we use 
--with-data-packaging=archive or --with-data-packaging=files then that 
would permit end users to add/remove encoding without needing access 
to a compiler, and (3) our automated tests end up linking in our 
libraries, so building the ICU data as a static library slows this 
down significantly, since it has to copy around all of the bits for 
each test.

But (1) and (3) are just short-term, 
convenient-for-ongoing-development reasons.
Item 1 should improve in ICU 3.0 on most platforms when a shared or 
static data library is used.

detail
Basically, ICU writes out the .dat file to computer assembly, and lets 
the C compiler create the object code from the assembly.  This is 
similar to how the Windows builds work.  It's all data, and there are no 
instructions in the assembly.  It's also much quicker than the original 
implementation.  There are some platforms that can't work with this 
build speed improvement without some porting help (please contact me if 
you want to help out).  If the new building process can't be done on a 
platform, it uses the original slow building process (this only happens 
when we don't have access to the compiler or platform for testing).
/detail

Item 2 can already be done when a static or shared data library is used 
(at least the add part).  If the ICU_DATA environment variable is set, 
or u_setDataDirectory() is used, you can add or override the data used 
within ICU's library.  If a user wanted to remove data, the user would 
need some of ICU's tools to unpackage and repackage the .dat archive, 
which requires a little detailed knowledge about what all the data and 
tools are used for.  The ICU User's Guide should help in those cases.

I'm not sure what you mean by item 3, but the new 3.0 data build process 
should hopefully help out.

I have a few (bugs if not patches) to submit. (For instance, with ICU 
2.8, building the tools seems to fail in the case of --enable-static 
--disable-shared, because of the s prefix.)
That's done on purpose.  The static library names collide with AIX 
shared library names and Windows import library names.  I recommend that 
you don't use --disable-shared and build both the static and shared 
libraries.  I don't think autoconf allowed us to remove the 
--disable-shared option.  I also doubt that the --disable-shared would 
work on all platforms anyway because library versioning is done 
differently (this is due to poor shared library versioning support on 
some platforms).  You can use the static libraries, but the tools kind 
of require shared libraries.  Patches to make ICU work with just static 
libraries, and with the current naming scheme, will probably be accepted.

I'll be in touch!
:-)

George
http://oss.software.ibm.com/icu/


Re: Method Name Truncation in PIR

2004-04-15 Thread Leopold Toetsch
Chromatic [EMAIL PROTECTED] wrote:

 On Thu, 2004-04-15 at 09:18, Leopold Toetsch wrote:

 Ah. Relikt of Jeff's patch. If that constant got reused elsewhere, e.g.
 as a method name, it were one too short.

 Confirmed.  Thanks, Leo!

Good.

 Would a test patch such as the following be good to catch regressions,

I didn't boil it down to a test. It was just a look at the patch, that
*could* have caused the bug. Simple usage of single quotes was ok.
Multiple usage too (constant folding jumps in). But name mangling +
shortened original symbol could cause the problem.

Just replacing double quotes with single is and was working. Only the
combination of different usage of one string constant could have
triggered the bug.

For a test you need (probably) usage of that string constant as a single
quoted string and as a method name - maybe in a different namespace.

leo


Re: Plans for string processing

2004-04-15 Thread Leopold Toetsch
Aaron Sherman [EMAIL PROTECTED] wrote:
 On Thu, 2004-04-15 at 05:00, Leopold Toetsch wrote:
  $dog eq my dog Fi\x{fb03}

 and C$dog hasn't some language info attached?

 Looks good to me. Great example!

 Seriously, why is that a problem?

Dan's problem to come up with better examples--or explanations :)

leo - resisting from further utterances WRT that topic in the absence of
The Plan(tm).


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 10:48 AM, Dan Sugalski wrote:

At this point I can say I don't honestly care all that much, and most 
of my worries are based on vague feelings that there are platforms out 
there where finding the actual executable name is somewhere between 
hard and impossible. I will, then, do the sensible thing and just punt 
on this--we can work out a best practices thing and enshrine it as the 
default on systems which can support it and be done with it.
I think it's worth trying out--if it works out, we can build on it; if 
it doesn't, we can rip it out/redo it. (And, the API could probably 
stay the same.)

The other question, then, is do we see the need for multiple 
categories of library which would want separately settable library 
paths? (Don't, here, forget the potential needs of embedders such as 
Apache) Once we get that thumped out I'll make the API additions.
We should probably start simple and build, but this would make sense to 
me (API names are just suggestions):

Parrot_get_configuration_base_path() -- returns the automagically 
determined path, unless the corresponding 
Parrot_set_configuration_base_path() had been called to set it to 
something else.

We could then have individual API to pick out specific resources based 
on that, but instead, this would be cleaner/simpler:

Parrot_get_path_for_resource(STRING *resource_name) -- returns the 
equivalent of Parrot_get_configuration_base_path()./.resource_name, 
unless you had called Parrot_set_path_for_resource(STRING 
*resource_name, STRING *path) to set the path for this particular 
resource to something else. Internally, this could special case certain 
resources, if needed.

This setup let's us have a stable API, but over time add to the list of 
things we would look up.

So (assuming for the moment a default layout similar to what we current 
have), in-core I can call 
Parrot_get_path_for_resource(library/config.pimc) and 
Parrot_get_path_for_resource(runtime/parrot/dynext) to locate these 
resources, by default inside of the base dir. But if I want to have a 
totally funky layout (in an embedding context, or just if I'm in a 
weird mood), all I need to do is explicitly call the set method (from 
setup code or from bytecode) to re-point where I find a particular 
resource.

(So the logic for that could just be to do a hash lookup for any 
explicitly set values, and fall back to simple concatenation if nothing 
was in the hash.)

That would all be fairly simple, yet expandable.

JEff



Re: Plans for string processing

2004-04-15 Thread Dan Sugalski
At 11:55 PM +0200 4/15/04, Leopold Toetsch wrote:
Aaron Sherman [EMAIL PROTECTED] wrote:
 On Thu, 2004-04-15 at 05:00, Leopold Toetsch wrote:
	$dog eq my dog Fi\x{fb03}

 and C$dog hasn't some language info attached?

 Looks good to me. Great example!

 Seriously, why is that a problem?
Dan's problem to come up with better examples--or explanations :)
Nah, that turns out not to be the case. It's my plan, and it's 
reasonable to say I'm OK with it. :) While I'd prefer to have 
everyone agree, I can live with it if people don't.

leo - resisting from further utterances WRT that topic in the absence of
The Plan(tm).
The Plan is in progress, though I admit I'm tempted to hit easier and 
less controvertial things (like, say, threads or events) first.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: {CVS ci] alternate object initializer calling scheme

2004-04-15 Thread chromatic
On Sat, 2004-04-10 at 01:49, Leopold Toetsch wrote:

  This initializer is available as first param in the init method.
  I'm happy with this.
 Good.

What needs to be done before making it the default?  I'm anxious to
remove CALL__BUILD=1 from my parrot alias.

 We don't have it yet. We could use vtable-destroy but I'd rather have
 vtable-finalize. -destroy does low-level cleanup of Parrot classes
 (i.e. free(3) memory. -finalize (a distinct vtable method) could do the
 higher-level object finalization. Here could be also the place, where
 destruction ordering is done.

That sounds reasonable.  It'd certainly be nice to be able to free the
memory of external resources I hold in SDL::* objects.

-- c



Re: Plans for string processing

2004-04-15 Thread Aaron Sherman
On Thu, 2004-04-15 at 23:13, Dan Sugalski wrote:

 Nah, that turns out not to be the case. It's my plan, and it's 
 reasonable to say I'm OK with it. :) While I'd prefer to have 
 everyone agree, I can live with it if people don't.

Perhaps, as usual, I've been too verbose and everyone just skipped over
what I thought were useful questions, but I came into this thinking I
must just not get it... now I'm left with the feeling that there are
some basic questions no one is asking here. Don't respond to this
message, but please keep these questions in mind as you start to
implement... whatever it is that you're going to implement for this.

 1. People have referred to comparing names, but most of the things
that make comparing names hard exist with respect to NAMES, and
not arbitrary strings (e.g. McLean is very different from
substr(358dsMcLeannbv35d,5,6) That is not something that
attaching metadata to a string is likely to resolve.
 2. There is no universal interchange rule-set (that I have ever
heard of) for operating on sequences of characters with respect
to two or more different languages at once, you have to pick a
language's (or culture's) rules to use, otherwise you are
comparing (or operating on) apples and oranges.
 3. In any given comparison type operation, one side's rules will
have to become dominant for that operation. Woefully, you have
no realistic way to decide this at run-time (e.g. because going
with LHS-wins would result in sorts potentially getting C($a
cmp $b) == 1 and C($b cmp $a) == 1 which can result in
infinite sort times.
 4. Given 1..3, you will probably have to implement some kind of
language context system (in most languages, this is handled by
locale) at some point, and it may need to take priority over the
language property of the strings that it operates on in certain
cases.
 5. Given 4, all unary operators become, for example,
{
set_current_locale($s.langauge);
uc($s.data)
}
Which is, after all what most languages do anyway, but they keep
that language information as a piece of global state. Allowing
just for lexical scoping of such things would be very nice.
 6. Separate from 1..5, language is an interesting property to
associate with strings, but so are a vast number of other
properties. Why are all of them second class citizens WRT
parrot, but not language? Why not build a class one level of
abstraction above raw strings which can bear arbitrary
properties?
 7. Which programming language does Parrot wish to host which
requires unique language tagging of all string data? Would this
perhaps be better left for a 2.0 feature, once the needs of the
client languages are better understood?

Ok, that's my peace. Thanks for taking the time. I'll be over here
watching now.

 easier and less controvertial things (like, say, threads or events) first.

Hah! That's rich!

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part