Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.
Well, then...

Given that everyone's weighing in on this one, it seems worthy of 
sane consideration. (I keep not thinking about this, as I'm used to 
the nicely sane VMS logical system :)

As we've got the unpleasant issues of OSes with Really Lame schemes, 
and embedders that may want to use alternate resource locations, it 
seems like the right thing to do here is to make this a part of the 
embedding interface and have the main parrot wrapper set it.

So, I'm thinking a few things:

1) We add a Parrot_set_library_base(char *lib_path, int length) 
function to set the base library path
2) We add a Parrot_get_base_library_path() function to the 
platform-specific interface so platforms can return the base path
3) Parrot itself (the main executable) has a static, global 1K buffer 
in it that starts and ends with some recognizable string (like, say, 
"***+++***START|" and "|END***+++***") so we can find it and 
overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to argue 
that, though :)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jarkko Hietaniemi
Dan Sugalski wrote:
> At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
> 
>>TT (Tangentially Topical): it would be nice if Parrot could avoid as
>>many hardcoded paths as possible for configs, libraries, and such, so
>>that the Parrot installation could be relocated as freely as possible.
> 
> 
> Well, then...
> 
> Given that everyone's weighing in on this one, it seems worthy of 
> sane consideration. (I keep not thinking about this, as I'm used to 
> the nicely sane VMS logical system :)

Brag :-)

(in case someone is wondering, the VMS "logicals" nicely solve this
problem, basically by each piece of software being installed into and
used/accessed throuh a "super environment variable"-- so basically Dan
can't understand why us others are having these problems and talk of it
as a new fancy thing :-)





Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 6:23 PM +0300 4/15/04, Jarkko Hietaniemi wrote:
Dan Sugalski wrote:
 At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:

TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.


 Well, then...

 Given that everyone's weighing in on this one, it seems worthy of
 sane consideration. (I keep not thinking about this, as I'm used to
 the nicely sane VMS logical system :)
Brag :-)
:-P

(in case someone is wondering, the VMS "logicals" nicely solve this
problem, basically by each piece of software being installed into and
used/accessed throuh a "super environment variable"-- so basically Dan
can't understand why us others are having these problems and talk of it
as a new fancy thing :-)
Oh, and have I mentioned they're group and system wide, persistent, 
group-protected, and leveled by protection, so they're actually safe 
to trust? (So if you look for an entry in a system logical table you 
can trust it, since someone needed compromise-the-world privs to set 
it in the first place so you've got bigger things to worry about if 
it's bad? :)

Not to, y'know, show off or anything. :)
--
Dan
--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to argue 
that, though :)
Makes sense to me to just store the path--keep it simple. As long as 
we've stored it away, anything using it later can chop it up into 
pieces itself if it wants too--anything we could have done in splitting 
it up, the consumer can do too. The only thing we really have to do is 
grab the info before it's too late--before something might have 
chdir'd, and before argv is either inaccessible, or could have been 
overwritten.

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to argue 
that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate paths 
for parrot's low-level libraries (basically the things we need for 
parrot to run in the first place) and higher-level libraries (modules 
installed off of CPAN and whatnot).
That's true. But as long as we grab the "here's where the executable 
is", we can (later) build API on top of that if we want. For instance, 
we could decide that core, low-level resources will be located relative 
to that path, and one of those resources will undoubtedly be a config 
file of some sort, and that config file could contain the path(s) to 
look for higher-level stuff. As long as we've "rescued" and stored our 
location, we've sort of bootstrapped that process.

(And to loop back a bit, the nice thing about bootstrapping this stuff 
based on our executable's location is that it makes it a no-brainer to 
have multiple, relocatable installs of parrot. And people would even be 
able to have 10 different versions of parrot sitting around, but have 
them all configured to share the same high-level resources.)

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we need 
for parrot to run in the first place) and higher-level libraries 
(modules installed off of CPAN and whatnot). I'm firmly in the "Don't 
care" camp here, so I figured I'd open it to discussion before 
enshrining the result in the API. :)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Brent 'Dax' Royal-Gordon
Dan Sugalski wrote:
1) We add a Parrot_set_library_base(char *lib_path, int length) function 
to set the base library path
2) We add a Parrot_get_base_library_path() function to the 
platform-specific interface so platforms can return the base path
Works for me...

3) Parrot itself (the main executable) has a static, global 1K buffer in 
it that starts and ends with some recognizable string (like, say, 
"***+++***START|" and "|END***+++***") so we can find it and overwrite 
the contents if the library gets moved, for use on platforms where the 
only way to put a path in is to stick it statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.
Sound sane? I can see splitting up the library base path into sections, 
but I'm not sure it's worth it. Now'd be the time to argue that, though :)
--
Brent "Dax" Royal-Gordon <[EMAIL PROTECTED]>
Perl and Parrot hacker
Oceania has always been at war with Eastasia.




Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we 
need for parrot to run in the first place) and higher-level 
libraries (modules installed off of CPAN and whatnot).
That's true. But as long as we grab the "here's where the executable 
is", we can (later) build API on top of that if we want.
Well, yeah, but... where the executable is ought, honestly, to be 
irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that 
I'll have parrot's library files hanging off of /usr/bin. And if I've 
got a few hundred machines with parrot's library NFS mounted in 
different places (to match conflicting vendor standards and other 
whackjob breakage which is endemic in, well, the world) it really 
falls down. :) Add to that you can't always figure out where Parrot 
really is both because of chroot behaviour and some odd "where am I 
really" problems with suid scripts in some places.

There are a couple of folks who could make your brain melt and flow 
out your ears with all this stuff too.

Having the executable path as an optional way to get the info's not 
necessarily a bad thing, but I think it's safe to say that it's not 
The Right Thing. (If there even is one)

If nothing else this has convinced me we need a way to specify site 
policy at build time for all this nonsense^Wfun. :)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 9:05 AM -0700 4/15/04, Brent 'Dax' Royal-Gordon wrote:
Dan Sugalski wrote:
3) Parrot itself (the main executable) has a static, global 1K 
buffer in it that starts and ends with some recognizable string 
(like, say, "***+++***START|" and "|END***+++***") so we can find 
it and overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
There isn't one, alas, at least for some people.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?
That's global. Bad idea, it messes up multiple installs of the same 
version, or similar-enough versions that they're indistinguishable.

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.
This wouldn't be a bad thing, nope. I could see security issues--it'd 
probably be better to link the config file right into parrot.

--
Dan
--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Brent 'Dax' Royal-Gordon
Dan Sugalski wrote:
#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?
That's global. Bad idea, it messes up multiple installs of the same 
version, or similar-enough versions that they're indistinguishable.
Good point.

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.
This wouldn't be a bad thing, nope. I could see security issues--it'd 
probably be better to link the config file right into parrot.
Install it with root ownership and 644 permissions, in a directory with 
similar settings.  (Or the system's equivalent, of course.)  Then put 
big blinking security warnings wherever the documentation talks about 
editing that file.  We can't protect sysadmins from their own idiocy.

--
Brent "Dax" Royal-Gordon <[EMAIL PROTECTED]>
Perl and Parrot hacker
Oceania has always been at war with Eastasia.


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jarkko Hietaniemi
> Well, yeah, but... where the executable is ought, honestly, to be 
> irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that 
> I'll have parrot's library files hanging off of /usr/bin.

Bah.  BAH, I say.  The /usr/bin/parrot is of course a symlink
to, say, /platform/os/version/parrot/version/bin/parrot, and we
parse the real path, not the symlink.

>  And if I've got a few hundred machines with parrot's library NFS mounted in 
> different places (to match conflicting vendor standards and other 
> whackjob breakage which is endemic in, well, the world) it really 
> falls down. :) Add to that you can't always figure out where Parrot 
> really is both because of chroot behaviour and some odd "where am I 
> really" problems with suid scripts in some places.
> 
> There are a couple of folks who could make your brain melt and flow 
> out your ears with all this stuff too.

Yes, I was once one of those people :-)

> Having the executable path as an optional way to get the info's not 
> necessarily a bad thing, but I think it's safe to say that it's not 
> The Right Thing. (If there even is one)
> 
> If nothing else this has convinced me we need a way to specify site 
> policy at build time for all this nonsense^Wfun. :)



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote:

At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we need 
for parrot to run in the first place) and higher-level libraries 
(modules installed off of CPAN and whatnot).
That's true. But as long as we grab the "here's where the executable 
is", we can (later) build API on top of that if we want.
Well, yeah, but... where the executable is ought, honestly, to be 
irrelevant.
Yes, in a sense it's irrelevant, but it's the only thing that's 1:1 
with a particular "copy" of parrot. It's the only thing (that I can 
think of) which continues to work if you move your distro around, and 
which naturally avoids problems with having multiple copies, and lets 
things work even if you don't "install".

If I've stuck Parrot in /usr/bin it seems unlikely that I'll have 
parrot's library files hanging off of /usr/bin.
Right, so you do what Mac OS X does with the java executable--you put a 
symlink in /usr/bin, pointing to the real location. And your "path to 
the executable" has to call realpath() or the equivalent to resolve 
such symlinks (which you need to do in order for path logic to 
do-the-right-thing).

And if I've got a few hundred machines with parrot's library NFS 
mounted in different places (to match conflicting vendor standards and 
other whackjob breakage which is endemic in, well, the world) it 
really falls down. :)
I'm not sure I get your meaning here. By "executable", I mean 
standalone-parrot, not libparrot, of course. If you mean that libparrot 
might end up in 100 different places, then you'll not end up with the 
dynamic linker finding things properly, so you'll have a bigger problem 
to solve. If you mean that standalone-parrot could end up in 100 
different places, then you're going to have 100 different ways you need 
to set up $PATH just to launch it, but once it's executing you'd still 
be fine. Or each host will have its own separate symlink in /usr/bin to 
the right location for that host, and everything will just be fine.

Add to that you can't always figure out where Parrot really is both 
because of chroot behaviour and some odd "where am I really" problems 
with suid scripts in some places.
With chroot, frankly, you have the same problem with DLLs, and you end 
up needing to have all of your necessary external resources located in 
your chroot-dir so that their paths after the chroot match their paths 
before. So that was a bad example on my part, really. (And, if you are 
chroot-ing from within a parrot script, you're in a place where you'd 
want to re-point your config dir path to match.)

But with interpreter files we could have the problem that the kernel 
hides the info from us. But for bytecode files, if they're launched 
like java apps are launched, with "parrot foo", then that problem 
wouldn't come up.

Having the executable path as an optional way to get the info's not 
necessarily a bad thing, but I think it's safe to say that it's not 
The Right Thing. (If there even is one)
Yeah, I don't think there's a 100% solution, but it would be nice to 
have something which works 95% of the time and is flexible/convenient, 
in preference to something that works 96% of the time and is less 
powerful.

I think a reasonable approach would be:

1) Always allow the config location to be overridden via a command-line 
parameter, and change-able from bytecode. (That let's you be 100% 
unambiguous, at the cost of needing to execute parrot in a particular 
way. And it's convenient for testing against a whole bunch of different 
sets of configs without rebuilding.)

2a) On platforms which support it, auto-find the executable, and base 
the config path on that.

2b) On platforms which don't support that (and even, as a compile-time 
option for those which support it), have a compiled-in path to use.

This basically matches the API you mentioned before, and boils down to 
what gets passed to Parrot_set_library_base() (or, call it 
Parrot_set_configuration_base maybe) at launch time--it gets passed 
either an explicitly supplied value, an inferred value, or a 
compiled-in value).

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Leopold Toetsch
Brent 'Dax' Royal-Gordon <[EMAIL PROTECTED]> wrote:
> Dan Sugalski wrote:
>> "***+++***START|" and "|END***+++***") so we can find it and overwrite

> That's pretty disgusting, but I don't know that I have a better idea.

Same scheme as with fingerprint.c?

leo


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 9:05 AM, Brent 'Dax' Royal-Gordon wrote:

Dan Sugalski wrote:
3) Parrot itself (the main executable) has a static, global 1K buffer 
in it that starts and ends with some recognizable string (like, say, 
"***+++***START|" and "|END***+++***") so we can find it and 
overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
It's yucky, but it matches what's done for dynamic libs, at least on 
some platforms. (That is, at build-time a library gets its 
path-where-I'll-be-installed compiled into it, and apps linked against 
that lib copy that path into themselves, so that at runtime the dynamic 
linker searches that location, in addition to standard locations, to 
find the library. And, there's then a tool which lets you modify you 
library to change its built-in install location, without re-compiling.) 
So that least there's precedent.

#3, I should point out, will *only* be used on those platforms that 
don't have a better scheme, and only by the 
Parrot_get_base_library_path() function.
System registry on Windows?  /etc file on Unixen?

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to 
search
for PMCs, and whatever else we can think of, without a hardcoded limit.
The idea (for me, at least) was to specify a directory, and the config 
file could be a conventional name relative to that--that lets you 
locate multiple resources without having do read on the config file in 
order to find them. And semantically, I think of it not as the 
executable's path--that just happens to be something that's 1:1 with a 
particular copy of parrot. And definitely not libparrot's 
path--embedded cases would have to specify the path explicitly, though 
they could partially mimic the same scheme.

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Dan Sugalski
At 10:23 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote:

At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

Sound sane? I can see splitting up the library base path into 
sections, but I'm not sure it's worth it. Now'd be the time to 
argue that, though :)
Makes sense to me to just store the path--keep it simple.
That's what I'm thinking, but I can see wanting to have separate 
paths for parrot's low-level libraries (basically the things we 
need for parrot to run in the first place) and higher-level 
libraries (modules installed off of CPAN and whatnot).
That's true. But as long as we grab the "here's where the 
executable is", we can (later) build API on top of that if we want.
Well, yeah, but... where the executable is ought, honestly, to be irrelevant.
Yes, in a sense it's irrelevant, but it's the only thing that's 1:1 
with a particular "copy" of parrot. It's the only thing (that I can 
think of) which continues to work if you move your distro around, 
and which naturally avoids problems with having multiple copies, and 
lets things work even if you don't "install".
At this point I can say I don't honestly care all that much, and most 
of my worries are based on vague feelings that there are platforms 
out there where finding the actual executable name is somewhere 
between hard and impossible. I will, then, do the sensible thing and 
just punt on this--we can work out a best practices thing and 
enshrine it as the default on systems which can support it and be 
done with it.

The other question, then, is do we see the need for multiple 
categories of library which would want separately settable library 
paths? (Don't, here, forget the potential needs of embedders such as 
Apache) Once we get that thumped out I'll make the API additions.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 9:41 AM, Dan Sugalski wrote:

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings.  That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to 
search
for PMCs, and whatever else we can think of, without a hardcoded 
limit.
This wouldn't be a bad thing, nope. I could see security issues--it'd 
probably be better to link the config file right into parrot.
There'll be the same security issue with anything located on the 
filesystem--the config is not particularly worse than anything else 
(DLLs, etc.). The security of anything you run is only as good as the 
integrity of the filesystem used to locate the resources. 
(Specifically, if I were a hacker and could compromise your system by 
replacing the config, I just as easily replace parrot itself.) But it 
would be nice to "bake in" things which you can't really change without 
rebuilding anyway--thinks like UINTVAL size, etc. Monkeying with them 
after-the-fact would be a definite security risk (buffer overruns, 
etc.), and wouldn't ever be useful. But stuff like finding ICU's data 
files (or add-on libraries) we'd want to be easily changeable without a 
rebuild. (And again, if you have to rebuild to change them, then people 
will tend to keep around the tools needed to do that, which would give 
a hacker the tools they need to do the same.) But we certainly need to 
define/articulate a security model, no matter what approach we take. 
(But my gut reaction is always against something which decreases 
flexibility, and only _seems_ to increase security.)

But there of course are security issues with anything located relative 
the the cwd(). (That is, if resources are located relative to the cwd, 
then I can trick you into loading my copies by taking you into 
chdir-ing into my home directory.)

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 10:30 AM, Jeff Clites wrote:

And semantically, I think of it not as the executable's path--that  
just happens to be something that's 1:1 with a particular copy of  
parrot. And definitely not libparrot's path--embedded cases would have  
to specify the path explicitly, though they could partially mimic the  
same scheme.
I take that back--the path to the library might actually work just as  
well (and may or may not be less ambiguous to find; the dynamic linker  
had to find it, and may have left breadcrumbs). This is all, by the  
way, exactly the NSBundle/CFBundle API from Mac OS X (and before that,  
OpenStep). See:  
.

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jeff Clites
On Apr 15, 2004, at 10:48 AM, Dan Sugalski wrote:

At this point I can say I don't honestly care all that much, and most 
of my worries are based on vague feelings that there are platforms out 
there where finding the actual executable name is somewhere between 
hard and impossible. I will, then, do the sensible thing and just punt 
on this--we can work out a best practices thing and enshrine it as the 
default on systems which can support it and be done with it.
I think it's worth trying out--if it works out, we can build on it; if 
it doesn't, we can rip it out/redo it. (And, the API could probably 
stay the same.)

The other question, then, is do we see the need for multiple 
categories of library which would want separately settable library 
paths? (Don't, here, forget the potential needs of embedders such as 
Apache) Once we get that thumped out I'll make the API additions.
We should probably start simple and build, but this would make sense to 
me (API names are just suggestions):

Parrot_get_configuration_base_path() -- returns the automagically 
determined path, unless the corresponding 
Parrot_set_configuration_base_path() had been called to set it to 
something else.

We could then have individual API to pick out specific resources based 
on that, but instead, this would be cleaner/simpler:

Parrot_get_path_for_resource(STRING *resource_name) -- returns the 
equivalent of Parrot_get_configuration_base_path()."/".resource_name, 
unless you had called Parrot_set_path_for_resource(STRING 
*resource_name, STRING *path) to set the path for this particular 
resource to something else. Internally, this could special case certain 
resources, if needed.

This setup let's us have a stable API, but over time add to the list of 
things we would look up.

So (assuming for the moment a default layout similar to what we current 
have), in-core I can call 
Parrot_get_path_for_resource("library/config.pimc") and 
Parrot_get_path_for_resource("runtime/parrot/dynext") to locate these 
resources, by default inside of the base dir. But if I want to have a 
totally funky layout (in an embedding context, or just if I'm in a 
weird mood), all I need to do is explicitly call the "set" method (from 
setup code or from bytecode) to re-point where I find a particular 
resource.

(So the logic for that could just be to do a hash lookup for any 
explicitly set values, and fall back to simple concatenation if nothing 
was in the hash.)

That would all be fairly simple, yet expandable.

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-19 Thread Gordon Henriksen
On Saturday, April 17, 2004, at 10:35 , Gordon Henriksen wrote:

Which suggests to me a linked list of resource resolvers. First one in 
the chain to return a file handle to the data or PBC wins. The head of 
parrot's own "system" chain would be available to be appended to any 
other chains that wanted it.
And the more I mull this over, the more I really come up with maybe 4 
slots in the search chain which are logically important. The order is up 
for debate, but they all need to be in there (whenever they apply, that 
is).

1.  Paths relative to the PBC binary which is searching for a library.
2.  Paths relative to the embedding application.
3.  Paths relative to parrot itself (be that libparrot.shlib or parrot).
4.  Paths to "system" libraries as specified by the administrator.
When searching for resources, only #1 should be used. Here are some 
examples:

PBC File: (whatever)
Host app: /usr/local/bin/parrot
Parrot: /usr/local/lib/libparrot.shlib
Consider searches for:
icu.dat
Search path:
1.  /usr/local/shared   # Relative to executable
PBC File: D:\inetpub\wwwroot\example.pbchtml
Host app: C:\Apache\libexec\httpd.exe
Parrot: C:\Parrot\lib
Consider searches for:
icu.dat
mod_parrot.pbc
My::WWWUtil.pbc
Time::HiRes.pbc
Search path:
D:\inetpub\wwwroot{,\lib,\..\lib}  # Relative to PBC
C:\Apache\libexec{,\lib,\..\lib}  # Relative to host app
C:\CPAN\lib  # System libraries
C:\Parrot\lib  # Relative to parrot
PBC File: ./bin/fib
Host app: /home/me/bin/parrot
Parrot: /home/me/bin/parrot
Consider searches for:
icu.dat
Time::HiRes.pbc
fib.parrot_resource_file
One possible search path:
./bin/{,/lib,/../lib}  # Relative to PBC
/usr/local/lib  # System libraries
/home/me/lib  # Relative to parrot


The scenario which gives me a little bit of heartburn is one like this, 
though:

Consider, say, an e-commerce site package. Call it OneStep::ECS. Runs 
under mod_parrot in Apache. Has hooks to load plugins:

	• Third-party plugins to provide connectivity to payment processing 
engines (call it OneStep::VeriSignPayflow.pbc).
	• First-party plugins allowing the customer to integrate his 
storefront with his database (call it 
MySite::OneStepECSCustomizations.pbc).

Now consider searches for VeriSign::PayflowPro.pbc, PayFlowPro.dll, 
MySite::CRM.pbc, MySite::Reporting.pbc, mysite_logo.png, 
Time::HiRes.pbc, libparrot.pbc, CGI.pbc

So maybe some libraries are "hosts" and need to be included in the 
search paths of libraries which are linked to them. One could even look 
at libparrot that way, in which case the search path model becomes:

Paths relative to this PBC file.
Paths relative to its hosts.
Paths relative to its hosts' hosts.
Paths relative to its hosts' hosts' hosts.
...
Paths configured by the system administrator.
—

Gordon Henriksen
[EMAIL PROTECTED]


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-19 Thread Jeff Clites
On Apr 17, 2004, at 6:18 PM, Gordon Henriksen wrote:

Dan Sugalski wrote:

Brent 'Dax' Royal-Gordon wrote:

Dan Sugalski wrote:

3) Parrot itself (the main executable) has a static, global 1K 
buffer in it that starts and ends with some recognizable string 
(like, say, "***+++***START|" and "|END***+++***") so we can find 
it and overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
There isn't one, alas, at least for some people.
Everyone running tripwire, et al. (or simply md5sum'ing files to 
verify integrity) will just love this strategy to death.
It would be no different than what would happen if you had to rebuild 
to change the built-in path. This just has the advantage of not 
requiring a compiler, and the source code.

Of course, one can find pathological cases—especially on Unix, which 
seems designed to thwart this sort of easy-to-administer technology:

	• parrot binary unlink'd between exec and main(). (Can't happen on 
Windows.)
	• Launched through a symlink to the binary.
	• Launched through a hard link to the binary.
	• bin/ is a symlink, so ../share won't work.
	• Platform can't find the binary. (Can't happen on Windows, Linux, or 
Mac OS X.)
	• chroot (which, in general, near-the-binary solves rather than 
complicates).
Pick any strategy, and there will be an opportunity to thwart it. 
Launch the binary, and then force-unmount the filesystem containing it 
and all of its resources. That would thwart any strategy with external 
resources. The point here (which you're probably agreeing with) is to 
provide a solution that gives people flexibility they can use, not a 
solution that will work if they are actively trying to trip it up.

As for the security concerns of trusting anything but one's current 
binary*, parrot could adopt a cryptographic solution for verifying 
integrity of resource files, if anybody's really all that worried 
about an errant Unicode character database.
It's really no different that loading an external Perl module today, as 
I see it.

* - Is the binary itself is really all that trustworthy in the first 
place? If a user is executing a program through an untrusted or 
compromised path, they're already putting their life in their hands, 
and accessing ${bin}/../share won't make the configuration any more 
trustworthy.
Exactly. The contents of the filesystem are either secure, or they're 
not.

JEff



Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-19 Thread Gordon Henriksen
Dan Sugalski wrote:

Brent 'Dax' Royal-Gordon wrote:

Dan Sugalski wrote:

3) Parrot itself (the main executable) has a static, global 1K buffer 
in it that starts and ends with some recognizable string (like, say, 
"***+++***START|" and "|END***+++***") so we can find it and 
overwrite the contents if the library gets moved, for use on 
platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
There isn't one, alas, at least for some people.
Everyone running tripwire, et al. (or simply md5sum'ing files to verify 
integrity) will just love this strategy to death.

Finding resource and library files relative to the binary really is a 
very good strategy. Windows is adopting the placed-near-the-binary 
strategy for locating resources and libraries. It has completely 
eliminated "DLL hell" for .NET programs. Mac OS 7 through X have all 
used the same strategy. They have never had major problems with library 
or resource location. Looks like a strong precedent and a proven 
technique.

Of course, one can find pathological cases—especially on Unix, which 
seems designed to thwart this sort of easy-to-administer technology:

	• parrot binary unlink'd between exec and main(). (Can't happen on 
Windows.)
	• Launched through a symlink to the binary.
	• Launched through a hard link to the binary.
	• bin/ is a symlink, so ../share won't work.
	• Platform can't find the binary. (Can't happen on Windows, Linux, 
or Mac OS X.)
	• chroot (which, in general, near-the-binary solves rather than 
complicates).

But I'd say these are all are heavily outweighed by the advantages. And, 
in any case, it's a trivial matter at this point in design to offer 
support for replacing a call to Parrot_get_bin_path() (or whatever) with 
"/usr/local/bin" at configure time. That resolves all of the above. With 
a loss of functionality, true, but: Users on platforms which can't 
support this feature won't after all expect /opt/parrot to work after it 
was mv'd.

As for the security concerns of trusting anything but one's current 
binary*, parrot could adopt a cryptographic solution for verifying 
integrity of resource files, if anybody's really all that worried about 
an errant Unicode character database.

—

Gordon Henriksen
[EMAIL PROTECTED]
* - Is the binary itself is really all that trustworthy in the first 
place? If a user is executing a program through an untrusted or 
compromised path, they're already putting their life in their hands, and 
accessing ${bin}/../share won't make the configuration any more 
trustworthy.


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-19 Thread Gordon Henriksen
On Thursday, April 15, 2004, at 01:48 , Dan Sugalski wrote:

At this point I can say I don't honestly care all that much, and most 
of my worries are based on vague feelings that there are platforms out 
there where finding the actual executable name is somewhere between 
hard and impossible. I will, then, do the sensible thing and just punt 
on this--we can work out a best practices thing and enshrine it as the 
default on systems which can support it and be done with it.

The other question, then, is do we see the need for multiple categories 
of library which would want separately settable library paths?
Wouldn't it be sensible to build something robust enough to also solve 
the problems of finding parrot user libraries and user resources? In 
which case, a static search path is decidedly retro. It would hardly 
make sense to not include, at the front of the search path, directories 
relative to the PBC file trying to find its libraries or resources.*

For finding resources, one doesn't generally want to fall back to system 
paths. Finding libraries is another matter.

Then there's the mention of using URLs to load resources (e.g., over 
HTTP). Which seems sensible and forward-thinking to me.

Which suggests to me a linked list of resource resolvers. First one in 
the chain to return a file handle to the data or PBC wins. The head of 
parrot's own "system" chain would be available to be appended to any 
other chains that wanted it.

—

Gordon Henriksen
[EMAIL PROTECTED]
(* - The directory containing every loaded PBC file is not at all 
important; consider an application like Apache+mod_parrot which is 
loading multiple independent PBC files. It would be useful to allow the 
administrator to install both the production PBC in addition to a 
development release of the same application on the same web server [just 
at different paths], with confidence that mod_parrot won't get the two 
confused. [IIS can do this. It's very cool.])


Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-19 Thread Dan Sugalski
At 9:18 PM -0400 4/17/04, Gordon Henriksen wrote:
Dan Sugalski wrote:

Brent 'Dax' Royal-Gordon wrote:

Dan Sugalski wrote:

3) Parrot itself (the main executable) has a static, global 1K 
buffer in it that starts and ends with some recognizable string 
(like, say, "***+++***START|" and "|END***+++***") so we can find 
it and overwrite the contents if the library gets moved, for use 
on platforms where the only way to put a path in is to stick it 
statically in the executable.
That's pretty disgusting, but I don't know that I have a better idea.
There isn't one, alas, at least for some people.
Everyone running tripwire, et al. (or simply md5sum'ing files to 
verify integrity) will just love this strategy to death.
No, not really. This only gets done once, when the package is installed.

Finding resource and library files relative to the binary really is 
a very good strategy.
I'm not saying it isn't, just that it's not possible on some systems. 
Granted, fairly old one generally, but...
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk