Re: ICU data file location issues
On Apr 14, 2004, at 10:20 PM, Jarkko Hietaniemi wrote: Finding stuff relative to the executable/DLL would be coolest scheme, but that is admittedly somewhat tricky to get working cross-platform. Excellent idea. Pretty much every single resource in Cocoa applications and frameworks on Mac OS X is located using a scheme such as this, and I believe it all used to work correctly for OpenStep applications on Windows, so there's a good chance it could be made to work. For Unix platforms at least, you should be able to do this: executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0) (to mix a bunch of syntaxes) during initialization before you've had a chance to chdir, and store that away on the interpreter struct. That should work unless you've gone out of your way to execute parrot with argv[0] set to something fake. I don't know what you'd do on Windows, but there must be something. An embedded parrot would need to be told explicitly where to find its resources, just by using the API that standalone parrot would call to store this information. JEff
Re: Method Name Truncation in PIR
Chromatic [EMAIL PROTECTED] wrote: Method 'layou' not found in file '(unknown file)' near line -1 Did you turn on debugging? Most of these name mangling and string constant stuff should be covered, e.g.: $ parrot -d /tmp/object-meths_15.pasm 21 | grep meth leo
Re: ICU data file location issues
Jeff Clites [EMAIL PROTECTED] wrote: On Apr 14, 2004, at 10:20 PM, Jarkko Hietaniemi wrote: Finding stuff relative to the executable/DLL would be coolest scheme, but that is admittedly somewhat tricky to get working cross-platform. Excellent idea. Pretty much every single resource in Cocoa applications and frameworks on Mac OS X is located using a scheme such as this, and I believe it all used to work correctly for OpenStep applications on Windows, so there's a good chance it could be made to work. For Unix platforms at least, you should be able to do this: executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0) (to mix a bunch of syntaxes) during initialization before you've had a chance to chdir, and store that away on the interpreter struct. That should work unless you've gone out of your way to execute parrot with argv[0] set to something fake. I don't know what you'd do on Windows, but there must be something. Strangely enough, I'm in the middle of putting something like this in place for another project... On Win32 you do:- GetModuleFileName(NULL, buffer, buffer_size) Passing NULL in as the first parameter returns the path to the executable the currently executing process (e.g. Parrot in our case) was created from. You then just need to chop off the executable name to find your path. Jonathan
Re: Plans for string processing
Aaron Sherman [EMAIL PROTECTED] wrote: So, why is that: my dog Fiffi:language(blah) eq my dog Fi\x{fb03}:langauge(blah) and not use language blah; my dog Fiffi eq my dog Fi\x{fb03} What, if this is: $dog eq my dog Fi\x{fb03} and C$dog hasn't some language info attached? leo
Re: ICU data file location issues
On Wed, Apr 14, 2004 at 11:25:22PM -0700, Jeff Clites wrote: For Unix platforms at least, you should be able to do this: executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0) (to mix a bunch of syntaxes) during initialization before you've had a chance to chdir, and store that away on the interpreter struct. That should work unless you've gone out of your way to execute parrot with argv[0] set to something fake. I don't know what you'd do on Windows, but there must be something. I think that it can be fun on HP-UX (where for #! the kernel sets argv[0] to the path of the script not the interpreter, despite the fact that the script's path is going to be somewhere else in argv) and AIX (where it seems that the kernel sets argv[0] to only the leafname of the interpreter, rather than the full path). But all this is from memory, and in turn for #! invocation one can always parse the #! line to work out where the interpreter was (mmm. race condition) Nicholas Clark
Re: Plans for string processing
On 14 Apr 2004, at 20:16, Larry Wall wrote: I think the idea of tagging complete strings with language is not terribly useful. If it's to be of much use at all, then it should be generalized to a metaproperty system for applying any property to any range of characters within a string, such that the properties float along with the characters they modify. The whole point of doing such properties is to be able to ignore them most of the time, and then later, after you've constructed your entire XML document, you can say, Oh, by the way, does this character have the toetsch property? There's no point in tagging text with language if 99% of it gets turned into Dunno, or English, but not really. It seems natural to associate language with utterances. When these utterances are written down - or as I'm doing here, skipping the speaking part and uttering straight to text - then the association still works. But once we start emitting written things (strings) in a less aural way, then the notion of an associated language can easily become forced or inaccurate. The process whereby we read a string like Is bthis/b string in Englisch? is generally a kind of lossy conversion to our language of preference for that particular string. It's very difficult for us to do otherwise. This natural generalization means that there will always be a demand for strings to have language associated with them, no matter how illogical it may seem to those who reflect upon it a bit. I think it is this user state that Dan is trying to support. And, in so far as it models natural and common perception, I think I agree with him. Lossy conversion is a kind of info-sin, especially when it should be avoided. There are circumstances where it would be more natural to read the above string as Is open-bold-tag this close-bold-tag string in the-German-word-for-English question mark i.e. when we are being more precise. It is for this more precise user state that we would be preserving information on substrings. There are plenty of strings which are simply never intended to be uttered, and therefore are effectively language-less. And many strings obviously in particular languages are often treated as if they weren't. It would be odd to submit the processing of such strings to a requirement of non or useless information preservation. Any sensible user would want to turn off language processing in such cases. So, we need to ask the user their state, and have the necessary level of support in place to be able to behave accordingly. Looking at this from an object-oriented perspective I can't help but wonder why we don't have a hierarchy of Parrot string types String LanguageString MultiLanguageString with a left wins rule for composition. Mike
Basic Library Paths (was Re: ICU data file location issues)
At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote: TT (Tangentially Topical): it would be nice if Parrot could avoid as many hardcoded paths as possible for configs, libraries, and such, so that the Parrot installation could be relocated as freely as possible. Well, then... Given that everyone's weighing in on this one, it seems worthy of sane consideration. (I keep not thinking about this, as I'm used to the nicely sane VMS logical system :) As we've got the unpleasant issues of OSes with Really Lame schemes, and embedders that may want to use alternate resource locations, it seems like the right thing to do here is to make this a part of the embedding interface and have the main parrot wrapper set it. So, I'm thinking a few things: 1) We add a Parrot_set_library_base(char *lib_path, int length) function to set the base library path 2) We add a Parrot_get_base_library_path() function to the platform-specific interface so platforms can return the base path 3) Parrot itself (the main executable) has a static, global 1K buffer in it that starts and ends with some recognizable string (like, say, ***+++***START| and |END***+++***) so we can find it and overwrite the contents if the library gets moved, for use on platforms where the only way to put a path in is to stick it statically in the executable. #3, I should point out, will *only* be used on those platforms that don't have a better scheme, and only by the Parrot_get_base_library_path() function. Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Basic Library Paths (was Re: ICU data file location issues)
Dan Sugalski wrote: At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote: TT (Tangentially Topical): it would be nice if Parrot could avoid as many hardcoded paths as possible for configs, libraries, and such, so that the Parrot installation could be relocated as freely as possible. Well, then... Given that everyone's weighing in on this one, it seems worthy of sane consideration. (I keep not thinking about this, as I'm used to the nicely sane VMS logical system :) Brag :-) (in case someone is wondering, the VMS logicals nicely solve this problem, basically by each piece of software being installed into and used/accessed throuh a super environment variable-- so basically Dan can't understand why us others are having these problems and talk of it as a new fancy thing :-)
Re: Basic Library Paths (was Re: ICU data file location issues)
At 6:23 PM +0300 4/15/04, Jarkko Hietaniemi wrote: Dan Sugalski wrote: At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote: TT (Tangentially Topical): it would be nice if Parrot could avoid as many hardcoded paths as possible for configs, libraries, and such, so that the Parrot installation could be relocated as freely as possible. Well, then... Given that everyone's weighing in on this one, it seems worthy of sane consideration. (I keep not thinking about this, as I'm used to the nicely sane VMS logical system :) Brag :-) :-P (in case someone is wondering, the VMS logicals nicely solve this problem, basically by each piece of software being installed into and used/accessed throuh a super environment variable-- so basically Dan can't understand why us others are having these problems and talk of it as a new fancy thing :-) Oh, and have I mentioned they're group and system wide, persistent, group-protected, and leveled by protection, so they're actually safe to trust? (So if you look for an entry in a system logical table you can trust it, since someone needed compromise-the-world privs to set it in the first place so you've got bigger things to worry about if it's bad? :) Not to, y'know, show off or anything. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: ICU data file location issues
On Apr 15, 2004, at 3:03 AM, Nicholas Clark wrote: But all this is from memory, and in turn for #! invocation one can always parse the #! line to work out where the interpreter was (mmm. race condition) And a race isn't too bad here actually--even if we know the path reliably, it's always possible to move or alter the resources which we're trying to locate, at any time (before, during, or after launching the process). So we need to treat them with as much skepticism as anything else on the file system. And for parrot-the-executable we should offer a command-line parameter to override the location. That would give people an escape hatch for special situations (for instance, if you are going to chroot or something). JEff
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote: Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) Makes sense to me to just store the path--keep it simple. As long as we've stored it away, anything using it later can chop it up into pieces itself if it wants too--anything we could have done in splitting it up, the consumer can do too. The only thing we really have to do is grab the info before it's too late--before something might have chdir'd, and before argv is either inaccessible, or could have been overwritten. JEff
Re: Method Name Truncation in PIR
On Thu, 2004-04-15 at 00:58, Leopold Toetsch wrote: Did you turn on debugging? Most of these name mangling and string constant stuff should be covered, e.g.: $ parrot -d /tmp/object-meths_15.pasm 21 | grep meth Aha, here's an interesting difference. I've been using single quotes for string constants. Here's what happens when I change the double quotes around the method name to single quotes in that test: emit newclass P3, Foo emit find_type I0, Foo emit new P2, I0 emit set S0, 'meth emit fetchmethod P0, P2, S0 emit print main\n emit invokecc emit print back\n emit fetchmethod P0, P3, S0 emit set P2, P3 emit invokecc emit print back\n emit end emit _Foo@@@meth: emit print in meth\n emit invoke P1 For what it's worth, if I switch back and forth, as in this PASM: .local pmc args new args, .PerlHash set args['height'], 100 set args[width], 100 set args['bpp'], 0 set args[flags],1 The debug output indicates: emit new P16, 33 emit set P16['height], 100 emit set P16[width], 100 emit set P16['bpp], 0 emit set P16[flags], 1 That may not be the root cause, but it's certainly suspicious. -- c
Re: Plans for string processing
On Thu, 2004-04-15 at 05:00, Leopold Toetsch wrote: Aaron Sherman [EMAIL PROTECTED] wrote: So, why is that: my dog Fiffi:language(blah) eq my dog Fi\x{fb03}:langauge(blah) and not use language blah; my dog Fiffi eq my dog Fi\x{fb03} What, if this is: $dog eq my dog Fi\x{fb03} and C$dog hasn't some language info attached? Looks good to me. Great example! Seriously, why is that a problem? That was my entry-point to this conversation: I just don't see any case in which performing a comparison of ANY two strings according to whatever arbitrary SINGLE language rules is a problem. I cannot imagine the case where you need two or more language rules AND could start off with any sense of what that would mean, and even if you could contrive such a case, I would suggest that its rarity should dictate it being attached to a class that defines a string-like object which mutates its behavior based on the language spoken by the maintainer of the database from which it was fetched or somesuch. -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote: At 8:35 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote: Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) Makes sense to me to just store the path--keep it simple. That's what I'm thinking, but I can see wanting to have separate paths for parrot's low-level libraries (basically the things we need for parrot to run in the first place) and higher-level libraries (modules installed off of CPAN and whatnot). That's true. But as long as we grab the here's where the executable is, we can (later) build API on top of that if we want. For instance, we could decide that core, low-level resources will be located relative to that path, and one of those resources will undoubtedly be a config file of some sort, and that config file could contain the path(s) to look for higher-level stuff. As long as we've rescued and stored our location, we've sort of bootstrapped that process. (And to loop back a bit, the nice thing about bootstrapping this stuff based on our executable's location is that it makes it a no-brainer to have multiple, relocatable installs of parrot. And people would even be able to have 10 different versions of parrot sitting around, but have them all configured to share the same high-level resources.) JEff
Re: Basic Library Paths (was Re: ICU data file location issues)
At 8:35 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote: Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) Makes sense to me to just store the path--keep it simple. That's what I'm thinking, but I can see wanting to have separate paths for parrot's low-level libraries (basically the things we need for parrot to run in the first place) and higher-level libraries (modules installed off of CPAN and whatnot). I'm firmly in the Don't care camp here, so I figured I'd open it to discussion before enshrining the result in the API. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Basic Library Paths (was Re: ICU data file location issues)
Dan Sugalski wrote: 1) We add a Parrot_set_library_base(char *lib_path, int length) function to set the base library path 2) We add a Parrot_get_base_library_path() function to the platform-specific interface so platforms can return the base path Works for me... 3) Parrot itself (the main executable) has a static, global 1K buffer in it that starts and ends with some recognizable string (like, say, ***+++***START| and |END***+++***) so we can find it and overwrite the contents if the library gets moved, for use on platforms where the only way to put a path in is to stick it statically in the executable. That's pretty disgusting, but I don't know that I have a better idea. #3, I should point out, will *only* be used on those platforms that don't have a better scheme, and only by the Parrot_get_base_library_path() function. System registry on Windows? /etc file on Unixen? Actually, one thing I'd like to see is if it wasn't the library's base path hardcoded in, but the base path of a frozen data structure or program that encoded Parrot's settings. That would allow it to carry the runtime library path, the paths to ICU's tables, the paths to search for PMCs, and whatever else we can think of, without a hardcoded limit. Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) -- Brent Dax Royal-Gordon [EMAIL PROTECTED] Perl and Parrot hacker Oceania has always been at war with Eastasia.
Re: Method Name Truncation in PIR
chromatic wrote: emit set P16['height], 100 Ah. Relikt of Jeff's patch. If that constant got reused elsewhere, e.g. as a method name, it were one too short. Fixed. leo
Re: Basic Library Paths (was Re: ICU data file location issues)
At 8:54 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote: At 8:35 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote: Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) Makes sense to me to just store the path--keep it simple. That's what I'm thinking, but I can see wanting to have separate paths for parrot's low-level libraries (basically the things we need for parrot to run in the first place) and higher-level libraries (modules installed off of CPAN and whatnot). That's true. But as long as we grab the here's where the executable is, we can (later) build API on top of that if we want. Well, yeah, but... where the executable is ought, honestly, to be irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that I'll have parrot's library files hanging off of /usr/bin. And if I've got a few hundred machines with parrot's library NFS mounted in different places (to match conflicting vendor standards and other whackjob breakage which is endemic in, well, the world) it really falls down. :) Add to that you can't always figure out where Parrot really is both because of chroot behaviour and some odd where am I really problems with suid scripts in some places. There are a couple of folks who could make your brain melt and flow out your ears with all this stuff too. Having the executable path as an optional way to get the info's not necessarily a bad thing, but I think it's safe to say that it's not The Right Thing. (If there even is one) If nothing else this has convinced me we need a way to specify site policy at build time for all this nonsense^Wfun. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Basic Library Paths (was Re: ICU data file location issues)
At 9:05 AM -0700 4/15/04, Brent 'Dax' Royal-Gordon wrote: Dan Sugalski wrote: 3) Parrot itself (the main executable) has a static, global 1K buffer in it that starts and ends with some recognizable string (like, say, ***+++***START| and |END***+++***) so we can find it and overwrite the contents if the library gets moved, for use on platforms where the only way to put a path in is to stick it statically in the executable. That's pretty disgusting, but I don't know that I have a better idea. There isn't one, alas, at least for some people. #3, I should point out, will *only* be used on those platforms that don't have a better scheme, and only by the Parrot_get_base_library_path() function. System registry on Windows? /etc file on Unixen? That's global. Bad idea, it messes up multiple installs of the same version, or similar-enough versions that they're indistinguishable. Actually, one thing I'd like to see is if it wasn't the library's base path hardcoded in, but the base path of a frozen data structure or program that encoded Parrot's settings. That would allow it to carry the runtime library path, the paths to ICU's tables, the paths to search for PMCs, and whatever else we can think of, without a hardcoded limit. This wouldn't be a bad thing, nope. I could see security issues--it'd probably be better to link the config file right into parrot. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Method Name Truncation in PIR
On Thu, 2004-04-15 at 09:18, Leopold Toetsch wrote: Ah. Relikt of Jeff's patch. If that constant got reused elsewhere, e.g. as a method name, it were one too short. Confirmed. Thanks, Leo! Would a test patch such as the following be good to catch regressions, or should it go elsewhere? If elsewhere, do you prefer a separate test in object-meths.t or somewhere in imcc/t? -- c Index: t/pmc/object-meths.t === RCS file: /cvs/public/parrot/t/pmc/object-meths.t,v retrieving revision 1.17 diff -u -u -r1.17 object-meths.t --- t/pmc/object-meths.t 10 Apr 2004 12:50:23 - 1.17 +++ t/pmc/object-meths.t 15 Apr 2004 16:54:04 - @@ -428,7 +428,7 @@ find_type I0, Foo new P2, I0 -set S0, meth +set S0, 'meth' fetchmethod P0, P2, S0 print main\n # P2, S0 are as in callmethod
Re: Basic Library Paths (was Re: ICU data file location issues)
Dan Sugalski wrote: #3, I should point out, will *only* be used on those platforms that don't have a better scheme, and only by the Parrot_get_base_library_path() function. System registry on Windows? /etc file on Unixen? That's global. Bad idea, it messes up multiple installs of the same version, or similar-enough versions that they're indistinguishable. Good point. Actually, one thing I'd like to see is if it wasn't the library's base path hardcoded in, but the base path of a frozen data structure or program that encoded Parrot's settings. That would allow it to carry the runtime library path, the paths to ICU's tables, the paths to search for PMCs, and whatever else we can think of, without a hardcoded limit. This wouldn't be a bad thing, nope. I could see security issues--it'd probably be better to link the config file right into parrot. Install it with root ownership and 644 permissions, in a directory with similar settings. (Or the system's equivalent, of course.) Then put big blinking security warnings wherever the documentation talks about editing that file. We can't protect sysadmins from their own idiocy. -- Brent Dax Royal-Gordon [EMAIL PROTECTED] Perl and Parrot hacker Oceania has always been at war with Eastasia.
Re: Basic Library Paths (was Re: ICU data file location issues)
Well, yeah, but... where the executable is ought, honestly, to be irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that I'll have parrot's library files hanging off of /usr/bin. Bah. BAH, I say. The /usr/bin/parrot is of course a symlink to, say, /platform/os/version/parrot/version/bin/parrot, and we parse the real path, not the symlink. And if I've got a few hundred machines with parrot's library NFS mounted in different places (to match conflicting vendor standards and other whackjob breakage which is endemic in, well, the world) it really falls down. :) Add to that you can't always figure out where Parrot really is both because of chroot behaviour and some odd where am I really problems with suid scripts in some places. There are a couple of folks who could make your brain melt and flow out your ears with all this stuff too. Yes, I was once one of those people :-) Having the executable path as an optional way to get the info's not necessarily a bad thing, but I think it's safe to say that it's not The Right Thing. (If there even is one) If nothing else this has convinced me we need a way to specify site policy at build time for all this nonsense^Wfun. :)
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote: At 8:54 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote: At 8:35 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote: Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) Makes sense to me to just store the path--keep it simple. That's what I'm thinking, but I can see wanting to have separate paths for parrot's low-level libraries (basically the things we need for parrot to run in the first place) and higher-level libraries (modules installed off of CPAN and whatnot). That's true. But as long as we grab the here's where the executable is, we can (later) build API on top of that if we want. Well, yeah, but... where the executable is ought, honestly, to be irrelevant. Yes, in a sense it's irrelevant, but it's the only thing that's 1:1 with a particular copy of parrot. It's the only thing (that I can think of) which continues to work if you move your distro around, and which naturally avoids problems with having multiple copies, and lets things work even if you don't install. If I've stuck Parrot in /usr/bin it seems unlikely that I'll have parrot's library files hanging off of /usr/bin. Right, so you do what Mac OS X does with the java executable--you put a symlink in /usr/bin, pointing to the real location. And your path to the executable has to call realpath() or the equivalent to resolve such symlinks (which you need to do in order for path logic to do-the-right-thing). And if I've got a few hundred machines with parrot's library NFS mounted in different places (to match conflicting vendor standards and other whackjob breakage which is endemic in, well, the world) it really falls down. :) I'm not sure I get your meaning here. By executable, I mean standalone-parrot, not libparrot, of course. If you mean that libparrot might end up in 100 different places, then you'll not end up with the dynamic linker finding things properly, so you'll have a bigger problem to solve. If you mean that standalone-parrot could end up in 100 different places, then you're going to have 100 different ways you need to set up $PATH just to launch it, but once it's executing you'd still be fine. Or each host will have its own separate symlink in /usr/bin to the right location for that host, and everything will just be fine. Add to that you can't always figure out where Parrot really is both because of chroot behaviour and some odd where am I really problems with suid scripts in some places. With chroot, frankly, you have the same problem with DLLs, and you end up needing to have all of your necessary external resources located in your chroot-dir so that their paths after the chroot match their paths before. So that was a bad example on my part, really. (And, if you are chroot-ing from within a parrot script, you're in a place where you'd want to re-point your config dir path to match.) But with interpreter files we could have the problem that the kernel hides the info from us. But for bytecode files, if they're launched like java apps are launched, with parrot foo, then that problem wouldn't come up. Having the executable path as an optional way to get the info's not necessarily a bad thing, but I think it's safe to say that it's not The Right Thing. (If there even is one) Yeah, I don't think there's a 100% solution, but it would be nice to have something which works 95% of the time and is flexible/convenient, in preference to something that works 96% of the time and is less powerful. I think a reasonable approach would be: 1) Always allow the config location to be overridden via a command-line parameter, and change-able from bytecode. (That let's you be 100% unambiguous, at the cost of needing to execute parrot in a particular way. And it's convenient for testing against a whole bunch of different sets of configs without rebuilding.) 2a) On platforms which support it, auto-find the executable, and base the config path on that. 2b) On platforms which don't support that (and even, as a compile-time option for those which support it), have a compiled-in path to use. This basically matches the API you mentioned before, and boils down to what gets passed to Parrot_set_library_base() (or, call it Parrot_set_configuration_base maybe) at launch time--it gets passed either an explicitly supplied value, an inferred value, or a compiled-in value). JEff
Re: Basic Library Paths (was Re: ICU data file location issues)
Brent 'Dax' Royal-Gordon [EMAIL PROTECTED] wrote: Dan Sugalski wrote: ***+++***START| and |END***+++***) so we can find it and overwrite That's pretty disgusting, but I don't know that I have a better idea. Same scheme as with fingerprint.c? leo
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 9:05 AM, Brent 'Dax' Royal-Gordon wrote: Dan Sugalski wrote: 3) Parrot itself (the main executable) has a static, global 1K buffer in it that starts and ends with some recognizable string (like, say, ***+++***START| and |END***+++***) so we can find it and overwrite the contents if the library gets moved, for use on platforms where the only way to put a path in is to stick it statically in the executable. That's pretty disgusting, but I don't know that I have a better idea. It's yucky, but it matches what's done for dynamic libs, at least on some platforms. (That is, at build-time a library gets its path-where-I'll-be-installed compiled into it, and apps linked against that lib copy that path into themselves, so that at runtime the dynamic linker searches that location, in addition to standard locations, to find the library. And, there's then a tool which lets you modify you library to change its built-in install location, without re-compiling.) So that least there's precedent. #3, I should point out, will *only* be used on those platforms that don't have a better scheme, and only by the Parrot_get_base_library_path() function. System registry on Windows? /etc file on Unixen? Actually, one thing I'd like to see is if it wasn't the library's base path hardcoded in, but the base path of a frozen data structure or program that encoded Parrot's settings. That would allow it to carry the runtime library path, the paths to ICU's tables, the paths to search for PMCs, and whatever else we can think of, without a hardcoded limit. The idea (for me, at least) was to specify a directory, and the config file could be a conventional name relative to that--that lets you locate multiple resources without having do read on the config file in order to find them. And semantically, I think of it not as the executable's path--that just happens to be something that's 1:1 with a particular copy of parrot. And definitely not libparrot's path--embedded cases would have to specify the path explicitly, though they could partially mimic the same scheme. JEff
Re: Basic Library Paths (was Re: ICU data file location issues)
At 10:23 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote: At 8:54 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote: At 8:35 AM -0700 4/15/04, Jeff Clites wrote: On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote: Sound sane? I can see splitting up the library base path into sections, but I'm not sure it's worth it. Now'd be the time to argue that, though :) Makes sense to me to just store the path--keep it simple. That's what I'm thinking, but I can see wanting to have separate paths for parrot's low-level libraries (basically the things we need for parrot to run in the first place) and higher-level libraries (modules installed off of CPAN and whatnot). That's true. But as long as we grab the here's where the executable is, we can (later) build API on top of that if we want. Well, yeah, but... where the executable is ought, honestly, to be irrelevant. Yes, in a sense it's irrelevant, but it's the only thing that's 1:1 with a particular copy of parrot. It's the only thing (that I can think of) which continues to work if you move your distro around, and which naturally avoids problems with having multiple copies, and lets things work even if you don't install. At this point I can say I don't honestly care all that much, and most of my worries are based on vague feelings that there are platforms out there where finding the actual executable name is somewhere between hard and impossible. I will, then, do the sensible thing and just punt on this--we can work out a best practices thing and enshrine it as the default on systems which can support it and be done with it. The other question, then, is do we see the need for multiple categories of library which would want separately settable library paths? (Don't, here, forget the potential needs of embedders such as Apache) Once we get that thumped out I'll make the API additions. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 9:41 AM, Dan Sugalski wrote: Actually, one thing I'd like to see is if it wasn't the library's base path hardcoded in, but the base path of a frozen data structure or program that encoded Parrot's settings. That would allow it to carry the runtime library path, the paths to ICU's tables, the paths to search for PMCs, and whatever else we can think of, without a hardcoded limit. This wouldn't be a bad thing, nope. I could see security issues--it'd probably be better to link the config file right into parrot. There'll be the same security issue with anything located on the filesystem--the config is not particularly worse than anything else (DLLs, etc.). The security of anything you run is only as good as the integrity of the filesystem used to locate the resources. (Specifically, if I were a hacker and could compromise your system by replacing the config, I just as easily replace parrot itself.) But it would be nice to bake in things which you can't really change without rebuilding anyway--thinks like UINTVAL size, etc. Monkeying with them after-the-fact would be a definite security risk (buffer overruns, etc.), and wouldn't ever be useful. But stuff like finding ICU's data files (or add-on libraries) we'd want to be easily changeable without a rebuild. (And again, if you have to rebuild to change them, then people will tend to keep around the tools needed to do that, which would give a hacker the tools they need to do the same.) But we certainly need to define/articulate a security model, no matter what approach we take. (But my gut reaction is always against something which decreases flexibility, and only _seems_ to increase security.) But there of course are security issues with anything located relative the the cwd(). (That is, if resources are located relative to the cwd, then I can trick you into loading my copies by taking you into chdir-ing into my home directory.) JEff
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 10:30 AM, Jeff Clites wrote: And semantically, I think of it not as the executable's path--that just happens to be something that's 1:1 with a particular copy of parrot. And definitely not libparrot's path--embedded cases would have to specify the path explicitly, though they could partially mimic the same scheme. I take that back--the path to the library might actually work just as well (and may or may not be less ambiguous to find; the dynamic linker had to find it, and may have left breadcrumbs). This is all, by the way, exactly the NSBundle/CFBundle API from Mac OS X (and before that, OpenStep). See: http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ ObjC_classic/Classes/NSBundle.html. JEff
ICU data loading and platform support
Hello Perl6 people, I couldn't help but notice that you were talking about ICU on this mailing list. Let me interject with some suggestions. I should mention that ICU 2.6 can build a static data library. I recommend that ICU be built without the --with-data-packaging=archive configure option. You will probably have fewer path issues if you did that, and used ICU static data library. For those people that are interested here is some more information about building ICU data and its options: http://oss.software.ibm.com/icu/userguide/icudata.html. If you are having problems and need to patch ICU, you should consider submitting the patches to ICU to our jitterbug system http://www.jtcsv.com/cgibin/icu-bugs. If you submit them early enough, the changes may be able to make ICU 3.0. ICU 3.0 should out in mid June. I'm sure that you don't want to keep patching ICU every time you upgrade to a new version of ICU. ICU 3.0 should work a little better with Windows and Cygwin. ICU 3.0 should also build faster than before due to some build changes done recently. I'm sure some people on this list would be interested in that. When the alpha and beta releases of ICU come out in a few weeks, I recommend someone from this group try building it on your machines. Some people here seem to have access to some machine configurations that are unavailable to the ICU team. Testing these pre-releases will help to verify that ICU release is as portable as possible. I'm glad to see that perl will be improving its Unicode support :-) George Rhoten http://oss.software.ibm.com/icu/
Re: ICU data loading and platform support
On Apr 15, 2004, at 11:12 AM, George R wrote: I couldn't help but notice that you were talking about ICU on this mailing list. Let me interject with some suggestions. Thanks much for the message. I should mention that ICU 2.6 can build a static data library. I recommend that ICU be built without the --with-data-packaging=archive configure option. You will probably have fewer path issues if you did that, and used ICU static data library. I went with the --with-data-packaging=archive initially for 3 pragmatic reasons: (1) it seems to take a really, really long time to build them into a library, and (2) once parrot ships, if we use --with-data-packaging=archive or --with-data-packaging=files then that would permit end users to add/remove encoding without needing access to a compiler, and (3) our automated tests end up linking in our libraries, so building the ICU data as a static library slows this down significantly, since it has to copy around all of the bits for each test. But (1) and (3) are just short-term, convenient-for-ongoing-development reasons. Long term, it make sense to expose all of the packaging choices to the parrot build-configuration process, and let end users decide what's best for them. In the short term, this is having the nice side effect of forcing some issues of resource location that we'll need to solve for other resources anyway. If you are having problems and need to patch ICU, you should consider submitting the patches to ICU to our jitterbug system http://www.jtcsv.com/cgibin/icu-bugs. I have a few (bugs if not patches) to submit. (For instance, with ICU 2.8, building the tools seems to fail in the case of --enable-static --disable-shared, because of the s prefix.) I'll be in touch! Thanks again, JEff
Re: ICU data loading and platform support
Jeff Clites wrote: On Apr 15, 2004, at 11:12 AM, George R wrote: I went with the --with-data-packaging=archive initially for 3 pragmatic reasons: (1) it seems to take a really, really long time to build them into a library, and (2) once parrot ships, if we use --with-data-packaging=archive or --with-data-packaging=files then that would permit end users to add/remove encoding without needing access to a compiler, and (3) our automated tests end up linking in our libraries, so building the ICU data as a static library slows this down significantly, since it has to copy around all of the bits for each test. But (1) and (3) are just short-term, convenient-for-ongoing-development reasons. Item 1 should improve in ICU 3.0 on most platforms when a shared or static data library is used. detail Basically, ICU writes out the .dat file to computer assembly, and lets the C compiler create the object code from the assembly. This is similar to how the Windows builds work. It's all data, and there are no instructions in the assembly. It's also much quicker than the original implementation. There are some platforms that can't work with this build speed improvement without some porting help (please contact me if you want to help out). If the new building process can't be done on a platform, it uses the original slow building process (this only happens when we don't have access to the compiler or platform for testing). /detail Item 2 can already be done when a static or shared data library is used (at least the add part). If the ICU_DATA environment variable is set, or u_setDataDirectory() is used, you can add or override the data used within ICU's library. If a user wanted to remove data, the user would need some of ICU's tools to unpackage and repackage the .dat archive, which requires a little detailed knowledge about what all the data and tools are used for. The ICU User's Guide should help in those cases. I'm not sure what you mean by item 3, but the new 3.0 data build process should hopefully help out. I have a few (bugs if not patches) to submit. (For instance, with ICU 2.8, building the tools seems to fail in the case of --enable-static --disable-shared, because of the s prefix.) That's done on purpose. The static library names collide with AIX shared library names and Windows import library names. I recommend that you don't use --disable-shared and build both the static and shared libraries. I don't think autoconf allowed us to remove the --disable-shared option. I also doubt that the --disable-shared would work on all platforms anyway because library versioning is done differently (this is due to poor shared library versioning support on some platforms). You can use the static libraries, but the tools kind of require shared libraries. Patches to make ICU work with just static libraries, and with the current naming scheme, will probably be accepted. I'll be in touch! :-) George http://oss.software.ibm.com/icu/
Re: Method Name Truncation in PIR
Chromatic [EMAIL PROTECTED] wrote: On Thu, 2004-04-15 at 09:18, Leopold Toetsch wrote: Ah. Relikt of Jeff's patch. If that constant got reused elsewhere, e.g. as a method name, it were one too short. Confirmed. Thanks, Leo! Good. Would a test patch such as the following be good to catch regressions, I didn't boil it down to a test. It was just a look at the patch, that *could* have caused the bug. Simple usage of single quotes was ok. Multiple usage too (constant folding jumps in). But name mangling + shortened original symbol could cause the problem. Just replacing double quotes with single is and was working. Only the combination of different usage of one string constant could have triggered the bug. For a test you need (probably) usage of that string constant as a single quoted string and as a method name - maybe in a different namespace. leo
Re: Plans for string processing
Aaron Sherman [EMAIL PROTECTED] wrote: On Thu, 2004-04-15 at 05:00, Leopold Toetsch wrote: $dog eq my dog Fi\x{fb03} and C$dog hasn't some language info attached? Looks good to me. Great example! Seriously, why is that a problem? Dan's problem to come up with better examples--or explanations :) leo - resisting from further utterances WRT that topic in the absence of The Plan(tm).
Re: Basic Library Paths (was Re: ICU data file location issues)
On Apr 15, 2004, at 10:48 AM, Dan Sugalski wrote: At this point I can say I don't honestly care all that much, and most of my worries are based on vague feelings that there are platforms out there where finding the actual executable name is somewhere between hard and impossible. I will, then, do the sensible thing and just punt on this--we can work out a best practices thing and enshrine it as the default on systems which can support it and be done with it. I think it's worth trying out--if it works out, we can build on it; if it doesn't, we can rip it out/redo it. (And, the API could probably stay the same.) The other question, then, is do we see the need for multiple categories of library which would want separately settable library paths? (Don't, here, forget the potential needs of embedders such as Apache) Once we get that thumped out I'll make the API additions. We should probably start simple and build, but this would make sense to me (API names are just suggestions): Parrot_get_configuration_base_path() -- returns the automagically determined path, unless the corresponding Parrot_set_configuration_base_path() had been called to set it to something else. We could then have individual API to pick out specific resources based on that, but instead, this would be cleaner/simpler: Parrot_get_path_for_resource(STRING *resource_name) -- returns the equivalent of Parrot_get_configuration_base_path()./.resource_name, unless you had called Parrot_set_path_for_resource(STRING *resource_name, STRING *path) to set the path for this particular resource to something else. Internally, this could special case certain resources, if needed. This setup let's us have a stable API, but over time add to the list of things we would look up. So (assuming for the moment a default layout similar to what we current have), in-core I can call Parrot_get_path_for_resource(library/config.pimc) and Parrot_get_path_for_resource(runtime/parrot/dynext) to locate these resources, by default inside of the base dir. But if I want to have a totally funky layout (in an embedding context, or just if I'm in a weird mood), all I need to do is explicitly call the set method (from setup code or from bytecode) to re-point where I find a particular resource. (So the logic for that could just be to do a hash lookup for any explicitly set values, and fall back to simple concatenation if nothing was in the hash.) That would all be fairly simple, yet expandable. JEff
Re: Plans for string processing
At 11:55 PM +0200 4/15/04, Leopold Toetsch wrote: Aaron Sherman [EMAIL PROTECTED] wrote: On Thu, 2004-04-15 at 05:00, Leopold Toetsch wrote: $dog eq my dog Fi\x{fb03} and C$dog hasn't some language info attached? Looks good to me. Great example! Seriously, why is that a problem? Dan's problem to come up with better examples--or explanations :) Nah, that turns out not to be the case. It's my plan, and it's reasonable to say I'm OK with it. :) While I'd prefer to have everyone agree, I can live with it if people don't. leo - resisting from further utterances WRT that topic in the absence of The Plan(tm). The Plan is in progress, though I admit I'm tempted to hit easier and less controvertial things (like, say, threads or events) first. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: {CVS ci] alternate object initializer calling scheme
On Sat, 2004-04-10 at 01:49, Leopold Toetsch wrote: This initializer is available as first param in the init method. I'm happy with this. Good. What needs to be done before making it the default? I'm anxious to remove CALL__BUILD=1 from my parrot alias. We don't have it yet. We could use vtable-destroy but I'd rather have vtable-finalize. -destroy does low-level cleanup of Parrot classes (i.e. free(3) memory. -finalize (a distinct vtable method) could do the higher-level object finalization. Here could be also the place, where destruction ordering is done. That sounds reasonable. It'd certainly be nice to be able to free the memory of external resources I hold in SDL::* objects. -- c
Re: Plans for string processing
On Thu, 2004-04-15 at 23:13, Dan Sugalski wrote: Nah, that turns out not to be the case. It's my plan, and it's reasonable to say I'm OK with it. :) While I'd prefer to have everyone agree, I can live with it if people don't. Perhaps, as usual, I've been too verbose and everyone just skipped over what I thought were useful questions, but I came into this thinking I must just not get it... now I'm left with the feeling that there are some basic questions no one is asking here. Don't respond to this message, but please keep these questions in mind as you start to implement... whatever it is that you're going to implement for this. 1. People have referred to comparing names, but most of the things that make comparing names hard exist with respect to NAMES, and not arbitrary strings (e.g. McLean is very different from substr(358dsMcLeannbv35d,5,6) That is not something that attaching metadata to a string is likely to resolve. 2. There is no universal interchange rule-set (that I have ever heard of) for operating on sequences of characters with respect to two or more different languages at once, you have to pick a language's (or culture's) rules to use, otherwise you are comparing (or operating on) apples and oranges. 3. In any given comparison type operation, one side's rules will have to become dominant for that operation. Woefully, you have no realistic way to decide this at run-time (e.g. because going with LHS-wins would result in sorts potentially getting C($a cmp $b) == 1 and C($b cmp $a) == 1 which can result in infinite sort times. 4. Given 1..3, you will probably have to implement some kind of language context system (in most languages, this is handled by locale) at some point, and it may need to take priority over the language property of the strings that it operates on in certain cases. 5. Given 4, all unary operators become, for example, { set_current_locale($s.langauge); uc($s.data) } Which is, after all what most languages do anyway, but they keep that language information as a piece of global state. Allowing just for lexical scoping of such things would be very nice. 6. Separate from 1..5, language is an interesting property to associate with strings, but so are a vast number of other properties. Why are all of them second class citizens WRT parrot, but not language? Why not build a class one level of abstraction above raw strings which can bear arbitrary properties? 7. Which programming language does Parrot wish to host which requires unique language tagging of all string data? Would this perhaps be better left for a 2.0 feature, once the needs of the client languages are better understood? Ok, that's my peace. Thanks for taking the time. I'll be over here watching now. easier and less controvertial things (like, say, threads or events) first. Hah! That's rich! -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback signature.asc Description: This is a digitally signed message part