Re: RFC: Logging used Perl Modules (was Re: API Design Question)
Doug MacEachern [EMAIL PROTECTED] wrote: On Tue, 3 Jul 2001, James G Smith wrote: The current code I have uses %INC, but I wanted to write something like the following: sub use : immediate { # do stuff here if logging return CORE::use(@_); } you could just override CORE::GLOBAL::require. you don't need to override the import, and your version of require will be called at the same time as the 'use'. Thanks! I will see what I can do with that. -- James Smith [EMAIL PROTECTED], 979-862-3725 Texas AM CIS Operating Systems Group, Unix
Re: RFC: Logging used Perl Modules (was Re: API Design Question)
On Tue, 3 Jul 2001, James G Smith wrote: The current code I have uses %INC, but I wanted to write something like the following: sub use : immediate { # do stuff here if logging return CORE::use(@_); } you could just override CORE::GLOBAL::require. you don't need to override the import, and your version of require will be called at the same time as the 'use'.
Re: RFC: Logging used Perl Modules (was Re: API Design Question)
James G Smith [EMAIL PROTECTED] said something to this effect on 07/02/2001: How would something like this do: NAME Apache::Use SYNOPSIS use Apache::Use (Logger = DB, File = /www/apache/logs/modules); DESCRIPTION Apache::Use will record the modules used over the course of the Perl interpreter's lifetime. If the logging module is able, the old logs are read and frequently used modules are automatically loaded. Note that no symbols are imported into packages. You can get this information from %INC, can't you? e.g.: use Time::Local; use Data::Dumper; use Apache; warn map sprintf(%-20.20s\t%s\n, $_, $INC{$_}), keys %INC; Exporter.pm /usr/local/perl/5.6.0/Exporter.pm Carp.pm /usr/local/perl/5.6.0/Carp.pm XSLoader.pm /usr/local/perl/5.6.0/i686-linux/XSLoader.pm mod_perl.pm /usr/local/perl/site_perl/5.6.0/i686-linux/mod_perl.pm strict.pm /usr/local/perl/5.6.0/strict.pm Apache/Connection.pm/usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Connection.pm Time/Local.pm /usr/local/perl/5.6.0/Time/Local.pm Apache/Table.pm /usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Table.pm DynaLoader.pm /usr/local/perl/5.6.0/i686-linux/DynaLoader.pm overload.pm /usr/local/perl/5.6.0/overload.pm Apache/Constants/Exp /usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Constants/Exports.pm AutoLoader.pm /usr/local/perl/5.6.0/AutoLoader.pm Apache/Server.pm/usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Server.pm Data/Dumper.pm /usr/local/perl/5.6.0/i686-linux/Data/Dumper.pm Apache.pm /usr/local/perl/site_perl/5.6.0/i686-linux/Apache.pm Isn't this more or less what you mean? (darren) -- My studies in Speculative Philosophy, metaphysics, and science are all summed up in the image of a mouse called man running in and out of every hole in the Cosmos hunting for the Absolute Cheese. -- Edmund Burke
Re: RFC: Logging used Perl Modules (was Re: API Design Question)
darren chamberlain [EMAIL PROTECTED] wrote: James G Smith [EMAIL PROTECTED] said something to this effect on 07/02/2001: How would something like this do: NAME Apache::Use SYNOPSIS use Apache::Use (Logger = DB, File = /www/apache/logs/modules); DESCRIPTION Apache::Use will record the modules used over the course of the Perl interpreter's lifetime. If the logging module is able, the old logs are read and frequently used modules are automatically loaded. Note that no symbols are imported into packages. You can get this information from %INC, can't you? e.g.: Most definitely. However, you lose information about which modules are needed more often than others. There's no difference between all scripts needing CGI.pm and one script needing Foo::Bar. We also lose timing information. If 90% of the modules are loaded into the process with the last request before the child is destroyed, there's no point in loading them during the configuration phase. We can help this a little by taking snapshots of %INC at regular intervals (at the end of each request, for example). The current code I have uses %INC, but I wanted to write something like the following: sub use : immediate { # do stuff here if logging return CORE::use(@_); } -- James Smith [EMAIL PROTECTED], 979-862-3725 Texas AM CIS Operating Systems Group, Unix
Re: RFC: Logging used Perl Modules (was Re: API Design Question)
James G Smith [EMAIL PROTECTED] said something to this effect on 07/03/2001: darren chamberlain [EMAIL PROTECTED] wrote: James G Smith [EMAIL PROTECTED] said something to this effect on 07/02/2001: Apache::Use You can get this information from %INC, can't you? e.g.: Most definitely. However, you lose information about which modules are needed more often than others. There's no difference between all scripts needing CGI.pm and one script needing Foo::Bar. Good point. We also lose timing information. If 90% of the modules are loaded into the process with the last request before the child is destroyed, there's no point in loading them during the configuration phase. We can help this a little by taking snapshots of %INC at regular intervals (at the end of each request, for example). The current code I have uses %INC, but I wanted to write something like the following: sub use : immediate { # do stuff here if logging return CORE::use(@_); } To go OT here, what would 'immediate' be doing here, if Perl supported it? (darren) -- The three most dangerous things are a programmer with a soldering iron, a manager who codes, and a user who gets ideas.
Re: RFC: Logging used Perl Modules (was Re: API Design Question)
On Tuesday 03 July 2001 16:46, darren chamberlain wrote: James G Smith [EMAIL PROTECTED] said something to this effect: The current code I have uses %INC, but I wanted to write something like the following: sub use : immediate { # do stuff here if logging return CORE::use(@_); } To go OT here, what would 'immediate' be doing here, if Perl supported it? It would run, well, immediately :) Cuse is run before the rest of the code (apart from BEGIN blocks) which is why one can't overload it (now) iirc. -- ___ Robin Berjon [EMAIL PROTECTED] -- CTO k n o w s c a p e : // venture knowledge agency www.knowscape.com --- In which level of metalanguage are you now speaking?
RFC: Logging used Perl Modules (was Re: API Design Question)
How would something like this do: NAME Apache::Use SYNOPSIS use Apache::Use (Logger = DB, File = /www/apache/logs/modules); DESCRIPTION Apache::Use will record the modules used over the course of the Perl interpreter's lifetime. If the logging module is able, the old logs are read and frequently used modules are automatically loaded. Note that no symbols are imported into packages. --- I really wish we had `use' as a function instead of a keyword and had an `immediate' property for subs (kindof a Forth thing). Then we could do reference counting of `use' and `require'. If the above seems reasonable, I'll try to get a 0.01 out asap. Passing this by the modules list for comment also. The current code I have does not actually depend on Apache and mod_perl. -- James Smith [EMAIL PROTECTED], 979-862-3725 Texas AM CIS Operating Systems Group, Unix
Re: API Design Question
On Sat, 30 Jun 2001, Steven Lembark wrote: Note that if they do get called this will end up using more memory than if you had just loaded them during startup, since they won't be shared between child processes. Original assumption is that they are called infrequently. You'll also find that the amount of memory sucked up by a single subroutine isn't much, less than pre-loading possibly 10' s of sub's that never get called. The optimal approach would be 1. Use CGI.pm's like -compile import tag or Autosplit/Autoload to provide the interface for loading only the wanted subs. 2. Use DB::DB hook to collect the stats on what subs are actually used See this nice article for more info: http://www.ddj.com/columns/perl/2001/0103pl002.htm?topic=perl 3. Use ab or something else to exercise your service to call all possible URIs/args. Here you can use the access_log to feed learn what to feed to ab assuming that access_log is big enough to exercise all your services (which of course won't work for new services, and then you have to supply the possible URIs/args by yourself) 4. Feed the results of 2 and 3 into 1 in startup.pl and voila you have the perfect optimization. 5. If you modify your code you need either to rerun the stats collection or manually adjust the startup.pl file. Depending on how important is to squeeze the most our of your boxes and how big is your code base, this scenario may or may not apply to your situation, but it gives you a good idea of how Perl can help you. All these stages can be completely automated. This seems to be an interesting project for someone to implement and release as a general module. So one can plug a stats handler which will collect all the used modules (so you can preload them all at startup.pl) and all used package::sub's to be fed into modules using autosplit/autoload to load these from startup.pl. Here is a simple Apache::UsedModules package Apache::UsedModules; use strict; use Apache; if($ENV{MOD_PERL}) { Apache-push_handlers(PerlChildExitHandler = \handler); } sub handler { my $r = shift; my $file = /tmp/modules.$$; open LOG, $file or die cannot open $file: $!; print LOG \n# Used modules\n\n; for (sort grep !/^main$/, keys %INC){ next if m!^/|\.pl$!; # skip non modules print LOG qq{require $_;\n}; # ($INC{$_})\n; } close LOG; } 1; usage: PerlModule Apache::UsedModules or use Apache::UsedModules; # in startup.pl For subs stats you actually need to rework the DB::DB hook from Apache::DB or write a new one based on Apache::DB (preferrably). _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://apachetoday.com http://eXtropia.com/ http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: API Design Question
On Friday, June 29, 2001, at 07:25 , Shawn Devlin wrote: What advantages do I gain by grouping the functions based on functionality? As per my response to Mr. Worrall, one of my concerns with placing each function call into its own module is the amount of memory used by the various .pm files that will be loaded numerous times. I can see that grouping functions based on functionality would reduce the number of .pm files in memory. However, if I go that route, then would I not be better just to leave the API as one file? A good reason for grouping related functions is not so much functionality as common dependencies, and ease of change management. If everything is in one huge module, then change management become tricky, especially with multiple developers. Giving every function its own module avoids this, but can make tracking down dependencies tricky (and there may be a small memory overhead for each module, but I've never looked). A happy medium is to group together functions that share a dependencies on underlying database objects. For example, if you have a family of library functions that retrieve, insert, update, or delete user records, it might make sense to group these together in a module. If you need to add a new field to your user records, then you change only that module (as well as any changes required to your scripts).
Re: API Design Question
memory used by the various .pm files that will be loaded numerous times. I can see that grouping functions based on functionality would reduce the number of .pm files in memory. However, if I go that route, use only loads the .pm once. Multiple uses don't eat up any more resource than having it done once. The minimal-module approach can be managed nicely via Autosplit, which puts eash sub in its own module with a stub AUTOLOAD that snags things into core only when they are called (the ultimate in lazyness, no?). This is particularly nice for rarely called modules . One example is speicial exceptions in database app's. You can put the exception handler into a sub, have autosplit stuff it into a module and only load it into memory if the error does show up. This helps with code release issues because the related code lives in a single moule for editing and testing purposes but only sucks up core when needed. sl
Re: API Design Question
The minimal-module approach can be managed nicely via Autosplit, which puts eash sub in its own module with a stub AUTOLOAD that snags things into core only when they are called Note that if they do get called this will end up using more memory than if you had just loaded them during startup, since they won't be shared between child processes. - Perrin
Re: API Design Question
Note that if they do get called this will end up using more memory than if you had just loaded them during startup, since they won't be shared between child processes. Original assumption is that they are called infrequently. You'll also find that the amount of memory sucked up by a single subroutine isn't much, less than pre-loading possibly 10' s of sub's that never get called. sl
Re: API Design Question
Adam Worrall wrote: SD == Shawn Devlin [EMAIL PROTECTED] writes: SD My first thought is to break the API up so that there is a SD module per API call (there are some 70 calls in the API). My SD reasoning is that I can modify existing calls and add new ones SD without affecting everything else. Does this make sense or is it SD better to have the API as one large program as I have it now? I'd have thought you'be best to put the API in a large module, and then make calls to it from mod_perl handlers. You oculd even write a generic handler which chose which function to execute based on the arguments. Having a module per function may start to do your head in :) The bulk of the API is in 4 or 5 .pm files. My cgi script basically determines the call being made, vets the parameters, calls vaious functions in the .pm files, and then returns the result. The current format for the call is server.com/cgi-bin/api.pl?command=fooparm1=parm2= What I want to have is server.com/api/foo?parm1=parm2= The module that handles foo would check the parameters, make it's calls to the various internal functions, and then compose and send the results. What I like about this is I can add a new function without needing to disturb the existing code. Also, each function call is then self-contained. Currently, my existing API script is essentially a big switch statement. My concern is that each handler links the .pm files so with 50 or so functions I will have 50 or so copies of the various .pm files in memory. Yes - when some Perl in an Apache child process executes DBI::connect (which has been overridden by Apache::DBI), it first looks in a hash of existing connections before opening a new one. Good news ! Thanks for the conformation. Shawn
Re: API Design Question
James G Smith wrote: [snip] My first thought is to break the API up so that there is a module per API call (there are some 70 calls in the API). My reasoning is that I can modify existing calls and add new ones without affecting everything else. Does this make sense or is it better to have the API as one large program as I have it now? If it's an API, I'd not make one module per function, if by function you mean a call (e.g., fork() is a function in the Unix kernel API). Instead, I'd group them by functionality (as in OS/2 - VIO, KBD, DOS, ...). So one module might handle customer accounts, another handle news items, etc. What advantages do I gain by grouping the functions based on functionality? As per my response to Mr. Worrall, one of my concerns with placing each function call into its own module is the amount of memory used by the various .pm files that will be loaded numerous times. I can see that grouping functions based on functionality would reduce the number of .pm files in memory. However, if I go that route, then would I not be better just to leave the API as one file? Thanks, Shawn -- This communication is intended to be received by the individual or entity to whom or to which it is addressed and it contains information that is privileged, confidential and subject to copyright of Recognia Inc. Any unauthorized use, copying, review or disclosure is prohibited. If received in error, please contact me by phone at 613-623-6159 or by email at mailto:[EMAIL PROTECTED].
RE: API Design Question
Shawn, We have taken the approach here of a format like the on laid out below: (in startup.pl - use lib '/usr/local/apache/lib'; - add the directories to @INC) /usr/local/apache/lib/APP - where APP is the main name of our application. In this directory we will have perl modules that are shared by all the handlers that make up the application. Inside this directory we have a directory for our handlers - the handler would relate to the 'command=foo' part of your current call - like: /usr/local/apache/lib/APP/Command Inside the directory for the handler we have at least a Handler.pm which is reference in out perl.conf something (well exactly like): Location /command SetHandler perl-script PerlHandler APP::Command::Handler /Location Inside the Handler.pm perl module is a sub called handler that processed the requests made to that command. We also have a perl module for each action related to a command, like: List.pm - to process things related to a /command?action=list call. For out situation, this works well because it allows several developers to work on different parts of the same Command without stepping on each others toes, so to speak. Of course, this does add a level of complexity to project, that in the beginning was met with some resistance by long time perl programmers - myself included and I thought of the layout, but in practice it has let our team cut development times significantly. Of course a good versioning system will allow multiple users to access the same file and not cause problems, and we use CVS to provide version control. As with anything, you milage may vary with this method. I'm sure there are hidden pitfalls involved with this method, but for the timebeing it does seem to work for us. I hope this helps. Good Luck Joe Breeden -Original Message- From: Shawn Devlin [mailto:[EMAIL PROTECTED]] Sent: Friday, June 29, 2001 1:18 PM To: [EMAIL PROTECTED] Subject: Re: API Design Question Adam Worrall wrote: SD == Shawn Devlin [EMAIL PROTECTED] writes: SD My first thought is to break the API up so that there is a SD module per API call (there are some 70 calls in the API). My SD reasoning is that I can modify existing calls and add new ones SD without affecting everything else. Does this make sense or is it SD better to have the API as one large program as I have it now? I'd have thought you'be best to put the API in a large module, and then make calls to it from mod_perl handlers. You oculd even write a generic handler which chose which function to execute based on the arguments. Having a module per function may start to do your head in :) The bulk of the API is in 4 or 5 .pm files. My cgi script basically determines the call being made, vets the parameters, calls vaious functions in the .pm files, and then returns the result. The current format for the call is server.com/cgi-bin/api.pl?command=fooparm1=parm2= What I want to have is server.com/api/foo?parm1=parm2= The module that handles foo would check the parameters, make it's calls to the various internal functions, and then compose and send the results. What I like about this is I can add a new function without needing to disturb the existing code. Also, each function call is then self-contained. Currently, my existing API script is essentially a big switch statement. My concern is that each handler links the .pm files so with 50 or so functions I will have 50 or so copies of the various .pm files in memory. Yes - when some Perl in an Apache child process executes DBI::connect (which has been overridden by Apache::DBI), it first looks in a hash of existing connections before opening a new one. Good news ! Thanks for the conformation. Shawn
Re: API Design Question
- Original Message - From: Shawn Devlin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, June 29, 2001 8:18 PM Subject: Re: API Design Question What I like about this is I can add a new function without needing to disturb the existing code. Also, each function call is then self-contained. Currently, my existing API script is essentially a big switch statement. I see the point in separating it. I do the same thing myself. My concern is that each handler links the .pm files so with 50 or so functions I will have 50 or so copies of the various .pm files in memory. That's not quite right. In its simplest form, I can say that Apache gets one copy of each module per *child*, not per file, so 50 files doesn't mean you'll have 50 modules loaded. For example, if one child serves /api/foo, and /api/foo loads API1.pm and API2.pm, those will stay in memory, so that when the same child serves /api/bar, and /api/bar attempts to use API1.pm and API2.pm, the perl interpreter will find out that these 2 modules have already been loaded, and not reload them. But if you use preloading, as you should do, you get even more benefit from shared memory. If you preload your modules in starup.pl or with PerlModule in httpd.conf, they'll stay shared in memory, thus reducing the memory overhead. Per Einar Ellefsen