Re: RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-10 Thread James G Smith

Doug MacEachern [EMAIL PROTECTED] wrote:
On Tue, 3 Jul 2001, James G Smith wrote:
 
 The current code I have uses %INC, but I wanted to write
 something like the following:
 
 sub use : immediate {
   # do stuff here if logging
   return CORE::use(@_);
 }

you could just override CORE::GLOBAL::require.  you don't need to
override the import, and your version of require will be called at the
same time as the 'use'. 

Thanks!  I will see what I can do with that.
-- 
James Smith [EMAIL PROTECTED], 979-862-3725
Texas AM CIS Operating Systems Group, Unix



Re: RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-09 Thread Doug MacEachern

On Tue, 3 Jul 2001, James G Smith wrote:
 
 The current code I have uses %INC, but I wanted to write
 something like the following:
 
 sub use : immediate {
   # do stuff here if logging
   return CORE::use(@_);
 }

you could just override CORE::GLOBAL::require.  you don't need to
override the import, and your version of require will be called at the
same time as the 'use'. 




Re: RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-03 Thread darren chamberlain

James G Smith [EMAIL PROTECTED] said something to this effect on 07/02/2001:
 How would something like this do:
 
 NAME
 
 Apache::Use
 
 SYNOPSIS
 
 use Apache::Use (Logger = DB, File = /www/apache/logs/modules);
 
 DESCRIPTION
 
 Apache::Use will record the modules used over the course of the 
 Perl interpreter's lifetime.  If the logging module is able, the 
 old logs are read and frequently used modules are automatically 
 loaded.  Note that no symbols are imported into packages.

You can get this information from %INC, can't you? e.g.:

use Time::Local;
use Data::Dumper;
use Apache;

warn map sprintf(%-20.20s\t%s\n, $_, $INC{$_}), keys %INC;

Exporter.pm /usr/local/perl/5.6.0/Exporter.pm
Carp.pm /usr/local/perl/5.6.0/Carp.pm
XSLoader.pm /usr/local/perl/5.6.0/i686-linux/XSLoader.pm
mod_perl.pm /usr/local/perl/site_perl/5.6.0/i686-linux/mod_perl.pm
strict.pm   /usr/local/perl/5.6.0/strict.pm
Apache/Connection.pm/usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Connection.pm
Time/Local.pm   /usr/local/perl/5.6.0/Time/Local.pm
Apache/Table.pm /usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Table.pm
DynaLoader.pm   /usr/local/perl/5.6.0/i686-linux/DynaLoader.pm
overload.pm /usr/local/perl/5.6.0/overload.pm
Apache/Constants/Exp
/usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Constants/Exports.pm
AutoLoader.pm   /usr/local/perl/5.6.0/AutoLoader.pm
Apache/Server.pm/usr/local/perl/site_perl/5.6.0/i686-linux/Apache/Server.pm
Data/Dumper.pm  /usr/local/perl/5.6.0/i686-linux/Data/Dumper.pm
Apache.pm   /usr/local/perl/site_perl/5.6.0/i686-linux/Apache.pm

Isn't this more or less what you mean?

(darren)

-- 
My studies in Speculative Philosophy, metaphysics, and science are all
summed up in the image of a mouse called man running in and out of every
hole in the Cosmos hunting for the Absolute Cheese.
-- Edmund Burke



Re: RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-03 Thread James G Smith

darren chamberlain [EMAIL PROTECTED] wrote:
James G Smith [EMAIL PROTECTED] said something to this effect on 07/02/2001:
 How would something like this do:
 
 NAME
 
 Apache::Use
 
 SYNOPSIS
 
 use Apache::Use (Logger = DB, File = /www/apache/logs/modules);
 
 DESCRIPTION
 
 Apache::Use will record the modules used over the course of the 
 Perl interpreter's lifetime.  If the logging module is able, the 
 old logs are read and frequently used modules are automatically 
 loaded.  Note that no symbols are imported into packages.

You can get this information from %INC, can't you? e.g.:

Most definitely.  However, you lose information about which 
modules are needed more often than others.  There's no difference 
between all scripts needing CGI.pm and one script needing 
Foo::Bar.  

We also lose timing information.  If 90% of the modules are 
loaded into the process with the last request before the child is 
destroyed, there's no point in loading them during the 
configuration phase.  We can help this a little by taking 
snapshots of %INC at regular intervals (at the end of each 
request, for example).

The current code I have uses %INC, but I wanted to write
something like the following:

sub use : immediate {
  # do stuff here if logging
  return CORE::use(@_);
}
-- 
James Smith [EMAIL PROTECTED], 979-862-3725
Texas AM CIS Operating Systems Group, Unix



Re: RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-03 Thread darren chamberlain

James G Smith [EMAIL PROTECTED] said something to this effect on 07/03/2001:
 darren chamberlain [EMAIL PROTECTED] wrote:
  James G Smith [EMAIL PROTECTED] said something to this effect on 07/02/2001:
   Apache::Use
 
  You can get this information from %INC, can't you? e.g.:
 
 Most definitely.  However, you lose information about which 
 modules are needed more often than others.  There's no difference 
 between all scripts needing CGI.pm and one script needing 
 Foo::Bar.  

Good point.

 We also lose timing information.  If 90% of the modules are 
 loaded into the process with the last request before the child is 
 destroyed, there's no point in loading them during the 
 configuration phase.  We can help this a little by taking 
 snapshots of %INC at regular intervals (at the end of each 
 request, for example).
 
 The current code I have uses %INC, but I wanted to write
 something like the following:
 
 sub use : immediate {
   # do stuff here if logging
   return CORE::use(@_);
 }

To go OT here, what would 'immediate' be doing here, if Perl
supported it?

(darren)

-- 
The three most dangerous things are a programmer with a soldering
iron, a manager who codes, and a user who gets ideas.



Re: RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-03 Thread Robin Berjon

On Tuesday 03 July 2001 16:46, darren chamberlain wrote:
 James G Smith [EMAIL PROTECTED] said something to this effect:
  The current code I have uses %INC, but I wanted to write
  something like the following:
 
  sub use : immediate {
# do stuff here if logging
return CORE::use(@_);
  }

 To go OT here, what would 'immediate' be doing here, if Perl
 supported it?

It would run, well, immediately :) Cuse is run before the rest of the code 
(apart from BEGIN blocks) which is why one can't overload it (now) iirc.

-- 
___
Robin Berjon [EMAIL PROTECTED] -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
---
In which level of metalanguage are you now speaking?




RFC: Logging used Perl Modules (was Re: API Design Question)

2001-07-02 Thread James G Smith

How would something like this do:

NAME

Apache::Use

SYNOPSIS

use Apache::Use (Logger = DB, File = /www/apache/logs/modules);

DESCRIPTION

Apache::Use will record the modules used over the course of the 
Perl interpreter's lifetime.  If the logging module is able, the 
old logs are read and frequently used modules are automatically 
loaded.  Note that no symbols are imported into packages.

---

I really wish we had `use' as a function instead of a keyword and 
had an `immediate' property for subs (kindof a Forth thing).  
Then we could do reference counting of `use' and `require'.

If the above seems reasonable, I'll try to get a 0.01 out asap.  
Passing this by the modules list for comment also.  The current 
code I have does not actually depend on Apache and mod_perl.
-- 
James Smith [EMAIL PROTECTED], 979-862-3725
Texas AM CIS Operating Systems Group, Unix



Re: API Design Question

2001-07-01 Thread Stas Bekman

On Sat, 30 Jun 2001, Steven Lembark wrote:



  Note that if they do get called this will end up using more memory than if
  you had just loaded them during startup, since they won't be shared between
  child processes.

 Original assumption is that they are called infrequently.  You'll also find
 that the amount of memory sucked up by a single subroutine isn't much,
 less than pre-loading possibly 10' s of sub's that never get called.

The optimal approach would be

1. Use CGI.pm's like -compile import tag or Autosplit/Autoload to
provide the interface for loading only the wanted subs.

2. Use DB::DB hook to collect the stats on what subs are actually used See
this nice article for more info:
http://www.ddj.com/columns/perl/2001/0103pl002.htm?topic=perl

3. Use ab or something else to exercise your service to call all possible
URIs/args. Here you can use the access_log to feed learn what to feed to
ab assuming that access_log is big enough to exercise all your services
(which of course won't work for new services, and then you have to supply
the possible URIs/args by yourself)

4. Feed the results of 2 and 3 into 1 in startup.pl and voila you have the
perfect optimization.

5. If you modify your code you need either to rerun the stats collection
or manually adjust the startup.pl file.

Depending on how important is to squeeze the most our of your boxes and
how big is your code base, this scenario may or may not apply to your
situation, but it gives you a good idea of how Perl can help you.

All these stages can be completely automated.

This seems to be an interesting project for someone to implement and
release as a general module. So one can plug a stats handler which will
collect all the used modules (so you can preload them all at startup.pl)
and all used package::sub's to be fed into modules using
autosplit/autoload to load these from startup.pl.

Here is a simple Apache::UsedModules

package Apache::UsedModules;

use strict;
use Apache;

if($ENV{MOD_PERL}) {
Apache-push_handlers(PerlChildExitHandler = \handler);
}

sub handler {
my $r = shift;

my $file = /tmp/modules.$$;
open LOG, $file or die cannot open $file: $!;
print LOG \n# Used modules\n\n;
for (sort grep !/^main$/, keys %INC){
next if m!^/|\.pl$!; # skip non modules
print LOG qq{require $_;\n}; # ($INC{$_})\n;
}
close LOG;

}
1;

usage:
PerlModule Apache::UsedModules

or
use Apache::UsedModules; # in startup.pl

For subs stats you actually need to rework the DB::DB hook from
Apache::DB or write a new one based on Apache::DB (preferrably).

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://eXtropia.com/
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/






Re: API Design Question

2001-06-30 Thread Martin Redington


On Friday, June 29, 2001, at 07:25 , Shawn Devlin wrote:

 What advantages do I gain by grouping the functions based on 
 functionality? As per my response to Mr. Worrall, one of my concerns 
 with placing each function call into its own module is the amount of 
 memory used by the various .pm files that will be loaded numerous 
 times. I can see that grouping functions based on functionality would 
 reduce the number of .pm files in memory. However, if I go that route, 
 then would I not be better just to leave the API as one file?

A good reason for grouping related functions is not so much 
functionality as common dependencies, and ease of change management.

If everything is in one huge module, then change management become 
tricky, especially with multiple developers. Giving every function its 
own module avoids this, but can make tracking down dependencies tricky 
(and there may be a small memory overhead for each module, but I've 
never looked).

A happy medium is to group together functions that share a dependencies 
on underlying database objects.

For example, if you have a family of library functions that retrieve, 
insert, update, or delete user records, it might make sense to group 
these together in a module. If you need to add a new field to your user 
records, then you change only that module (as well as any changes 
required to your scripts).




Re: API Design Question

2001-06-30 Thread Steven Lembark


 memory used by the various .pm files that will be loaded numerous 
 times. I can see that grouping functions based on functionality would 
 reduce the number of .pm files in memory. However, if I go that route, 

use only loads the .pm once.  Multiple uses don't eat up any more resource
than having it done once.

The minimal-module approach can be managed nicely via Autosplit, which 
puts eash sub in its own module with a stub AUTOLOAD that snags things 
into core only when they are called (the ultimate in lazyness, no?).  This is
particularly nice for rarely called modules .  One example is speicial exceptions
in database app's.  You can put the exception handler into a sub, have autosplit
stuff it into a module and only load it into memory if the error does show up.

This helps with code release issues because the related code lives in a single
moule for editing and testing purposes but only sucks up core when needed.

sl 



Re: API Design Question

2001-06-30 Thread Perrin Harkins

 The minimal-module approach can be managed nicely via Autosplit, which
 puts eash sub in its own module with a stub AUTOLOAD that snags things
 into core only when they are called

Note that if they do get called this will end up using more memory than if
you had just loaded them during startup, since they won't be shared between
child processes.
- Perrin




Re: API Design Question

2001-06-30 Thread Steven Lembark



 Note that if they do get called this will end up using more memory than if
 you had just loaded them during startup, since they won't be shared between
 child processes.

Original assumption is that they are called infrequently.  You'll also find 
that the amount of memory sucked up by a single subroutine isn't much,
less than pre-loading possibly 10' s of sub's that never get called.

sl



Re: API Design Question

2001-06-29 Thread Shawn Devlin

Adam Worrall wrote:

SD == Shawn Devlin [EMAIL PROTECTED] writes:


SD My first thought is to break the API up so that there is a
SD module per API call (there are some 70 calls in the API). My
SD reasoning is that I can modify existing calls and add new ones
SD without affecting everything else. Does this make sense or is it
SD better to have the API as one large program as I have it now?

I'd have thought you'be best to put the API in a large module, and then
make calls to it from mod_perl handlers. You oculd even write a generic
handler which chose which function to execute based on the arguments.

Having a module per function may start to do your head in :)

The bulk of the API is in 4 or 5 .pm files. My cgi script basically 
determines the call being made, vets the parameters, calls vaious 
functions in the .pm files, and then returns the result. The current 
format for the call is

server.com/cgi-bin/api.pl?command=fooparm1=parm2=

What I want to have is

server.com/api/foo?parm1=parm2=

The module that handles foo would check the parameters, make it's calls 
to the various internal functions, and then compose and send the results.

What I like about this is I can add a new function without needing to 
disturb the existing code. Also, each function call is then 
self-contained. Currently, my existing API script is essentially a big 
switch statement.

My concern is that each handler links the .pm files so with 50 or so 
functions I will have 50 or so copies of the various .pm files in memory.




Yes - when some Perl in an Apache child process executes DBI::connect
(which has been overridden by Apache::DBI), it first looks in a hash of
existing connections before opening a new one. Good news !

Thanks for the conformation.


Shawn





Re: API Design Question

2001-06-29 Thread Shawn Devlin

James G Smith wrote:

[snip]

My first thought is to break the API up so that there is a module per 
API call (there are some 70 calls in the API). My reasoning is that I 
can modify existing calls and add new ones without affecting everything 
else. Does this make sense or is it better to have the API as one large 
program as I have it now?


If it's an API, I'd not make one module per function, if by function you mean 
a call (e.g., fork() is a function in the Unix kernel API).  Instead, I'd 
group them by functionality (as in OS/2 - VIO, KBD, DOS, ...).  So one module 
might handle customer accounts, another handle news items, etc.

What advantages do I gain by grouping the functions based on 
functionality? As per my response to Mr. Worrall, one of my concerns 
with placing each function call into its own module is the amount of 
memory used by the various .pm files that will be loaded numerous times. 
I can see that grouping functions based on functionality would reduce 
the number of .pm files in memory. However, if I go that route, then 
would I not be better just to leave the API as one file?

Thanks,

Shawn

-- 
This communication is intended to be received by the individual or
entity to whom or to which it is addressed and it contains information
that is privileged, confidential and subject to copyright of 
Recognia Inc. Any unauthorized use, copying, review or
disclosure is prohibited. If received in error, please contact me by
phone at 613-623-6159 or by email at mailto:[EMAIL PROTECTED].






RE: API Design Question

2001-06-29 Thread Joe Breeden

Shawn,

We have taken the approach here of a format like the on laid out below:

(in startup.pl - use lib '/usr/local/apache/lib'; - add the directories to
@INC)

/usr/local/apache/lib/APP - where APP is the main name of our application.
In this directory we will have perl modules that are shared by all the
handlers that make up the application.

Inside this directory we have a directory for our handlers - the handler
would relate to the 'command=foo' part of your current call - like:

/usr/local/apache/lib/APP/Command

Inside the directory for the handler we have at least a Handler.pm which is
reference in out perl.conf something (well exactly like):

Location /command
SetHandler perl-script
PerlHandler APP::Command::Handler
/Location

Inside the Handler.pm perl module is a sub called handler that processed the
requests made to that command. We also have a perl module for each action
related to a command, like:

List.pm - to process things related to a /command?action=list call. 

For out situation, this works well because it allows several developers to
work on different parts of the same Command without stepping on each others
toes, so to speak. Of course, this does add a level of complexity to
project, that in the beginning was met with some resistance by long time
perl programmers - myself included and I thought of the layout, but in
practice it has let our team cut development times significantly. Of course
a good versioning system will allow multiple users to access the same file
and not cause problems, and we use CVS to provide version control. 

As with anything, you milage may vary with this method. I'm sure there are
hidden pitfalls involved with this method, but for the timebeing it does
seem to work for us. I hope this helps.

Good Luck

Joe Breeden



 -Original Message-
 From: Shawn Devlin [mailto:[EMAIL PROTECTED]]
 Sent: Friday, June 29, 2001 1:18 PM
 To: [EMAIL PROTECTED]
 Subject: Re: API Design Question
 
 
 Adam Worrall wrote:
 
 SD == Shawn Devlin [EMAIL PROTECTED] writes:
 
 
 SD My first thought is to break the API up so that there is a
 SD module per API call (there are some 70 calls in the API). My
 SD reasoning is that I can modify existing calls and 
 add new ones
 SD without affecting everything else. Does this make 
 sense or is it
 SD better to have the API as one large program as I have it now?
 
 I'd have thought you'be best to put the API in a large 
 module, and then
 make calls to it from mod_perl handlers. You oculd even 
 write a generic
 handler which chose which function to execute based on the arguments.
 
 Having a module per function may start to do your head in :)
 
 The bulk of the API is in 4 or 5 .pm files. My cgi script basically 
 determines the call being made, vets the parameters, calls vaious 
 functions in the .pm files, and then returns the result. The current 
 format for the call is
 
 server.com/cgi-bin/api.pl?command=fooparm1=parm2=
 
 What I want to have is
 
 server.com/api/foo?parm1=parm2=
 
 The module that handles foo would check the parameters, make 
 it's calls 
 to the various internal functions, and then compose and send 
 the results.
 
 What I like about this is I can add a new function without needing to 
 disturb the existing code. Also, each function call is then 
 self-contained. Currently, my existing API script is 
 essentially a big 
 switch statement.
 
 My concern is that each handler links the .pm files so with 50 or so 
 functions I will have 50 or so copies of the various .pm 
 files in memory.
 
 
 
 
 Yes - when some Perl in an Apache child process executes DBI::connect
 (which has been overridden by Apache::DBI), it first looks 
 in a hash of
 existing connections before opening a new one. Good news !
 
 Thanks for the conformation.
 
 
 Shawn
 
 



Re: API Design Question

2001-06-29 Thread Per Einar


- Original Message -
From: Shawn Devlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, June 29, 2001 8:18 PM
Subject: Re: API Design Question



 What I like about this is I can add a new function without needing to
 disturb the existing code. Also, each function call is then
 self-contained. Currently, my existing API script is essentially a big
 switch statement.

I see the point in separating it. I do the same thing myself.


 My concern is that each handler links the .pm files so with 50 or so
 functions I will have 50 or so copies of the various .pm files in memory.

That's not quite right. In its simplest form, I can say that Apache gets one
copy of each module per *child*, not per file, so 50 files doesn't mean
you'll have 50 modules loaded. For example, if one child serves /api/foo,
and /api/foo loads API1.pm and API2.pm, those will stay in memory, so that
when the same child serves /api/bar, and /api/bar attempts to use API1.pm
and API2.pm, the perl interpreter will find out that these 2 modules have
already been loaded, and not reload them.

But if you use preloading, as you should do, you get even more benefit from
shared memory. If you preload your modules in starup.pl or with PerlModule
in httpd.conf, they'll stay shared in memory, thus reducing the memory
overhead.

Per Einar Ellefsen