Refreshing stored data at administrator's signal

2008-01-13 Thread Colin Wetherbee

Greetings.

I have an application that accesses some relatively static database 
tables to create drop-down  lists.  As an example, one of these 
tables is a list of common commercial aircraft.


At the moment (and not in a production environment), every time the 
drop-down list is generated for a web page, the script queries the 
database to retrieve the entire list of aircraft.  I would prefer to 
retrieve the list of aircraft when each Perl interpreter starts and then 
not retrieve it again until the administrator sends a signal.  For this 
particular table, the signal would only occur when new aircraft hit the 
market, like the Boeing 787 will (hopefully) in December of this year.


The most UNIX-ish way to do this, I guess, would be to send SIGHUP to 
each running perl process, causing it to reload its configuration, 
update its stored lists, and so forth.  I'd rather do this in a more 
Perl-ish or Apache-ish way, though, and I'd also rather be specific 
about which list it should update.


At the moment, there are about 10 such lists, and I can see that number 
growing to about 20 before the site goes live.  At a guess, the lists 
average about 300 elements each (with the list of aircraft being one of 
the shorter and less-frequently-updated lists).


My ideal solution would be to have an external application (the 
administrator app or whatever it ends up being) update some flag inside 
each mod_perl process every time they need to update a list, and then 
each mod_perl application would see the flag and perform the update.  I 
could do this relatively easily with a combination of threads and file 
hooks or "update sockets" or something, but I don't plan on adding 
threads or sockets to the application, and I think adding that much 
complexity and overhead for this "simple" feature would be overkill.


Any thoughts?

Thanks.

Colin


Re: Refreshing stored data at administrator's signal

2008-01-13 Thread John ORourke

Colin Wetherbee wrote:
At the moment (and not in a production environment), every time the 
drop-down list is generated for a web page, the script queries the 
database to retrieve the entire list of aircraft.  I would prefer to 
retrieve the list of aircraft when each Perl interpreter starts and 
then not retrieve it again until the administrator sends a signal.  
For this particular table, the signal would only occur when new 
aircraft hit the market, like the Boeing 787 will (hopefully) in 
December of this year.


The most UNIX-ish way to do this, I guess, would be to send SIGHUP to 
each running perl process, causing it to reload its configuration, 
update its stored lists, and so forth.  I'd rather do this in a more 
Perl-ish or Apache-ish way, though, and I'd also rather be specific 
about which list it should update.


Wouldn't a simpler approach be to just restart Apache when you want to 
update the lists?  You could even have the 'add to list' function send 
SIGUSR1 to the parent Apache, causing a graceful restart.


Having said that, if running 20 DB queries returning a few hundred 
records is causing you a speed problem, are you sure the DB is running 
efficiently?  Is this a very high traffic site?  Is there a requirement 
for ultra-fast page generation?  I've got pages that make dozens and 
dozens of DB queries returning hundreds of records and do lots of 
post-processing, and I can generate pages in under a second much of the 
time.


cheers
John



Re: Refreshing stored data at administrator's signal

2008-01-13 Thread Colin Wetherbee

John ORourke wrote:

Colin Wetherbee wrote:
At the moment (and not in a production environment), every time the 
drop-down list is generated for a web page, the script queries the 
database to retrieve the entire list of aircraft.  I would prefer to 
retrieve the list of aircraft when each Perl interpreter starts and 
then not retrieve it again until the administrator sends a signal.  
For this particular table, the signal would only occur when new 
aircraft hit the market, like the Boeing 787 will (hopefully) in 
December of this year.


The most UNIX-ish way to do this, I guess, would be to send SIGHUP to 
each running perl process, causing it to reload its configuration, 
update its stored lists, and so forth.  I'd rather do this in a more 
Perl-ish or Apache-ish way, though, and I'd also rather be specific 
about which list it should update.


Wouldn't a simpler approach be to just restart Apache when you want to 
update the lists?  You could even have the 'add to list' function send 
SIGUSR1 to the parent Apache, causing a graceful restart.


I'm trying to avoid restarting Apache altogether, although I admit it 
would be a pretty simple solution.


Having said that, if running 20 DB queries returning a few hundred 
records is causing you a speed problem, are you sure the DB is running 
efficiently?  Is this a very high traffic site?  Is there a requirement 
for ultra-fast page generation?  I've got pages that make dozens and 
dozens of DB queries returning hundreds of records and do lots of 
post-processing, and I can generate pages in under a second much of the 
time.


The point is more like "well, this isn't really super-dynamic data, so 
running a query every time I need it seems like a waste of processor 
time and disk activity."


It's not causing any slow-down right now, though when the site goes 
live, it certainly could.


Colin


Re: Refreshing stored data at administrator's signal

2008-01-13 Thread John ORourke

Colin Wetherbee wrote:

John ORourke wrote:

Colin Wetherbee wrote:
Wouldn't a simpler approach be to just restart Apache when you want 
to update the lists?  You could even have the 'add to list' function 
send SIGUSR1 to the parent Apache, causing a graceful restart.


I'm trying to avoid restarting Apache altogether, although I admit it 
would be a pretty simple solution.



I'd seriously consider it - it's simple and clean and only takes a few 
seconds, and it happens every night when you rotate your logs anyway.  
If you really really don't want to restart Apache, you could get your 
'add data' function to create a file called 'need_restart' somewhere on 
the disk, and after processing each request your mod_perl handler could 
check for the file and call $r->child_terminate if it finds it.  You'd 
have to have some method of stopping it from constantly restarting 
could get complicated.


The cynic in me suspects you'll spend too many hours on this 
not-really-a-problem, when there may be other parts of the system that 
would benefit from more attention!


cheers
John



Re: Refreshing stored data at administrator's signal

2008-01-13 Thread Colin Wetherbee

John ORourke wrote:

Colin Wetherbee wrote:

John ORourke wrote:

Colin Wetherbee wrote:
Wouldn't a simpler approach be to just restart Apache when you want 
to update the lists?  You could even have the 'add to list' function 
send SIGUSR1 to the parent Apache, causing a graceful restart.


I'm trying to avoid restarting Apache altogether, although I admit it 
would be a pretty simple solution.


I'd seriously consider it - it's simple and clean and only takes a few 
seconds, and it happens every night when you rotate your logs anyway.  
If you really really don't want to restart Apache, you could get your 
'add data' function to create a file called 'need_restart' somewhere on 
the disk, and after processing each request your mod_erl handler could 
check for the file and call $r->child_terminate if it finds it.  You'd 
have to have some method of stopping it from constantly restarting 
could get complicated.


I thought about the file thing... if the file exists, check its last 
modified timestamp; if that timestamp is greater than the stored 
timestamp, then update the data from the database.  It seems like 
unnecessary disk access, though.  Then again, this whole problem is 
riddled with unnecessary disk access. :)


The cynic in me suspects you'll spend too many hours on this 
not-really-a-problem, when there may be other parts of the system that 
would benefit from more attention!


Well, you're probably right about that. ;)

Perhaps I'll set up a restart-based system and then worry about it later 
if it becomes an "actual" problem.


Thanks for your input.

Colin


Re: Refreshing stored data at administrator's signal

2008-01-13 Thread Perrin Harkins
On Jan 13, 2008 4:19 PM, Colin Wetherbee <[EMAIL PROTECTED]> wrote:
> I thought about the file thing... if the file exists, check its last
> modified timestamp; if that timestamp is greater than the stored
> timestamp, then update the data from the database.  It seems like
> unnecessary disk access, though.  Then again, this whole problem is
> riddled with unnecessary disk access. :)

Using a "touch file" is the classic solution to this problem.  You
check the mod time on a file (it's okay for it always be there -- it's
just the mod time we care about) and compare that to the last update
time that you keep in a global.  It's dirt simple, avoids messy
problems with signals, and it should end up in your operating system's
disk cache so it really won't do any physical disk reads.

- Perrin


Re: Refreshing stored data at administrator's signal

2008-01-14 Thread Colin Wetherbee

Scott Gifford wrote:

Colin Wetherbee <[EMAIL PROTECTED]> writes:

[...]


At the moment (and not in a production environment), every time the
drop-down list is generated for a web page, the script queries the
database to retrieve the entire list of aircraft.  I would prefer to
retrieve the list of aircraft when each Perl interpreter starts and
then not retrieve it again until the administrator sends a signal.
For this particular table, the signal would only occur when new
aircraft hit the market, like the Boeing 787 will (hopefully) in
December of this year.


Essentially what you want is an in-memory cache of a possibly slow
database query.  There are several modules on CPAN that do this;
search for "cache".


I'm not sure what you're suggesting.  The first few pages of "cache" on 
CPAN have some modules for caching data in memory and on disk and so 
forth, but I don't see how they relate to my problem.


Which is that of notifying all of my application's perl processes when 
an update has been performed on a table in a database, without having 
them access the database to determine this on their own.


Thanks.

Colin



Re: Refreshing stored data at administrator's signal

2008-01-14 Thread Clinton Gormley

> I'm not sure what you're suggesting.  The first few pages of "cache" on 
> CPAN have some modules for caching data in memory and on disk and so 
> forth, but I don't see how they relate to my problem.
> 
> Which is that of notifying all of my application's perl processes when 
> an update has been performed on a table in a database, without having 
> them access the database to determine this on their own.

There are two ways of achieving your task:
 - active: forcing all the apache processes to update their list of 
   aircraft
 - passive: having each apache child check on whether the copy of the 
   list they already have is still up to date

By far the simplest way of achieving the first option is by having the
parent load and cache the list (which means that memory is shared by all
the child processes) and restarting your apache processes when the list
changes.

For the passive route, each apache child has to perform some kind of
check to see whether their version is up to date.  This requires some
kind of check somewhere, eg:
 - checking the last modified time of a file
 - loading the list from a cache 
 - loading the list from the database

Your intention is to reduce the number of database hits.  That's fine,
but it needs to be weighed against the cost of inflexibility, or the
cost of checking and rebuilding the cache.

For data that almost never changes, I would go the active route.

For data that changes more regularly, but has a certain time-to-live, I
would go the caching route.  For data that changes by the second, get it
directly from the DB.

So searching for 'cache' on CPAN, indeed gives you a number of very
useful modules that ease your path to reducing the number of DB hits
that you have.

My personal favourite is Cache::Memcached, but that's only relevant if
you have more than one web server.  If not, the file based caches are
the fastest (or you could try looking at SQLite or Cache::BerkleyDB or
even a memory table in MySQL, but on a different DB server)

regards

Clint


> 
> Thanks.
> 
> Colin
> 



Re: Refreshing stored data at administrator's signal

2008-01-14 Thread Colin Wetherbee

Clinton Gormley wrote:
I'm not sure what you're suggesting.  The first few pages of "cache" on 
CPAN have some modules for caching data in memory and on disk and so 
forth, but I don't see how they relate to my problem.


Which is that of notifying all of my application's perl processes when 
an update has been performed on a table in a database, without having 
them access the database to determine this on their own.


My personal favourite is Cache::Memcached, but that's only relevant if
you have more than one web server.  If not, the file based caches are
the fastest (or you could try looking at SQLite or Cache::BerkleyDB or
even a memory table in MySQL, but on a different DB server)


Memcached sounds like a good idea.  I could cache the update timestamps 
and compare them on each run.


I guess I wasn't thinking about "cache" the right way around.

Thanks!

Colin


Re: Refreshing stored data at administrator's signal

2008-01-15 Thread Wagner, Chris (GEAE, CBTS)
Hi.  The touch file will definately work and I've used that myself but
in this case its inelegance bothers me.  It's also another touch point
for administration.  What I would probably do is put the state
information in the database itself.  The script would keep track of the
age of its data and every 5 minutes or so it would check the state
information in the course of its normal operation.  So when a user hit
causes the script to execute the last thing it does is see if it's state
data is more than 5 minutes old and if so refresh it.  If the state
information has changed it would reload everything indicated right
there.  U want to do this at the very tail end of the script so the
refresh doesn't delay the page draw for the user.  This way u've avoided
expanding ur administrative footprint.

Colin Wetherbee wrote:
> 
> Greetings.
> 
> I have an application that accesses some relatively static database
> tables to create drop-down  lists.  As an example, one of these
> tables is a list of common commercial aircraft.
> 
> At the moment (and not in a production environment), every time the
> drop-down list is generated for a web page, the script queries the
> database to retrieve the entire list of aircraft.  I would prefer to
> retrieve the list of aircraft when each Perl interpreter starts and then
> not retrieve it again until the administrator sends a signal. 
> Thanks.
> 
> Colin

-- 
Chris Wagner
CBTS
GE Aircraft Engines
[EMAIL PROTECTED]