Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread shane

> > Actually in my experience the sharing of memory doesn't work as well  
> > as one would hope. While compiling perl allocates memory for code  
> > and data (variables) from the same memory pools, so code and  
> > variables are interlaced. Over the lifetime of a apache/mod_perl  
> > child a lot of memory pages are promoted from shared to unshared that  
> > contain mostly code and one or two variables... If someone with more  
> > knowledge of the perl internals were to change that, this would make  
> > a huge difference for mod_perl users and everybody else writing  
> > daemons in perl that spawn many children.

Well... the structs for the "Code Values" are mixed with both code,
and variables in the case of Lexicals.  However, during runtime these
are not altered.  Their are structs within the main code value struct
that will be altered during run time, namely the recursive lexical
array.  (That's what I call it, but I'm sure Malcolm uses a more
corect word :->)  However, the actual Code (a series of highly
optimized opcodes... instruction set workalike type stuff that take a
few clockcycles each to do depending on the op and the architecture)
is not memory inline with the structs for the recursive lexical array.

What I'm saying is that if you include all your code at the very
begining the design of perl will not alter that code, so it should be
allowed to be fixed and shared.  Basically it just holds an opcode
pointer as to which opcode its working at within the CV.  The
recursive lexical array itself just has a pointer within the "code
value" struct to itself.  So basically that main Code struct should
never need to be realloc'd so it's fairly unlikely that it would need
to be non-shared.  However maybe someone that understands how
something is "unshared" within the kernel could be quite helpful.  If
you were to change where something was pointing within a struct would
that cause it to be unshared?  I think that it's fairly unlikely, but
I suppose it's possible.  If that's the case then it's quite likely
that code pieces could become unshared I suppose.  However the main
hunk of actual function opcodes would remain fixed, only the execution
pointer (where it's pointing at within the present program) would
change.  So, in final (!) the code should always be shared.  However
if you change the file and it checks the date on it and reloads it,
obviously it won't be shared :-).

> The Perl is a language that uses weak data types, i.e. you don't specify
> variable size (type) like you do in the strong typed languages like C. 
> Therefore Perl uses heap memory by allocating memory on demand, rather
> (unmodifiable) text and (modifiable) data memory pages, used by C.  The
> latter get allocated when the program is loaded into memory before it
> starts to run, the former is allocated at the run-time. 

Yes that's true.  There is some compile time stuff where it organizes
the variable names within the lexical array, but I'm not sure whether
or not it actually reserves space for those things at that time.  I'm
really not sure about that item.

> On heap there is no separation for data and text pages, when you call
> malloc it just allocates you a chunk you have asked for, this leads to the
> case where the code (static in most cases) and the variables (dynamic) 
> land on the same memory pages.

This is a weak area in my knowledge.  I'm not certain how the kernel
actually marks segments as shared and not... so I'll refrain from
commenting.
 
> I'm not sure this a very good explanation. Perl gurus are very welcome to
> correct/improve my attempt to explain this.

I've tryed to explain what I can.  The best book for this is "Advanced
Perl Programming"... published by Oreilly.  (of course)  There is a
chapter in there written by Malcolm Beatie (well pieces of the chapter)
that are pretty good..., but I'm afraid they might not go into enough
depth on these exact issues.  Not only that, but these are also very
kernel related too..., you have to understand how both pieces fit
together, and frankly I couldn't answer that, and I don't know a
person alive that could :-).  (I'm sure there are some, but who?)

Thanks,
Shane.

> 
> But the main point is that that's how Perl is written, and I don't know
> whether it can be changed.
> 
> __
> Stas Bekman | JAm_pH--Just Another mod_perl Hacker
> http://stason.org/  | mod_perl Guide  http://perl.apache.org/guide 
> mailto:[EMAIL PROTECTED]  | http://perl.orghttp://stason.org/TULARC/
> http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
> --
> 
> 
> 



Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Gunther Birznieks

At 01:44 AM 4/20/00 +0300, Stas Bekman wrote:
>On Wed, 19 Apr 2000, Gerd Knops wrote:
>
> > Stas Bekman wrote:
> > > On Wed, 19 Apr 2000, Matt Carothers wrote:
> > > > On Tue, 18 Apr 2000, Stas Bekman wrote:
> > > >
> > > > > Let's assume that you have two different sets of scripts/code
> > > > > which have a little or nothing in common at all (different
> > > > > modules, no base code sharing), the basic mod_perl process
> > > > > before the code have been loaded of three Mbytes and each code
> > > > > base adds ten Mbytes when loaded. Which makes each process 23Mb
> > > > > in size when all the code gets loaded.
> > > >
> > > > Can't you share most of that 23mb between the processes by
> > > > pre-loading the scripts/modules in your startup.pl? I'd say the
> > > > main advantage of engineering dissimilar services as if they were
> > > > on separate servers is scalability rather than memory use. When a
> > > > site outgrows the hardware it's on, spreading it out to multiple
> > > > machines requires a lot less ankle grabbing if it was designed
> > > > that way to begin with. :)
> > >
> > > Geez, I always forget something :(
> > >
> > > You are right. I forgot to mention that this was a scenario for the
> > > 23 Mb of unshared memory. I just wanted to give an example. Still
> > > somehow I'm almost sure that there are servers where even with
> > > sharing in place, the hypothetical scenario I've presented is quite
> > > possible.
> > >
> > > Anyway, it's just another patent for squeezing some more juice from
> > > your hardware without upgrading it.
> > >
> > > But, sure I'll add the correction about the sharing memory which
> > > drastically changes the story :)
> > >
> > Actually in my experience the sharing of memory doesn't work as well
> > as one would hope. While compiling perl allocates memory for code
> > and data (variables) from the same memory pools, so code and
> > variables are interlaced. Over the lifetime of a apache/mod_perl
> > child a lot of memory pages are promoted from shared to unshared that
> > contain mostly code and one or two variables... If someone with more
> > knowledge of the perl internals were to change that, this would make
> > a huge difference for mod_perl users and everybody else writing
> > daemons in perl that spawn many children.
>
>The Perl is a language that uses weak data types, i.e. you don't specify
>variable size (type) like you do in the strong typed languages like C.
>Therefore Perl uses heap memory by allocating memory on demand, rather
>(unmodifiable) text and (modifiable) data memory pages, used by C.  The
>latter get allocated when the program is loaded into memory before it
>starts to run, the former is allocated at the run-time.
>
>On heap there is no separation for data and text pages, when you call
>malloc it just allocates you a chunk you have asked for, this leads to the
>case where the code (static in most cases) and the variables (dynamic)
>land on the same memory pages.
>
>I'm not sure this a very good explanation. Perl gurus are very welcome to
>correct/improve my attempt to explain this.
>
>But the main point is that that's how Perl is written, and I don't know
>whether it can be changed.
That's how I understand it to be. But I could be wrong as well. :)

By the way, I think your section here is actually far-thinking. Although 
Matt and others have pointed out that this scenario may not be the biggest 
memory booster in the world in some cases, there is another consideration 
that I think should be mentioned.

Reliability and Troubleshooting.

Right now, I dare say that most of you probably either are doing custom 
mod_perl coding or are using an infrastructural tools such as EmbPerl, ASP 
etc...

In these scenarios, it is relatively easy to troubleshoot your code because 
you either [a] wrote it all or [b] you are running code on top of a 
well-tested Apache::Mod_perl infrastructural application toolkit.

This is, I suspect, because as far as real-world open source applications 
are concerned, mod_perl is a bit behind. Thousands of open source CGI 
scripts exist in Perl for plain CGI. Few (if any) are in a repository to 
work with mod_perl off the bat.

However, I see this changing. Efforts such as mine, SmartWorker's etc... 
will eventually lead to a proliferation of another layer of infrastructural 
component which is above the Tool level (EmbPerl/Apache::Session/etc). In 
other words, a component level that is at the application level -- plug and 
play calendars, bbses, web shopping carts, etc.

If people reach this point on mod_perl, you may find that some modules are 
not written as well as others and so subtle side effects may be introduced 
when you start throwing everyone's code together in one huge vat of mod_perl.

If this happens, I suspect it will be a lot easier to troubleshoot problems 
that occur if you keep major application suites separate from each other. 
eg Don't run SmartWorker on the same server as EmbPerl or the same ser

Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread C. Jon Larsen


My apache processes are typically 18MB-20MB in size, with all but 500K to
1MB of that shared. We restart our servers in the middle of the nite as
part of planned maintenance, of course, but even before we did that,
and even after weeks of uptime, the percentages did not change.

We do not use Apache::Registry at all; everything is a pure handler. We
cache all data structures (lots of storable things) in the parent process
by thawing refs to the datastructures into package variables. We use no
globals, only a few package variables (4) that we access by fully
qualified package name, and they get reset on each request. 

We use Apache::DBI and MySQL, and it works perfectly other than a few
segfaults that occur once in a while. Having all of the data structures
cached (and shared !) allows us to do some neat things without having to
rely solely on sql.

On Thu, 20 Apr 2000, Stas Bekman wrote:

> On Wed, 19 Apr 2000, Joshua Chamas wrote:
> 
> > Stas Bekman wrote:
> > > 
> > > Geez, I always forget something :(
> > > 
> > > You are right. I forgot to mention that this was a scenario for the 23 Mb
> > > of unshared memory. I just wanted to give an example. Still somehow I'm
> > > almost sure that there are servers where even with sharing in place, the
> > > hypothetical scenario I've presented is quite possible.
> > > 
> > > Anyway, it's just another patent for squeezing some more juice from your
> > > hardware without upgrading it.
> > > 
> > 
> > Your scenario would be more believable with 5M unshared, even
> > after doing ones best to share everything.  This is pretty typical
> > when connecting to databases, as the database connections cannot
> > be shared, and especially DB's like Oracle take lots of RAM
> > per connection.
> 
> Good idea. 5MB sounds closer to the real case than 10Mb. I'll make the
> correction. Thanks!!! 
> 
> > I'm not sure that your scenario is worthwhile if someone does
> > a good job preloading / sharing code across the forks, and 
> > the difference will really be how much of the code gets dirty
> > while you run things, which can be neatly tuned with MaxRequests.
> 
> Agree. But not everybody knows to do that well. So the presented idea
> might still find a good use at some web shops.
> 
> > Interesting & novel approach though.  I would bet that if people
> > went down this path, they would really end up on different machines
> > per web application, or even different web clusters per application ;)
> 
> :)
> 
> 
> __
> Stas Bekman | JAm_pH--Just Another mod_perl Hacker
> http://stason.org/  | mod_perl Guide  http://perl.apache.org/guide 
> mailto:[EMAIL PROTECTED]  | http://perl.orghttp://stason.org/TULARC/
> http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
> --
> 
> 




Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Stas Bekman

On Wed, 19 Apr 2000, Gerd Knops wrote:

> Stas Bekman wrote:
> > On Wed, 19 Apr 2000, Matt Carothers wrote:
> > > On Tue, 18 Apr 2000, Stas Bekman wrote:
> > >
> > > > Let's assume that you have two different sets of scripts/code
> > > > which have a little or nothing in common at all (different
> > > > modules, no base code sharing), the basic mod_perl process
> > > > before the code have been loaded of three Mbytes and each code
> > > > base adds ten Mbytes when loaded. Which makes each process 23Mb
> > > > in size when all the code gets loaded.
> > >
> > > Can't you share most of that 23mb between the processes by
> > > pre-loading the scripts/modules in your startup.pl? I'd say the
> > > main advantage of engineering dissimilar services as if they were
> > > on separate servers is scalability rather than memory use. When a
> > > site outgrows the hardware it's on, spreading it out to multiple
> > > machines requires a lot less ankle grabbing if it was designed
> > > that way to begin with. :)
> >
> > Geez, I always forget something :(
> >
> > You are right. I forgot to mention that this was a scenario for the
> > 23 Mb of unshared memory. I just wanted to give an example. Still
> > somehow I'm almost sure that there are servers where even with
> > sharing in place, the hypothetical scenario I've presented is quite
> > possible.
> >
> > Anyway, it's just another patent for squeezing some more juice from
> > your hardware without upgrading it.
> >
> > But, sure I'll add the correction about the sharing memory which
> > drastically changes the story :)
> >
> Actually in my experience the sharing of memory doesn't work as well  
> as one would hope. While compiling perl allocates memory for code  
> and data (variables) from the same memory pools, so code and  
> variables are interlaced. Over the lifetime of a apache/mod_perl  
> child a lot of memory pages are promoted from shared to unshared that  
> contain mostly code and one or two variables... If someone with more  
> knowledge of the perl internals were to change that, this would make  
> a huge difference for mod_perl users and everybody else writing  
> daemons in perl that spawn many children.

The Perl is a language that uses weak data types, i.e. you don't specify
variable size (type) like you do in the strong typed languages like C. 
Therefore Perl uses heap memory by allocating memory on demand, rather
(unmodifiable) text and (modifiable) data memory pages, used by C.  The
latter get allocated when the program is loaded into memory before it
starts to run, the former is allocated at the run-time. 

On heap there is no separation for data and text pages, when you call
malloc it just allocates you a chunk you have asked for, this leads to the
case where the code (static in most cases) and the variables (dynamic) 
land on the same memory pages. 

I'm not sure this a very good explanation. Perl gurus are very welcome to
correct/improve my attempt to explain this.

But the main point is that that's how Perl is written, and I don't know
whether it can be changed.

__
Stas Bekman | JAm_pH--Just Another mod_perl Hacker
http://stason.org/  | mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]  | http://perl.orghttp://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
--






Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Joshua Chamas

Stas Bekman wrote:
> 
> Geez, I always forget something :(
> 
> You are right. I forgot to mention that this was a scenario for the 23 Mb
> of unshared memory. I just wanted to give an example. Still somehow I'm
> almost sure that there are servers where even with sharing in place, the
> hypothetical scenario I've presented is quite possible.
> 
> Anyway, it's just another patent for squeezing some more juice from your
> hardware without upgrading it.
> 

Your scenario would be more believable with 5M unshared, even
after doing ones best to share everything.  This is pretty typical
when connecting to databases, as the database connections cannot
be shared, and especially DB's like Oracle take lots of RAM
per connection.

I'm not sure that your scenario is worthwhile if someone does
a good job preloading / sharing code across the forks, and 
the difference will really be how much of the code gets dirty
while you run things, which can be neatly tuned with MaxRequests.

Interesting & novel approach though.  I would bet that if people
went down this path, they would really end up on different machines
per web application, or even different web clusters per application ;)

-- Joshua
_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks >> free web link monitoring   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051



Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Stas Bekman

On Wed, 19 Apr 2000, Joshua Chamas wrote:

> Stas Bekman wrote:
> > 
> > Geez, I always forget something :(
> > 
> > You are right. I forgot to mention that this was a scenario for the 23 Mb
> > of unshared memory. I just wanted to give an example. Still somehow I'm
> > almost sure that there are servers where even with sharing in place, the
> > hypothetical scenario I've presented is quite possible.
> > 
> > Anyway, it's just another patent for squeezing some more juice from your
> > hardware without upgrading it.
> > 
> 
> Your scenario would be more believable with 5M unshared, even
> after doing ones best to share everything.  This is pretty typical
> when connecting to databases, as the database connections cannot
> be shared, and especially DB's like Oracle take lots of RAM
> per connection.

Good idea. 5MB sounds closer to the real case than 10Mb. I'll make the
correction. Thanks!!! 

> I'm not sure that your scenario is worthwhile if someone does
> a good job preloading / sharing code across the forks, and 
> the difference will really be how much of the code gets dirty
> while you run things, which can be neatly tuned with MaxRequests.

Agree. But not everybody knows to do that well. So the presented idea
might still find a good use at some web shops.

> Interesting & novel approach though.  I would bet that if people
> went down this path, they would really end up on different machines
> per web application, or even different web clusters per application ;)

:)


__
Stas Bekman | JAm_pH--Just Another mod_perl Hacker
http://stason.org/  | mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]  | http://perl.orghttp://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
--




Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Gerd Knops

Stas Bekman wrote:
> On Wed, 19 Apr 2000, Matt Carothers wrote:
> > On Tue, 18 Apr 2000, Stas Bekman wrote:
> >
> > > Let's assume that you have two different sets of scripts/code
> > > which have a little or nothing in common at all (different
> > > modules, no base code sharing), the basic mod_perl process
> > > before the code have been loaded of three Mbytes and each code
> > > base adds ten Mbytes when loaded. Which makes each process 23Mb
> > > in size when all the code gets loaded.
> >
> > Can't you share most of that 23mb between the processes by
> > pre-loading the scripts/modules in your startup.pl? I'd say the
> > main advantage of engineering dissimilar services as if they were
> > on separate servers is scalability rather than memory use. When a
> > site outgrows the hardware it's on, spreading it out to multiple
> > machines requires a lot less ankle grabbing if it was designed
> > that way to begin with. :)
>
> Geez, I always forget something :(
>
> You are right. I forgot to mention that this was a scenario for the
> 23 Mb of unshared memory. I just wanted to give an example. Still
> somehow I'm almost sure that there are servers where even with
> sharing in place, the hypothetical scenario I've presented is quite
> possible.
>
> Anyway, it's just another patent for squeezing some more juice from
> your hardware without upgrading it.
>
> But, sure I'll add the correction about the sharing memory which
> drastically changes the story :)
>
Actually in my experience the sharing of memory doesn't work as well  
as one would hope. While compiling perl allocates memory for code  
and data (variables) from the same memory pools, so code and  
variables are interlaced. Over the lifetime of a apache/mod_perl  
child a lot of memory pages are promoted from shared to unshared that  
contain mostly code and one or two variables... If someone with more  
knowledge of the perl internals were to change that, this would make  
a huge difference for mod_perl users and everybody else writing  
daemons in perl that spawn many children.

Gerd



Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Stas Bekman

On Wed, 19 Apr 2000, Matt Carothers wrote:
> On Tue, 18 Apr 2000, Stas Bekman wrote:
> 
> > Let's assume that you have two different sets of scripts/code which
> > have a little or nothing in common at all (different modules, no base
> > code sharing), the basic mod_perl process before the code have been
> > loaded of three Mbytes and each code base adds ten Mbytes when
> > loaded. Which makes each process 23Mb in size when all the code gets
> > loaded.
> 
> Can't you share most of that 23mb between the processes by pre-loading 
> the scripts/modules in your startup.pl?  I'd say the main advantage of
> engineering dissimilar services as if they were on separate servers is
> scalability rather than memory use.  When a site outgrows the hardware 
> it's on, spreading it out to multiple machines requires a lot less ankle 
> grabbing if it was designed that way to begin with. :)

Geez, I always forget something :( 

You are right. I forgot to mention that this was a scenario for the 23 Mb
of unshared memory. I just wanted to give an example. Still somehow I'm
almost sure that there are servers where even with sharing in place, the
hypothetical scenario I've presented is quite possible. 

Anyway, it's just another patent for squeezing some more juice from your
hardware without upgrading it.

But, sure I'll add the correction about the sharing memory which
drastically changes the story :)

Thanks, Matt!

__
Stas Bekman | JAm_pH--Just Another mod_perl Hacker
http://stason.org/  | mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]  | http://perl.orghttp://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
--




Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-19 Thread Matt Carothers



On Tue, 18 Apr 2000, Stas Bekman wrote:

> Let's assume that you have two different sets of scripts/code which
> have a little or nothing in common at all (different modules, no base
> code sharing), the basic mod_perl process before the code have been
> loaded of three Mbytes and each code base adds ten Mbytes when
> loaded. Which makes each process 23Mb in size when all the code gets
> loaded.

Can't you share most of that 23mb between the processes by pre-loading 
the scripts/modules in your startup.pl?  I'd say the main advantage of
engineering dissimilar services as if they were on separate servers is
scalability rather than memory use.  When a site outgrows the hardware 
it's on, spreading it out to multiple machines requires a lot less ankle 
grabbing if it was designed that way to begin with. :)

- Matt




Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-18 Thread Vivek Khera

> "ELB" == Eric L Brine <[EMAIL PROTECTED]> writes:

ELB> It used to be one process for everything, or at least one application for
ELB> everything.  Then mod_perl comes in and people have started using a tiered
ELB> system (plain server + mod_perl server). Now you're talking about
ELB> individual application servers. Someday, maybe the script will load the
ELB> server instead of the other way around!

I think the word "FastCGI" comes to mind... ;-)



Re: [RFC] Do Not Run Everything on One mod_perl Server

2000-04-18 Thread Eric L. Brine


It used to be one process for everything, or at least one application for
everything.  Then mod_perl comes in and people have started using a tiered
system (plain server + mod_perl server). Now you're talking about
individual application servers. Someday, maybe the script will load the
server instead of the other way around!

  use mod_perl (8080, 3);  #  (port, #processes)

Sorry, feeling philosophical this morning. Thanks for the report.
ELB

--
Eric L. Brine  |  Chicken: The egg's way of making more eggs.
[EMAIL PROTECTED]  |  Do you always hit the nail on the thumb?
ICQ# 4629314   |  An optimist thinks thorn bushes have roses.