Re: [AOLSERVER] Roadmap - 4.6 and beyond

2012-09-28 Thread Gustaf Neumann
As a contributor to aolserver and naviserver i want to add a 
few comments
- we are running between 30 and 50 servers for various 
projects,
   i would say that 70% are naviserver right now.
- the reason we switched from aolserver to naviserver was that
   with our load-pattern (using OpenACS) we experienced some 
problems:
   * up to about 1000 concurrent users aolserver was 
perfectly fine
   * above this, we saw crashes, running out of resources 
(connection threads),
 memory growth, etc., thread lockups, micro-lockups for 
a few seconds.
 Some of these lead to contributions to aolserver i did 
in the past
   * to pinpoint the problem we moved to Zoran's setup
 (tcl version, naviserver), that went though heaving 
testing on his side
 and was rock-stable
   * some of our problems disappeared/changed, some not 
(burst creation
 of theads,...). We have quite different load patterns 
than zoran.
   * we found sources for our problems at various places 
(server, tcl, ...
 machine architecture) depending as well on e.g. tcl 
versions etc.

By now, most of the problems are gone, we are using 
NaviServer in
production since more than two years. A summary is on the
link referenced below. Even more recently, we exchanged the
hardware to a more mainstream one (this improved the
performance by a factor of 3-4!). The fact that e.g. the 
resource
consumption went down, helped a lot to run on a much cheaper 
machine
(memory consumption, max number of connection threads went
from 80 to 30, etc.).

Btw., in this process of moving from POWER to Intel apparently
the biggest  source of our crashes went away. The way Tcl 
handles
thread-local storage (Tcl 8.5) seems to cause under heavy load
race conditions, which lead to crashes in otherwise stable
code-pieces (e.g. regexp). I rewrote some of the usages of the
tls infrastructure in tcl to use GCC's non-standard tls 
handling via
  __thread, then the problem went from regexp to other
places using tls). The problem was most likely dirty reads 
in tcl +
mutex handling + POWER + gcc (from rhel). Tcl 8.6 is
supposed to be better in this regard.

For the changes in naviserver, see [1]. With the recent 
versions of
naviserver/tcl 8.5/libthread the server runs in a stable 
memory size, without
the need for daily reboots (although a reboot has some nice 
self-healing
properties for nsvs, etc.). See [2] for a statistics from a 
machine with
two naviservers with different configuration running (alice, 
nm).
Among other things one can see the stable rss and vsizes of 
the servers
over the last few months.

Aolserver is in terms of memory leaks not so bad either. One 
can see
on [3] the statistics from openacs.org and 
translate.openacs.org which
is runing aolserver 4.5.1. One can see, where we fixed an 
application
leak in May [4].

[1] 
https://bitbucket.org/naviserver/naviserver/src/5df3b1cb9ea6/NEWS
[2] 
http://alice.wu.ac.at/munin/wu.ac.at/alice.wu.ac.at/index.html#naviserver
[3] 
http://openacs.org/munin/localdomain/localhost.localdomain/index.html#naviserver
[4] 
http://openacs.org/munin/localdomain/localhost.localdomain/naviserver_translate_memsize.html

Concerning the comments below
- the documentation of naviserver is at least par with 
aolserver
   (the man pages are quite good).
- for me, the the biggest pain is the aolserver-naviserver
   config file conversion, but the actual documented
   config files on bitbucket contain now all values read 
from the
   server with the default values.
- porting all the changes from naviserver into aolserver is
   much more work than the other way round. i have no
   problem with the coexistence of naviserver and aolserver,
   providing urgent changes to both servers (as i have done 
in the past).
- both aolserver and naviserver are stable and mature (having
   advantages and disadvantages), the people running large sites
   are rather conservative. Having alternatives is rather a 
selling
   argument. If e.g. aolserver is dropping windows support,
   naviserver can continue it (or vice versa).

-gustaf neumann


On 27.09.12 23:25, John Buckman from BookMooch wrote:
 Naviserver has added a lot of interesting features, and appears to be fairly 
 mature.

 I would have probably switched to Naviserver two years ago if they had 
 documented some of their changes.  The quantity of the contributions, and the 
 interesting nature of many of them, make me feel that Naviserver is far from 
 end of life.

 When I switched (temporarily) to naviserver I found enough things that didn't 
 work like aolserver, yet were totally undocumented, that the experience was 
 very frustrating and I went back to aolserver.  I was spending too much time 
 reading C source code to figure things out.

 So... my personal vote for an aolserver v5 would be merging in lots of the 
 naviserver code changes into aolserver.  There's a lot of bang-for-our-buck 
 there.  Or, simply running with naviserver, if we (the aolserver community) 
 can get it 

Re: [AOLSERVER] Suggestions for a future aolserver

2012-09-28 Thread Andrew Piskorski
On Thu, Sep 27, 2012 at 02:44:16PM -0700, John Buckman wrote:

 The other area where state of the art thinking is occurring, is in
 scaling web sites to many, many machines.

Reputedly, the best toolkit for building that sort of stuff is Erlang
and its OTP libraries.  (Asynchronous message passing, hot code
reloading, fault tolerance, etc.)  I've also long wondered if its
Mnesia distributed RDBMS is any good.

I was quite happy with AOLserver, Tcl, and a decent single-box RDBMS
like Oracle or PostgreSQL.  But if I intended to build stuff for
massive scale out, ideally I'd want to first hack seriously with
Erlang for a year or so first to really understand what it's good for,
what it's not, clever approaches the Erlang community came up with,
etc.  If anybody here as done something along those lines and can
report back, compare/contrast to the AOLserver / Tcl / OpenACS world,
that could be really interesting!

A maybe related approach from the enterprise software world
(probably meaning giant investment banks), is Message-Oriented
Middleware; this is also asynchronous message passing.  This guy (Kirk
Wylie, founder of OpenGamma) seems to know what he's talking about,
and recommended the Hohpe book below:

  http://kirkwylie.blogspot.com/

  
http://www.amazon.com/Enterprise-Integration-Patterns-Designing-Deploying/dp/0321200683/
  Enterprise Integration Patterns: Designing, Building, and Deploying Messaging 
Solutions
  by Gregor Hohpe, Bobby Woolf

 Ousterhout recently wrote a paper about RAMCloud, which would be very helpful 
 on aolserver:
 http://cacm.acm.org/magazines/2011/7/109885-the-case-for-ramcloud/fulltext

Useful I guess, but that seems pretty low level.  I'd rather look into
shared-nothing parallel RDBMSs.  These were mostly intended for
analytic (date warehouse) loads, but there are OLTP-oriented designs
available now too (e.g. VoltDB, which also happens to be RAM-only).

I don't know how well they scale across modes.  Years ago, I once
heard that, Teradata was only intended to scale to no more than 100 or
so fat nodes.  But even if modern shared-nothing OLTP systems scale no
better, that's still about 100 times better than you could do with a
typical single server Oracle or PostgreSQL installation, which should
give you a lot of helpful leeway before you HAVE to develop and adopt
database sharding and/or more specialized tools, like the Digital City
team did.

The recent Calvin OLTP research is also interesting.  Like RamCloud,
this is a lower-level tool, not a complete system.  Unlike RamCloud
though, it seems like an actual tested advance in the state-of-the
art, not just a conceptual description of a cool piece of
infrastructure someone might want to build:

  
http://dbmsmusings.blogspot.com/2012/05/if-all-these-new-dbms-technologies-are.html

-- 
Andrew Piskorski a...@piskorski.com

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk


Re: [AOLSERVER] aolserver startup crash

2012-09-28 Thread Jeff Rogers
Gustaf Neumann wrote:
 On 28.09.12 12:57, Wolfgang Winkler wrote:
 Hi!
 The only problem we have is similar to Johns occasional crashes. But
 I have 2 test servers and 1 pre production server, where AOLserver
 crashes with  Fatal: received fatal signal 11 or alloc: invalid
 block: 0xc595660: b0 6, somtimes even 4 or 5 times while starting
 up. Nearly the same code works for other very similar installations
 with much higher load.
 are you using at the backend SSL?
 do you have a backtrace from gdb?
 We are using SSL a lot, but this is handled by nginx.
 should have asked more precisely ... handled in aolserver.

 This are some gdb backtraces:
 The crash happens in CallCommandTraces() ?
 which version of tcl 8.5 is this?

 Since you say, this happens during bootup, i assume this happens
 before the first request. What is your minthreads setting?

 -gustaf neumann


The stack trace shows it down inside NsConnThread, which suggests it's 
not during server startup.  The location in NsTclICtlObjCmd is in 
'ns_ictl update', in the case where the init script has been updated. 
My first guess would be some weirdness with ns_eval.

-J

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk