Re: [Opensim-dev] Performance optimization of complex ScienceSim regions

2009-08-31 Thread James Stallings II
wow indeed!

GO INTEL!!

On Mon, Aug 31, 2009 at 7:19 AM, Dr Scofield wrote:

>
> Lake, Dan wrote:
> > A few months back, we analyzed and proposed optimizations to scripting
> and timers on homogeneous regions which were dynamically created with up to
> 40,000 simple cubes and physics disabled. Considerable reductions in scene
> creation time and cpu utilization were achieved.
> >
> > The regions running on ScienceSim at this time have few scripts (less
> than 1% of objects), have large linked sets,  are loaded at startup from a
> database, and most have ODE physics enabled although very few objects are
> physical. This represents a completely different workload for OpenSim from
> our previous analysis. Some of these ScienceSim regions are extremely
> complex with between 60,000 and 140,000 prims. We have noticed that startup
> on these regions can take 45 minutes or more and consume 50% of a CPU once
> they reach a steady state with no users connected. We did not expect that
> high utilization since script counts were below 200 and no users were
> connected.
> >
> > We have identified 3 areas of optimization.
> >
> > 1. On startup, the region must be loaded from the database and all region
> modules must be started to prepare the region to run. On the largest
> ScienceSim regions, this step takes 20 minutes before the command prompt
> appears. We refer to this phase as the startup time.
> >
> > 2. The appearance of the OpenSim command prompt indicates that the
> Heartbeat thread has started up. Commands can be issued such as "create
> user" or "show stats", but the Heartbeat thread itself will remain in its
> first "beat" for up to 40 more minutes. During this time, users cannot
> connect and the stats are all listed as 0 and do not update. We refer to
> this phase as "first heartbeat time".
> >
> > 3. Once the region has completely started up, but before any users have
> connected, we notice that the CPU utilization seems unusually high for the
> amount of "action" in the scene. Less than 200 scripted or physical objects
> should not represent a high load, but the 140,000 static prims somehow
> consumed 50% of a CPU.
> >
> > Analysis
> >
> [lots of very good stuff]
>
> wow!
>
> --
> dr dirk husemann  virtual worlds research  ibm zurich research lab
> SL: dr scofield  drscofi...@xyzzyxyzzy.net  http://xyzzyxyzzy.net/
> RL: h...@zurich.ibm.com - +41 44 724 8573 - 
> http://www.zurich.ibm.com/~hud/
> ___
> Opensim-dev mailing list
> Opensim-dev@lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/opensim-dev
>



-- 
===
http://osgrid.org
http://del.icio.us/SPQR
http://twitter.com/jstallings2
http://www.linkedin.com/pub/5/770/a49
___
Opensim-dev mailing list
Opensim-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/opensim-dev


Re: [Opensim-dev] Performance optimization of complex ScienceSim regions

2009-08-31 Thread Dr Scofield

Lake, Dan wrote:
> A few months back, we analyzed and proposed optimizations to scripting and 
> timers on homogeneous regions which were dynamically created with up to 
> 40,000 simple cubes and physics disabled. Considerable reductions in scene 
> creation time and cpu utilization were achieved. 
> 
> The regions running on ScienceSim at this time have few scripts (less than 1% 
> of objects), have large linked sets,  are loaded at startup from a database, 
> and most have ODE physics enabled although very few objects are physical. 
> This represents a completely different workload for OpenSim from our previous 
> analysis. Some of these ScienceSim regions are extremely complex with between 
> 60,000 and 140,000 prims. We have noticed that startup on these regions can 
> take 45 minutes or more and consume 50% of a CPU once they reach a steady 
> state with no users connected. We did not expect that high utilization since 
> script counts were below 200 and no users were connected.
> 
> We have identified 3 areas of optimization.
> 
> 1. On startup, the region must be loaded from the database and all region 
> modules must be started to prepare the region to run. On the largest 
> ScienceSim regions, this step takes 20 minutes before the command prompt 
> appears. We refer to this phase as the startup time. 
> 
> 2. The appearance of the OpenSim command prompt indicates that the Heartbeat 
> thread has started up. Commands can be issued such as "create user" or "show 
> stats", but the Heartbeat thread itself will remain in its first "beat" for 
> up to 40 more minutes. During this time, users cannot connect and the stats 
> are all listed as 0 and do not update. We refer to this phase as "first 
> heartbeat time".
> 
> 3. Once the region has completely started up, but before any users have 
> connected, we notice that the CPU utilization seems unusually high for the 
> amount of "action" in the scene. Less than 200 scripted or physical objects 
> should not represent a high load, but the 140,000 static prims somehow 
> consumed 50% of a CPU.
> 
> Analysis
> 
[lots of very good stuff]

wow!

-- 
dr dirk husemann  virtual worlds research  ibm zurich research lab
SL: dr scofield  drscofi...@xyzzyxyzzy.net  http://xyzzyxyzzy.net/
RL: h...@zurich.ibm.com - +41 44 724 8573 - http://www.zurich.ibm.com/~hud/
___
Opensim-dev mailing list
Opensim-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/opensim-dev


[Opensim-dev] Performance optimization of complex ScienceSim regions

2009-08-31 Thread Lake, Dan
A few months back, we analyzed and proposed optimizations to scripting and 
timers on homogeneous regions which were dynamically created with up to 40,000 
simple cubes and physics disabled. Considerable reductions in scene creation 
time and cpu utilization were achieved. 

The regions running on ScienceSim at this time have few scripts (less than 1% 
of objects), have large linked sets,  are loaded at startup from a database, 
and most have ODE physics enabled although very few objects are physical. This 
represents a completely different workload for OpenSim from our previous 
analysis. Some of these ScienceSim regions are extremely complex with between 
60,000 and 140,000 prims. We have noticed that startup on these regions can 
take 45 minutes or more and consume 50% of a CPU once they reach a steady state 
with no users connected. We did not expect that high utilization since script 
counts were below 200 and no users were connected.

We have identified 3 areas of optimization.

1. On startup, the region must be loaded from the database and all region 
modules must be started to prepare the region to run. On the largest ScienceSim 
regions, this step takes 20 minutes before the command prompt appears. We refer 
to this phase as the startup time. 

2. The appearance of the OpenSim command prompt indicates that the Heartbeat 
thread has started up. Commands can be issued such as "create user" or "show 
stats", but the Heartbeat thread itself will remain in its first "beat" for up 
to 40 more minutes. During this time, users cannot connect and the stats are 
all listed as 0 and do not update. We refer to this phase as "first heartbeat 
time".  

3. Once the region has completely started up, but before any users have 
connected, we notice that the CPU utilization seems unusually high for the 
amount of "action" in the scene. Less than 200 scripted or physical objects 
should not represent a high load, but the 140,000 static prims somehow consumed 
50% of a CPU.

Analysis

Startup

During startup, we identified two issues and provide patches which reduce a 20 
minute startup to 4 minutes, 30 seconds. 

First, the LoadObjects function queries the database for each prim in the 
region separately to determine if it has any inventory (primitems). This 
results in 140,000 unnecessary database queries since none have any inventory. 
We replaced the query with one to get a list of only the region prims which 
have inventory and then requesting the inventory only for those that we know 
have items. The patch is for MySQL only.
0001-LoadItems-from-DB-only-for-prims-with-inventory.patch
MANTIS: http://opensimulator.org/mantis/view.php?id=4077

Also during startup, there is a lot of linear searching through an O(n) list in 
ODE used to taint each object as it is added to the physical scene. We replace 
the List with a HashSet and add System.Core to prebuild.xml.
0001-Optimize-startup.-ODE-taint-list-changed-from-List-w.patch
MANTIS: http://opensimulator.org/mantis/view.php?id=4078

First Heartbeat

During the first heartbeat, most cycles were again spent in ODE running some 
seemingly simple mesh functions which convert lists of vertices and triangles 
into lists of indices. The function used an O(n) IndexOf operator on lists of 
vertices, some with thousands of entries. For only 300 prims, 78 million calls 
were made to Object::Equals(Vertex, Vertex). We replaced the public lists with 
private Dictionaries and encapsulated the functionality within Mesh. 
Meshmerizer no longer adds Vertexes, but only Triangles to a Mesh.  
0001-Optimized-ODE-initialization-of-meshed-by-changing-v.patch
MANTIS: http://opensimulator.org/mantis/view.php?id=4079

We found that in ODEPrim.cs, there is a Sleep(10) statement during the meshing 
of each prim. By removing this sleep, we see erratic ODE behavior. In the best 
case, we see first heartbeat time reduced by 75%, in the worst case, ODE 
crashes or consumes all available cores until the process is killed. This patch 
is experimental and not recommended for commit. The Sleep was added in February 
2008 by Teravus to handle a race condition. We are proposing that if the race 
condition can be identified, and the Sleep removed, that first heartbeat 
performance of ODE may be considerably better. 
0001-Optimize-ODE-mesh-by-removing-sleep.-On-a-region-wit.patch
MANTIS: http://opensimulator.org/mantis/view.php?id=4080

Steady State

We identified 3 issues in the steady state for a region with 140,000 prims. 
Patches to 2 of these issues are submitted here. During each heartbeat, the 
UpdateEntityMovement function is called on SceneGraph which calls 
UpdateMovement on every Entity in the scene. For SceneObjectGroups, this 
function is null, but it cannot be inlined and so the empty context is created 
almost 1 million times per second. By eliminating the empty calls and instead 
only calling UpdateMovement for ScenePresences, steady state CPU was reduced 
35%.
0001-Optimize-Heartbeat-