Hi Martin and others,

I just tested what part the Pathfinder code generation plays and generated MIL code for the Aug2008 (0.24), the Nov2008, and the Feb2009 release branches. I ran all queries using the newest stable version (Feb2009) on Mac OS X.

The observations are:
* The problem with gdk_heap.mx, mmap, and Mac OS X still resides (all queries run in 10 seconds instead of 2 seconds)---Peter knows what I'm talking about.
* Like Nils reported the queries are getting slower.
* The main performance decrease in my scenario is the document loading.
* The problem does not stem from Pathfinder's MIL code generation.

For more details see the attached file...

q0.aug2008.out         |q0.nov2008.out         |q0.feb2009.out         
Shred     472.523 msec |Shred     232.504 msec |Shred     229.330 msec 
Query    8910.591 msec |Query    7416.427 msec |Query    7328.854 msec 
Print       1.112 msec |Print       0.375 msec |Print       0.502 msec 
                       |                       |                       
Shred    5714.028 msec |Shred    5618.727 msec |Shred    5754.495 msec 
Query    8440.298 msec |Query   12710.321 msec |Query    7253.548 msec 
Print       0.298 msec |Print       0.294 msec |Print       0.275 msec 
                       |                       |                       
Shred    9539.667 msec |Shred   16729.746 msec |Shred    9003.976 msec 
Query   10319.638 msec |Query   10528.145 msec |Query    8899.959 msec 
Print       0.603 msec |Print       0.307 msec |Print       0.829 msec 
                       |                       |                       
Shred   11082.784 msec |Shred   10780.859 msec |Shred   10527.369 msec 
Query   10123.990 msec |Query    9661.684 msec |Query    9794.755 msec 
Print       0.378 msec |Print       0.295 msec |Print       0.292 msec 
                       |                       |                       
Shred   10419.874 msec |Shred   10272.143 msec |Shred    9725.076 msec 
Query    9559.761 msec |Query    9089.654 msec |Query    9240.117 msec 
Print       0.359 msec |Print       0.299 msec |Print       0.314 msec 


q1.aug2008.out         |q1.nov2008.out         |q1.feb2009.out         
Shred     399.395 msec |Shred     396.367 msec |Shred     388.671 msec
Query   11097.419 msec |Query    9754.142 msec |Query   11086.588 msec 
Print       0.657 msec |Print       0.560 msec |Print       0.959 msec 
                       |                       |
Shred    5746.966 msec |Shred    6401.255 msec |Shred    5735.158 msec
Query   10093.130 msec |Query   10372.052 msec |Query   11100.466 msec
Print       0.341 msec |Print       0.803 msec |Print       0.519 msec
                       |                       |
Shred   10141.365 msec |Shred   10549.271 msec |Shred    9458.204 msec
Query   11842.561 msec |Query   12012.305 msec |Query   12793.418 msec
Print       0.304 msec |Print       0.312 msec |Print       0.284 msec
                       |                       |
Shred   10849.639 msec |Shred   10504.439 msec |Shred   11353.584 msec
Query   12063.871 msec |Query   11690.745 msec |Query   10453.142 msec
Print       0.319 msec |Print       0.767 msec |Print       0.283 msec
                       |                       |
Shred    9661.377 msec |Shred   10630.150 msec |Shred   10004.209 msec
Query   11429.812 msec |Query   10841.888 msec |Query    9518.676 msec
Print       0.390 msec |Print       0.333 msec |Print       0.284 msec


BTW: For todays' head version the results are even worse...

Jan

On Mar 9, 2009, at 18:08, Martin Kersten wrote:

For all interested. Indeed there are performance differences
between the various releases. Some can be traced back to
functional enhancements, others are a result from internal
administrative activities.

Recent experiments with the TPC-H scale-factor 2 on Feb 2009
branch show a performance degradation compared to Aug 2008,
as reported on the website.

It appears that some low-level actions related to allocation
of BATs and their management in memory-scarce situations are
debet to this situation.

Solutions are integrated with the HEAD, and may (depending
on our resources) be back propagated into a bugfix release
of the Feb 2009 version.

Nils Grimsmo wrote:
On Wed, Mar 04, 2009 at 11:08:40PM +0100, Jan Rittinger wrote:
Hi Nils,

I just ran your queries with the latest (not yet announced) Feb2009
release (http://monetdb.cwi.nl/downloads/sources/Feb2009/) and
received an answer in 1.5 (Q1) and 2.5 (Q2) seconds. If you still have
problems with the new version, then please let us know.

Thank you for your answer, Jan. Feb2009 is indeed faster than Nov2008,
but on my computer it is still slower than Aug2008.  I also see some
strange and unfavorable performance characteristics on subsequent queries
for Nov2008 and Feb2009 (see below).


Aug2008:
# MonetDB Server v4.24.0
# based on GDK   v1.24.0
# PF/Tijah module v0.5.0 loaded. http://dbappl.cs.utwente.nl/pftijah
# MonetDB/XQuery module v0.24.0 loaded (default back-end is 'algebra')

Nov2008-SP2:
# MonetDB Server v4.26.4
# based on GDK   v1.26.4
# PF/Tijah module v0.9.0 loaded. http://dbappl.cs.utwente.nl/pftijah
# MonetDB/XQuery module v0.26.4 loaded (default back-end is 'algebra')

Feb2009:
# MonetDB Server v4.28.0
# Based on GDK   v1.28.0
# PF/Tijah module v0.9.0 loaded. http://dbappl.cs.utwente.nl/pftijah
# MonetDB/XQuery module v0.28.0 loaded (default back-end is 'algebra')


I run the queries multiple times in different scenarios.

A - Have just indexed the document, first run.
B - Second run (subsequent have similar timing).
C - Restart the server (Mserver), then first run.
D - Second run (subsequent have similar timing).


Query Q0:
    Aug2008    Nov2008    Feb2009
A       1101       3687       1760
B       1031       4510       3015
C       1350       5216       3390
D       1035      12620       9533


Query Q1:
    Aug2008    Nov2008    Feb2009
A       2161      15119       3013
B       2099      19292       4072
C       2526      18523       4567
D       2117      42555      10602


This seems very strange to me. The timings make sense for Aug2008, where the query is slightly slower right after restarting the server (C). For Nov2008 and Feb2009, the second (and subsequent) runs are slower than the
first.  How can this be?  It can make sense for the first run after
restarting the server (C) to be slower (reading stuff from disk etc.), but why is the second (D) terribly slower? If I just keep running the query,
the timings are similar to D.

Note: If I start mixing Q0 and Q1 after step D, they are both as slow as
in step D.


I hope this feedback is helpful.  Is there something strange with my
setup, or is this a "bug"? (My timings in step (A) seem similar to Jan's
timings).


If I want to compare MonetDB/XQuery to other implementations in a
scientific paper, I typically want to warm up the system, then run the query multiple times to get an average timing. It is kind of inconvenient
not to be able to close down Mserver between experiments...


P.S.: The E-Mail subject seems slightly off topic here :)

Yes, thought I'd avoid touching the mouse to copy the email address. Cut
away In-Reply-To:, but forgot to change Subject:...


Thank you for your assistance!


Klem fra Nils

On Mar 4, 2009, at 16:30, Nils Grimsmo wrote:

Hi, I just upgraded from the Augst to the Noveber super-ball, and the
performance has worsened badly.

Example queries on dblp.xml (441 MB):

Q0: count(/dblp//author[text()="Michael Stonebraker"])
Q1: count(/dblp/*/author[text()="Michael Stonebraker"])

Query time in milliseconds:

     August    November
Q0     1100        4867
Q1     3993       17999

I have compiled with --enable-optimise both times.  I query with:

mclient --language=xquery --algebra --time < $QUERYFILE

Is this performance degradation expected?  If so, why?


BTW:  Is there any way of finding how much disk space a collection
uses?


Thank you for contributing free software!


Klem fra Nils

--
Jan Rittinger
Lehrstuhl Datenbanken und Informationssysteme
Wilhelm-Schickard-Institut für Informatik
Eberhard-Karls-Universität Tübingen

http://www-db.informatik.uni-tuebingen.de/team/rittinger

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Monetdb-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Reply via email to