On Thu, Feb 18, 2010 at 07:05:47PM +0100, Stefan de Konink wrote:
> Op 18-02-10 18:23, Stefan Manegold schreef:
> >On Thu, Feb 18, 2010 at 05:46:31PM +0100, Stefan de Konink wrote:
> >>Attached is a trace of a single column select operation, with a max,
> >>avg and equality.
> >>
> >>The equality takes about 4 times more time than the avg operation.
> >>
> >> From the plan I see that the operation is executed paralel, but I do
> >>wonder why suddenly the uselect takes such a high amount of time.
> >
> >Which version of MonetDB?
> 
> CVS HEAD, 2 am.
> 
> >How configured / compiled?
> 
> --enable-assert --enable-debug
> 
> >How was the mserver5 started (which command line options were given)?
> 
> /opt/monetdb-head/bin/mserver5 --dbfarm=/home/skinkie/monetb
> --dbname=kvk --dbinit="include sphinx; include sql;" --set
> mapi_port=50001
> 
> >Any changes compared to the default monetdb5.conf?
> 
> No, not changed anything in there.
> 
> >Is the "kvk" column sorted?
> 
> No, it does include a partial sort (msb are sorted). A 32bit column
> (kvks) that is (or either should be...) ends up with between 170ms ~
> 220ms.
> 
> >Could you run the select query in the MAL debugger
> >(DEBUG select kvk from kvk where kvk = 412657690010;),
> >single step through it,
> >set the debug mask to 2097152
> >(debug 2097152)
> >just before the algebra.uselect() is executed,
> >and report the output on the server console after the algebra.uselect() has
> >been executed?
> 
> Do you mean this?

indeed

> mdb>#    _44 := 
> algebra.uselect(_40=<tmp_5710>:bat[:oid,:lng][1196532],A0=412657690010:lng);
> mdb>
> #BAT_select_(b=tmp_5710): sampling: tmp1 = BATslice(b=tmp_5710, _lo=598266, 
> _hi=598371);
> #BAT_select_(b=tmp_5710): sampling: tmp2 = BAT_select_(tmp1=tmp_5704, tl, th, 
> tail);
> #BAT_select_(b=tmp_5704): BAT_scanselect(b=tmp_5704, bn=tmp_5703, tl, th, 
> equi=1, nequi=0, lval=1, hval=1, nocheck=1);
> #seqscan_eq_lng_void_tloc_oid_vid_nocheck_noinc[if ( simple_EQ(tl ,v,lng) 
> ),v,oid_ctr,oid_ctr++;](b=tmp_5704, bn=tmp_5703, tl, th, oid_ctr=598266, 
> str_idx=0);
> #BAT_select_(b=tmp_5704): tmp_5703: hkey=1, tkey=0, hsorted=139659451564097, 
> tsorted=0.
> #BAT_select_(b=tmp_5710): BAT_hashselect(b=tmp_5710, bn=tmp_5703, tl); 
> (building hash-table on the fly)
                            
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

"It" (in fact we) choose to do a hash select, and since there is no hash
table, yet, we need to build it, which is infact more expensive than a
simple scan select for this very operation (later operation *might* then
benefit from the hash table ...):
========
$ grep -9n --color 'building hash-table on the fly' MonetDB/src/gdk/gdk_batop.mx
1210-           int nocheck = (estimate >= batcnt);
1211-
1212-           if (!preserve_order && equi && b->T->hash) {
1213-                   ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): 
BAT_hashselect(b=%s, bn=%s, tl); (using existing hash-table)\n", BATgetId(b), 
BATgetId(b), BATgetId(bn));
1214-
1215-                   bn = BAT_hashselect(b, bn, tl);
1216-           } else if (!preserve_order && equi && ATOMsize(b->ttype) > 1 && 
estimate * 100 < batcnt && batcnt * 2 * sizeof(int) < (GDK_mem_maxsize / 4)) {
1217-                   /* Build a hash-table on the fly for equi-select if the 
selectivity is low
1218-                    * and it is not too big */
1219:                   ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): 
BAT_hashselect(b=%s, bn=%s, tl); (building hash-table on the fly)\n", 
BATgetId(b), BATgetId(b), BATgetId(bn));
1220-
1221-                   bn = BAT_hashselect(b, bn, tl);
1222-           } else {
1223-                   ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): 
BAT_scanselect(b=%s, bn=%s, tl, th, equi=%d, nequi=%d, lval=%d, hval=%d, 
nocheck=%d);\n", BATgetId(b), BATgetId(b), BATgetId(bn), equi, nequi, lval, 
hval, nocheck);
1224-
1225-                   bn = BAT_scanselect(b, bn, tl, th, li, hi, equi, nequi, 
lval, hval, nocheck);
1226-           }
1227-   }
1228-   if (bn == NULL) {
========

In fact, I doubt whether investing in building a hash table is a good
decision in such cases where we do/can not know whether it will ever pay off
...

Hence, I'd propose to simply drop the choise to build a hash table on the
fly, and rather fall through to the basic scan select also in this (rare?) case.

For now, you can just locally disable/remove that alternative in the above code,
try again, and report the result.

Stefan


> #BAThash: create hash(1196532);
> #BAT_select_(b=tmp_5710): tmp_5703: hkey=1, tkey=0, hsorted=139659451564032, 
> tsorted=139659451564032.

-- 
| Dr. Stefan Manegold | mailto:[email protected] |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4199       |

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Monetdb-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Reply via email to