Re: [GENERAL] Weird behaviour in planner (PostgreSQL v 9.2.14)
Maxim Boguk writes: > [ planner changes behavior when a VALUES RTE reaches 200 elements ] The immediate cause of that is that, lacking any real statistics for the VALUES RTE, eqjoinsel_semi() will fall back to a rather dubious default estimate if it believes it's looking at a default estimate for the number of distinct entries on either side of the join clause: /* * Without MCV lists for both sides, we can only use the heuristic * about nd1 vs nd2. */ doublenullfrac1 = stats1 ? stats1->stanullfrac : 0.0; if (!isdefault1 && !isdefault2) { if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; else selec = (nd2 / nd1) * (1.0 - nullfrac1); } else ===>selec = 0.5 * (1.0 - nullfrac1); And get_variable_numdistinct() changes its mind about whether it's issuing a default estimate when the VALUES size reaches DEFAULT_NUM_DISTINCT (200): /* * With no data, estimate ndistinct = ntuples if the table is small, else * use default. We use DEFAULT_NUM_DISTINCT as the cutoff for "small" so * that the behavior isn't discontinuous. */ if (ntuples < DEFAULT_NUM_DISTINCT) return clamp_row_est(ntuples); *isdefault = true; return DEFAULT_NUM_DISTINCT; So basically, although this is alleged to be continuous behavior, the changeover from isdefault = false to isdefault = true causes a huge change in the result from eqjoinsel_semi. There are a couple of things we might choose to do about this: 1. get_variable_numdistinct doesn't currently pay any attention to what kind of RTE it's considering, but it does have access to that. We could teach it to assume that if the Var comes from a VALUES RTE, the values are all distinct regardless of the length of the VALUES list. That's effectively what it assumes for all VALUES of < 200 elements today, and it's not apparent why we shouldn't make the same assumption for longer lists. Or we could adopt some sort of nonlinear behavior that gradually reduces the assumed stadistinct fraction, but I don't know any rule for that that wouldn't be pretty ad-hoc. 2. We could change the default don't-know-anything selectivity estimate in eqjoinsel_semi to be something less crude than 0.5. But again, it's hard to say what to use instead. The first of these ideas might be something that would be sane to back-patch, but I'd be pretty hesitant about back-patching anything along the lines of #2; the scope of the effects is hard to predict. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird behaviour on a join with multiple keys
Charlie Clark <[EMAIL PROTECTED]> writes: > Am 09.03.2007 um 16:15 schrieb Tom Lane: >> There's your problem right there. The string comparison routines are >> built on strcoll(), which is going to expect UTF8-encoded data because >> of the LC_COLLATE setting. If there are any high-bit-set LATIN1 >> characters in the database, they will most likely look like invalid >> encoding to strcoll(), and on most platforms that causes it to behave >> very oddly. You need to keep lc_collate (and lc_ctype) in sync with >> server_encoding. > That does indeed seem to have been the problem even though the > examples I was looking at were all using plain ASCII characters. Glad > to know it wasn't a bug and to have learned something new. Well, it *is* a bug: we really shouldn't let you select incompatible locale and encoding settings. This gotcha has been known for a long time, but it's not clear that there's a bulletproof, portable way to determine which encoding a particular locale setting implies ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Weird behaviour on a join with multiple keys
Am 09.03.2007 um 16:15 schrieb Tom Lane: psytec=# show lc_collate; lc_collate - de_DE.UTF-8 (1 row) psytec=# show server_encoding; server_encoding - LATIN1 (1 row) There's your problem right there. The string comparison routines are built on strcoll(), which is going to expect UTF8-encoded data because of the LC_COLLATE setting. If there are any high-bit-set LATIN1 characters in the database, they will most likely look like invalid encoding to strcoll(), and on most platforms that causes it to behave very oddly. You need to keep lc_collate (and lc_ctype) in sync with server_encoding. That does indeed seem to have been the problem even though the examples I was looking at were all using plain ASCII characters. Glad to know it wasn't a bug and to have learned something new. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Weird behaviour on a join with multiple keys
Charlie Clark <[EMAIL PROTECTED]> writes: > psytec=# show lc_collate; > lc_collate > - > de_DE.UTF-8 > (1 row) > psytec=# show server_encoding; > server_encoding > - > LATIN1 > (1 row) There's your problem right there. The string comparison routines are built on strcoll(), which is going to expect UTF8-encoded data because of the LC_COLLATE setting. If there are any high-bit-set LATIN1 characters in the database, they will most likely look like invalid encoding to strcoll(), and on most platforms that causes it to behave very oddly. You need to keep lc_collate (and lc_ctype) in sync with server_encoding. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Weird behaviour on a join with multiple keys
Am 09.03.2007 um 05:30 schrieb Tom Lane: Charlie Clark <[EMAIL PROTECTED]> writes: I'm getting unexpected results on a query which involves joining two tables on two common variables (firstname and lastname). That looks like it should work. Given that you describe the columns as "names" I'm supposing they are of textual datatypes. Maybe you have a messed-up encoding or locale situation that is causing the sorts to not work properly? What PG version is this exactly, on what platform, and what do "show lc_collate" and "show server_encoding" say? I'm running PostgreSQL 8.1.4 on Mac OS X psytec=# show lc_collate; lc_collate - de_DE.UTF-8 (1 row) psytec=# show server_encoding; server_encoding - LATIN1 (1 row) I thought that it might be something to do with the encoding - one of the tables has just been imported and I had some "fun" doing that but it "looks" okay now. Is there a way of checking? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Weird behaviour on a join with multiple keys
Charlie Clark <[EMAIL PROTECTED]> writes: > I'm getting unexpected results on a query which involves joining two > tables on two common variables (firstname and lastname). That looks like it should work. Given that you describe the columns as "names" I'm supposing they are of textual datatypes. Maybe you have a messed-up encoding or locale situation that is causing the sorts to not work properly? What PG version is this exactly, on what platform, and what do "show lc_collate" and "show server_encoding" say? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Weird behaviour on a join with multiple keys
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/08/07 16:09, Charlie Clark wrote: > Hi, > > I'm getting unexpected results on a query which involves joining two > tables on two common variables (firstname and lastname). > > This is the basic query: > > SELECT table1.lastname, table1.firstname > FROM table1 > INNER JOIN table2 ON > (table2.name = table1.name > AND > table2.vorname = table1.vorname) > > This is returning many rows fewer than I expect and is ignoring a lot > where table1.firstname = table2.firstname AND table1.lastname = > table2.lastname. Huh? Why should you? You're not joining on firstname and lastname. What happens if you do it like this: SELECT T1.LASTNAME, T2.FIRSTNAME FROM TABLE1 T1, TABLE2 T2 WHERE T1.NAME = T2.NAME AND T1.VORNAME = T2.VORNAME; -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFF8N1zS9HxQb37XmcRAp3wAKCRJ1kuoqbc8YPOZwx+53+JRqvD/ACfVvFy zK8u0+RYuMiBxEnURVc74Jc= =9oxj -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Weird behaviour on a join with multiple keys
What happens if you do an outer join instead of an inner join? Charlie Clark wrote: Hi, I'm getting unexpected results on a query which involves joining two tables on two common variables (firstname and lastname). This is the basic query: SELECT table1.lastname, table1.firstname FROM table1 INNER JOIN table2 ON (table2.name = table1.name AND table2.vorname = table1.vorname) This is returning many rows fewer than I expect and is ignoring a lot where table1.firstname = table2.firstname AND table1.lastname = table2.lastname. In fact when I extend the query by a WHERE clause such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are not returned by the original query. I'm not very au fait with the inner workings of PostgreSQL but EXPLAIN does not seem, to me at least, to provide an explanation for the missing results. "Merge Join (cost=1987.97..2121.24 rows=34 width=22)" " Merge Cond: (("outer"."?column3?" = "inner"."?column3?") AND ("outer"."?column4?" = "inner"."?column4?"))" " -> Sort (cost=364.97..375.99 rows=4409 width=22)" "Sort Key: (table1.lastname)::text, (table1.firstname)::text" "-> Seq Scan on table1 (cost=0.00..98.09 rows=4409 width=22)" " -> Sort (cost=1623.00..1667.00 rows=17599 width=21)" "Sort Key: (table2.lastname)::text, (table2.firstname)::text" "-> Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)" Am I missing something big and obvious here? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] weird behaviour
sorry, i forgot some version number :) [mat@biniac ~]$ psql --version psql (PostgreSQL) 7.1.2 contains readline, history support Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group Portions Copyright (c) 1996 Regents of the University of California Read the file COPYRIGHT or use the command \copyright to see the usage and distribution terms. [mat@biniac ~]$uname -rs FreeBSD 4.3-RELEASE[mat@biniac ~]$ mat ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly