Re: [GENERAL] Weird behaviour in planner (PostgreSQL v 9.2.14)

2015-12-11 Thread Tom Lane
Maxim Boguk  writes:
> [ planner changes behavior when a VALUES RTE reaches 200 elements ]

The immediate cause of that is that, lacking any real statistics for the
VALUES RTE, eqjoinsel_semi() will fall back to a rather dubious default
estimate if it believes it's looking at a default estimate for the number
of distinct entries on either side of the join clause:

/*
 * Without MCV lists for both sides, we can only use the heuristic
 * about nd1 vs nd2.
 */
doublenullfrac1 = stats1 ? stats1->stanullfrac : 0.0;

if (!isdefault1 && !isdefault2)
{
if (nd1 <= nd2 || nd2 < 0)
selec = 1.0 - nullfrac1;
else
selec = (nd2 / nd1) * (1.0 - nullfrac1);
}
else
===>selec = 0.5 * (1.0 - nullfrac1);

And get_variable_numdistinct() changes its mind about whether it's issuing
a default estimate when the VALUES size reaches DEFAULT_NUM_DISTINCT (200):

/*
 * With no data, estimate ndistinct = ntuples if the table is small, else
 * use default.  We use DEFAULT_NUM_DISTINCT as the cutoff for "small" so
 * that the behavior isn't discontinuous.
 */
if (ntuples < DEFAULT_NUM_DISTINCT)
return clamp_row_est(ntuples);

*isdefault = true;
return DEFAULT_NUM_DISTINCT;

So basically, although this is alleged to be continuous behavior, the
changeover from isdefault = false to isdefault = true causes a huge
change in the result from eqjoinsel_semi.

There are a couple of things we might choose to do about this:

1. get_variable_numdistinct doesn't currently pay any attention to what
kind of RTE it's considering, but it does have access to that.  We could
teach it to assume that if the Var comes from a VALUES RTE, the values
are all distinct regardless of the length of the VALUES list.  That's
effectively what it assumes for all VALUES of < 200 elements today, and
it's not apparent why we shouldn't make the same assumption for longer
lists.  Or we could adopt some sort of nonlinear behavior that gradually
reduces the assumed stadistinct fraction, but I don't know any rule for
that that wouldn't be pretty ad-hoc.

2. We could change the default don't-know-anything selectivity estimate in
eqjoinsel_semi to be something less crude than 0.5.  But again, it's hard
to say what to use instead.

The first of these ideas might be something that would be sane to
back-patch, but I'd be pretty hesitant about back-patching anything
along the lines of #2; the scope of the effects is hard to predict.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Tom Lane
Charlie Clark <[EMAIL PROTECTED]> writes:
> Am 09.03.2007 um 16:15 schrieb Tom Lane:
>> There's your problem right there.  The string comparison routines are
>> built on strcoll(), which is going to expect UTF8-encoded data because
>> of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
>> characters in the database, they will most likely look like invalid
>> encoding to strcoll(), and on most platforms that causes it to behave
>> very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
>> server_encoding.

> That does indeed seem to have been the problem even though the  
> examples I was looking at were all using plain ASCII characters. Glad  
> to know it wasn't a bug and to have learned something new.

Well, it *is* a bug: we really shouldn't let you select incompatible
locale and encoding settings.  This gotcha has been known for a long
time, but it's not clear that there's a bulletproof, portable way to
determine which encoding a particular locale setting implies ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Charlie Clark


Am 09.03.2007 um 16:15 schrieb Tom Lane:


psytec=# show lc_collate;
lc_collate
-
de_DE.UTF-8
(1 row)



psytec=# show server_encoding;
server_encoding
-
LATIN1
(1 row)


There's your problem right there.  The string comparison routines are
built on strcoll(), which is going to expect UTF8-encoded data because
of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
characters in the database, they will most likely look like invalid
encoding to strcoll(), and on most platforms that causes it to behave
very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
server_encoding.


That does indeed seem to have been the problem even though the  
examples I was looking at were all using plain ASCII characters. Glad  
to know it wasn't a bug and to have learned something new.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Tom Lane
Charlie Clark <[EMAIL PROTECTED]> writes:
> psytec=# show lc_collate;
> lc_collate
> -
> de_DE.UTF-8
> (1 row)

> psytec=# show server_encoding;
> server_encoding
> -
> LATIN1
> (1 row)

There's your problem right there.  The string comparison routines are
built on strcoll(), which is going to expect UTF8-encoded data because
of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
characters in the database, they will most likely look like invalid
encoding to strcoll(), and on most platforms that causes it to behave
very oddly.  You need to keep lc_collate (and lc_ctype) in sync with 
server_encoding.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Charlie Clark


Am 09.03.2007 um 05:30 schrieb Tom Lane:


Charlie Clark <[EMAIL PROTECTED]> writes:

I'm getting unexpected results on a query which involves joining two
tables on two common variables (firstname and lastname).


That looks like it should work.  Given that you describe the  
columns as

"names" I'm supposing they are of textual datatypes.  Maybe you have a
messed-up encoding or locale situation that is causing the sorts to  
not

work properly?  What PG version is this exactly, on what platform, and
what do "show lc_collate" and "show server_encoding" say?


I'm running PostgreSQL 8.1.4 on Mac OS X

psytec=# show lc_collate;
lc_collate
-
de_DE.UTF-8
(1 row)

psytec=# show server_encoding;
server_encoding
-
LATIN1
(1 row)

I thought that it might be something to do with the encoding - one of  
the tables has just been imported and I had some "fun" doing that but  
it "looks" okay now. Is there a way of checking?


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Tom Lane
Charlie Clark <[EMAIL PROTECTED]> writes:
> I'm getting unexpected results on a query which involves joining two  
> tables on two common variables (firstname and lastname).

That looks like it should work.  Given that you describe the columns as
"names" I'm supposing they are of textual datatypes.  Maybe you have a
messed-up encoding or locale situation that is causing the sorts to not
work properly?  What PG version is this exactly, on what platform, and
what do "show lc_collate" and "show server_encoding" say?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/08/07 16:09, Charlie Clark wrote:
> Hi,
> 
> I'm getting unexpected results on a query which involves joining two
> tables on two common variables (firstname and lastname).
> 
> This is the basic query:
> 
> SELECT table1.lastname, table1.firstname
> FROM  table1
> INNER JOIN table2 ON
> (table2.name = table1.name
> AND
> table2.vorname = table1.vorname)
> 
> This is returning many rows fewer than I expect and is ignoring a lot
> where table1.firstname = table2.firstname AND table1.lastname =
> table2.lastname. 

Huh?  Why should you?  You're not joining on firstname and lastname.

What happens if you do it like this:
SELECT T1.LASTNAME, T2.FIRSTNAME
FROM TABLE1 T1,
 TABLE2 T2
WHERE T1.NAME = T2.NAME
  AND T1.VORNAME = T2.VORNAME;


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF8N1zS9HxQb37XmcRAp3wAKCRJ1kuoqbc8YPOZwx+53+JRqvD/ACfVvFy
zK8u0+RYuMiBxEnURVc74Jc=
=9oxj
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Omar Eljumaily

What happens if you do an outer join instead of an inner join?

Charlie Clark wrote:

Hi,

I'm getting unexpected results on a query which involves joining two 
tables on two common variables (firstname and lastname).


This is the basic query:

SELECT table1.lastname, table1.firstname
FROM  table1
INNER JOIN table2 ON
(table2.name = table1.name
AND
table2.vorname = table1.vorname)

This is returning many rows fewer than I expect and is ignoring a lot 
where table1.firstname = table2.firstname AND table1.lastname = 
table2.lastname. In fact when I extend the query by a WHERE clause 
such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are 
not returned by the original query.


I'm not very au fait with the inner workings of PostgreSQL but EXPLAIN 
does not seem, to me at least, to provide an explanation for the 
missing results.


"Merge Join  (cost=1987.97..2121.24 rows=34 width=22)"
"  Merge Cond: (("outer"."?column3?" = "inner"."?column3?") AND 
("outer"."?column4?" = "inner"."?column4?"))"

"  ->  Sort  (cost=364.97..375.99 rows=4409 width=22)"
"Sort Key: (table1.lastname)::text, (table1.firstname)::text"
"->  Seq Scan on table1  (cost=0.00..98.09 rows=4409 width=22)"
"  ->  Sort  (cost=1623.00..1667.00 rows=17599 width=21)"
"Sort Key: (table2.lastname)::text, (table2.firstname)::text"
"->  Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)"

Am I missing something big and obvious here?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] weird behaviour

2001-08-30 Thread matthieuclavier



sorry, i forgot some version number :) 


[mat@biniac ~]$ psql --version
psql (PostgreSQL) 7.1.2
contains readline, history support
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996 Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.


[mat@biniac ~]$uname -rs
FreeBSD 4.3-RELEASE[mat@biniac ~]$

mat

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly