Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Charlie Clark


Am 09.03.2007 um 05:30 schrieb Tom Lane:


Charlie Clark [EMAIL PROTECTED] writes:

I'm getting unexpected results on a query which involves joining two
tables on two common variables (firstname and lastname).


That looks like it should work.  Given that you describe the  
columns as

names I'm supposing they are of textual datatypes.  Maybe you have a
messed-up encoding or locale situation that is causing the sorts to  
not

work properly?  What PG version is this exactly, on what platform, and
what do show lc_collate and show server_encoding say?


I'm running PostgreSQL 8.1.4 on Mac OS X

psytec=# show lc_collate;
lc_collate
-
de_DE.UTF-8
(1 row)

psytec=# show server_encoding;
server_encoding
-
LATIN1
(1 row)

I thought that it might be something to do with the encoding - one of  
the tables has just been imported and I had some fun doing that but  
it looks okay now. Is there a way of checking?


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Tom Lane
Charlie Clark [EMAIL PROTECTED] writes:
 psytec=# show lc_collate;
 lc_collate
 -
 de_DE.UTF-8
 (1 row)

 psytec=# show server_encoding;
 server_encoding
 -
 LATIN1
 (1 row)

There's your problem right there.  The string comparison routines are
built on strcoll(), which is going to expect UTF8-encoded data because
of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
characters in the database, they will most likely look like invalid
encoding to strcoll(), and on most platforms that causes it to behave
very oddly.  You need to keep lc_collate (and lc_ctype) in sync with 
server_encoding.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Charlie Clark


Am 09.03.2007 um 16:15 schrieb Tom Lane:


psytec=# show lc_collate;
lc_collate
-
de_DE.UTF-8
(1 row)



psytec=# show server_encoding;
server_encoding
-
LATIN1
(1 row)


There's your problem right there.  The string comparison routines are
built on strcoll(), which is going to expect UTF8-encoded data because
of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
characters in the database, they will most likely look like invalid
encoding to strcoll(), and on most platforms that causes it to behave
very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
server_encoding.


That does indeed seem to have been the problem even though the  
examples I was looking at were all using plain ASCII characters. Glad  
to know it wasn't a bug and to have learned something new.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-09 Thread Tom Lane
Charlie Clark [EMAIL PROTECTED] writes:
 Am 09.03.2007 um 16:15 schrieb Tom Lane:
 There's your problem right there.  The string comparison routines are
 built on strcoll(), which is going to expect UTF8-encoded data because
 of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
 characters in the database, they will most likely look like invalid
 encoding to strcoll(), and on most platforms that causes it to behave
 very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
 server_encoding.

 That does indeed seem to have been the problem even though the  
 examples I was looking at were all using plain ASCII characters. Glad  
 to know it wasn't a bug and to have learned something new.

Well, it *is* a bug: we really shouldn't let you select incompatible
locale and encoding settings.  This gotcha has been known for a long
time, but it's not clear that there's a bulletproof, portable way to
determine which encoding a particular locale setting implies ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Charlie Clark

Hi,

I'm getting unexpected results on a query which involves joining two  
tables on two common variables (firstname and lastname).


This is the basic query:

SELECT table1.lastname, table1.firstname
FROM  table1
INNER JOIN table2 ON
(table2.name = table1.name
AND
table2.vorname = table1.vorname)

This is returning many rows fewer than I expect and is ignoring a lot  
where table1.firstname = table2.firstname AND table1.lastname =  
table2.lastname. In fact when I extend the query by a WHERE clause  
such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are  
not returned by the original query.


I'm not very au fait with the inner workings of PostgreSQL but  
EXPLAIN does not seem, to me at least, to provide an explanation for  
the missing results.


Merge Join  (cost=1987.97..2121.24 rows=34 width=22)
  Merge Cond: ((outer.?column3? = inner.?column3?) AND  
(outer.?column4? = inner.?column4?))

  -  Sort  (cost=364.97..375.99 rows=4409 width=22)
Sort Key: (table1.lastname)::text, (table1.firstname)::text
-  Seq Scan on table1  (cost=0.00..98.09 rows=4409 width=22)
  -  Sort  (cost=1623.00..1667.00 rows=17599 width=21)
Sort Key: (table2.lastname)::text, (table2.firstname)::text
-  Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)

Am I missing something big and obvious here?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Omar Eljumaily

What happens if you do an outer join instead of an inner join?

Charlie Clark wrote:

Hi,

I'm getting unexpected results on a query which involves joining two 
tables on two common variables (firstname and lastname).


This is the basic query:

SELECT table1.lastname, table1.firstname
FROM  table1
INNER JOIN table2 ON
(table2.name = table1.name
AND
table2.vorname = table1.vorname)

This is returning many rows fewer than I expect and is ignoring a lot 
where table1.firstname = table2.firstname AND table1.lastname = 
table2.lastname. In fact when I extend the query by a WHERE clause 
such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are 
not returned by the original query.


I'm not very au fait with the inner workings of PostgreSQL but EXPLAIN 
does not seem, to me at least, to provide an explanation for the 
missing results.


Merge Join  (cost=1987.97..2121.24 rows=34 width=22)
  Merge Cond: ((outer.?column3? = inner.?column3?) AND 
(outer.?column4? = inner.?column4?))

  -  Sort  (cost=364.97..375.99 rows=4409 width=22)
Sort Key: (table1.lastname)::text, (table1.firstname)::text
-  Seq Scan on table1  (cost=0.00..98.09 rows=4409 width=22)
  -  Sort  (cost=1623.00..1667.00 rows=17599 width=21)
Sort Key: (table2.lastname)::text, (table2.firstname)::text
-  Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)

Am I missing something big and obvious here?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/08/07 16:09, Charlie Clark wrote:
 Hi,
 
 I'm getting unexpected results on a query which involves joining two
 tables on two common variables (firstname and lastname).
 
 This is the basic query:
 
 SELECT table1.lastname, table1.firstname
 FROM  table1
 INNER JOIN table2 ON
 (table2.name = table1.name
 AND
 table2.vorname = table1.vorname)
 
 This is returning many rows fewer than I expect and is ignoring a lot
 where table1.firstname = table2.firstname AND table1.lastname =
 table2.lastname. 

Huh?  Why should you?  You're not joining on firstname and lastname.

What happens if you do it like this:
SELECT T1.LASTNAME, T2.FIRSTNAME
FROM TABLE1 T1,
 TABLE2 T2
WHERE T1.NAME = T2.NAME
  AND T1.VORNAME = T2.VORNAME;


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF8N1zS9HxQb37XmcRAp3wAKCRJ1kuoqbc8YPOZwx+53+JRqvD/ACfVvFy
zK8u0+RYuMiBxEnURVc74Jc=
=9oxj
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Weird behaviour on a join with multiple keys

2007-03-08 Thread Tom Lane
Charlie Clark [EMAIL PROTECTED] writes:
 I'm getting unexpected results on a query which involves joining two  
 tables on two common variables (firstname and lastname).

That looks like it should work.  Given that you describe the columns as
names I'm supposing they are of textual datatypes.  Maybe you have a
messed-up encoding or locale situation that is causing the sorts to not
work properly?  What PG version is this exactly, on what platform, and
what do show lc_collate and show server_encoding say?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster