Hey,Tarlika.
I tried to reproduce your test case through a series of inserts and It
seems that the lower case "d" went unnoticed.
That explains the empty list i got as result. My fault. Sorry :-(
Great to hear it helped you
Best,
Oliveiros
Enviado de meu iPhone
Em 24/09/2010, às 05:12 PM, "Tarlika Elisabeth Schmitz" <postgre...@numerixtechnology.d
e> escreveu:
Dear Oliveiros,
Thank you for taking the time to help.
On Fri, 24 Sep 2010 11:22:21 +0100
"Oliveiros d'Azevedo Cristina" <oliveiros.crist...@marktest.pt> wrote:
----- Original Message -----
From: "Tarlika Elisabeth Schmitz" <postgre...@numerixtechnology.de>
To: <pgsql-sql@postgresql.org>
Sent: Thursday, September 23, 2010 10:39 PM
Subject: [SQL] identifying duplicates in table with redundancies
[...] I want to check for duplicates:
1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name
I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it.
CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);
====
EXAMPLE DATA
22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====
select trainer_id, trainer_name from
(
select distinct on (trainer_name) trainer_id, trainer_name
from student
where trainer_id in
(
select distinct on (id) id
from
(
select distinct on (trainer_id,trainer_name)
trainer_id as id,
trainer_name as name from student
) as trainer
group by trainer.id
having count (trainer.name) > 1
)
) as y
order by trainer_id
Howdy, Tarlika.
First, did you past correctly your query into your mail?
I am asking this because your query doesn't seem work for me, it
returns an empty list :-|
Your most nested query, [...]
returns this
1|"MacDonald"
2|"Smith"
3|"Smith"
For me, the innermost query returns:
1|"Macdonald"
1|"MacDonald"
2|"Smith"
3|"Smith"
(note the lower/uppercase "d" in MacDonald)
The whole query returns:
1|"Macdonald"
1|"MacDonald"
1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name
To achieve 2) I would use this query
SELECT DISTINCT trainer_id,trainer_name
FROM (
SELECT trainer_name -- The field you want to test for duplicates
FROM (
SELECT DISTINCT "trainer_id","trainer_name"
FROM student
) x
GROUP BY "trainer_name" -- the field you want to test for
duplicates
HAVING (COUNT(*) > 1)
) z
NATURAL JOIN student y
I see my 2 innermost queries are the same as yours, just a bit more
wordy. I messed up at the third query, which threw up an error when I
tried to add an ORDER BY.
It will give you a list of the trainer names who have more than one
trainer ID and the respective trainer IDS.
For your particular example data result will be
2|"Smith"
3|"Smith"
Splendid! Just what I wanted.
As Smith is the only trainer with records with diferent trainer IDs.
Question : Can this be what you want?
The real table has 250000 entries and quite a few dups.
If you want to achieve 1) just substitute the trainer_name by
trainer_id on the commented places.
1) works as well now - just had to transpose id/name.
--
Best Regards,
Tarlika Elisabeth Schmitz
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql