[SQL] Duplicates Processing

2010-10-08 Thread Gary Chambers
All,

I've been provided a CSV file of parts that contains duplicates of
properties (e.g. resistors have a wattage, tolerance, and temperature
coefficient property) of those parts that differ by a manufacturer
part number.  What I'd like to do is to process this file and, upon
encountering one of the duplicates, take that part with its new part
number and move it to a part substitutes table.  It seems like it
should be pretty simple, but I can't seem to generate a query or a
function to accomplish it.  I'd greatly appreciate any insight or
assistance with solving this problem.  Thank you very much in advance.

-- Gary Chambers

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Tim Landscheidt
Gary Chambers  wrote:

> I've been provided a CSV file of parts that contains duplicates of
> properties (e.g. resistors have a wattage, tolerance, and temperature
> coefficient property) of those parts that differ by a manufacturer
> part number.  What I'd like to do is to process this file and, upon
> encountering one of the duplicates, take that part with its new part
> number and move it to a part substitutes table.  It seems like it
> should be pretty simple, but I can't seem to generate a query or a
> function to accomplish it.  I'd greatly appreciate any insight or
> assistance with solving this problem.  Thank you very much in advance.

You can - for example - create a query with a call to
ROW_NUMBER() and then process the matching rows (untested):

| INSERT INTO substitutes ([...])
|   SELECT [...] FROM
| (SELECT *,
| ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
| ORDER BY part_number) AS RN
|  FROM parts) AS SubQuery
|   WHERE RN > 1;

| DELETE FROM parts
| WHERE primary_key IN
|   (SELECT primary_key FROM
| (SELECT *,
| ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
|ORDER BY part_number) AS RN
|  FROM parts) AS SubQuery
|WHERE RN > 1);

Tim


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Gary Chambers
Tim,

Thanks for taking the time to reply!

> | INSERT INTO substitutes ([...])
> |   SELECT [...] FROM
> |     (SELECT *,
> |             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
> |                         ORDER BY part_number) AS RN
> |      FROM parts) AS SubQuery
> |   WHERE RN > 1;

> | DELETE FROM parts
> | WHERE primary_key IN
> |   (SELECT primary_key FROM
> |     (SELECT *,
> |             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
> |                                ORDER BY part_number) AS RN
> |      FROM parts) AS SubQuery
> |    WHERE RN > 1);

You have solved the problem precisely as I described it.  In my haste
to make the request for assistance, I omitted one critical piece of
information that may call into question my data model.  In its current
state, my substitute parts table contains only the part number (the
"new" one, so-to-speak), a foreign key reference to the original parts
table, and some location data (which is also in the original parts
table).  Is there any advice you can offer in light of what I have
just described? I apologize for the oversight.

-- Gary Chambers

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Rob Sargent

On 10/08/2010 01:42 PM, Gary Chambers wrote:
> Tim,
> 
> Thanks for taking the time to reply!
> 
>> | INSERT INTO substitutes ([...])
>> |   SELECT [...] FROM
>> | (SELECT *,
>> | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
>> | ORDER BY part_number) AS RN
>> |  FROM parts) AS SubQuery
>> |   WHERE RN > 1;
> 
>> | DELETE FROM parts
>> | WHERE primary_key IN
>> |   (SELECT primary_key FROM
>> | (SELECT *,
>> | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
>> |ORDER BY part_number) AS RN
>> |  FROM parts) AS SubQuery
>> |WHERE RN > 1);
> 
> You have solved the problem precisely as I described it.  In my haste
> to make the request for assistance, I omitted one critical piece of
> information that may call into question my data model.  In its current
> state, my substitute parts table contains only the part number (the
> "new" one, so-to-speak), a foreign key reference to the original parts
> table, and some location data (which is also in the original parts
> table).  Is there any advice you can offer in light of what I have
> just described? I apologize for the oversight.
> 
> -- Gary Chambers
> 

Perhaps a trade off between nullable fields and redundant types.  If
your original table simply had a nullable column called
isReplacementFor, into which you place in the subsequent rows the id of
the first instance found.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Gary Chambers
Rob,

> Perhaps a trade off between nullable fields and redundant types.  If
> your original table simply had a nullable column called
> isReplacementFor, into which you place in the subsequent rows the id of
> the first instance found.

Am I misunderstanding you when you're suggesting a table like:

part_number   INTEGER
is_replacement_for INTEGER references part_number
value   INTEGER
wattage   FLOAT8
...

-- Gary Chambers

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Rob Sargent
Yes.  With this you can find all part numbers/supplies which match your
value, wattage criteria in one table. Or exclude any which have a
non-null is_replacement_for value.

If you need to drop the "replaceable" variant, you choose which of the
replacements to promote and update the others to match the new
"replaceable".  They're all instances of the same type of thing so in my
view they ought to live in the same table.


Also possible to maintain the replacement structure via a (self) join
record with replacable/is_replacement_for tuples.  You have a similar
but slightly more involve maintenance issue of course.



On 10/08/2010 02:42 PM, Gary Chambers wrote:
> Rob,
> 
>> Perhaps a trade off between nullable fields and redundant types.  If
>> your original table simply had a nullable column called
>> isReplacementFor, into which you place in the subsequent rows the id of
>> the first instance found.
> 
> Am I misunderstanding you when you're suggesting a table like:
> 
> part_number   INTEGER
> is_replacement_for INTEGER references part_number
> value   INTEGER
> wattage   FLOAT8
> ...
> 
> -- Gary Chambers

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Gary Chambers
Rob,

> Yes.  With this you can find all part numbers/supplies which match your
> value, wattage criteria in one table. Or exclude any which have a
> non-null is_replacement_for value.

I understand -- thanks.  I have received contradictory advice in a
purely data modeling context.  What about the null values that will be
in the properties columns of the part?  It would appear to be more
applicable to an employee database where the columns are populated
regardless and the "replacement_for" in the context of our discussion
would be a self-reference to the employee's manager.  No?

Thanks again for your help.

-- Gary Chambers

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] counting related rows

2010-10-08 Thread James Cloos
I have a table which includes a text column containing posix-style
paths.  Ie, matching the regexp "^[^/]+(/[^/]+)*$".

I need to do a query of a number of columns from that table, plus the
count of rows which are "children" of the current row.

The query:

  SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM m WHERE id=30016)
AND name ILIKE (SELECT name || '/%' FROM m WHERE id=30016);

selects that extra column given the id.

A view containing all of the columns from m plus a column matching the
above select would cover my needs well.

But I haven't been able to get the syntax right.

The current code (which I did not write) uses one select to get the
values of id it wants, and then iterates through them selecting four
columns from the table and then the above.  As you may imagine, two
selects per row is *slow*.  Maybe 20 rows per second.  I expect a
single, complete select to take < 10 ms.

Again, to be clear, for each row I need the count of other rows which
have the same value for column o and whose name is a child path of the
current row's name.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] counting related rows

2010-10-08 Thread Frank Bax

James Cloos wrote:

I have a table which includes a text column containing posix-style
paths.  Ie, matching the regexp "^[^/]+(/[^/]+)*$".

I need to do a query of a number of columns from that table, plus the
count of rows which are "children" of the current row.

The query:

  SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM m WHERE id=30016)
AND name ILIKE (SELECT name || '/%' FROM m WHERE id=30016);

selects that extra column given the id.

A view containing all of the columns from m plus a column matching the
above select would cover my needs well.

But I haven't been able to get the syntax right.



It would help if you provided:
a) statements to create sample data
b) expected results from sample data

Does this do what you want?

select * from m, (SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM 
m WHERE id=30016) AND name ILIKE (SELECT name || '/%' FROM m WHERE 
id=30016)) om;




--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Duplicates Processing

2010-10-08 Thread Rob Sargent

My understanding was that the values were in fact in the data of the
"replacers".  If not, you are correct.  In this case the replacers are
more like alias for the only instance you have.

If the replacers are immutable by all means ship them off to some other
table (where I suppose the become pointers to other suppliers of the
type of thing holding the real data). Not sure I've ever met immutable
data but there you go ;)

And to your point of self-reference, it would be to a co-worker more
than a manager.  Managers are often not good replacements for workers. :)


On 10/08/2010 04:12 PM, Gary Chambers wrote:
> Rob,
> 
>> Yes.  With this you can find all part numbers/supplies which match your
>> value, wattage criteria in one table. Or exclude any which have a
>> non-null is_replacement_for value.
> 
> I understand -- thanks.  I have received contradictory advice in a
> purely data modeling context.  What about the null values that will be
> in the properties columns of the part?  It would appear to be more
> applicable to an employee database where the columns are populated
> regardless and the "replacement_for" in the context of our discussion
> would be a self-reference to the employee's manager.  No?
> 
> Thanks again for your help.
> 
> -- Gary Chambers

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql