[SQL] Duplicates Processing
All, I've been provided a CSV file of parts that contains duplicates of properties (e.g. resistors have a wattage, tolerance, and temperature coefficient property) of those parts that differ by a manufacturer part number. What I'd like to do is to process this file and, upon encountering one of the duplicates, take that part with its new part number and move it to a part substitutes table. It seems like it should be pretty simple, but I can't seem to generate a query or a function to accomplish it. I'd greatly appreciate any insight or assistance with solving this problem. Thank you very much in advance. -- Gary Chambers -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
Gary Chambers wrote: > I've been provided a CSV file of parts that contains duplicates of > properties (e.g. resistors have a wattage, tolerance, and temperature > coefficient property) of those parts that differ by a manufacturer > part number. What I'd like to do is to process this file and, upon > encountering one of the duplicates, take that part with its new part > number and move it to a part substitutes table. It seems like it > should be pretty simple, but I can't seem to generate a query or a > function to accomplish it. I'd greatly appreciate any insight or > assistance with solving this problem. Thank you very much in advance. You can - for example - create a query with a call to ROW_NUMBER() and then process the matching rows (untested): | INSERT INTO substitutes ([...]) | SELECT [...] FROM | (SELECT *, | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature | ORDER BY part_number) AS RN | FROM parts) AS SubQuery | WHERE RN > 1; | DELETE FROM parts | WHERE primary_key IN | (SELECT primary_key FROM | (SELECT *, | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature |ORDER BY part_number) AS RN | FROM parts) AS SubQuery |WHERE RN > 1); Tim -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
Tim, Thanks for taking the time to reply! > | INSERT INTO substitutes ([...]) > | SELECT [...] FROM > | (SELECT *, > | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature > | ORDER BY part_number) AS RN > | FROM parts) AS SubQuery > | WHERE RN > 1; > | DELETE FROM parts > | WHERE primary_key IN > | (SELECT primary_key FROM > | (SELECT *, > | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature > | ORDER BY part_number) AS RN > | FROM parts) AS SubQuery > | WHERE RN > 1); You have solved the problem precisely as I described it. In my haste to make the request for assistance, I omitted one critical piece of information that may call into question my data model. In its current state, my substitute parts table contains only the part number (the "new" one, so-to-speak), a foreign key reference to the original parts table, and some location data (which is also in the original parts table). Is there any advice you can offer in light of what I have just described? I apologize for the oversight. -- Gary Chambers -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
On 10/08/2010 01:42 PM, Gary Chambers wrote: > Tim, > > Thanks for taking the time to reply! > >> | INSERT INTO substitutes ([...]) >> | SELECT [...] FROM >> | (SELECT *, >> | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature >> | ORDER BY part_number) AS RN >> | FROM parts) AS SubQuery >> | WHERE RN > 1; > >> | DELETE FROM parts >> | WHERE primary_key IN >> | (SELECT primary_key FROM >> | (SELECT *, >> | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature >> |ORDER BY part_number) AS RN >> | FROM parts) AS SubQuery >> |WHERE RN > 1); > > You have solved the problem precisely as I described it. In my haste > to make the request for assistance, I omitted one critical piece of > information that may call into question my data model. In its current > state, my substitute parts table contains only the part number (the > "new" one, so-to-speak), a foreign key reference to the original parts > table, and some location data (which is also in the original parts > table). Is there any advice you can offer in light of what I have > just described? I apologize for the oversight. > > -- Gary Chambers > Perhaps a trade off between nullable fields and redundant types. If your original table simply had a nullable column called isReplacementFor, into which you place in the subsequent rows the id of the first instance found. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
Rob, > Perhaps a trade off between nullable fields and redundant types. If > your original table simply had a nullable column called > isReplacementFor, into which you place in the subsequent rows the id of > the first instance found. Am I misunderstanding you when you're suggesting a table like: part_number INTEGER is_replacement_for INTEGER references part_number value INTEGER wattage FLOAT8 ... -- Gary Chambers -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
Yes. With this you can find all part numbers/supplies which match your value, wattage criteria in one table. Or exclude any which have a non-null is_replacement_for value. If you need to drop the "replaceable" variant, you choose which of the replacements to promote and update the others to match the new "replaceable". They're all instances of the same type of thing so in my view they ought to live in the same table. Also possible to maintain the replacement structure via a (self) join record with replacable/is_replacement_for tuples. You have a similar but slightly more involve maintenance issue of course. On 10/08/2010 02:42 PM, Gary Chambers wrote: > Rob, > >> Perhaps a trade off between nullable fields and redundant types. If >> your original table simply had a nullable column called >> isReplacementFor, into which you place in the subsequent rows the id of >> the first instance found. > > Am I misunderstanding you when you're suggesting a table like: > > part_number INTEGER > is_replacement_for INTEGER references part_number > value INTEGER > wattage FLOAT8 > ... > > -- Gary Chambers -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
Rob, > Yes. With this you can find all part numbers/supplies which match your > value, wattage criteria in one table. Or exclude any which have a > non-null is_replacement_for value. I understand -- thanks. I have received contradictory advice in a purely data modeling context. What about the null values that will be in the properties columns of the part? It would appear to be more applicable to an employee database where the columns are populated regardless and the "replacement_for" in the context of our discussion would be a self-reference to the employee's manager. No? Thanks again for your help. -- Gary Chambers -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] counting related rows
I have a table which includes a text column containing posix-style paths. Ie, matching the regexp "^[^/]+(/[^/]+)*$". I need to do a query of a number of columns from that table, plus the count of rows which are "children" of the current row. The query: SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM m WHERE id=30016) AND name ILIKE (SELECT name || '/%' FROM m WHERE id=30016); selects that extra column given the id. A view containing all of the columns from m plus a column matching the above select would cover my needs well. But I haven't been able to get the syntax right. The current code (which I did not write) uses one select to get the values of id it wants, and then iterates through them selecting four columns from the table and then the above. As you may imagine, two selects per row is *slow*. Maybe 20 rows per second. I expect a single, complete select to take < 10 ms. Again, to be clear, for each row I need the count of other rows which have the same value for column o and whose name is a child path of the current row's name. -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6 -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] counting related rows
James Cloos wrote: I have a table which includes a text column containing posix-style paths. Ie, matching the regexp "^[^/]+(/[^/]+)*$". I need to do a query of a number of columns from that table, plus the count of rows which are "children" of the current row. The query: SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM m WHERE id=30016) AND name ILIKE (SELECT name || '/%' FROM m WHERE id=30016); selects that extra column given the id. A view containing all of the columns from m plus a column matching the above select would cover my needs well. But I haven't been able to get the syntax right. It would help if you provided: a) statements to create sample data b) expected results from sample data Does this do what you want? select * from m, (SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM m WHERE id=30016) AND name ILIKE (SELECT name || '/%' FROM m WHERE id=30016)) om; -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Duplicates Processing
My understanding was that the values were in fact in the data of the "replacers". If not, you are correct. In this case the replacers are more like alias for the only instance you have. If the replacers are immutable by all means ship them off to some other table (where I suppose the become pointers to other suppliers of the type of thing holding the real data). Not sure I've ever met immutable data but there you go ;) And to your point of self-reference, it would be to a co-worker more than a manager. Managers are often not good replacements for workers. :) On 10/08/2010 04:12 PM, Gary Chambers wrote: > Rob, > >> Yes. With this you can find all part numbers/supplies which match your >> value, wattage criteria in one table. Or exclude any which have a >> non-null is_replacement_for value. > > I understand -- thanks. I have received contradictory advice in a > purely data modeling context. What about the null values that will be > in the properties columns of the part? It would appear to be more > applicable to an employee database where the columns are populated > regardless and the "replacement_for" in the context of our discussion > would be a self-reference to the employee's manager. No? > > Thanks again for your help. > > -- Gary Chambers -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql