Re: [GENERAL] is (not) distinct from
On 03/07/2017 01:22 AM, Johann Spies wrote: Thanks (again Adrian) and Tom. The situation was that I had a table with 731million records which I wanted to copy into a partitioned one where there was a unique constraint on the fields used in my query. The "backup" table was the single one. While inserting into the partitioned table from the backup one, several (about 12) records failed to insert. I wanted to find out which records were involved and found that some had "'" characters in the values which broke some of the functions used to do some calculations.. As there were fields that might have null values I have tried the "is not distinct from". Both sides of the query had primary keys and I did not use group by. That was why I used "distinct". Would it not be easier to use a LEFT JOIN between the original table(backup) and the new table: https://www.postgresql.org/docs/9.6/static/sql-select.html join_type "LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the JOIN clause's own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards." So something like: SELECT a.pk FROM original_table AS a LEFT JOIN--The OUTER is not required new_table AS b ON a.pk = b.pk WHERE b.pk IS NULL That would show all the rows in the original table that where not transferred over. Anyhow in the end, I made some progress with a modified query: where s.citing_article = A.citing_article and s.cited_article != A.cited_article and s.pubyear is distinct from A.pubyear and s.year_cited is distinct from A.year_cited and s.cited_author is distinct from A.cited_author and regexp_replace(s.cited_title, $$'$$, $$''$$,'g') is distinct from regexp_replace(A.cited_title, $$'$$, $$''$$,'g') and regexp_replace(s.cited_work, $$'$$, $$''$$,'g') is distinct from regexp_replace(A.cited_work, $$'$$, $$''$$,'g') and s.doi is distinct from A.doi Regards. Johann -- Because experiencing your loyal love is better than life itself, my lips will praise you. (Psalm 63:3) -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] is (not) distinct from
Thanks (again Adrian) and Tom. The situation was that I had a table with 731million records which I wanted to copy into a partitioned one where there was a unique constraint on the fields used in my query. The "backup" table was the single one. While inserting into the partitioned table from the backup one, several (about 12) records failed to insert. I wanted to find out which records were involved and found that some had "'" characters in the values which broke some of the functions used to do some calculations.. As there were fields that might have null values I have tried the "is not distinct from". Both sides of the query had primary keys and I did not use group by. That was why I used "distinct". Anyhow in the end, I made some progress with a modified query: where s.citing_article = A.citing_article and s.cited_article != A.cited_article and s.pubyear is distinct from A.pubyear and s.year_cited is distinct from A.year_cited and s.cited_author is distinct from A.cited_author and regexp_replace(s.cited_title, $$'$$, $$''$$,'g') is distinct from regexp_replace(A.cited_title, $$'$$, $$''$$,'g') and regexp_replace(s.cited_work, $$'$$, $$''$$,'g') is distinct from regexp_replace(A.cited_work, $$'$$, $$''$$,'g') and s.doi is distinct from A.doi Regards. Johann -- Because experiencing your loyal love is better than life itself, my lips will praise you. (Psalm 63:3)
Re: [GENERAL] is (not) distinct from
Adrian Klaver writes: > Where I am going with this, is that it is not clear to me how you are > matching the two sets of records to determine whether they are different > or not. He's not. The query is forming the cartesian product of the two tables and then dropping join rows where the tables match ... but every B row is going to have multiple A rows where it doesn't match, and those join rows will all survive the WHERE. Then "select distinct" gets rid of the duplicates, and since nothing from A is presented in the result, it's not very obvious what's happening. This is a great example of "select distinct" being used as a band-aid over a fundamental misunderstanding of SQL. It's good advice to never use "distinct" unless you know exactly why your query is generating duplicate rows in the first place. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] is (not) distinct from
On 03/01/2017 12:15 AM, Johann Spies wrote: On 28 February 2017 at 17:06, Adrian Klaver mailto:adrian.kla...@aklaver.com>> wrote: I have not worked through all this but at first glance I suspect: select distinct b.* from b ... is distinct from ... constitutes a double negative. What happens if you eliminate the first distinct? Thanks Adrian, The dynamics of the data has changed because of data updates so an exact comparison is not possible. Other tests now confirm that the 28 records are identical in both tables. The results then become more confusing: If I remove the first distinct and use "is distinct from" I get 756 rows and when I use "is not distinct from" I get 28. In the first (756) case when I use "group by" the result of the first query is exactly the same as the second one. To be clear you are looking for records in citation that are different from citationbackup over a subset(Are there more fields?) of 8 fields, correct? What do those 8 fields represent? Is citationbackup really a backup of citation? Is there a Primary Key on either/both tables? What are you grouping by? Where I am going with this, is that it is not clear to me how you are matching the two sets of records to determine whether they are different or not. Your result that yields 756 rows indicates that the comparison is not an apples to apples comparison, but a comparison of two 'shuffled' sets. Adding the group by seems to sort that out. So some idea of what constitutes a difference and how you determine which records from each table you want to match would be helpful. If you could show the table schema and some sample data it would be even better. Regards Johann -- Because experiencing your loyal love is better than life itself, my lips will praise you. (Psalm 63:3) -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] is (not) distinct from
On 28 February 2017 at 17:06, Adrian Klaver wrote: > > I have not worked through all this but at first glance I suspect: > > select distinct b.* from b ... > > is distinct from ... > > constitutes a double negative. > > What happens if you eliminate the first distinct? > > > > Thanks Adrian, The dynamics of the data has changed because of data updates so an exact comparison is not possible. Other tests now confirm that the 28 records are identical in both tables. The results then become more confusing: If I remove the first distinct and use "is distinct from" I get 756 rows and when I use "is not distinct from" I get 28. In the first (756) case when I use "group by" the result of the first query is exactly the same as the second one. Regards Johann -- Because experiencing your loyal love is better than life itself, my lips will praise you. (Psalm 63:3)
Re: [GENERAL] is (not) distinct from
On 02/28/2017 12:08 AM, Johann Spies wrote: When I query table a I get 18 rows. The same query on table b results in 28 rows. Both tables have the same structure. When I export the results to csv-files and do a diff it confirms that all 18 rows from a are also in b. Table b has 10 new rows. When I combine these queries and use "is (not) distinct from" I get strange results: with a as (select citing_article, cited_article, pubyear, year_cited, cited_author, cited_title, cited_work, doi from wos_2017_1.citation where citing_article='abcdefg' order by 3,4,5,6,8), b as ( select citing_article, cited_article, pubyear, year_cited, cited_author, cited_title, cited_work, doi from wos_2017_1.citationbackup where citing_article='abcdefg' order by 3,4,5,6,8) select distinct b.* from b , a where ( B.citing_article, B.cited_article, B.pubyear, B.year_cited, B.cited_author, B.cited_title, B.cited_work, B.doi) is distinct from (A.citing_article, A.cited_article, A.pubyear, A.year_cited, A.cited_author, A.cited_title, A.cited_work, A.doi) The result of this query is 28 rows - thus that of b. I expected this to be 10. If I change the query to "is not distinct from" it results in 18 rows which is what I would have expected. I have not worked through all this but at first glance I suspect: select distinct b.* from b ... is distinct from ... constitutes a double negative. What happens if you eliminate the first distinct? Regards Johann. -- Because experiencing your loyal love is better than life itself, my lips will praise you. (Psalm 63:3) -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] is (not) distinct from
When I query table a I get 18 rows. The same query on table b results in 28 rows. Both tables have the same structure. When I export the results to csv-files and do a diff it confirms that all 18 rows from a are also in b. Table b has 10 new rows. When I combine these queries and use "is (not) distinct from" I get strange results: with a as (select citing_article, cited_article, pubyear, year_cited, cited_author, cited_title, cited_work, doi from wos_2017_1.citation where citing_article='abcdefg' order by 3,4,5,6,8), b as ( select citing_article, cited_article, pubyear, year_cited, cited_author, cited_title, cited_work, doi from wos_2017_1.citationbackup where citing_article='abcdefg' order by 3,4,5,6,8) select distinct b.* from b , a where ( B.citing_article, B.cited_article, B.pubyear, B.year_cited, B.cited_author, B.cited_title, B.cited_work, B.doi) is distinct from (A.citing_article, A.cited_article, A.pubyear, A.year_cited, A.cited_author, A.cited_title, A.cited_work, A.doi) The result of this query is 28 rows - thus that of b. I expected this to be 10. If I change the query to "is not distinct from" it results in 18 rows which is what I would have expected. Regards Johann. -- Because experiencing your loyal love is better than life itself, my lips will praise you. (Psalm 63:3)
Re: [GENERAL] is not distinct from any(...)
"Robert Haas" <[EMAIL PROTECTED]> writes: > That doesn't seem to work, because IS NOT DISTINCT FROM is not an > operator. Yah :-( > So then I tried creating an operator === (anyelement, > anyelement) that just does IS NOT DISTINCT FROM and writing: > select 1 === any(array[1]); > which got me: > ERROR: could not find array type for data type anyelement FWIW, it seems to work in 8.3. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] is not distinct from any(...)
I'm trying to write a SQL statement to determine whether a value is an an array, but I want the comparison to be done using IS NOT DISTINCT FROM rather than =. My first thought was that instead of writing: SELECT value = ANY(array) ...I could simply write: SELECT value IS NOT DISTINCT FROM ANY(array) That doesn't seem to work, because IS NOT DISTINCT FROM is not an operator. So then I tried creating an operator === (anyelement, anyelement) that just does IS NOT DISTINCT FROM and writing: select 1 === any(array[1]); which got me: ERROR: could not find array type for data type anyelement Grr... any suggestions? ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general