Re: [GENERAL] Strange problem with create table as select * from table;
On Sun, Nov 06, 2011 at 09:34:24AM -0500, Tom Lane wrote: hubert depesz lubaczewski dep...@depesz.com writes: Any chance of getting the fix in patch format so we could test it on this system? http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=23998fe99c1220ba3a9eefee194e37ec1f14ae07 hi just to close the loop - finally today I could restart the database with patched 8.4. bug fixed, all works fine. thanks a lot. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: Any chance of getting the fix in patch format so we could test it on this system? http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=23998fe99c1220ba3a9eefee194e37ec1f14ae07 regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 09:04:02PM -0400, Tom Lane wrote: that. And that they are the only rows that, in addition to the above conditions, contain data fields wide enough to require out-of-line toasting. checked lengths of the text/varchar columns in database. there are 16 such columns in the table. full report of lengths is in http://www.depesz.com/various/lengths.report.gz it was obtained using: select length( first_text_column ) as length_1, count(*) from etsy_v2.receipts group by 1 order by 1; and so on for every text column, and at the end I also made summary of sum-of-lengths. there is also: http://www.depesz.com/various/lengths2.report.gz which has the same summary, but only of the damaged rows. As you can see the length of columns is not really special - somewhere in the middle of all other rows. summarized length is also not special in any way. These conditions together are enough to break the assumption in toast_insert_or_update that the old and new tuples must have the same value of t_hoff. But it can only happen when the source tuple is an original on-disk tuple, which explains why only INSERT ... SELECT * causes the problem, not any variants that require projection of a new column set. When it does happen, toast_insert_or_update correctly computes the required size of the new tuple ... but then it tells heap_fill_tuple to fill the data part at offset olddata-t_hoff, which is wrong (too small) and so the nulls bitmap that heap_fill_tuple concurrently constructs will overwrite the first few data bytes. In your example, the table contains 49 columns so the nulls bitmap requires 7 bytes, just enough to overwrite the first 6 data bytes as observed. (In fact, given the values we see being filled in, I can confidently say that you have two added-since-creation null columns, no more, no less.) I can reproduce the problem with the attached test case (using the regression database). With asserts enabled, the Assert(new_len == olddata-t_hoff); fails. With asserts off, corrupt data. How can I make the onek table for the test? is it standard table from something? This is trivial to fix, now that we know there's a problem --- the function is only using that assumption to save itself a couple lines of code. Penny wise, pound foolish :-( Any chance of getting the fix in patch format so we could test it on this system? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: checked lengths of the text/varchar columns in database. there are 16 such columns in the table. full report of lengths is in http://www.depesz.com/various/lengths.report.gz it was obtained using: select length( first_text_column ) as length_1, count(*) from etsy_v2.receipts group by 1 order by 1; and so on for every text column, and at the end I also made summary of sum-of-lengths. BTW, that probably doesn't prove a lot since it takes no account of compression, and different values could be compressible to varying degrees. It'd be more useful to look at pg_column_size() numbers to get an idea of whether toasting happened or not. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Friday, November 04, 2011 3:43:48 pm hubert depesz lubaczewski wrote: On Fri, Nov 04, 2011 at 05:49:44PM -0400, Tom Lane wrote: You said that pg_dump does not show the corruption. That could be because the data is coming out through the COPY code path instead of the SELECT code path. Could you try a pg_dump with --inserts (which will fetch the data with SELECTs) and see if it shows corrupt data? i'm running the pg_dump (it will take some time, so don't hold your breath), but at the same time - I can select these rows, correctly, with normal SELECT from table (xobjects table). Doesn't it disprove this theory? Another question. Between 07/20/11 and this recent attempt did you do a CREATE TABLE AS on this table and not have corrupted rows? Best regards, depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 05:49:44PM -0400, Tom Lane wrote: You said that pg_dump does not show the corruption. That could be because the data is coming out through the COPY code path instead of the SELECT code path. Could you try a pg_dump with --inserts (which will fetch the data with SELECTs) and see if it shows corrupt data? pg_dump finished. all ids are correct - i.e. there are no rows with xobject_id = -1. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 05:06:35PM -0700, Adrian Klaver wrote: Another question. Between 07/20/11 and this recent attempt did you do a CREATE TABLE AS on this table and not have corrupted rows? don't remember. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Friday, November 04, 2011 6:04:02 pm Tom Lane wrote: I wrote: A different line of thought is that there's something about these specific source rows, and only these rows, that makes them vulnerable to corruption during INSERT/SELECT. Do they by any chance contain any values that are unusual elsewhere in your table? One thing I'm wondering about right now is the nulls bitmap --- so do these rows have nulls (or not-nulls) in any place that's unusual elsewhere? Hah ... I have a theory. This is trivial to fix, now that we know there's a problem --- the function is only using that assumption to save itself a couple lines of code. Penny wise, pound foolish :-( I killed a few brain cells just reading the explanation:) regards, tom lane -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
I wrote: A different line of thought is that there's something about these specific source rows, and only these rows, that makes them vulnerable to corruption during INSERT/SELECT. Do they by any chance contain any values that are unusual elsewhere in your table? One thing I'm wondering about right now is the nulls bitmap --- so do these rows have nulls (or not-nulls) in any place that's unusual elsewhere? Hah ... I have a theory. I will bet that you recently added some column(s) to the source table using ALTER TABLE ADD COLUMN and no default value, so that the added columns were nulls and no table rewrite happened. And that these troublesome rows predate that addition, but contained no nulls before that. And that they are the only rows that, in addition to the above conditions, contain data fields wide enough to require out-of-line toasting. These conditions together are enough to break the assumption in toast_insert_or_update that the old and new tuples must have the same value of t_hoff. But it can only happen when the source tuple is an original on-disk tuple, which explains why only INSERT ... SELECT * causes the problem, not any variants that require projection of a new column set. When it does happen, toast_insert_or_update correctly computes the required size of the new tuple ... but then it tells heap_fill_tuple to fill the data part at offset olddata-t_hoff, which is wrong (too small) and so the nulls bitmap that heap_fill_tuple concurrently constructs will overwrite the first few data bytes. In your example, the table contains 49 columns so the nulls bitmap requires 7 bytes, just enough to overwrite the first 6 data bytes as observed. (In fact, given the values we see being filled in, I can confidently say that you have two added-since-creation null columns, no more, no less.) I can reproduce the problem with the attached test case (using the regression database). With asserts enabled, the Assert(new_len == olddata-t_hoff); fails. With asserts off, corrupt data. This is trivial to fix, now that we know there's a problem --- the function is only using that assumption to save itself a couple lines of code. Penny wise, pound foolish :-( regards, tom lane drop table wide; create table wide as select ten as firstc, unique1 as unique1_1, unique2 as unique2_1, two as two_1, four as four_1, ten as ten_1, twenty as twenty_1, hundred as hundred_1, thousand as thousand_1, twothousand as twothousand_1, fivethous as fivethous_1, tenthous as tenthous_1, odd as odd_1, even as even_1, stringu1 as stringu1_1, stringu2 as stringu2_1, string4 as string4_1, unique1 as unique1_2, unique2 as unique2_2, two as two_2, four as four_2, ten as ten_2, twenty as twenty_2, hundred as hundred_2, thousand as thousand_2, twothousand as twothousand_2, fivethous as fivethous_2, tenthous as tenthous_2, odd as odd_2, even as even_2, stringu1 as stringu1_2, stringu2 as stringu2_2, string4 as string4_2, unique1 as unique1_3, unique2 as unique2_3, two as two_3, four as four_3, ten as ten_3, twenty as twenty_3, hundred as hundred_3, thousand as thousand_3, twothousand as twothousand_3, fivethous as fivethous_3, tenthous as tenthous_3, odd as odd_3, even as even_3, repeat('xyzzyxydlkadlkndvlelfzzy', 2) as widec from onek limit 10; alter table wide add column nullc1 int; alter table wide add column nullc2 int; drop table widec; create table widec as select * from wide; select firstc, to_hex(unique1_1), unique2_1, to_hex(unique1_2) from widec; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 09:04:02PM -0400, Tom Lane wrote: Hah ... I have a theory. I will bet that you recently added some column(s) to the source table using ALTER TABLE ADD COLUMN and no default value, so that the added columns were nulls and no table rewrite happened. And that these troublesome rows predate that addition, but contained no nulls before that. And that they are the only rows that, in addition to the above conditions, contain data fields wide enough to require out-of-line toasting. These conditions together are enough to break the assumption in toast_insert_or_update that the old and new tuples must have the same value of t_hoff. Wow! Good catch. This is trivial to fix, now that we know there's a problem --- the function is only using that assumption to save itself a couple lines of code. Penny wise, pound foolish :-( No doubt the assumption was true when the code was written, but still. Hve a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 11:03:45PM +0100, hubert depesz lubaczewski wrote: looking for some other info. will post as soon as i'll gather it, but that will be in utc morning :( I looked closer at the rows that got -1 xobject_id. $ select magic_id, count(*) from qqq where xobject_id = -1 group by 1 order by 1; magic_id | count --+--- 30343295 | 2 30408831 | 3 30539903 | 1 30605439 | 2 30670975 | 3 30802047 | 1 30867583 | 1 30933119 | 1 31195263 | 2 31260799 | 1 31326335 | 1 31588479 | 3 31654015 | 1 31719551 | 1 31785087 | 3 31850623 | 4 31981695 | 2 32047231 | 2 32112767 | 1 32309375 | 1 32374911 | 1 32440447 | 1 32505983 | 2 (23 rows) So, I checked original counts for these magic_id: $ select magic_id, count(*) from sss.xobjects where magic_id in (30343295,30408831,30539903,30605439,30670975,30802047,30867583,30933119,31195263,31260799,31326335,31588479,31654015,31719551,31785087,31850623,31981695,32047231,32112767,32309375,32374911,32440447,32505983) group by 1 order by 1; magic_id | count --+--- 30343295 | 1 30408831 | 1 30539903 | 1 30605439 | 1 30670975 | 1 30802047 | 1 30867583 | 1 30933119 | 1 31195263 | 1 31260799 | 1 31326335 | 1 31588479 | 1 31654015 | 1 31719551 | 1 31785087 | 1 31850623 | 1 31981695 | 1 32047231 | 1 32112767 | 1 32309375 | 1 32374911 | 1 32440447 | 1 32505983 | 1 (23 rows) $ select min(magic_id), max(magic_id), count(distinct magic_id), sum( case when magic_id between 30343295 and 32505983 then 1 else 0 end ) as count_in_range from sss.xobjects; min | max| count | count_in_range --+--+--+ 1000 | 37830834 | 32030523 |2079327 (1 row) So, the ids are not clustered. at least the magic_id. but since these are more or less chronological, it means that these rows were added to db quite some time apart: magic_id | creation_tsz --+ 30343295 | 2011-05-28 00:57:36+00 30408831 | 2011-05-30 01:51:09+00 30539903 | 2011-06-02 04:06:20+00 30605439 | 2011-06-03 18:23:06+00 30670975 | 2011-06-05 16:49:49+00 30802047 | 2011-06-08 16:46:22+00 30867583 | 2011-06-10 01:39:41+00 30933119 | 2011-06-11 19:48:07+00 31195263 | 2011-06-18 00:33:24+00 31260799 | 2011-06-20 01:49:46+00 31326335 | 2011-06-21 17:53:41+00 31588479 | 2011-06-28 07:07:19+00 31654015 | 2011-06-29 20:30:52+00 31719551 | 2011-07-01 09:50:54+00 31785087 | 2011-07-03 03:42:02+00 31850623 | 2011-07-05 05:02:27+00 31981695 | 2011-07-08 04:49:21+00 32047231 | 2011-07-09 21:59:25+00 32112767 | 2011-07-11 16:53:10+00 32309375 | 2011-07-15 21:52:31+00 32374911 | 2011-07-17 19:20:34+00 32440447 | 2011-07-19 03:13:21+00 32505983 | 2011-07-20 16:15:38+00 So, Let's see how the bad rows (-1) look, in comparison to good ones, in copy of the table: $ select xobject_id, magic_id, creation_tsz from qqq where magic_id in (30343295,30408831,30539903,30605439,30670975,30802047,30867583,30933119,31195263,31260799,31326335,31588479,31654015,31719551,31785087,31850623,31981695,32047231,32112767,32309375,32374911,32440447,32505983) order by 2, 1; xobject_id | magic_id | creation_tsz +--+ -1 | 30343295 | 2011-05-28 17:24:58+00 -1 | 30343295 | 2011-05-28 04:57:23+00 33695980 | 30343295 | 2011-05-28 00:57:36+00 -1 | 30408831 | 2011-05-31 04:25:44+00 -1 | 30408831 | 2011-05-31 01:03:03+00 -1 | 30408831 | 2011-05-30 14:52:29+00 33761515 | 30408831 | 2011-05-30 01:51:09+00 -1 | 30539903 | 2011-06-02 05:05:08+00 33892588 | 30539903 | 2011-06-02 04:06:20+00 -1 | 30605439 | 2011-06-05 05:13:01+00 -1 | 30605439 | 2011-06-04 03:22:08+00 33958124 | 30605439 | 2011-06-03 18:23:06+00 -1 | 30670975 | 2011-06-05 17:54:06+00 -1 | 30670975 | 2011-06-06 13:59:01+00 -1 | 30670975 | 2011-06-06 14:44:22+00 34023662 | 30670975 | 2011-06-05
Re: [GENERAL] Strange problem with create table as select * from table;
On 11/04/2011 01:17 PM, hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 11:03:45PM +0100, hubert depesz lubaczewski wrote: looking for some other info. will post as soon as i'll gather it, but that will be in utc morning :( I looked closer at the rows that got -1 xobject_id. Does it tell you anything? You are very thorough. I don't know enough about Postgres internals to be much help there. All I can point out is the problem seemed to appear over roughly a two month period 5/28/11-7/20/11. Any new code or procedures rolled out then? Any bug reports? Given the turnover on this table it seems important everything is 'normal' since 7/20/11. Best regards, depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 01:43:55PM -0700, Adrian Klaver wrote: Does it tell you anything? You are very thorough. I hate mysteries. Especially the ones that break stuff. I don't know enough about Postgres internals to be much help there. All I can point out is the problem seemed to appear over roughly a two month period 5/28/11-7/20/11. Any new code or procedures rolled out then? Any bug reports? not as far as I know, but I'll check with guys. Still - app changes shouldn't cause something like this in db? Given the turnover on this table it seems important everything is 'normal' since 7/20/11. yes, but we can't do pg_reorg of the table (it has quite a lot of bloat) Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On 11/04/2011 01:47 PM, hubert depesz lubaczewski wrote: On Fri, Nov 04, 2011 at 01:43:55PM -0700, Adrian Klaver wrote: Does it tell you anything? You are very thorough. I hate mysteries. Especially the ones that break stuff. Know the feeling. I don't know enough about Postgres internals to be much help there. All I can point out is the problem seemed to appear over roughly a two month period 5/28/11-7/20/11. Any new code or procedures rolled out then? Any bug reports? not as far as I know, but I'll check with guys. Still - app changes shouldn't cause something like this in db? Then it wouldn't be a bug:) Seriously, what should happen and what does happen is very often different. Given the turnover on this table it seems important everything is 'normal' since 7/20/11. yes, but we can't do pg_reorg of the table (it has quite a lot of bloat) Probably should have been clearer here. I would look at what information is available for changes on 7/20-21/11. Best regards, depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: OK. So based on it all, it looks like for some rows, first two columns got mangled. Good detective work. So now we at least have a believable theory about *what* is happening (something is stomping the first 8 data bytes of these particular rows), if not *why*. You said that pg_dump does not show the corruption. That could be because the data is coming out through the COPY code path instead of the SELECT code path. Could you try a pg_dump with --inserts (which will fetch the data with SELECTs) and see if it shows corrupt data? Another thing we probably should ask at this point is whether you have any nonstandard software loaded into your server, like auto_explain or pg_stat_statements or some such. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 05:49:44PM -0400, Tom Lane wrote: You said that pg_dump does not show the corruption. That could be because the data is coming out through the COPY code path instead of the SELECT code path. Could you try a pg_dump with --inserts (which will fetch the data with SELECTs) and see if it shows corrupt data? Sure. Testing. Another thing we probably should ask at this point is whether you have any nonstandard software loaded into your server, like auto_explain or pg_stat_statements or some such. No. Nothing like this. Just base Pg. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
I wrote: Good detective work. So now we at least have a believable theory about *what* is happening (something is stomping the first 8 data bytes of these particular rows), if not *why*. Scratch that: something is stomping the first *six* bytes of data. On a hunch I converted the original and bad xobject_id and magic_id values to hex, and look at the pattern that pops out: badxi | badmi | origxi | origmi --+-+-+- | 1e2007f | 215a0f2 | 1e27862 | 1e2007f | 215da81 | 1e2b1f1 | 1e2007f | 215330e | 1e20a86 | 1e5007f | 2184b11 | 1e52281 | 1e5007f | 218597f | 1e530ef | 1e5007f | 2184e4f | 1e525bf | 1e6007f | 21939f6 | 1e61166 | 1e6007f | 21a1054 | 1e6e7c4 | 1e6007f | 219d7de | 1e6af4e | 1e6007f | 219d9f6 | 1e6b166 | 1e8007f | 21b3861 | 1e80fd1 | 1e8007f | 21b361d | 1e80d8d | 1f0007f | 223bde0 | 1f09528 | 1f0007f | 223a81a | 1f07f62 I'm assuming this is little-endian hardware, so the low-order half of magic_id is adjacent to xobject_id. We can see that in each case the first six bytes are being overwritten with ff ff ff ff 7f 00, while the high-order half of magic_id remains unchanged. Not sure what it means yet, but this seems like confirmation of the idea that something's stomping on the data while it passes through CREATE TABLE AS. BTW, did you try the separate INSERT/SELECT yet? Does that show corruption? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 06:18:55PM -0400, Tom Lane wrote: BTW, did you try the separate INSERT/SELECT yet? Does that show corruption? pg_dump --inserts is still working. i did create table (like), insert into ... select and it also shows the problem, as I showed (with other data) in email: 2003200312.ga4...@depesz.com Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 05:49:44PM -0400, Tom Lane wrote: You said that pg_dump does not show the corruption. That could be because the data is coming out through the COPY code path instead of the SELECT code path. Could you try a pg_dump with --inserts (which will fetch the data with SELECTs) and see if it shows corrupt data? i'm running the pg_dump (it will take some time, so don't hold your breath), but at the same time - I can select these rows, correctly, with normal SELECT from table (xobjects table). Doesn't it disprove this theory? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: On Fri, Nov 04, 2011 at 05:49:44PM -0400, Tom Lane wrote: You said that pg_dump does not show the corruption. That could be because the data is coming out through the COPY code path instead of the SELECT code path. Could you try a pg_dump with --inserts (which will fetch the data with SELECTs) and see if it shows corrupt data? i'm running the pg_dump (it will take some time, so don't hold your breath), but at the same time - I can select these rows, correctly, with normal SELECT from table (xobjects table). Doesn't it disprove this theory? Well, we don't know what the triggering condition is for the corruption, so it's premature to draw conclusions on what will or won't cause it. I was wondering if selecting the entire table was necessary. A different line of thought is that there's something about these specific source rows, and only these rows, that makes them vulnerable to corruption during INSERT/SELECT. Do they by any chance contain any values that are unusual elsewhere in your table? One thing I'm wondering about right now is the nulls bitmap --- so do these rows have nulls (or not-nulls) in any place that's unusual elsewhere? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thursday, November 03, 2011 1:25:58 am hubert depesz lubaczewski wrote: Hi We have pretty weird situation, which seems to be impossible, but perhaps you'll notice something that will let me fix the problem. System: SunOS 5.11 snv_130 Pg: PostgreSQL 8.4.7 on i386-pc-solaris2.11, compiled by cc: Sun C 5.10 SunOS_i386 2009/06/03, 64-bit I also verified that there are no concurrent updates that would set xobject_id to -1, so it's not a problem of isolation. During the night I repeated the procedure and the rows that got duplicated seem to be the same - at the very least - the same magic_id. Does above seem sensible for anyone? Any suggestions on what could be broken? Do the xobject_id values have other negative numbers or is -1 just a special case? The only thing I can think of is a corrupted index on xobject_id. Best regards, depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 07:00:30AM -0700, Adrian Klaver wrote: I also verified that there are no concurrent updates that would set xobject_id to -1, so it's not a problem of isolation. During the night I repeated the procedure and the rows that got duplicated seem to be the same - at the very least - the same magic_id. Does above seem sensible for anyone? Any suggestions on what could be broken? Do the xobject_id values have other negative numbers or is -1 just a special case? The only thing I can think of is a corrupted index on xobject_id. minimal xobject_id in source table is 1000. index on xobject_id might be corrupted, but it doesn't explain that I don't see duplicates with group_by/having query on xobjects, which uses seqscan: $ explain select xobject_id, count(*) from sss.xobjects group by 1 having count(*) 1; QUERY PLAN - GroupAggregate (cost=11718280.34..12682724.26 rows=35070688 width=4) Filter: (count(*) 1) - Sort (cost=11718280.34..11805957.06 rows=35070688 width=4) Sort Key: xobject_id - Seq Scan on xobjects (cost=0.00..5884815.88 rows=35070688 width=4) (5 rows) Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On 3 November 2011 15:15, hubert depesz lubaczewski dep...@depesz.com wrote: Do the xobject_id values have other negative numbers or is -1 just a special case? The only thing I can think of is a corrupted index on xobject_id. minimal xobject_id in source table is 1000. index on xobject_id might be corrupted, but it doesn't explain that I don't see duplicates with group_by/having query on xobjects, which uses seqscan: Actually, it does. A sequential scan doesn't use the index. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 03:19:36PM +0100, Alban Hertroys wrote: On 3 November 2011 15:15, hubert depesz lubaczewski dep...@depesz.com wrote: Do the xobject_id values have other negative numbers or is -1 just a special case? The only thing I can think of is a corrupted index on xobject_id. minimal xobject_id in source table is 1000. index on xobject_id might be corrupted, but it doesn't explain that I don't see duplicates with group_by/having query on xobjects, which uses seqscan: Actually, it does. A sequential scan doesn't use the index. sure. but so isn't create table as. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: index on xobject_id might be corrupted, but it doesn't explain that I don't see duplicates with group_by/having query on xobjects, which uses seqscan: I was just going to ask you to check that. Weird as can be. Does plain old SELECT COUNT(*) show a difference between the two tables? Do you get similar misbehavior if you break the CREATE TABLE AS into a CREATE and an INSERT/SELECT? Also, please note the rowcount returned by INSERT/SELECT and see how it matches up with the tables afterwards. Does turning synchronize_seqscans off affect the behavior? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 10:55:20AM -0400, Tom Lane wrote: index on xobject_id might be corrupted, but it doesn't explain that I don't see duplicates with group_by/having query on xobjects, which uses seqscan: I was just going to ask you to check that. Weird as can be. Does plain old SELECT COUNT(*) show a difference between the two tables? yes. the xobjects table is written to quite a lot and turning it off is not an option. Do you get similar misbehavior if you break the CREATE TABLE AS into a CREATE and an INSERT/SELECT? Also, please note the rowcount returned by INSERT/SELECT and see how it matches up with the tables afterwards. Does turning synchronize_seqscans off affect the behavior? interesting. will test and let you know. the test will take couple of hours, though. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thursday, November 03, 2011 7:15:22 am hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 07:00:30AM -0700, Adrian Klaver wrote: I also verified that there are no concurrent updates that would set xobject_id to -1, so it's not a problem of isolation. During the night I repeated the procedure and the rows that got duplicated seem to be the same - at the very least - the same magic_id. Does above seem sensible for anyone? Any suggestions on what could be broken? Do the xobject_id values have other negative numbers or is -1 just a special case? The only thing I can think of is a corrupted index on xobject_id. minimal xobject_id in source table is 1000. So just to be clear there is and never has been a -1 value for xobject_id in the source table? So a select count(*) from sss.xobjects where xobject_id = -1 on the source table yields 0? depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 08:04:19AM -0700, Adrian Klaver wrote: So just to be clear there is and never has been a -1 value for xobject_id in the source table? yes. min value of xobject_id is 1000, and we had trigger in place on the table which logged all inserts/updates/deletes and the value -1 never showed up (At least in the last couple of days, during which i was making the copies). So a select count(*) from sss.xobjects where xobject_id = -1 on the source table yields 0? yes, that's correct. both using index, and usingf seq scan. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On 3 November 2011 09:25, hubert depesz lubaczewski dep...@depesz.com wrote: All looks good. pg_dump of the table also doesn't show any strange problems, and is duplicate free. But: $ create table zzz as select * from sss.xobjects; SELECT $ select xobject_id, count(*) from zzz group by 1 having count(*) 1 order by 2 desc; xobject_id | count +--- -1 | 40 (1 row) Can you verify that these queries both do actually use a sequential scan, and not, for some reason, an index scan? Just to rule out the index corruption scenario. You (or someone near you) might have disabled seqscans, for example. Another thought: Is it possible that xobject_id just happens to be used internally as a hidden field by Postgres or by an extension? That would be another explanation for seeing -1 or duplicates in that column. If that's the case, I would have expected an error on creation of that table. And lastly, is this behaviour after copying a table into a new one reproducible or did it happen just once? -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thursday, November 03, 2011 8:05:38 am hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 08:04:19AM -0700, Adrian Klaver wrote: So just to be clear there is and never has been a -1 value for xobject_id in the source table? yes. min value of xobject_id is 1000, and we had trigger in place on the table which logged all inserts/updates/deletes and the value -1 never showed up (At least in the last couple of days, during which i was making the copies). So a select count(*) from sss.xobjects where xobject_id = -1 on the source table yields 0? yes, that's correct. both using index, and usingf seq scan. Hmmm. Now we await the results of the tests Tom suggested. Just a thought, any other strange behavior, hiccups in the database over the past couple of days? Best regards, depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 04:21:37PM +0100, Alban Hertroys wrote: On 3 November 2011 09:25, hubert depesz lubaczewski dep...@depesz.com wrote: All looks good. pg_dump of the table also doesn't show any strange problems, and is duplicate free. But: $ create table zzz as select * from sss.xobjects; SELECT $ select xobject_id, count(*) from zzz group by 1 having count(*) 1 order by 2 desc; xobject_id | count +--- -1 | 40 (1 row) Can you verify that these queries both do actually use a sequential scan, and not, for some reason, an index scan? Just to rule out the index corruption scenario. You (or someone near you) might have disabled seqscans, for example. yes, i tested it with explain. Another thought: Is it possible that xobject_id just happens to be used internally as a hidden field by Postgres or by an extension? That would be another explanation for seeing -1 or duplicates in that column. no. it's not a special field. just plain old something_id, with underscore, so it is not special in any way. If that's the case, I would have expected an error on creation of that table. And lastly, is this behaviour after copying a table into a new one reproducible or did it happen just once? fully reproductible, as i mentioned at the end of my original mail. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 08:23:01AM -0700, Adrian Klaver wrote: On Thursday, November 03, 2011 8:05:38 am hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 08:04:19AM -0700, Adrian Klaver wrote: So just to be clear there is and never has been a -1 value for xobject_id in the source table? yes. min value of xobject_id is 1000, and we had trigger in place on the table which logged all inserts/updates/deletes and the value -1 never showed up (At least in the last couple of days, during which i was making the copies). So a select count(*) from sss.xobjects where xobject_id = -1 on the source table yields 0? yes, that's correct. both using index, and usingf seq scan. Hmmm. Now we await the results of the tests Tom suggested. Just a thought, any other strange behavior, hiccups in the database over the past couple of days? no. it's doing it's job without problems. other tests are running, but simple question - how to get number of rows affected from psql? create table xxx as select * from xobjects; returns just: SELECT Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: other tests are running, but simple question - how to get number of rows affected from psql? create table xxx as select * from xobjects; returns just: SELECT We fixed that in 9.0, but 8.4 won't provide the count (unless you care to patch it). That's why I suggested doing a separate insert/select. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thursday, November 03, 2011 8:30:34 am hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 08:23:01AM -0700, Adrian Klaver wrote: On Thursday, November 03, 2011 8:05:38 am hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 08:04:19AM -0700, Adrian Klaver wrote: So just to be clear there is and never has been a -1 value for xobject_id in the source table? yes. min value of xobject_id is 1000, and we had trigger in place on the table which logged all inserts/updates/deletes and the value -1 never showed up (At least in the last couple of days, during which i was making the copies). So a select count(*) from sss.xobjects where xobject_id = -1 on the source table yields 0? yes, that's correct. both using index, and usingf seq scan. Hmmm. Now we await the results of the tests Tom suggested. Just a thought, any other strange behavior, hiccups in the database over the past couple of days? no. it's doing it's job without problems. other tests are running, but simple question - how to get number of rows affected from psql? See Toms answer create table xxx as select * from xobjects; returns just: SELECT The thing that has me puzzled is shown below from you original post: $ select xobject_id, magic_id from zzz where magic_id in ( 32440447, 32047231, 32505983); xobject_id | magic_id +-- 35858705 | 32505983 35793169 | 32440447 -1 | 32440447 -1 | 32047231 -1 | 32505983 -1 | 32505983 35399951 | 32047231 -1 | 32047231 (8 rows) Looks like multiple tuples of the same row where magic_id(s) of 32505983 and 32047231 where touched three times and 32440447 twice. For some reason the original table is seeing only the most recent version while the CREATE AS is pulling it and past versions. Some sort of visibility problem, exactly what is beyond me at this point. Best regards, depesz -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 10:55:20AM -0400, Tom Lane wrote: hubert depesz lubaczewski dep...@depesz.com writes: index on xobject_id might be corrupted, but it doesn't explain that I don't see duplicates with group_by/having query on xobjects, which uses seqscan: I was just going to ask you to check that. Weird as can be. Does plain old SELECT COUNT(*) show a difference between the two tables? Do you get similar misbehavior if you break the CREATE TABLE AS into a CREATE and an INSERT/SELECT? Also, please note the rowcount returned by INSERT/SELECT and see how it matches up with the tables afterwards. Does turning synchronize_seqscans off affect the behavior? So, did some tests: $ select count(*) from sss.xobjects; count -- 35179058 (1 row) $ create table qqq as select * from sss.xobjects; SELECT $ select count(*) from qqq; count -- 35179631 (1 row) $ select count(*) from qqq where xobject_id = -1; count --- 40 (1 row) $ drop table qqq; DROP TABLE $ select count(*) from sss.xobjects; count -- 35182687 (1 row) $ create table qqq ( like sss.xobjects ); CREATE TABLE $ insert into qqq select * from sss.xobjects; INSERT 0 35182962 $ select count(*) from qqq; count -- 35182962 (1 row) $ select count(*) from qqq where xobject_id = -1; count --- 40 (1 row) $ drop table qqq; DROP TABLE $ set synchronize_seqscans = off; SET $ create table qqq as select * from sss.xobjects; SELECT $ select count(*) from qqq; count -- 35185653 (1 row) $ select count(*) from qqq where xobject_id = -1; count --- 40 (1 row) $ drop table qqq; DROP TABLE $ create table qqq ( like sss.xobjects ); CREATE TABLE $ insert into qqq select * from sss.xobjects; INSERT 0 35188896 $ select count(*) from qqq; count -- 35188896 (1 row) $ select count(*) from qqq where xobject_id = -1; count --- 40 (1 row) $ drop table qqq; DROP TABLE as you can see counts of rows in created table are more or less sensible, but whatever method I used - create table as, insert into, using sychronized_scans (initially) or not (later) - copy of the table, as long as it's in database, has 40 those -1 rows. one note - maybe it wasn't clear from my original mail - when I did pg_dump of the xobjects table, it didn't have -1 rows. -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thursday, November 03, 2011 1:03:12 pm hubert depesz lubaczewski wrote: On Thu, Nov 03, 2011 at 10:55:20AM -0400, Tom Lane wrote: So, did some tests: as you can see counts of rows in created table are more or less sensible, but whatever method I used - create table as, insert into, using sychronized_scans (initially) or not (later) - copy of the table, as long as it's in database, has 40 those -1 rows. At least it is consistent:) Characteristics of those forty rows. 1) creation_tsz clustered? 2) magic_id in a range? 3) Non -1 xobject_id in a range? 4) Any other field show a similarity? one note - maybe it wasn't clear from my original mail - when I did pg_dump of the xobjects table, it didn't have -1 rows. -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
Adrian Klaver adrian.kla...@gmail.com writes: On Thursday, November 03, 2011 1:03:12 pm hubert depesz lubaczewski wrote: as you can see counts of rows in created table are more or less sensible, but whatever method I used - create table as, insert into, using sychronized_scans (initially) or not (later) - copy of the table, as long as it's in database, has 40 those -1 rows. At least it is consistent:) Yeah, it wasn't clear before, but it looks like there are specific rows that there's something wrong with. Characteristics of those forty rows. 1) creation_tsz clustered? 2) magic_id in a range? 3) Non -1 xobject_id in a range? 4) Any other field show a similarity? I would like to know the ctid's of the -1 rows in the copied table, along with the ctid's of the rows they share magic_ids with, and the ctid's of the rows with those same magic_ids in the original. I'm wondering whether the affected rows are physically clustered ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
I would like to know the ctid's of the -1 rows in the copied table, along with the ctid's of the rows they share magic_ids with, and the ctid's of the rows with those same magic_ids in the original. I'm wondering whether the affected rows are physically clustered ... i tried: create table qqq as select cmax as o_cmax, xmax as o_xmax, cmin as o_cmin, xmin as o_xmin, ctid as o_ctid, * from sss.xobjects; but the resulting table didn't have -1 values: $ select xobject_id, count(*) from qqq group by 1 having count(*) 1; xobject_id | count +--- (0 rows) $ select o_cmax,o_xmax,o_cmin,o_xmin,o_ctid, xobject_id, order_id from qqq where xobject_id = -1; o_cmax | o_xmax | o_cmin | o_xmin | o_ctid | xobject_id | order_id ++++++-- (0 rows) i'm checking now something else, but later will try to get better grasp on those bad rows. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski dep...@depesz.com writes: i tried: create table qqq as select cmax as o_cmax, xmax as o_xmax, cmin as o_cmin, xmin as o_xmin, ctid as o_ctid, * from sss.xobjects; but the resulting table didn't have -1 values: Oh, that's pretty interesting ... suggests that the targetlist has to be exactly select *. We have some minor optimizations for that case, though nothing that could result in extra rows AFAICS. Are there any dropped columns in the original table? Try select * from pg_attribute where attrelid = 'sss.xobjects'::regclass and attisdropped; regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strange problem with create table as select * from table;
On Thu, Nov 03, 2011 at 06:02:04PM -0400, Tom Lane wrote: select * from pg_attribute where attrelid = 'sss.xobjects'::regclass and attisdropped; no dropped columns. looking for some other info. will post as soon as i'll gather it, but that will be in utc morning :( Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general