Re: [GENERAL] Addled index

2013-03-15 Thread Steve Crawford

On 03/15/2013 11:29 AM, Oleg Alexeev wrote:

We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, 
a_id). B.a_id is foreign key to A. Both name columns in tables 
contains identical values for A.id = B.a_id. A.name column has unique 
constraint and additional index by it.


So, in some moment results for queries like [select id from A where 
name = 'petya'] became empty (row with 'petya' name exist in  A).


But query [select a_id from B where name = 'petya'] returns A.id and 
[select * from A where id = ] returns row.


This problem can be solved by index recreation only.

How can we avoid such situation?


What version??

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Addled index

2013-03-16 Thread Oleg Alexeev
On 16 March 2013 01:21, Steve Crawford wrote:

> On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
>
>> We've faced with strange index problem.
>>
>> At some moment index became bad and queries does not return any data.
>>
>> For example, there are two tables - A (id, name) and B (id, name, a_id).
>> B.a_id is foreign key to A. Both name columns in tables contains identical
>> values for A.id = B.a_id. A.name column has unique constraint and
>> additional index by it.
>>
>> So, in some moment results for queries like [select id from A where name
>> = 'petya'] became empty (row with 'petya' name exist in  A).
>>
>> But query [select a_id from B where name = 'petya'] returns A.id and
>> [select * from A where id = ] returns row.
>>
>> This problem can be solved by index recreation only.
>>
>> How can we avoid such situation?
>>
>>  What version??
>
>
The first one fail was on 9.1.? (table with at least 10 000 000 rows with
20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with
less than 0.5% every day modifications)


-- 
Oleg V Alexeev
E:oalex...@gmail.com


Re: [GENERAL] Addled index

2013-03-16 Thread Alban Hertroys
On Mar 16, 2013, at 9:33, Oleg Alexeev  wrote:

> On 16 March 2013 01:21, Steve Crawford  wrote:
> On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
> We've faced with strange index problem.
> 
> At some moment index became bad and queries does not return any data.
> 
> For example, there are two tables - A (id, name) and B (id, name, a_id). 
> B.a_id is foreign key to A. Both name columns in tables contains identical 
> values for A.id = B.a_id. A.name column has unique constraint and additional 
> index by it.
> 
> So, in some moment results for queries like [select id from A where name = 
> 'petya'] became empty (row with 'petya' name exist in  A).
> 
> But query [select a_id from B where name = 'petya'] returns A.id and [select 
> * from A where id = ] returns row.
> 
> This problem can be solved by index recreation only.
> 
> How can we avoid such situation?
> 
> What version??
> 
> 
> The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% 
> every day modifications)
> 
> Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less 
> than 0.5% every day modifications)


Perhaps the name you're not finding is spelled differently than what you're 
typing, due to collation?

If there's actually something wrong with the database; it looks a bit like your 
tables and your indexes get out of sync somehow, which normally wouldn't be 
possible. I'm mostly guessing, but perhaps one of the below has something to do 
with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on a 
file-system that might be rolling back file-system transactions? It this 
database perhaps a replicated node?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: [GENERAL] Addled index

2013-03-16 Thread Tom Lane
Alban Hertroys  writes:
> If there's actually something wrong with the database; it looks a bit like 
> your tables and your indexes get out of sync somehow, which normally wouldn't 
> be possible. I'm mostly guessing, but perhaps one of the below has something 
> to do with it:
> Maybe you turned fsync off?
> What type of index is that? A standard btree or one of the newer types?
> Are those tables and indexes perhaps on some kind of virtual storage or on a 
> file-system that might be rolling back file-system transactions? It this 
> database perhaps a replicated node?

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation.  You've provided no information that would suggest what,
though.  Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from?  Some distro (whose)?
  If they're self-built, what compiler and configuration settings did
  you use?
* what platform is this?  I would not rule out kernel bugs or flaky
  hardware.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Addled index

2013-03-16 Thread Oleg Alexeev
On 16 March 2013 14:32, Alban Hertroys  wrote:

> On Mar 16, 2013, at 9:33, Oleg Alexeev  wrote:
>
> On 16 March 2013 01:21, Steve Crawford wrote:
>
>> On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
>>
>>> We've faced with strange index problem.
>>>
>>> At some moment index became bad and queries does not return any data.
>>>
>>> For example, there are two tables - A (id, name) and B (id, name, a_id).
>>> B.a_id is foreign key to A. Both name columns in tables contains identical
>>> values for A.id = B.a_id. A.name column has unique constraint and
>>> additional index by it.
>>>
>>> So, in some moment results for queries like [select id from A where name
>>> = 'petya'] became empty (row with 'petya' name exist in  A).
>>>
>>> But query [select a_id from B where name = 'petya'] returns A.id and
>>> [select * from A where id = ] returns row.
>>>
>>> This problem can be solved by index recreation only.
>>>
>>> How can we avoid such situation?
>>>
>>>  What version??
>>
>>
> The first one fail was on 9.1.? (table with at least 10 000 000 rows with
> 20% every day modifications)
>
> Two day ago was another one fail on 9.2.3. (table with 120 000 rows with
> less than 0.5% every day modifications)
>
>
> Perhaps the name you're not finding is spelled differently than what
> you're typing, due to collation?
>
> If there's actually something wrong with the database; it looks a bit like
> your tables and your indexes get out of sync somehow, which normally
> wouldn't be possible. I'm mostly guessing, but perhaps one of the below has
> something to do with it:
> Maybe you turned fsync off?
> What type of index is that? A standard btree or one of the newer types?
> Are those tables and indexes perhaps on some kind of virtual storage or on
> a file-system that might be rolling back file-system transactions? It this
> database perhaps a replicated node?
>
>
>
Oh, it is not about short experiments. :)

Both failed queries are part of 24/7 application. And one of the query
results became empty for existing key at some moment. We've recreated index
and same query returned to the normal work.

fsync is in off state, yes

Both failed indexes are btree type.

Database located on software md raid 1 based on two SSD disks array. Ext4
filesystem. Database is master node.

-- 
Oleg V Alexeev
E:oalex...@gmail.com


Re: [GENERAL] Addled index

2013-03-16 Thread Oleg Alexeev
On 16 March 2013 19:10, Tom Lane  wrote:

> Alban Hertroys  writes:
> > If there's actually something wrong with the database; it looks a bit
> like your tables and your indexes get out of sync somehow, which normally
> wouldn't be possible. I'm mostly guessing, but perhaps one of the below has
> something to do with it:
> > Maybe you turned fsync off?
> > What type of index is that? A standard btree or one of the newer types?
> > Are those tables and indexes perhaps on some kind of virtual storage or
> on a file-system that might be rolling back file-system transactions? It
> this database perhaps a replicated node?
>
> More generally: since we're not hearing this type of complaint from
> other people, there must be something pretty unusual about your
> installation.  You've provided no information that would suggest what,
> though.  Aside from Alban's questions, some other things come to mind:
>
> * is that a plain text column, or some other data type?
> * what collation/ctype is your database using?
> * what nondefault parameter settings are you using?
> * where did you get the Postgres executables from?  Some distro (whose)?
>   If they're self-built, what compiler and configuration settings did
>   you use?
> * what platform is this?  I would not rule out kernel bugs or flaky
>   hardware.
>
> regards, tom lane
>

* it is varchar columns, 256 and 32 symbols length
* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
checkpoint_completion_target = 0.7
* postgres 9.2.3 installed via yum repository for version 9.2
* 64 bit Centos 6, installed and updated from yum repository

-- 
Oleg V Alexeev
E:oalex...@gmail.com


Re: [GENERAL] Addled index

2013-03-16 Thread Tom Lane
Oleg Alexeev  writes:
> * it is varchar columns, 256 and 32 symbols length
> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
> checkpoint_completion_target = 0.7
> * postgres 9.2.3 installed via yum repository for version 9.2
> * 64 bit Centos 6, installed and updated from yum repository

fsync off?  Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

> Database located on software md raid 1 based on two SSD disks array. Ext4
> filesystem. Database is master node.

Meh.  I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
> recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there.  I'd certainly suggest making
sure you are on a *recent* kernel.  If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about.  But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Addled index

2013-03-18 Thread Greg Jaskiewicz

On 17 Mar 2013, at 04:30, Tom Lane  wrote:

> Oleg Alexeev  writes:
>> * it is varchar columns, 256 and 32 symbols length
>> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
>> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
>> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
>> checkpoint_completion_target = 0.7
>> * postgres 9.2.3 installed via yum repository for version 9.2
>> * 64 bit Centos 6, installed and updated from yum repository
> 
> fsync off?  Have you had any power failures or other system crashes?
> ext4 is *way* more prone than ext3 was to corrupt data when fsync is
> disabled, because it caches and reorders writes much more aggressively.
> 
>> Database located on software md raid 1 based on two SSD disks array. Ext4
>> filesystem. Database is master node.
> 
> Meh.  I quote from the RHEL6 documentation (Storage Administration
> Guide, Chapter 20: Solid-State Disk Deployment Guidelines):
> 
>> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
>> recommended for use on SSDs.
> 
> https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html
> 
> The part of the docs I'm looking at only asserts that performance is
> bad, but considering that it's a deprecated combination, it may well be
> that there are data-loss bugs in there.  I'd certainly suggest making
> sure you are on a *recent* kernel.  If that doesn't help, reconsider
> your filesystem choices.
> 
Yeah, I don't think I'd consider using software raid for SSDs any time a good 
idea

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Addled index

2013-03-18 Thread Oleg Alexeev
On 17 March 2013 08:30, Tom Lane  wrote:

> Oleg Alexeev  writes:
> > * it is varchar columns, 256 and 32 symbols length
> > * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
> > * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
> > commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
> > checkpoint_completion_target = 0.7
> > * postgres 9.2.3 installed via yum repository for version 9.2
> > * 64 bit Centos 6, installed and updated from yum repository
>
> fsync off?  Have you had any power failures or other system crashes?
> ext4 is *way* more prone than ext3 was to corrupt data when fsync is
> disabled, because it caches and reorders writes much more aggressively.
>
> > Database located on software md raid 1 based on two SSD disks array. Ext4
> > filesystem. Database is master node.
>
> Meh.  I quote from the RHEL6 documentation (Storage Administration
> Guide, Chapter 20: Solid-State Disk Deployment Guidelines):
>
> > Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
> > recommended for use on SSDs.
>
>
> https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html
>
> The part of the docs I'm looking at only asserts that performance is
> bad, but considering that it's a deprecated combination, it may well be
> that there are data-loss bugs in there.  I'd certainly suggest making
> sure you are on a *recent* kernel.  If that doesn't help, reconsider
> your filesystem choices.
>
> (Disclaimer: I work for Red Hat, but not in the filesystem group,
> so I don't necessarily know what I'm talking about.  But I have the
> feeling you have chosen a configuration that's pretty bleeding-edge
> for RHEL6.)
>
> regards, tom lane
>


I think fsync=off was really bad idea.



-- 
Oleg V Alexeev
E:oalex...@gmail.com