On Wednesday, July 28, 2010 07:08:42 pm you wrote:
Le 28/07/2010 17:16, Sean Brown a écrit :
On Wednesday, July 28, 2010 10:56:27 am you wrote:
Le 28/07/2010 16:50, Sean Brown a écrit :
Does PgPool have an issue with the order of results from a query
being returned in a different order?
With the ongoing issue of our cluster falling out, we just had one
member fall out again and again, no error in PostgreSQLs log is
recorded. PgPool reports the possible last query was:
select * from carts where cart_id in (11835,11824,11819)
Specifically the error in PgPools log is:
read_kind_from_backend: 1 th kind D does not match with master or
majority connection kind ^@
2010-07-28 10:21:23 ERROR: pid 28242: kind mismatch among backends.
Possible last query was: select * from carts where cart_id in
(11835,11824,11819) kind details are: 1[D]
If I run that query on the remaining node and on the one that just
fell out, I get the same 3 results, but the orders of the records
are different.
The query you show doesn't ask for a specific order (no ORDER BY
clause), so each backend can send the data in whatever order they
prefer.
That I understand, what I am wondering is if this will cause PgPool to
believe there is an error in the information returned from the
backends, even if the only difference is the difference in order?
If it does, I assume the best way to deal with it is add order by's to
every query passed to pgpool?
That's actually really good questions, and I don't have an answer to
them. CC back the list to know if other people have an answer to give us.
No, pgpool does not care about the order of data returned by SELECT.
More details...
PostgreSQL returns data packet like this:
T(descriptions about tuple)
D(actual one raw tuple data)
D
D
C(indicates succeeded in sending data)
where each single capital letter is packet kind(see PostgreSQL docs
for more details). What pgpool actually does is, checking the packet
kind, not content of tuple data. So as long as same number of tuples
are returned from each backend, pgpool is happy.
I'm not sure what pgpool version Sean's uses, it seems it's a little
bit old (from the error message I guess). Also, majority connection
kind ^@ looks strange. ^@ = 0x00, which is not valid kind at all.
It seems something unusual is going on...
If self cantained test case is provided, I will be able to look into
this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Ok I have some more information. The cluster fell apart again yesterday.
Again, there was nothing out of the ordinary in the postgresql logs leading
upto the nodes falling out but this was logged in the pgpool log
First node fell out at 10:36am:
2010-08-02 10:36:47 DEBUG: pid 16699: AsciiRow: 24 th field size does not match
between master(16777216) and 2 th backend(0)
2010-08-02 10:36:47 DEBUG: pid 16699: read_kind_from_backend: read kind from 0
th backend ^@ NUM_BACKENDS: 3
2010-08-02 10:36:47 DEBUG: pid 16699: read_kind_from_backend: read kind from 1
th backend ^@ NUM_BACKENDS: 3
2010-08-02 10:36:47 DEBUG: pid 16699: read_kind_from_backend: read kind from 2
th backend D NUM_BACKENDS: 3
2010-08-02 10:36:47 ERROR: pid 16699: read_kind_from_backend: 2 th kind D does
not match with master or majority connection kind ^@
2010-08-02 10:36:47 ERROR: pid 16699: kind mismatch among backends. Possible
last query was: select * from carts where cart_id in
(12979,12984,12987,12986,12982,12981) kind details are: 2[D]
2010-08-02 10:36:47 LOG: pid 16699: notice_backend_error: 2 fail over
request from pid 16699
2010-08-02 10:36:47 DEBUG: pid 24933: failover_handler called
Second node fell out at 1:56pm:
2010-08-02 13:56:21 DEBUG: pid 19720: AsciiRow: len: 1 data: t
2010-08-02 13:56:21 DEBUG: pid 19720: read_kind_from_backend: read kind from 0
th backend ^@ NUM_BACKENDS: 3
2010-08-02 13:56:21 DEBUG: pid 19720: read_kind_from_backend: read kind from 1
th backend D NUM_BACKENDS: 3
2010-08-02 13:56:21 ERROR: pid 19720: read_kind_from_backend: 1 th kind D does
not match with master or majority connection kind ^@
2010-08-02 13:56:21 ERROR: pid 19720: kind mismatch among backends. Possible
last query was: select * from carts where cart_id in (13019,13018) kind
details are: 1[D]
2010-08-02 13:56:21 LOG: pid 19720: notice_backend_error: 1 fail over
request from pid 19720
2010-08-02 13:56:21 DEBUG: pid 24933: failover_handler called
I notice two things, first that we died both times on a query of the carts
table, but this could just be a coincidence. Second, we have the ^@ showing up
as a kind again, except while two nodes returned ^@ when the first node fell
out, only one returned ^@ when the second fell out, even though it was one of
the nodes that returned ^@ that morning.
I have PgPool 2.3.2 and PostgreSQL 8.4.4